Exploratory Data Analysis and Visualization for the Mean Marketer (Part 1)

The world’s digital information is growing faster and faster every day, and by some estimates the digital universe will reach a whopping 90 zettabytes of information by the year 2020.  With this type of digital data tsunami the amount of insights that can be extracted is nearly limitless. This data revolution can already be seen changing our everyday lives; from having search engines respond to our every query within seconds of the request, to being able to predict fraudulent credit card transactions.

If you are in online marketing, or in any type of roll that involves any level of data analysis, possessing the ability to understand, analyze and communicate statistical observations will not only be a requirement, but in some cases could elevate your career to new levels. Don’t believe me?  Take a look at some of the following stats and quotes:

“It doesn’t do you any good to find out something interesting and new if you can’t tell people about it. Once you have an insight, you must be able to talk about it compellingly,”

- Suzanne Fogel, chair of the marketing department at DePaul University.

“Employment of data analysts is expected to grow 45 percent over the by 2018, much faster than average.”

- U.S. Bureau of Labor Statistics

“Advancements in computing capabilities and analytical software have made it faster and cheaper for analysts to solve problems. As problem solving becomes cheaper and faster, more firms will have the ability to employ analysts,” 

- U.S. Bureau of Labor Statistics

“I keep saying that the sexy job in the next 10 years will be statisticians,”

- Hal Varian, Chief Economist at Google.

Hopefully by now you are convinced that the ability to analyze data and communicate the results is a worthy study.  In this post, I’d like to take a look at a few aspects of the exploratory data analysis (EDA) phase of statistics.  Particularly, I am going to look at how to categorize, analyze and graphically display three different types of data analysis situations involving two variables. For example, this could consist of examining the relationship between a search engine and conversion rate or the number of backlinks and Page Rank.  Thankfully, the field of statistics has been around since, at least, the 5th century BC, and in most cases there are guidelines and procedures to follow. Let’s begin.

The first step when analyzing two variables is classifying the variables into one of two types: categorical or quantitative.  Categorical variables are variables that can be divided into separate categories and can take on labels; some examples include: colors, animals, and cities.  Quantitative variables are numerical values and represent a measurable quantity; some examples include: batting average, backlinks, and population.

After determining the type of variable that will be analyzed, it is time to determine the role each variable plays.  In almost all situations involving two variables, analysts look to explain the outcome of one variable based off the other. For example, when examining if where the search engine visitors come from impacts conversion rates, the search engine is assigned the explanatory role as it tries to explain conversions, which play the response role.  Below are a few examples where the explanatory variable is in red and the response variable is highlighted in blue:

  1. Explain number of conversions by the number of visits.
  2. Explain the number of conversions by the type of keyword.
  3. Theorize what type of device is being used by the search engine it came from.
  4. Theorize what type of search engine is being used by the number of visits.

After examining the above examples, think about how the variable in red tries to explain the variable in blue.  Once this connection is made, analysts are able to identify the role-type classification.  The chart below illustrates the four different role-type classifications:

The four Role-Type Classifications: Please note that these correspond to the above examples.

  1. Quantitative – Quantitative (Q – Q)
  2. Categorical – Quantitative (C- Q)
  3. Categorical – Categorical (C – C)
  4. Quantitative – Categorical (Q – C)

To recap, an analyst now has the ability to identify if a variable is quantitative or categorical, determine if it plays the response or explanatory role, and assign the situation to a specific role-type classification.  In my next post, I will walk through an example where we can look at various role-type classifications and discuss the best graphical method to analyze and communicate the situation.