Exploratory Data Analysis and Visualization for the Mean Marketer – Part 2

In the previous post, we looked at how an analyst can indentify if a variable is quantitative or categorical, determine if it plays the response or explanatory role, and assign the situation a specific role-type classification.  With this information at hand, I’d like to walk through an example where we can examine various role-type classifications and determine the best graphical method to analyze and communicate the data.

In our example, we will pretend that Inc. Inc.’s marketing department is attempting to allocate next quarter’s digital marketing budget.  They have limited information so a thorough analysis is mandatory to meet next quarter’s profitability goals.  Additionally, Inc. Inc. recently downsized their marketing department after missing last quarter’s profitability goals and has provided you, an intern, with the opportunity to step up and make recommendations on next quarter’s budget allocation.  Inc. Inc. has provided you with a spreadsheet that contains four columns: visits, conversions, search engine and keyword type.  It is up to you to analyze and present your recommendations for next quarter’s marketing budget.

First, you decide to examine the relationship between visits and conversions.  You hope to identify a strong correlation between visits and conversions which will justify allocating marketing’s entire budget into the cheapest channel.  In this situation you will be looking at a quantitative to quantitative (Q-Q) role-type classification.  In order to effectively display a Q-Q relationship, a scatter graph should be used.  Additionally, when analyzing two variables the explanatory variable will be graphed on the horizontal X-axis, and the response variable on the vertical Y-axis.   Below you will find some links that explain, in detail, how to create a scatter graph with various software packages.  You will also find a scatter graph representing the relationship between Inc. Inc.’s conversion and visit data.

  1. Excel Scatter Graph
  2. Google Docs
  3. R


Two factors should always be looked at when analyzing a scatter graph, including the pattern and any remarkable deviations.  Below you will find a list of patterns that can be observed within a scatter graph.

  1. Linear (Form): Defined as points scattered about a straight line.
  2. Curvilinear (Form): Defined as points scattered about a curved line.
  3. Positive (Direction): Defined as the pattern moving from the lower left to the upper right.
  4. Negative (Direction): Defined as the pattern moving from the upper left to the lower right.
  5. Neither (Direction): Does not fit either pattern described above.  For example, this could be a U-shaped line.
  6. Weak (Strength): The points are sparsely scattered across a line.
  7. Strong (Strength): The points are densely scattered across a line.

In the scatter graph containing Inc. Inc.’s data, we can now recognize that there is a weak, positive linear relationship.  Based off this information you decide that allocating marketing’s entire budget to the cheapest channel may not be the best move as you now know conversions weakly correlate with number of visits.

Next, you consider that the type of keyword or search engine that a visitor originates from may play a role in conversion rate.  Here you will be looking at a categorical to quantitative (C-Q) relationship, and a boxplot should be used to effectively analyze the information.  A boxplot will let you visualize the distribution (range, median, quartiles) of the response variable by category.  Below, I have included a few links that will explain how to do this in detail for some popular packages.  Additionally, you will find two side by side boxplots comparing conversion rate to various search engines and keyword categories.

  1. Excel Scatter Graph
  2. R


In the above boxplot you can compare the minimum, first quartile, median, third quartile, and maximum conversion rate for Google, Bing, branded keywords, and non-branded keywords.  By examining these distribution metrics, you can see that Bing converts at a higher rate than Google.  Furthermore, in the keyword category boxplot you can see that branded keywords convert at a much higher rate than non-branded keywords.  With this knowledge you decide to allocate a larger percentage of next quarter’s marketing budget to branded keywords in Bing.

Since your deadline is approaching in a few days, you decide to examine only one more relationship; the relationship between device and search engine.  With this information you hope to determine if users of certain devices (i.e. cell phones, laptops, or desktops) favor particular search engines.  In this instance, a categorical to categorical relationship will be examined, and a two-way table chart should be used. Below is an illustration.

Here we can see that 67% of laptop devices use Google while 33% use Bing, and that all devices favor Google as their search engine.  However, in the case of desktops we find that the percentage of people that use Bing is moderately higher than every other device.  With this information you decide to allocate slightly more of next quarter’s marketing budget to desktop search as you know Bing typically converts at a higher rate.

Finally, you combine all of the data and recommend that the 70% of next quarter’s marketing budget should be allocated to branded searches on Bing, originating from desktop devices.  Several months later you receive Q3′s profit reports and see that marketing spend returned a threefold increase on investment.  In light of this, Inc. Inc. offers you a promotion and asks you to optimize their marketing spend using more advanced techniques!

Now, of course we know that this example was exclusively used to highlight the various graphical devices that can be used to communicate different role-type classifications, and in real life situations numerous other factors would need to be accounted for.


Actionable Insights

As search engine algorithms become smarter, the means by which SEOs perform their job will be changing, and the modes of yesteryear will no longer be applicable.  In order to launch effective organic search campaigns, marketing channels will need to be integrated and disparate data sources will need to be combined.

In this post a few fundamental concepts of exploratory data analysis and visualization were covered and can be directly applied to  many aspects of SEO including: backlink analysis and keyword research.  However, I hope many SEOs will take a few of the concepts here and start exploring relationships between segmented marketing channels (i.e. TV ads and backlinks or sponsorships and social mentions).  Finally, I must add that correlation doesn’t equal causation, so while relationships may be found, keep an eye out for lurking variables.  A quick recap can be found below.

  • Variables: Can be either Categorical or Quantitative
  • Roles: Can be either Explanatory or Responsive
  • Role-type classifications
    • Quantitative to Quantitative (Q-Q): Graphically displayed with a scatter graph
    • Categorical to Quantitative (C-Q): Graphically displayed by a side-by-side box plot
    • Categorical to Categorical (C-C): Graphically displayed by a two-way table
    • Quantitative to Categorical (Q-C): Not Covered as it is beyond the scope of this post