In an earlier post on time series, I explained how a time series data set has to be examined to see how the data can best be incorporated within an advanced analytics model. There are a number of advanced models, but probably the one most being rediscovered now is the marketing mix model, a regression analysis of several marketing channels, assigning sales or market share as dependent variables.
This post will look at what goes into planning a marketing mix model that makes budgets more effective when faced with media complexity.
The value of a marketing mix model
Marketing mix modeling has gained importance because of the rise in the number of media channels that can trigger customer activity. When I launched my consultancy in 2009, analytics was designed to examine website activity from a home computer. Today that activity can come from almost anywhere. A mobile device or networked consumer goods like cars or refrigerators can be a starting or intermediary point for a customer to make a purchase or research a brand online. The attribution activity can now be represented as data, so a marketing mix model aims to blend that data into a regression so that marketers can better estimate which activity on a set of channels drive sales.
Where to start with a model
1. Import libraries
As I mentioned in the time series post, libraries carry the functions needed for the model. In this case, the libraries that can help model are ones that address wrangling and regression creation. For a regression you can use the base R functions cor() for the correlation, sum() to summarize results, and lm() for the regression on your data. You may want a visualization library like ggplot2 to provide more visualization options .
2. The next step is adding the data and wrangling the data fields
Wrangling is just another term for organizing data. A number of data types can go into a marketing mix model, much of which represent retail activity, such as an organization’s events and sponsorships, new client accounts, sales by product line, economic data representing macroeconomic forces for each market, and competitor advertising spend. You can pick according to what you are interested in modeling, but there should be uniformity in data type where possible.
The steps for tidy data apply in wrangling (I explain a few tidy data steps here). But there are some additional considerations.
- Channel data with columns should have media clearly labeled
- The time series data for each media channel being examined must exist in the same equal lengths in time
- Media sources appear in the data field as units, not dollars. This choice is meant to eliminate price fluctuations that can skew the analysis.
There are libraries that can import from databases and URL easily, such as data.table package, an extension of the data.frame object class in R that contains performance improvements.
You may also seek libraries with test data that can be used when you are interested in experimenting with unique channels where actual data is scarce.
3. Creating the regression for analysis
The regression model relates sales or market share to the channel data with coefficients for each channel. The channel data is in units to ease the comparison.
A regression will take the form of a line graph based on the following linear equation:
Sales (can also be Market share) = Bo + B1 x1 + B2 x2 + B3 x3
The equation addresses two components. Bo represents base-level sales or market share without advertising (assumed constant), while the rest of the equation represents sales activity due to channels x1, x2 , x3 , and so forth.
The sales activity in the equation explains how a one unit increase in a given channel would increase the sales (or market share) by a given coefficient (B1 , B2 , B3 , etc). You can then evaluate the calculated sales against your budget and real world constraints in mind, creating a contribution table that compares the sales lift from a given channel when the other channels are held constant. That comparison quantifies the effects of different advertising mediums.
Limitations to the model?
There are some limitations to this analysis, however. Marketing mix models emphasize immediate channel responses to advertising media, overlooking long-term brand recognition as a sales factor. This limitation may once have seemed philosophical, but as brand image becomes linked to digital channels, especially on social media, brands are seing their brand equity – and sales – rapidly impacted by real-life events, as discussed here.
Nevertheless, the marketing mix model does provide solid answers for high-level attribution of marketing channels to a business model. With better data and improved analysis models from data science languages like R and Python, marketers can use the model to dial in their marketing budget to the right channels, and improve revenue outcomes.