Reaching the next level in Data Analytics

Sharing is caring!
Time Series

How do you answer the perennial dinner party question "So, what do you do for a living"? I've given up on the accurate and detailed answer - too many people glazed over, started sleeping or just walked away looking sympathetically at the rambling nerd. For a while I dabbled with the outright lie "I'm a luge test pilot". It got the conversation off to a flying start but the revelation of the less than glamorous truth amplified the same result as the honest answer.

Nowadays I offer "I do data", and surprisingly, it's the top performer.

Data is the new coal.
Data is sexy!
Data is controversial and (surprisingly) exciting.
Everyone loves data. Everyone wants data. Now!

Here's the problem. Having data isn't the prize. It's barely the first step on the journey to getting answers. And that's the thing - we consciously crave the data but we don't let the subconscious voice to be heard. The subconscious voice offers some important questions:

"Why do I want this data?"
"When I get it, what am I going to use it for?"
"How will it help answer questions?"
"And which questions will it help me answer?"
"Why am I here?"

Okay, hopefully the last question is rarely needed but if we just stopped and listened carefully for a moment and reflected on why our data appetite needs to be satisfied, we'd be better users of data. We'd see that having the data is not the be-all and end-all.

We can start to see that the more we ask meta-questions regarding our data collection efforts, the more we can see how powerful data becomes. Data collection has a cost. The data must have an ROI in commercial and tactical terms. It needs to deliver on our bottom line and deliver value: a.k.a knowledge.

Taking the next step in consuming data rather than acquiring more data is a liberation. Properly employing time series data for the first time and using it as an asset rather than a prize is a threshold moment to be appreciated.

Here are 5 case studies where data pillaging turned into data capitalisation. We like to think they will help liberate you in your data usage too. This is for everyone who's never thought: "I couldn't have done this without the data"

1. Optimisation

First, we'll consider a major online shoe retailer. They have really high quality transactional data - good volume and quality - and it looks splendid in a visualisation like the one below.

Optimisation Visualisation

So what?

Fantastic! Just having this data in one chart seems like a win... but... so what? Knowing that millions of training shoes are sold is great as a vanity metric but it's hardly stimulating. Little value comes from using such a chart in terms of business optimisation.

Never be satisfied by tidy looking charts with the offered dimensionality. The chart is a great starting point as it provokes us to look deeper and ask more of the data. For example, who buys the trainers and when?

Use the data better

Consider two key dimensions that are needed to ask useful questions of the data. Time and behaviour. The time dimension will reveal seasonality - not just seasonal peaks but also the frequency of transactions per user.

Thinking about users, we want to see if there is any pattern regarding new users and returning users. How does behaviour change on repeat visits?

Taking the behaviour dimension first, the startling realisation is that new and returning users behave exactly the same. The plots of sales for new and returning users are identical - their product choices are consistent.

However, when we introduce the time dimension we see that there is a strong regularity to purchase behaviour. Whilst sales are spread over the year, users are rhythmic in their purchasing behaviour. Both new and returning users partake of the annual purchases with returning users confirming the habit over time.

This insight provides a signal we can use every month to promote the site through marketing optimisation and with personalisation on the site. We have an opportunity to optimise ads and the site - the ideal combination.

Every month, we can look back 12 months and see which users are likely to revisit on their annual shoe purchase. Let's build an audience and start to gently remind them of the great brand they used last year. Not just a "spray and pray" campaign though. Knowing what they bought last time needs to inform the content of the ads this year. Training shoes evolve, this years model as an update to last year, better pricing, more choice. The tactics are straightforward.

When the click happens, make sure the site responds to their previous visits - not in a creepy stalking manner. Just make sure the product categories that are of likely interest are front and center - easy to find. Subtle reductions in purchase friction make the site more aligned with the probable user intent.

There's another analysis we can perform. The same query based on 11 month old data will show who was likely to revisit but didn't. This cohort needs a different treatment. These folks maybe forgot about the site or they didn't know about the new models in their favourite brand. We'll incentivise these with a specific coupon to motivate the annual visit as well as including the range and pricing update.

This kind of tripwire marketing is super effective in terms of the results it delivers as well as the investment required to get it moving. The data is key, of course, but the return on spending £5 on 1000 users who are likely to purchase means it's is a smarter spend than spray £1 on ALL users who have a lower probability of buying from you.

Takeaway

Avoid looking at data with a single perspective. Don't just stare at dashboards. Introduce time and behavioural variables if they're absent. Ask questions with a strategic suffix: "What does this data mean with respect to the strategic goal Y?"

Use tripwires to preempt likely actions and act when behaviour deviates from the expected norm.

2. Machine Learning

Hurray, we have lots of data about sessions and the content that was consumed during those sessions!

content analysis

So what?

Page level data tells us very little about user behaviour. There's no session context or even pan session relevance. We need to join this data up to be useful in some way.

Use the data better

We can easily stitch session data together and then think in terms of users across multiple sessions. Great, so we see what content is being consumed across a bunch of sessions, what next?

We identify patterns in the data. A simple heuristic analysis of our data identified a common pattern where users navigate from out homepage to the /about page. The problem here is that our about page wasn't a great user experience. Google told us the page was slow.

PageSpeed Insights

As we saw the common pattern was to go from home to /about, why not use this insight to second guess the user's choice and prerender the page? Using the prerender resource directive lets us get the page ready to show resulting in an apparently fast page.

Having hard coded the prerender based on the data from the heuristic analysis we saw the average document interactive time drop from 3 seconds to 2 seconds. Not bad for a quick and dirty hack but we can do better.

Deeper analysis of the data shows that based on the last 6 pages the user saw (even across sessions), we can predict the next page the user will navigate to with an accuracy approaching 90%. How is this done and what action can we take with this prediction?

Take a simple 6 path journey like this:

  1. /
  2. /about
  3. /blog
  4. /service/google-analytics
  5. /careers
  6. /contact

This is categorical data - the variables are labels rather than numeric. If we want to build a statistical model based on this data, we need to respect that many machine learning algorithms prefer to act on numerical data rather than categorical. We need to process this data. Having recorded the last 6 pages paths as a JSON object, we can reprocess the object using 1 hot encoding, as shown in the table below

Categorical Data

This is quite simple to do using R:

Asking an R user where one-hot encoding is used is like asking a fish where there is water; they can't point to it as it is everywhere.

Now we can feed this numeric data into our model, predict the next page and prerender it in real time.

We use a simple Custom HTML tag in GTM to manage a FIFO queue in local storage for the last 6 page paths. We send that to a Cloud Function in the Google Cloud. The Cloud Function sends the JSON to an R script running on a Google Cloud Virtual Machine, handles the 1 hot encoding, sends the encoded page paths to our model in Google Cloud ML which makes the next page path prediction, passes the prediction back to GTM which then pre renders the page:

Machine Learning prediction

Inside a few hundred milliseconds, the page is pre rendered and ready to delight the user with amazing speed:

Page Pre-rendering

Takeaway

The prerender technique itself isn't a ConversionWorks invention. Originally (I think) seen in the wild thanks to the brilliant work by Mark Edmondson, and latterly it's packaged as Google guess.js. The takeaway here is the realisation that content consumption data in isolation is of limited value relative to what can be done.

Page scoped data points tell you very little about the user - they lack context. You need to consider the user when analysing content consumption - they're non-linear - think pan session. Don't rely on raw data - with the right choice of tool, processing raw data is quick and relatively simple.

3. Forecasting

How long can you stare at this kind of chart?

Line Chart

How often do you look back at last year's data? Two years ago? You probably use a year on year comparison to see how you're doing this year compared to last. Seeing where you've been can only guide optimisation so far. What if you could look forward?

So what?

Using time series data as a rear view mirror is missing out on an amazing opportunity. Use your time series data to look into the future. We don't advise ignoring historic data completely. Use historic data in combination with a forecast to better inform decision making.

Use the data better

Staring at a chart and making a decision on future performance is hard for humans. Let's get the machine do the heavy lifting on the numbers using time series decomposition. Time series data can be decomposed into three components - the trend, the seasonality and the noise or random component.

Time Series - Trend Seasonality Noise

The accuracy of this technique is dependant on the strength of the seasonal component and the stability of the noise component. Less noise and more seasonality means we can combine the trend with the seasonal component to determine where the chart line is going next.

We've previously used the HoltWinters method in the commonly available forecast package for R. Other techniques are available, such as Arima, Prophet, and ETS.

It's worth performing forecasts against observed data to calibrate the accuracy of each model and using the best fit for your business.

Again, sticking with R, the ability to perform a simple query on our Google Analytics data using Mark Edmondson's googleanalyticsR package makes life very simple:

Google Analytics R

Having the data, we then build the forecast and plot the chart as seen below:

Google Analytics forecasting

Takeaway

Again, it has to be noted, this isn't a new technique developed by ConversionWorks. Check out the easy to follow example on dartistics.com

The intended takeaway here is that using your data to reflect on the past has limited value in optimisation and looking forward. The examples here show how simple and accessible the technique is. Now you can use a forecast for benchmarking against a test program, setting goals, and addressing "what if?" questions.

4. The commercial solution

Any sensible business strives to manage costs. You might find cost savings for your business but you may also find spending least isn't always best. Be mindful of the value of an investment as well as the cost.

This maxim is incredibly relevant to Google Ads. There's a widespread school of thought that seeks out the cheapest Google Ads traffic by optimising for the lowest cost per acquisition (CPA).

This is like trying to find the cheapest lawyer when you're in a legal bind or the cheapest healthcare when you need medical advice - it's not always the wisest choice. Indeed, a low CPA will often optimise for lots of low value traffic. Optimising for the wrong metric will not deliver the best results.

So what?

Rather than buying cheap sessions, consider where your valuable traffic has come from in the past. Which campaigns and keywords deliver the highest revenue per user? Understanding where the value is coming from will give you direction for your optimisation.

Use the data better

Once you have cohorts of top revenue generators, you can see which campaigns and keywords work and which don't. Using revenue per user in combination with returning users, you'll now optimise for customer lifetime value.

How? Say for example you've identified your top campaigns: X and Y. You can test these against a control campaign. You can see both top campaigns outperform the control, X converts at 4% and y converts at 5%. Both deliver a higher revenue per user and have considerably higher return visitor percentages. Which one to choose?

A typical A/B test will require tens of thousands of visitors to give you sufficient confidence regarding which to choose but with a "Go for it" test, you can make the call sooner and optimise faster. Using the technique published by Conductrics, in a "go for it" test, the assumption is that there is no downside to calling the wrong test winner. You can see two campaigns are both potential winners but which one?

Using the "power" of the test in this scenario we can tell if we have enough data to just call the current leader as the winner.

A test's power is the probability of correctly rejecting the null hypothesis when it is false; a test's power is influenced by the choice of significance level for the test, the size of the effect being measured, and the amount of data available.

We can use one line of R to see if we have enough data yet:

power.prop.test(n = NULL, p1 = 0.04, p2 = 0.05, sig.level = 0.5, power = .95, alternative ="one.sided", strict = FALSE)

The output from this calculation tells us with one treatment converting at 4% and the other at 5%, we need 2,324 observations per treatment (or 4,648 in total). The conventional approach to make the call based on 95% confidence requires a total of 18,598 observations before we can make the call on the test.

Test Power

Takeaway

Optimise paid ad campaigns for lifetime value rather than cost per acquisition. Use "Go For It" tests to accelerate optimisation. Think about users:

  1. Sessions don't convert
  2. Users buy from you
  3. Understand which users
  4. Understand how they differ from low value users

5. Political Jiu Jitsu

Jiu Jitsu is defined as "a Japanese martial art and a method of close combat for defeating an armed and armored opponent in which one uses either a short weapon or none." Let's talk about close combat in the boardroom, and using data to combat against faith based marketing.

When you hear "I think x", you need to say "We know Y". Data is powerful to support and refute opinions. For example, we had a client who had nominated revenue as their "North Star metric":

The North Star Metric is the single metric that best captures the core value that your product delivers to customers. Optimizing your efforts to grow this metric is key to driving sustainable growth across your full customer base.

The issue here is that revenue is ignorant of profit and profit is a more fundamental measure of business success. However, capturing the profit margin for a transaction instead of revenue could result in the margin data appearing in a web page source as part of the tracking script.

Understandably, clients are reluctant to publish margin data on a webpage. As well as a resistance to changing the KPI, the reluctance to publish margin data meant we had to pivot on our approach to engage in close-quarters combat with data.

So what?

Maintaining the current page based transaction measurement, we added a server side measurement protocol call to Google Analytics to record product level dimensions against each item in the transaction to record the cost of manufacture for each product and any tax aspect additionally. This tackled the question of not exposing margin data to the wider public.

Inside GA we were then able to build a margin calculated metric. This metric is then introduced to the regular business dashboards with revealing results.

While revenue looked healthy, a large proportion of transactions appeared with negative profitability. This was a major surprise to the client.

When multiple promotions were run in combination with price adjustments, we could easily see transactions that had a toxic mix of products. High ticket value items sold well but with lower frequency than consumable products. The profitability of high ticket value items was diluted or destroyed by transactions that contained multiple consumables that were on offer.

Use the data better

The immediate action for the client was to address their merchandising tactics. Using the changes in management of pricing and offer coupons in combination with campaign bidding adjustments had dramatic effects on the business profitability.

The changes achieved a 60% increase in ROI inside 6 months.

Takeaway

Transitioning the online strategy from revenue to margin enabled the right action to be taken at the right time for the right product with the most valuable metric in mind. Choosing the right data collection architecture made the choice of metric possible.

Wrapping up

We've discussed 5 case studies, and 5 well known techniques. We can see how easy it is to fall into a habit of not really using the data - to take data as the prize rather than realise the true value of the data through action.

For each client in these case studies we've seen an amazing transition. Having seen how the value of their data asset can be unlocked with some thought and reflection, we have conversations that go from "Oh, you can do that?!" to "We can also think about this, try that...or that...and, oh my!"

The ramp up from unconscious incompetence (not realising they're stuck on staring at a dashboard) to conscious competence (making an effort to think beyond the data visualisation and moving into action) is fast. So fast it quickly becomes an unconscious competence where the automatic default is to consider the value before data collection is finalised. We now know we can capture the data, but before it's solved, let's make sure this is only the data we need and that we can use it.

It's liberating and so powerful.

Sharing is caring!