## A Case Study In The Application Of Statistics & Probability In Web Analytics

Explaining what happened in the past is a common task for most web analytics professionals. While some will argue that web analytics is like driving by watching the rear view mirror, understanding where problems lie is a key step to driving improvements.

The presentation, the Case of the Missing Ring, is a case study that focuses on explaining the slow and gradual decline in traffic to the contemporary jewellery website DefiniteStyle, and in doing so outlines a number of critical concepts that web analytics professionals must integrate into their day-to-day practices in order to be effective. These include:

- A basic understanding of probability and statistical concepts.
- Gathering data from multiple sources.
- Applying an iterative approach to analysis.
- Trying to prove yourself wrong in order to get closer to the truth.

## Understanding Probability and Statistical Concepts

While I can’t claim to have a large enough sample size to make the claim that statistical literacy is lacking in the web analytics profession, my anecdotal experience is that few professionals working in web or digital analytics have sufficient statistical skills to do justice to our profession.

Challenge #1: how do you answer when asked the question “what is the average bounce rate?” or “is a 2% conversion rate good?” In my experience these are very common questions that I hear from our clients and prospects.

Challenge #2: If you have ever used the term “statistically significant”, can you simply explain the concepts of confidence level and confidence interval?

Think carefully before reading on.

The web/digital analytics industry is still young, very young in the scheme of things. WebTrends was founded in 1995 and a few tools predate this by a couple of years, but when compared to the discipline of statistics with hundreds of years of development, there is so much that web analytics professionals can learn.

*Statistics helps us to understand the past and to filter out that which is not important from that which may be important. From this we can make future forecasts and estimate the probability that a change we make will result in a positive or negative impact.*

In the first challenge above, both questions are significantly flawed. Firstly, to address any question that involves an average, we must understand how that average is constituted, including its variance and composition, best described through the concepts of the standard deviation and probability distribution.

Secondly, both questions imply further questions about the population we are studying. The average bounce rate for visitors looking for contact details would be far higher than for those looking for product information. There are many more subtleties that influence how an analyst should respond to these questions. The sad reality is that more often than not the answer given is something like “35% bounce rate is good” or “2% conversion is acceptable” without any significant thought about the actual question that needs to be answered and therefore the accuracy of the answer.

## Gathering Data From Multiple Sources

A key problem with web analytics tools is that they only provide information from one perspective, that of the events recorded when a visitor interacts with a website. The case study shows that in order to understand the fluctuations in the volume of traffic to this particular site, we need to understand the broader trends beyond the site. In this instance, Google Insights was used to identify patterns in the potential traffic that could conceivably come to the site (represented by overall search volume) and compare these to the actual traffic patterns that the site received.

If we are to only look at the problem from the perspective of the web analytics tool (e.g. Google Analytics) then we only see a problem (drop in traffic) and guess at its cause. By combining multiple data sources and applying statistical methods such as simple regression analysis it is possible to discount factors that are unlikely to have caused the problem. It is important to note however that when using these techniques we are describing the past. While it is possible to infer causality if there is a strong correlation, there is absolutely no guarantee that any true causality exists. To find the real cause you must run a controlled experiment.

## Applying An Iterative Approach To Analysis

The first answer is not always (perhaps rarely) the best answer. Analysis is hard work and when we stumble across a plausible solution or answer it is so very tempting to stop right there and claim to have solved the puzzle. Good analysis means taking different perspectives and finding multiple plausible answers and then using tests to find the most robust ones.

## Trying To Prove Yourself Wrong In Order To Get Closer To The Truth

This is really an extension of the last point. Good analysis like good science stands the test of repeatability. The key method to get as close as possible to the true cause or explanation is to seek to falsify your findings. This means working hard to locate evidence that contradicts the results of your findings. When a finding stands the test of many efforts to prove it false it is far more likely to be both repeatable and potentially useful.

Web analytics is a great discipline but it is very important to realize that we stand on the shoulders of giants from whom we can learn much.

## About Rod Jacka

Rod Jacka is a veteran of web analytics and was one of the first people in the world to be certified by Google to join their *Google Analytics Certified Company* program in 2006. With over 16 years experience he is widely regarded as a leading expert in how to apply insights from tools like Google Analytics to real world problems.

Rod founded Panalysis in 2001 after a long period of frustration in hearing people saying things such as â€śmy website gets a million hits a monthâ€ť. Knowing that there was much more to be gained than simply counting hits he set up Panalysis to provide insight and commercially actionable data from web analytics tools. Rod also writes for the Panalysis blog. Follow him on Google+.