So, you’re minding your own business, sifting through a client’s analytics data for some epic, revelatory insight that’ll blow their minds. All of a sudden, you get this feeling that something weird is happening. You could’ve sworn your client had 3,132 visits from Canadians who found you with organic search, but you have a few advanced segments on, and now the traffic with the exact same filter conditions is actually 2,984 visits.
You flip back and forth between the two reports with a pit forming in your stomach. Is there something wrong with my profile filters? Has something gone horribly wrong with the tracking code I gave the client to put on their website? Is Google Analytics… unreliable? How am I going to explain this to my client?
Nope, none of the above – your data is being sampled.
Note: If you see any concepts or terms in this article you’re not terribly familiar with, try consulting my web analytics primer
What Is Data Sampling?
Sampling is a process by which a part of a larger body of data (a sample, if you will) is used to extrapolate the rest of the data. While sampling can’t produce 100% ‘down-to-the-last-digit’ numbers, it is accurate enough to spot trends in the data – the ups and downs in a sampled dataset will usually correspond and be proportional to the ups and downs of the full dataset. Sampling, because it processes a smaller amount of data representative of the whole, lessens the load on the computers doing the processing.
When Does Sampling Happen?
Sampling occurs when you attempt to filter or manipulate a particularly huge amount of data from a large number of visits, or a large number of visits and using a complex set of filters in custom reports or advanced segments.
Why Does Google Analytics Sample Data?
Google does this because even their servers can’t always be expected to hold endlessly-large datasets in a finite amount of working memory. Imagine trying to solve a complex mathematical equation in your head. Certain numbers divide pretty easily into one another, and you can handle those pretty well. But complex numbers with decimals might make things a bit too complicated to keep in your short-term memory, and now you have to resort to a calculator or a piece of paper. Same thing goes with computers… except they don’t have anything else to turn to when their ‘short-term memory’ gets overwhelmed.
Note: Your data isn’t affected or changed in any way by sampling – the only difference is in which set of data Google has brought to the surface in its report.
Why Are My Numbers Different When The Filters Haven’t Changed?
This is why you might end up looking at a number in one of Google Analytics’ standard reports, and then when using custom reports or advanced segments, the number is different when it should be exactly the same.
Here, we have a volume of data which has been filtered for a particular country – my lovely homeland of Canada.
In this next image, you’ll see that when I applied an additional filter, sampling was used to try and lessen the burden on Google’s servers. Because of the sampling, the number of visitors from Canada is different from that seen in the previous image.
At What Point Does Google Analytics Start Sampling?
Sampling kicks in at different points depending on which version of Google Analytics you happen to be using:
- Free: More than one million unique dimension combinations in standard reports, or more than 500,000 for special queries such as custom reports, advanced segments, or inline filters, or any other such cases where the data has not been pre-stored and pre-aggregated.
- Premium: More than 50 million unique dimension combinations in any type of query. Learn more about Google Analytics Premium at analyticspremium.com
Note that sampling is done at the web property level and not at the profile level. So even if you’re manipulating data at the level of the individual web property, the settings on the profile (such as profile-based filters) won’t apply – the original data captured before profile-level filtering is what the decision of whether or not to sample is based on.
Can I Prevent Google Analytics From Sampling?
The level of sampling in your data can be adjusted, but never eliminated. You can tweak the balance between higher accuracy and lower processing time. If you use a larger sample to increase the accuracy of your data, you’ll have to wait longer for your report to load. If you use a smaller sample because you’re in a hurry, your data might not be entirely accurate – just accurate enough to spot trends in the data.
Step 1: Select the square icon, made up of a bunch of very small squares, just below the date.
Step 2: Adjust the slider to taste – further to the left means faster load but less accuracy, and further to the right means slower load but more accuracy.
Step 3: Wait for your report to load back up.
Another way to avoid sampling would simply be to select a date range which would include less than 500,000 visits. This could be done by finding a way to perform automatic daily exports, whose data could then be used in data analysis in a specialized tool outside of Google Analytics.
How Much Sampling Is Too Much Sampling?
It’s necessary to stress that sometimes a difference in the exact number between reports may not be a problem. After all there are differences, then there are statistically significant differences. The statistically significant differences are the ones you need to worry about, as they can’t be explained away by sampling – if the difference in numbers is large enough to be statistically significant, then you need to start worrying about the way your analytics tool is set up.
Analytics legend Avinash Kaushik has an article on statistical significance and a helpful spreadsheet to help you calculate whether or not a particular difference really matters.
As usual, please leave a comment below if you have any questions or any new analytics mysteries for me to dig into.