We've all been there. Everything on surface looks like it's running smoothly. Data is coming in. The 30,000-foot view of your account looks like business as usual. You start upping your analytics game. Maybe you took some training and you're getting your hands dirty asking the tough questions of your data. But how do you know if you can trust your data in the first place?
Before I dive into the various reasons your data can be messed up, let me define a specific element of "messed up". As an example, within each of the various reports that Google provides to you there is what's called the "Explorer" view. This view is the most commonly used. At the bottom of this report you have the data table which is broken down into columns and rows. Each one of these rows is unique. Therein lies the problem. Conceptually, what you understand as a single campaign, page of your site, source, medium, or whatever may be broken up into multiple different rows. Generally the most active row will rise to the top. That active row may only be a portion of the true data.
Read on to discover ten ways your data may be secretly compromised.
View filters do a wonderful job of segmenting your data into nice little buckets for analysis. Newer or rookie users don't leverage these as much as they should. While on the surface people think "segmentation" when it comes to employing filters, the vast majority of filters I apply to my views are on clean-up detail. Search and replace filters do a great job of making those page request URI's humanly readable. Uppercase and lowercase filters excel at ensuring report rows don't split since those are case sensitive. Use filters to clean up your data, not just segment it. View filters are the catch-all solution to cleaning your data.
Learn more about Google Analytics View Filters.
On the flip side, the most messed up accounts I've encountered do employ filters, but poorly. Always keep in mind that filters are destructive. They irrevocably alter your data as it is coming into the GA reporting database. The solution to this is having a deployment process to your filters and always having a single view that is unfiltered and untouched by any destructive options in configuration. Create a quick and dirty testing view, throw your filters in there first, and vet the data. The lack of a deployment process is very harmful to your historical data.
Always have a "raw data" view on your account.
If you go to your referrals report and see your own domain as a referral, you've got a self-referring issue. Why is this a problem? Because you have absolutely no idea where those people actually came from. All that traffic is now unattributable to your marketing efforts. Gets even worse when you've spent hard cash on a marketing effort and you have no idea what percentage of that traffic makes up the self referrers.
There's a ton of information regarding how to fix this out there, but I can tell you the most common reason this happens is one or more pages of your website lack the Google Analytics Tracking Code (GATC). If someone over the course of their visit hits a page on your site that has no tag, the previous page will count as an exit, and the next page will count as an entrance with your domain as the referral. The visit from that point on will start over again. The most common page lacking tags from my experience is the 404 error page.
Learn more about common causes for self-referrals in Google Analytics.
Query parameters are simply values added to the URL in the format of ?key=value&key2=value2. Query parameters can be stripped out view-by-view in the view settings page of the admin menu. These values can mean a wide array of things and it is very common for content management systems (CMS) to use these as well as external tracking applications. Query parameters don't always dictate what the content of the page is. Sometimes it's just session or user data the CMS needs to shuffle around page to page. Important thing to note: they hold no value to you as an analyst.
Keeping in mind reporting rows, each reporting row in your content reports is unique. Every query parameter and combination of them will be its own row which is going to split the data for the page into multiple rows. Odds are you're only looking at the row for a particular page that is most utilized and surfaced by column sorting. The metrics in that row may not be accurate since there could be any number of trailing rows for that page being split by query parameters. To see if you have this problem, simply use your inline filter in your content report to search for "?". That will surface all the entries with a query parameter.
To exclude URL query parameters, click on "Admin" on the top navigation and choose "View Settings" for the View you want to change.
Another content clean up trick which is incredibly easy is setting your default page in your view settings. What this does is aggregate the root domain "/" with whatever page you dictate as your default page. For example if you tell it "index.php" is your default page, the "/index.php" in your content reports will now be lumped in with "/" into one clean row. Any additions, such as query parameters or subfolders, will still break out into their own row. That's a good thing!
To add a default page, click on "Admin" on the top navigation and choose "View Settings" for the View you want to change.
Aside from bad data, you might even be looking in the wrong place. As a developer I'm a stickler for good naming conventions. They should be consistent, easy to understand, and human readable. If you open your account and can't explain in 10 seconds what each view, web property, and account is then you have a real problem. Account setups can range from the most basic to the uber complex. Take a real big step back and figure out how everything ties together and if it's the best setup for you before you fully trust the data.
Here are some naming conventions for GTM, but they can be equally applied to GA.
The chief benefit to annotations is compensating for changes to your account when viewing historical data. You probably remember the last 3 months of any sort of changes to your GA account. You could explain away spikes in traffic or modifications here and there. But what about a year from now? You can't remember all the filters or development changes that affected the data which is why annotations are important. When you're building out those historical reports you'll have a nice neat log of what the data means so you can get true insights.
Check this feature walk-through.
This is a big one that view filters can fix. Campaign tracking in GA is a great feature; however, it does require a level of project management to ensure the data coming in stays accurate. Campaign tagging your links is case sensitive, meaning if you launch three campaigns and you list the medium as EMAIL, Email, and email, those will show up as three completely different mediums. It's highly recommended that you have some sort of communal document that your marketing team can access and collaborate on when crafting campaign tags but as an added measure you can always add a lowercase campaign/medium/etc view filter and solve the problem 100%.
Check out the URL Builder, an useful tool to build campaign tags.
The best practice of placing the GATC on your website, much like the technology itself, changes. I would love to just tell you right here and now the right place to put it but for the sake of making this article timeless, I'll tell you how to always get it right. When you create a web property, or within your GA admin panel where the tracking code is located, is a brief description of exactly where it needs to go. When it comes to troubleshooting bad data this is one of the very first things I look at and it explains away too many problems to list: pageview inaccuracies, self referrals, wrong time on site, page timing off, erroneous bounces... the list goes on and on.
Learn more about the recommended code setup.
Check the Universal Analytics Upgrade Center to make sure you currently have the most updated version.