A common question you might ask when going through your Google Analytics data is “Where do my users come from?” This will help you understanding how users find your website or app and which sources of traffic are working well (or not). Using this information wisely, you can quantify the value of your paid marketing campaigns, optimise Organic traffic, find out how well your newsletter emails are performing and more!
You probably know your main traffic sources but do you know how many possible traffic sources there are? Literally, how many sources do you think your site could have? Apart from the campaigns you’ve set up in AdWords or emails, have you ever considered what other traffic sources are available? There are more traffic sources then you may think.
You might not know this but there are potentially hundreds if not thousands of potential traffic sources! This poses a major challenge to Google Analytics to make sense of where all your users came from and to present accurate data in reports. Google Analytics always tries as hard as possible to give you accurate data about your traffic sources but sometimes it’s not so straightforward.
Enters the Direct source and (none) medium. You’d expect Direct traffic to represent your most loyal users who know how to find your app or site by name. Right? Not always. There’s more to Direct than you might initially think.
This article will cover exactly what Direct means, how to understand it, when it is correct or due to an error and how to solve it.
How does Google Analytics calculate a traffic source?
When a user arrives on your site, how does GA decide whence they came? GA uses a fairly simple algorithm, which is described in the flow chart below. It is long, but keep on going!
There are a few technical terms in this diagram so let’s work through a series of use cases to better understand what they mean.
How GA identifies visits from Organic search?
Go to Google, search for ConversionWorks and click through to my homepage. How would GA know you came from a Google search? Thanks to the magic that enables the internet to run (HTTP), we can see in the Request Headers that the site I came from, the Referrer, was https://www.google.co.uk
Okay, but looking at my GA data in the Acquisitions report (Source/Medium) I can see google / organic, google / cpc and a bunch of referrals from other Google properties. How does GA differentiate between a Google search visit, a click on an AdWords ad and the other referrals?
Being a Google product and knowing that Google knows a thing or two about search, GA knows how to recognise a visit from a search engine. You can see the list of known search engines and how they’re identified using the Referrer value here.
How GA identifies visits from a paid ad click?
Okay, so GA can spot a referring search engine but what about paid traffic? When we talk about paid traffic we might mean AdWords or DoubleClick for example. If paid campaigns using these ad networks are setup using auto-tagging then GA will look for a query parameter on the URL when users land on the site. A user coming to ConversionWorks from a click on an AdWords ad will have a query string parameter named gclid appended on the URL like this:
A DoubleClick ad click will have a query string parameter named
dclid. GA can see these parameters (
dclid) and can decide not only that this visitor came from a paid ad click but also which ad they clicked, which campaign, the cost of the click and so on. Pretty amazing and super valuable data!
If the user came from a Google property with no query string data and it wasn’t a search property then the visit must be a referral.
How GA identifies visits from custom campaigns?
GA looks for some other standard query string parameters too – the utm parameters:
You can append these querystring parameters to links in emails, social shares, non-Google ad networks or links in pdf documents. You can use these very powerful parameters to control the data in your acquisition reports. You can use the official Google URL Builder to help build your links.
Be careful though! Using these parameters on internal links on your website (from one page on your site to another page on your site) will cause a new session to start artificially. If you change the campaign value, you’ll start a new session which you probably do not want to happen. Best use these guys only when linking from an external source that you want to track explicitly.
What Direct means and when can it happen?
If GA can’t determine that a user landed on your site from a recognised campaign, search or social source, there’s no existing campaign data, it’s a new session and no referral data is available: you’ll find yourself at the very bottom of the traffic source algorithm flow chart – it’s a Direct visit.
But is it really Direct?
Just because GA decided a user was Direct doesn’t necessarily mean this is exactly what happened. The analytical mind will always question a data source. Is this data a true reflection of reality? How can I trust this data? How can I prove this is correct?
What grounds do we have to doubt the accuracy of the data? How do we reasonably question the data and go about calibrating it?
Questioning your data
Let’s assume you have some expectations regarding the volume of traffic you’ll get from ‘owned’ campaigns such as Email, AdWords, DoubleClick or Social. In addition to expected volume, you probably also know where users are going to land on the site. Now you need to check your data. Ask:
- Are users landing on the pages you expect?
- Are the landing pages being hit by the right campaigns?
- Does the session volume look right?
Acquisition report gross error checking
This kind of “gross error checking” exercise is seldom performed. This is such an important exercise to conduct if you’re going to trust your data.
Compare your click data from AdWords with GA data. Are you seeing the right number of sessions compared to clicks? The numbers may not match exactly but anywhere within 5% is about right.
Check other source/medium combinations for your landing pages in GA. Seeing anything untoward or untrustworthy? If your campaign traffic volume is south of your expected value then this is where you might see more Direct traffic than you might expect.
Is this a good or a bad thing? How to spot and fix issues?
Users can always type a URL into their address bar or click a bookmark, of course, and this is genuine Direct traffic. However, there is also a chance Google Analytics may not be able to correctly attribute the user’s session to a traffic source, in which case the session is flagged as Direct. Here are a few scenarios:
- Clicking from a secure site that uses https to in insecure site that uses http
- Clicks from apps
- Untagged or incorrectly tagged links (most common)
- Measurement protocol hit
If any of the scenarios listed above happen then this will cause GA to flag the session as Direct which is potentially not right.
HTTPS to HTTP
This is the way the internet works. If you’re on a secure site that uses https, part of the security is that when you click through to an insecure site using http, the insecure site is prevented from seeing where you came from – no referrer data is available to GA on the insecure site.
Secure sites like Google and Facebook are quite clever in that they do expose referrer information when you click through from their secure pages to insecure pages on other sites. We don’t need to go into how they do this in this post but the simplest solution is to run your site on https. This is good for your users. Give them peace of mind knowing their browsing experience is secure and you’ll have no worries about losing referral data. That’s an easy trade. Talk to your engineers and get it done already!
Clicks from apps
If users click on links to your site from within an app, GA can’t see which site they came from because they didn’t come from a site! They came from an app which is an app… not a site in a browser. The app won’t necessarily send referral information which confounds GA and you end up with incorrect Direct traffic.
It’s quite possible the clicks from apps are valuable. If you treat clicks from apps as a monetisable channel then you need to track these clicks properly.
Use utm tagging (also known as manual tagging) to decorate the links in the app with campaign data. If you’ve never done this, take a look at this handy resource provided by Google.
Untagged or incorrectly tagged links
This is a very similar scenario to the last one. Maybe you don’t have links in apps but if you have links in emails or maybe even pdf documents, Word documents or Excel spreadsheets, these are not browsers and might not send referral data for GA to latch on to. You need to use manual tagging again.
What if you are using manual tagging but had a little finger trouble? You did test the links right? They went through to the right page but did you check the GA data for the right source, medium and campaign values?
If you click this link, you’ll end up on our homepage:
Looks okay? Can you spot the issue?
utm_sorce is not a correct utm parameter. Make sure to double check every time you create a campaign link or (better) use an automatic solution. A great way to check if the link works correctly is to use it once and use real time reports in GA to check it works correctly.
Measurement protocol hits
Have you heard of the Internet of things? Internet connected devices that can talk to other things on the internet: fridges, fitness trackers, cows… yes, even farm animals. None of these things are browsers but they can all potentially send data to Google Analytics using the Measurement Protocol. The Measurement Protocol is what makes Universal Analytics truly universal. It’s a technique provided by Google Analytics for non-web browser technology to be measured using GA.
GA data sent via the Measurement Protocol might be flag ged as Direct if it is not decorated with campaign information. You can check the data to see if these hits are from things rather than users quite simply. Knowing that things are things and not browsers means we can use common dimensions in GA to see real browsers. Real browsers will automatically expose the screen resolution, the computer operating system and the flash version being used amongst others. These appear as dimensions in GA reports.
So, for example, a property that was only populated with Measurement Protocol hits might show (not set) for all Measurement Protocol hits on the Operating System dimension. Similarly, you would see no Flash Version, no screen resolution or Screen Colours. These dimensions are all available to see in the Audience -> Technology -> Browsers & OS report.
See how adding a secondary dimension of Source / Medium helps us narrow down the data to check exactly what’s going on? This is a useful technique to learn and use.
This essay has shown how GA decides where a user came from. You’ve seen how this can work and you’ve seen how this can fail. Knowing these details, plan a review of your traffic source data. Do some gross error checking. Do some calibration. Check your data and build confidence in the numbers.
If you find any holes, you’re better armed with explanations and fixes. You may find more value in certain channels and optimisation opportunities in others.
You’re on your way to using data more wisely. Good!