How To Get Past Last-Touch Attribution With Google Analytics
Posted by willcritchlow
In last week’s Whiteboard Friday “Kill the Head or Chase the Tail”, Rand and I started by discussing how to gain true insight into what kind of keywords are leading people to discover your brand and ultimately driving conversions for your business (clue: it’s probably not branded search phrases, despite what your analytics reports are telling you). Today, I’m going to demonstrate one way of measuring this more accurately in Google Analytics.
The problem is well described by the ever-excellent Avinash Kaushik in his post entitled Measuring Upper Funnel Keywords (although nominally about paid search, his description applies perfectly well to natural search except you aren’t paying for traffic in the same way). It can be summarised by thinking about all those reports we have all seen showing branded search terms being the best-converting. While this is true in the sense that the individual finally converted after searching for the brand, it’s clearly not the way they found out about your services. For the purposes of setting strategy, you need to understand in better detail your “visitor acquisition” channels that eventually lead to conversions. Sam’s superb post on SEOmoz’s conversion rate lessons from 2009 touches on this in point 2.
Enter multi-touch analytics tracking.
Most analytics packages use last-touch attribution by default meaning that conversions are allocated to the most recent source of a visit for that visitor. We are interested here in first-touch attribution or even multi-touch attribution models to understand how visitors are influenced over time by repeated visits to the site. If you are interested in analytics packages that can track multiple touches ‘out of the box’, I recommend reading John Santangelo’s YOUmoz post on Google Analytics alternatives.
First-touch tracking in Google Analytics
Patrick at Blogstorm has written about over-riding last click attribution (something I also discussed in my presentation Analytics Every SEO Should Know that Scott linked to from the Whiteboard Friday). But this method only works when you can specify the exact URL of the landing page including parameters as it relies on the utm_nooverride parameter. This works fine for email and PPC traffic, but doesn’t help with tracking organic search traffic.
For this, we need a slightly more involved method.
In my presentation, I touched on the function setVar and a custom function called superSetVar, but in the updates announced in October last year, the GA team released a new function called setCustomVar that is now the best functionality to use. For this purpose we want to track variables at the visitor level.
In your GA tracking code, you want to check for the presence of the __utma cookie which will be present only if the user is a returning visitor. If it is not present, use the JavaScript variable document.referrer to set a visitor-level custom variable (named something like “original referrer”) and use location.pathname to set a second visitor-level custom variable (named something like “original landing page”). Take care not to re-use custom variable slots you are using elsewhere in your analytics.
You will probably then want to add a filter to your analytics profile to convert the raw referrer into referring keywords using a filter like this one for getting detailed PPC keyword information (obviously not filtering only PPC traffic). You might also want to pull out the original source (which you can work out from the referrer and landing page) into a separate variable.
With this all set up, you will be able to run conversion reports by original keyword for a given original source and see conversion information based on first click attribution. I would expect that you would see the long-tail contributing far more than it does in the standard reports and branded search much less (not zero of course – there will still be first-touch branded searches driven by PR, offline marketing etc.).
Multi-touch attribution modelling
If you are feeling especially hardcore, you can dig even deeper into this whole mess by attempting to capture multiple touch-points. The idea here is that you want to give attribution for conversions not only to first- and last-touches but also give so-called assists to touch-points along the way (e.g. a conversion path could look like long-tail keyword > head keyword > branded search > direct visit – under this scenario, you might want to give the head and branded searches some attribution for the conversion).
This becomes especially important if you have different departments contributing to the marketing – you would like to be able to give some credit to the departments that bring the visitor in, some to the channels that keep the visitor returning and to the channel that finally converts them.
I haven’t set this up with the new GA functions, but the basic process would involve something similar to the superSetVar function for the new setCustomVar. The idea here would be to stuff repeat visit information into the custom variables. This information is almost certainly unusable via the interface and you will likely need to export to Excel and play there (most likely with Pivot Tables – you all know how much I love them – it’s a little while since we ran a conference call (that link is to a recording of the one I did on Excel) but I’m planning the next one so go and sign up if you aren’t already on that mailing list).
If you’re hardcore enough to really want this information, you can probably work out the details! If anyone has done it and wants to write up detailed instructions, I’ll happily update this post with a link to your explanation.
View-through conversions
The missing piece of the puzzle if you are doing multi-touch attribution modelling is giving ‘assists’ to branding events such as the viewing of a display advert (without a clickthrough). Rich, our PPC guru at Distilled, wrote an introduction to Google’s viewthrough conversion metric.
There are all kinds of privacy concerns in extending this further – but the data is out there to gather this kind of data across whole platforms (e.g. understanding search funnels that led to your site in the end). The signs are there that we are going to get ever more information like this – particularly out of Google who are obviously always looking for ways to persuade their customers to spend in areas outside (the generally cheaper) branded search!
I love analytics and statistics, so I’d love to hear your favourite tips and tricks in the comments.
I’m sure future conference calls in my schedule will involve analytics tips and tricks so go ahead and sign up if you’d like to hear when they are running. You also might be interested in a post I wrote about integrating Google Website Optimizer with Google Analytics on SearchEngineLand.
Illustrating the Long Tail
Filed under: SEO, Search Engine Marketing, Search Engines
Posted by randfish
The long tail of search demand has been around since the dawn of web search and, since that time, search marketers have been attempting to tap into the powerful stream that high quantities of unique content can provide. I recently came across some great data from Hitwise (about 1 year old, but still highly relevant) showing off just how substantive the long tail can be. Bill Tancer’s post – Sizing Up the Long Tail – gives some stats:
…the head and body together only account for 3.25% of all search traffic! In fact, the top terms don’t account for much traffic:
• Top 100 terms: 5.7% of the all search traffic
• Top 500 terms: 8.9% of the all search traffic
• Top 1,000 terms: 10.6% of the all search traffic
• Top 10,000 terms: 18.5% of the all search trafficThis means if you had a monopoly over the top 1,000 search terms across all search engines (which is impossible), you’d still be missing out on 89.4% of all search traffic. There’s so much traffic in the tail it is hard to even comprehend. To illustrate, if search were represented by a tiny lizard with a one-inch head, the tail of that lizard would stretch for 221 miles.
Top 10,000 Search Terms by Percentage of All Search Traffic
The truth is my research is still greatly understating the true size of the tail because:
• The Hitwise sample contains 10 million U.S. Internet users and a complete data set would uncover much larger portions of the long tail.
• The data set I used filtered out adult searches.
• I only looked at 3-months worth of data (which were some of the slower months for search engines).
To help put this in perspective, I made a few spiffy charts that can help to illustrate these points:
In this first chart, you can see a representation of Hitwise’s data from the four chunks Bill broke down.
In this next representation, I’m showing the classic “long tail” style curve, but color-coded to help show the various areas of keyword demand. Note that you could conceptually say that the 9,000 of the top 10,000 terms should technically fit into the chunky middle. Bill classified them thusly in his post, but I tend to think that at those demand levels, we’re still talking about “head” of the curve figures.
For both of these graphics, there’s a large, high-res version available by clicking the chart. You can find lots, lots more on our Free Charts page
Google Link: Command – Busting the Myths
Filed under: Search Engine Marketing, Search Engines
Posted by randfish
I’m a big Google fan – my wife often sleeps in their t-shirts, I speak on panels with Googlers all the time and I’ve even got a Google water bottle for working out (which happens all of once a month these days). However, I am NOT a fan of the Google link command, and I’m shocked by the number of folks who operate in and around the SEO, webdev and technology industries who haven’t realized this.
Here’s what Google themselves have to say on the matter:
You can perform a Google search using the link: operator to find a sampling of links to any site. For instance, [link:www.google.com] will list web pages that have links pointing to the Google home page. Note there can be no space between the “link:” and the web page URL.
To see a much larger sampling of links to any verified site in Webmaster Tools:
- On the Webmaster Tools Home page, click the site you want.
- Under Your site on the web, click Links to your site.
Note: Not all links to your site may be listed. This is normal.
Here’s what Matt Cutts (head of Google’s Webspam team) had to say in a video on the subject:
The short answer is that historically, we only had room for a very small percentage of backlinks because web search was the main part and we didn’t have a ton of servers for link colon queries and so, we have doubled or increased the amount of backlinks that we show over time for link colon, but it is still a sub-sample. It’s a relatively small percentage. And I think that that’s a pretty good balance, because if you just automatically show a ton of backlinks for any website then spammers or competitors can use that to try to reverse engineer someone’s rankings.
Google themselves is telling us not to pay too much attention to the link command, but that doesn’t seem to be stopping folks. Let the myth busting commence.
Myth #1 – The Google Link Command Returns Accurate Numbers
Nope. Not even close. Google themselves say the numbers aren’t accurate and that they’re showing a small sub-sample. The numbers show this as well. Check your link counts with the Google link command vs. the number inside Google’s Webmaster Tools (when you verify your account, you’ll see them shown). Here’s the stats for SEOmoz, for example:

Google’s link command claims 1,590 links. Let’s see what Webmaster Tools says:

Hmm… 381,403 seems slightly larger than 1,590. In fact, the link command is showing me 0.4% of what Webmaster Tools says exists. Running this analysis on another few domains that we have access to in Webmaster Tools, I saw numbers ranging from 0.1% to 4.4% (meaning there’s not even any consistency between in the percentage of links from the two counts).
Myth #2 – The Google Link Command Returns Important Links
Tragically, a long time ago (pre-2004), Google did show only important links via the link: command, which created the myth that exists to this day. In fact, the links shown in the link: command have no particular importance or relevance. They are truly a random sample, including links that are nofollowed, links from pages that have had PageRank penalties applied to them as well as links that do pass link juice and value.
Myth #3 – The Google Link Command Returns Links in Some Kind of Order
No one in SEO has been able to show any ordering of any kind in the Google link: command’s results. Important, well-known websites may be listed on page 2 or page 20 of the results, and it is likewise with spam, scrapers and low quality sites that Google’s likely not counting. In Site Explorer and the web results, Yahoo! appears to do some type of ordering, tending to show more important links, pages and sites before less important ones (though not with great consistency). Unfortunately, many SEOs suspect that, should Microsoft’s deal to power Yahoo! with Bing results go through, Yahoo! is unlikely to maintain their own web index (and thus, link, linkdomain and site explorer will be gone).

As exemplified above, Google appears to be very random indeed when showing link: results.
Myth #4 – The Google Link Command Returns a Numerically Representative Count of Links
This is possibly the myth that’s most disturbing of all, primarily because so many operators in the SEO field belive it and track the link: command count as a reliable, useful metric. Nothing could be further from the truth – and here’s some data to help back it up:
|
Root Domain |
Google Link: # |
Yahoo! Linkdomain # |
Linkscape Count |
| Yahoo.com | 3,650 | 331,000,000 | 201,681,667 |
| Recovery.gov | 7,550 | 328,000 | 155,780 |
| Facebook.com | 165,000 | 567,000,000 | 116,748,934 |
| Real.com | 11,400 | 4,600,000 | 5,596,165 |
| Adobe.com | 51,200 | 124,000,000 | 78,550,468 |
| Reddit.com | 18,300 | 128,000,000 | 29,071,291 |
| Twitter.com | 224,000 | 515,000,000 | 132,528,763 |
| Salon.com | 12,300 | 3,420,000 | 1,535,342 |
| SEOmoz.org | 1,590 | 957,000 | 486,405 |
| NYTimes.com | 7,990 | 21,200,000 | 12,884,758 |
| TurkeyDayRun.com | 3 | 68 | 22 |
| Ninme.com | 539 | 42,000 | 3,149 |
| Burgerking.com | 942 | 106,000 | 23,761 |
| Alaskaair.com | 1,010 | 44,000 | 38,358 |
| Smashingmagazine.com | 8,730 | 1,130,000 | 592,054 |
| Smithsonian.org | 4,860 | 25,700 | 14,545 |
I collected the data above spur of the moment, so I won’t try to claim great statistical integrity. However, looking at Google’s link: command results, the best I can say is that Google has some relationship to the others within 1-2 orders of magnitude, though they may be directionally inaccurate much of the time as well. Just look at the NYTimes.com for example – Google claims they have 2/3rds the links that Salon.com has, yet Yahoo! and Linkscape agree that, in fact, NYTimes.com has 6X+ Salon.com’s link total.
These are not numbers you want to hang your hat (or any crucial business decisions) on.
Myth #5 – The Google Link Command Tracks Accurately Over Time
Unfortunately, I don’t have data points I can show, but our observations over time indicate that Google’s link count in Webmaster Tools might rise, along with the Yahoo! and Linkscape link counts, yet the Google link: command will show lower numbers. The reverse is sometimes also the case. Without directional consistency, even when compared against their own counts, it’s very hard to take the Google link: count seriously.
Myth #6 – The Google Link Command is Up to Date
Most SEOs & webmasters have noticed that the Google link: counts update infrequently, inconsistently and most often in correlation with toolbar PageRank updates (another data point I’ll need to takcle in a future post). These updates from Google occur every 2-10 months with little warning about when they’re coming or have happened. If you watch sites like closely, they’ll report many of these as they occur.
The next time someone tells you their Google link: command numbers as a metric for SEO, competitive analysis or anything else, make sure they read this post. Google’s not nearly as up-front with the information as they should be (honestly, removing the link command would save so much time and effort for poor site owners who get needlessly confused), but hopefully as a community, we can help build more awareness around this issue.
Hulu is Seeing Record Numbers
comScore Video Metrix has released its monthly look at the performance of online video content properties. As usual, Google sites dominate the picture, largely because of YouTube, which gets 99% of Google’s video views.
The real story, however, is that Hulu is achieving record numbers. The site ranked number 2 (though significantly behind Google with 3.1% market share compared to Google’s 37.7) during the month of October, with an all time high of 856 million videos viewed.
On top of that, the average Hulu viewer watched 20.1 videos during the month, representing another record for the site. This amounts to about 2 hours of videos per viewer.
Here’s a look at the top ten online video content properties for the month of October:

Some other highlights (not Hulu-specific) from comScore’s findings:
- The top video ad networks in terms of their actual delivered reach were: BrightRoll Video Network with 16.5 percent penetration of online video viewers, Tremor Media Video Network with 15.5 percent, and BBE with 13.6 percent.
- 84.4 percent of the total U.S. Internet audience viewed online video.
- The average online video viewer watched 10.8 hours of video.
- 125.3 million viewers watched nearly 10.4 billion videos on YouTube.com (83.1 videos per viewer).
- 41.1 million viewers watched 313.5 million videos on MySpace.com (7.6 videos per viewer).
- The duration of the average online video was 3.9 minutes.
It is worth noting that Nielsen released some data last week, which put Facebook in third place among video sites, just behind YouTube and Hulu. According to them, Facebook had about 217.8 million streams in October. Make of that what you will.
Related Articles:
> Facebook Catapults Into Third Place Among Video Sites
> Hulu Gets Feet Wet in the Music Video Pool
> YouTube And Hulu See Record High Video Views
Advanced Link Analysis Charts
Posted by willcritchlow
Bored of sorting massive lists of links in all kinds of different directions to understand the link profile of a new site?
Struggle to understand how to gather actual insights about link profiles from lists of thousands of links and persuade management of the actions needed?
Don’t panic. Help is at hand.
I’m going to share some data visualisation tips today that I reckon I could use to beat up on Rand in a presentation-off (umm, again). We have recently been doing some deep dives into clients’ and prospects’ link profiles which gave me an excuse to mash up some Linkscape API data in Excel. I’ve used Linkscape data, but you could use any link analysis tool you like as long as you can get some metric to sort the linking domain by (I have used domain mozTrust in most of the examples below). Equally, I’ve used Excel, but you can use any data analysis package you like. If you want to use Excel, you will need the Data Analysis Toolpak (for the histogram function).
I’ll get into how to make the charts in a minute, but first I’m going to just show you some pretty pictures:
Impress the boss
This one is of questionable use (I think there are better ways of actually visualising the data) but it’s pretty, and bosses like pretty (allegedly). This is a surface chart of number of linking domains by domain mozTrust shown across 4 data points – all links, links to the homepage and links to the next two strongest pages:

The bit of insight this does give us at a glance is that the vast majority of the site’s very low DmT links go to the homepage and that the most trusted domains linking to the site (DmT >=
don’t link to the homepage or the next two strongest pages.
The same chart just showing links to the homepage compared to all links which shows the top end a litle more clearly:

Gathering insights
I think this data is actually easier to see as a line chart like this (locations A and B are the top two strongest pages on the site after the homepage):

What we just about see here is some bumps up at the top end of the DmT scale in the light blue line which is the same bit of insight I mentioned above.
Drilling down
Diving into this data to show only the top end of the DmT scale, we get:

And we see that although the homepage and these top two location pages are the most powerful pages on the site, they are not the ones with the links from the biggest / most trusted sites. This is an area for further examination that would be hard to discover by looking at endless lists of links.
This is just an example of the kind of insight you can gather. I’m showing off tools and techniques here rather than specific insights. I’ll leave you to do your own playing to discover interesting things about your clients and competitors. I didn’t know what I was going to find when I started diving into the data for this site. You likely won’t know either, but graphs are great discovery tools. Sometimes, of course you find nothing of interest:

Comparing just the top two pages doesn’t give us any very meaningful insights except that the big links out at 6.5-7 DmT to location A probably explain why it’s more powerful than B. It might be more insightful at a lower granularity.
Equally, I haven’t yet learnt to understand the meaning that I am sure is buried in charts like this one:

This is the number of links to a whole site by the mR of the linking page. Like the mythical guys who can understand network traffic by watching LEDs blink on routers, I’d love to be able to look at this kind of chart and really understand things. The closest I’ve got so far is that I think these charts should look roughly smooth in the absence of manipulation. If we assume that the difficulty of acquiring a link is roughly correlated to its strength and that we get links at a rate inversely proportional to their difficulty, then I think this chart should look roughly like a Poisson distribution:

Which this one does, so I’m happy.
Persuading management / bosses
The next thing that some of these charts helps with is making the case to management when you know something is true, but they need more persuading. This next example takes two different sites (neither of them is the site above) that are in different industries but have remarkably similar link characteristics at the macro level (don’t ask me how I found these sites – I am just that sad). The spider chart shows how similar they are:

However, if we dig in a little further, we find quite a difference behind the scenes:

The red site seems to have loads more decent links (mR 4, 5, 6) than the blue site. So how does the blue site end up with similar domain metrics?
It’s all about the relatively small number of very powerful links the blue site has. Zooming in on mR 6 & 7 links:

If you were just to look at this chart, you might imagine that the red site was getting more juice passed via these links than the blue site is. However, you’d be being fooled by the logarithmic scale. In terms of total juice passed by just these mR 6 and 7 links, the actual story is:

In other words, the blue site is competing almost purely on the basis of the big mR 7 links it has that the red site doesn’t. That’s kinda interesting in terms of strategy generation isn’t it?
How do you do this analysis?
Pretty much everything in this post was generated using the histogram function in Excel running over Linkscape API data. It’s pretty straightforward with the online help. The only gotchas I noticed that you might need to know about were:
- Align the ‘bins’ (which are the x-axis values on most of the charts above) either with mR / mT intervals (e.g. 1, 2, 3, 4, …) or go much more granular (e.g. 0.1, 0.2, 0.3, ….). Anything in between tends to generate artifacts
- The bin range has to be on the same sheet as the data – if you try to pull in a bin range from another sheet, it fails silently
- If you want to do the surface chart, you need to do some interpolation between your points. In the examples above, I just did a linear interpolation (i.e. drawing a straight line between the different page levels) – so if the homepage has 100 mR 2 links and the next page has 50 mR 2 links, I just created 10 imaginary pages with 55, 60, 65, 70… mR 2 links to spread the surface out far enough to see it. This may not be the best way of doing things. I’d love to hear from anyone who has a better method
Thanks to foliovision for the photo from the ProSEO seminar.
Technorati Tags
linkbuilding, analysis, visualisation, visualization, data
















