Google Analytics Site Speed considered misleading

… maybe one of their mathematicians can explain the difference between mean and median to the marketing folks?

Google Analytics has a very handy site speed feature tracking the time your users' browsers took to load the page. Unfortunately, all of the timing reports make a novice statistical mistake by reporting the average rather than more robust metrics like 90th percentile. Many people have heard that averages are prone to outliers but it's easy to forget the degree to which a reported average can misrepresent something as variable as Internet traffic. Here are two pictures showing why it's not even worth looking at the Site Speed value:

World map of load times for a single AJAX request: note the United States at 5.9 seconds!
Drilling down revealed the US average around .3 seconds for every state except New Jersey and even there the high average was limited to one small town with the shockingly-high average of 47 seconds! Fortunately, the data we actually need is available: the performance tab displays the distribution of timings, allowing us to see that even when considering only traffic from the same town, the vast majority (97%) of requests were loaded in a tenth of a second or less and 99% were loaded in under one second.

Since these values occurred only a single-digit number of times globally and are extremely high – does anyone really wait an hour for a web-page to load? – it's almost certain that they reflect some sort of measurement error in the browser. This is to be expected on the Internet — Flickr famously observed a reported load time which pre-dated that page being added to the site — and it's why you need to use something like a 95th percentile or histogram for any kind of real-world performance reporting so you can measure and act on values which are representative of what most of your users experience rather than wasting time chasing chimeras.

In summary: 3 data points out of 213,000 are enough to skew the average by a factor of 10 or more. When using Google Analytics pretend the summary page doesn't exist and look at the performance distribution.

comments powered by Disqus