Web analytics is all about making decisions from the data. But how can you be sure of the quality of the data you investigate, and the recommendations you provide from it? Whilst the numbers may be accurate and reflect what happened on your site thanks to a successful tagging implementation, are they statistically significant? Furthermore, once you've uncovered what's fluke and what's not, how can you illustrate this succinctly in your reporting?
Unfortunately, with a few limited exceptions, vendors don't provide any indication of the robustness of the data in their consoles. Wouldn't it be great if, for a selected time period, you could segment your site and see that although there's a marked difference in conversion (or metric of preference) that it's only significant at the 50% level? Or, alternatively, what appears to be only a small difference in bounce rate is actually statistically significant? Until that day comes though, you need to be able to do it yourself. Avinash has written about on a couple of related topics a while back - applying statistical limits to reporting and an overview of statistical significance. In a more recent post, Anil Batra highlights the importance of not rushing to pick the winning page from an A/B test. And in the last few days, Alec Cochrane has written a great piece on how to improve the statistical significance of a web analytics data-based model.
There are plenty of statistics tests out there, with different datasets and situations that call for them, but for the purpose of this post I'll focus on just two both of which are listed here amongst others.
The two proportion z test compares whether the specified proportions of two samples are statistically different from one another.
This test can be applied to a number of reports within web analytics, but its main use would be for comparing the click-through rate or response rate of two campaigns to determine whether one is conclusively better than the other. The beauty of this test is that it only requires four values - the two proportions (%s) and the two sample sizes, and as such can be calculated without use of a spreadsheet.
The second test is the two-sample t-test which determines whether the means of two samples are statistically different from each other for the two given sample sizes and sample standard deviations.
By requiring the standard deviations of both samples, this result takes more time to compute by requiring the user to download the series data in question. This test has a variety of uses, for example comparing whether the different average values of a given metric for two segments are statistically different, or perhaps looking at the same data series before and after an external event takes place to determine whether it has had a statistically significant effect on the data or not.
Now that you're confident that you know which results are statistical flukes and which ones aren't, how do you go about illustrating this in your reporting? One option would be to include the t test results and significance levels in your reporting, but this is likely to clutter your reports as well as potentially confuse the reader. A neater way might be to colour code the values to illustrate their confidence level if you're happy to introduce different text colours to your report. For time series data you can add the mean, and upper and lower bounds to a graph, to show which peaks and troughs merit further investigation.
Of course, once you've come up with a clear and succinct way of displaying this statistical information, you still need to explain it to your stakeholders, not all of whom will have a background in statistical analysis. Demonstrating the robustness of the data to them and how the varying levels of robustness are determined will not only provide extra confidence in the decisions you're recommending from the data, but illustrate the importance of asking constructive questions of the data, rather than slavishly following what the data suggests at first glance.
Images from Wikipedia.org
Hi Lawrence,
ReplyDeleteThanks for the mention :)
I like the idea of including the confidence levels in the reports as colour coded. It's sometimes difficult explaining to Business people the maths and statistics involved in things like this - making it easy for them without having to think about what is going on in the background can only be a good thing.
Alec
Hi Lawrence,
ReplyDeleteYou mentioned in your post that the Z-Test "can be applied to a number of reports within web analytics" - I was wondering if you could elaborate on that. I definitely agree that it would be appropriate to use this for AB tests or campaigns that have Test/Control groups , where you do get random sampling. But would it also be appropriate to use for example on traffic sources reports, such as comparing conversion rates between yahoo and bing, or conversion rates for keywords? http://www.michaelwhitaker.com/blog/wp-content/uploads/2011/05/bingyahoo.png
Many thanks,
Michael