Digital Analytics Review: 2010

Wednesday, December 15, 2010

Privacy, reputation and ethics

The public's grievances with tracking are not going away, fuelled by articles in the WSJ, extensions which block tracking and murmurings of a tracking ban. In an attempt to engage with and inform the public, the WAA have recently updated their new code of ethics. In it they propose a list of statements that websites should agree to that centre around privacy, transparency, consumer control and education.

Although some believe that a ban is inevitable, if hard to enforce, we can begin to fight back by considering how a website owner's decision to monitor traffic responsibly can affect their reputation. The decision to adhere to the code or not will likely be affected by how concerned a site's visitors are with privacy and data security, as well as the policy/code's perceived cost (implementing and enforcing the relevant processes, displaying it on the site, etc):

Although websites and their visitors vary, it's likely that in order to avoid the potential negative effects on reputation for a relatively small implementation cost, most would choose to publicly sign up to the code. With individual complaints now able to build momentum into public campaigns, websites need to take reputation management very seriously. Would publicly signing up to the WAA code pacify privacy campaigners? Not entirely - the code requires the public's trust that it is being faithfully enforced, and trust is one of the current stumbling blocks. This is why a clear, intuitive argument for tracking, backed up by the site's privacy policy and support for the code is required to provide a compelling case for why tracking is in both parties' interests, and that upholding the principles of the code are too.

And yet not everyone is aware of this debate, or have yet to take the decision. This is where the WAA needs to keep on evangelising, talking to the likes of the WSJ and putting across our side of the argument. We can do our bit, by signing up to the code and improving our own sites' privacy policies. With both sides of the argument becoming more vocal, the number of those existing in blissful ignorance should soon diminish.

Tuesday, November 9, 2010

Using browser data as a net sophistication proxy

Today marks Firefox's 6th birthday, and what better way to celebrate than with a blogpost on browser data.

Within web analytics packages there are plenty of metrics and dimensions that describe user behaviour on your site. However, without resorting to external sources it's hard to build a picture of individuals and their characteristics as opposed to the behaviour they exhibit on your site. However, there are data within web analytics packages that can hint at these characteristics. One example of this is the browser breakdown report: a user's choice of browser works as a proxy for their level of internet sophistication.

Historically, we could say with a fair degree of confidence that those who didn't use Internet Explorer were more advanced users of the internet than those who did. In more recent years, although this still rings true it's not as black and white as it used to be, as IE's market share diminishes in light of the general public's increasing awareness of the alternatives. There are of course exceptions to this - those who use multiple browsers, or those using the internet in a work environment where their browser choice is restricted, although this can be overcome. However, in general those who use non-IE browsers are by definition exhibiting preferences that indicate their more sophisticated use of the internet.

This definition of sophistication can be improved by looking at browser versions rather than just browsers themselves. Doing this could give you an indication of how early adopters (those using dev or beta versions of browsers) interact with your site as opposed to luddites (those still on IE6), and gives you more flexibility into how you define people. The downside to this though is you need to keep up to date with your definitions as browser updates now come thick and fast. And, of course, you don't have to stop there with your definitions - adding other dimensions (for example keywords used or keyword count) can further refine them.

Creating segments based on these definitions can open up a lot of insights into your site behaviour and traffic sources. However, you need to bear in mind that it is a proxy, and first and foremost it describes the difference in behaviour of visitors using different browsers - so if you see some weird and wacky behaviour as a result of this, your first port of call should be to check how your site functions for this browser rather than put it down to users being less/more sophisticated than average. That said, with some common sense and imagination you can uncover plenty of interesting stuff using different interpretations of "standard" web analytics dimensions.

Tuesday, October 19, 2010

Improving Engagement

Engagement's back on the menu. Eric Peterson gave a webinar recently with a excellent overview of engagement and discussed some new white papers written to measure it. However, due to the ambiguous nature of engagement, these measurement techniques (despite being the best attempt so far) are quite complicated and need to be tailored to the individual website.

Historically we measured engagement using page views per visit, time on site etc, but there was no way to define positive or negative sentiment, or distinguish an engaged visitor from a lost one. Currently companies are trying to bring in other datasets at their disposal to help, such as social and Voice of the Customer, but it's still fuzzy and subjective. We need to spend more time thinking about what engagement is, rather than just thinking of it as "someone who digs my site". To what extent are current methodologies for engagement measurement capturing actual engagement? Are we using the metrics at our disposal to define the concept of engagement as well as measure it?

Ideally we'd define engagement by what the visitors to our sites are thinking whilst carrying out their activities on the site as much as by what they did. But this isn't an ideal world. What we need to avoid doing is defining it by just selecting the metrics we can hold up against them. Broadly, an engaged visitor is likely to view more pages and exhibit an increased propensity to interact with your site whether internally (e.g leaving comments on posts) or externally (e.g. linking to your site). Obviously this again is dependent on the site and site type so can't be defined too tightly, creating the engagement paradox - we have a limited number of valid metrics at our disposal to capture behaviour that is too varied to define accurately. But it gets worse - we also need to bear in mind that visitors are unique and as such will interact differently on a site. Another thing we might need to take into consideration is at what stage they're at on the customer lifecycle:

Whilst we can argue the toss at which stage a visitor would become engaged, we can certainly agree that the latter stages would define engaged visitors. However, visitors in these different latter categories would likely display different types of behaviour, even though they were "engaged". A visitor who's yet to make a purchase but is close to making a decision would behave differently to a loyal multi-buyer. To me this highlights the fact that with the current tools at our disposal it's going to be hard work to build an engagement model anytime soon.

Might this change? Looking to the future, the increased importance of mobile to analytics and its implications for future web behaviour (geolocation) will bring more parameters and data to be used in the calculation of engagement. Whether this will make the calculation of engagement easier or not is debatable. Perhaps this is somewhere that the paid tools can bring some innovation to the market by looking to build an engagement feature into the interface? We're forever hearing the concerns around the amount of data available, but the lack of information coming out of it - this would be a great opportunity to right that wrong.

Tuesday, September 28, 2010

The future of web analytics

As the web analytics industry matures its future is still uncertain. In this post I'll have a look at some of the questions that we'll need to answer soon.

There's been a lot of takeover activity in the industry of late, and two of the main analytics commentators, John Lovett and Eric Peterson, have written about the mergers and what they mean of late. Now only WebTrends is left as an independent tool, zigging whilst the others zag, and IBM is splitting at the seams with its three recent acquisitions, to make 23 in the last four years. Whilst they may not have released plans to close any of these products down, one has to wonder if this will be beneficial for the industry, with the potential for stiffling innovation, which is what the industry craves at the moment.

Then there's the bifurcation in tools debate. Some maintain that to do truly sophisticated analytics you need a more powerful (and expensive) tool, with the likes of Google Analytics being left to the marketers. Whilst Google doesn't offer an visitor-level intelligence tool as some of the paid solutions do, no-one can deny the progress the tool has made in recent years. But will it ever truly catch up and end the bifurcation of tools (and is it in their commercial interests to do so)? And what about Google's future itself - how reliant on its parent is Google Analytics? With Facebook and others starting to take on the big G, and its recent attempts to enter the social arena backfiring, the company's future isn't guaranteed, and its analytics package isn't at the top of its list of priorities. However, the tool has no clear competitors in the free arena, with Yahoo! Analytics maintaining it's non-mainstream enterprise-only position for the foreseeable future. What if a new (suitably big) entrant decided to get in on the free game? Perhaps Microsoft might reconsider their exit from this field? If they or another did, it could force Google to further up its game.

Finally, there's the soft side of analytics - the skills required to do the job. Currently a knowledge of statistics isn't that important; being business savvy or having coding knowledge more helpful. But what if other factors change? Will the rise of mobile require a more technical person to understand the intracies of it? Will the rise of intuitivly-designed and easy-to-implement analytical packages mean that company knowledge and the intepretation of these numbers becomes more important to bring context and relevancy? As sites evolve and improve through competition and analytical insight, will visitor-level tools become obligatory for commercial sites? If they do, the skills set of the web analyst will have to expand too.

Wednesday, September 15, 2010

The future of the web and the implications for its measurement

Guessing the future of the web is a game that everyone likes to play, but because it's still early days and the web is still volatile, the forecaster normally ends up looking silly. But I'm going to carry on and make some predictions anyway and have a think about what this means for those in the web analytics community.

Although it typically accounts for less that 10% of visits to websites, it's obvious that with the rise of smartphones and tablets the mobile share of web browsing will soon dominate. Handset manufacturers will focus on the devices' power and bringing in new functionality, with more consumer focus being put on the operating system and software. What new apps will be developed? Currently geolocation is the flavour of the month, with Facebook joining in the fun. In my experience the geography data fed back through web analytics tools is not that accurate. With the future of the internet becoming more reliant on geography, this might be something we need to improve. Whilst there are again privacy implications for the improving this accuracy, imagine the potential for finding out where your customers are when browsing your site. Or being able to integrate check-in data with your web analytics data?

With issues around privacy, complaints about its applications and other negative publicity Facebook's seems to be peaking. We regularly hear about the risks of putting all your marketing eggs in the Facebook basket, but does the web analytics industry not risk doing the same thing? However, companies are now not only building relationships with customers within these arenas on fan pages, but monitoring what's said about them within these arenas outside of their fan pages. Sentiment analysis is one area with real potential but is reliant on still-developing artificial intelligence. To me, this is closely linked with the struggle to get to Web 3.0 - the semantic web, where we try to bring more meaning to the content on the internet, and build relationships between data and datasets. Thinking about how we struggle to manage our data now, and how the providers struggle to present it makes it clear how much progress will be required to accurately manage, link and present this new era of data. I think that one of the largest challenges facing our industry is how this is managed, owned and presented in the future, perhaps second only to how we address our current privacy issues.

The majority of the world is still coming to terms with the implications of the "always available" internet, and its potential for increased communication whether it be for good or ill. As the authorities attempt to track illicit online behaviour, there's a growing confusion between monitoring civilians' behaviour and data, and web analytics. Whilst I believe we need to step up to this and nip it in the bud, I would hope that eventually the public takes a more relaxed attitude to tracking, in the way that they do to store loyalty cards, for example. We also need to consider the implications of a generation growing up with the internet as its main resource for entertainment and education. It doesn't seem beyond the realms of possiblility for future companies to be set up to help 18 year olds change their identity and escape their permanently documented youthful transgressions, as hypothesised by Eric Schmidt recently. Might there be an opportunity for building tools to help individuals track their online presence? Whilst the ease with which students can now research information will help them discover more, on the downside it's now easier to plagiarise other's work for assignments and communicate in exams. The recently introduced Tynt Tracer may be further developed to help track illicit copying in this framework, with analytics agencies being set up to monitor other people's sites rather than their own in order to optimise their site.

Indeed, it's this side of analytics that I think we need to be considering now. Whilst the model of working for a company to help optimise their website is the current standard, perhaps we should start thinking outside the box. The internet is now central to more and more people's lives, and whilst this will continue to drive this existing model for those in the web analytics industry, there are opportunities to be had for working on other sites. These could be governmental, educational, looking at analysing external sites for a company, as suggested above, or indeed working for individuals, perhaps to measure the data held on them by other companies? All in all, this shows that the web analytics industry should be kept quite busy keeping up with developments on the internet.

I'd love to hear your thoughts on this - am I way off mark? Have I missed something which you think we need to consider?

Tuesday, August 24, 2010

Improving self-improvement: a call for open-source education

It was simpler in the olden days: you bought the tool, read the documentation and voila! you'd taught yourself web analytics (well, almost). Now, to be at the top of your game in this business you need to be continuously learning. One of the many great reasons for working in the web analytics industry is its rate of development, with lots of new tools and techniques being introduced, and different thoughts abound on how to do the job properly.

There are a variety of learning resources available to the budding web analyst. There are many blogs in the web analytics field debating the latest issues, giving advice and suggesting new ways to tackle old problems (I've listed a few in my blog list to the right, if you're interested). There are also forums, books, and white papers provided by consultancies and vendors, catering to those in the visual learner category. For those auditory learners there are a number of podcasts out there (see also banner to the right). This then leaves the excitingly named kinesthetic learners who learn by doing, which sounds like the perfect opportunity to plug the Analysis Exchange.

So there are a number of places a web analyst can rely on to keep up-to-date with what's going on. But this puts me in mind of the former US Secretary of Defense, Donald Rumsfeld, talking about known unknowns. These resources are all great at helping you find out information about things you know that exist and you know little about, the known unknowns. But what about the unknown unknowns? How can you get a definitive list of everything that a web analyst should know, to determine if you're on top of it all? I believe that this is something that the WAA is missing. Whilst they currently have the syllabus for the WAA Certification, publishing a list of the areas involved in "Web Analytics" might help define the role of the web analyst better, and help them in their efforts to define themselves too. It could help build a coherent self-referenced set of pages on the intracies of web analytics, with suggestions for the metrics and reports to use for given scenarios. Whilst there's plenty of information out there providing overviews of web analytics and the tools to use, quite often the advice contained glosses over the details, or is one-dimensional, failing to mention other related reports or analyses that could be carried out. This then would become the definitive site for a web analytics education.

The science of web analytics has been around for a while now. So why hasn't this "open-source" educational resource been created yet? Being spoon-fed the information isn't the best way to learn - what good, curious web analyst would want to learn this way? With the current web analytics sphere being very tools-centric it becomes harder to share information as silos develop. And there's also an element of self-interest. Handing out the information on a plate loses business for practioners; it also spoils book sales.

And yet, I still feel that open-source education is the way to move forwards. Whilst the web analytics industry has been around for a while, it's still not mature. The public doesn't trust it, and whilst the majority of companies have at least one web analytics solution on their site, there's little evidence it's being used to its potential, with only the largest or bravest allowing their online strategy to be steered by it. In order to deal with this, we need to grow the number of individuals with the necessary knowledge to become advocates, dedicated to analysing their website on a full time basis. Restricting the ease with which they can learn is a short-termist approach - we need to think about the long term. By growing an army of trained web analysts, the case for the benefits of analytics can be made to those businesses still too small or immature to have made the transition, transforming companies from being satisfied with a list of their top 10 pages to ones competing on analytics, to paraphrase Stephane Hamel's OAMM model. As a critical mass of sites that truly use analytics is reached, the remainder will have to engage or die. Competition breeds improvements in techniques and ideas. Then, as the world learns that sophisticated web analytics requires sufficient resourcing, the opportunity for consulting services and more specialist knowledge will grow, and the availability of information on the internet becomes irrelevant. No-one teaches themselves accountancy - they hire an accountant. By sharing now, we can create the demand for tomorrow.

Thursday, August 19, 2010

A Levels and KPIs - an analogy

Today is A-level results day in the UK, when high school students find out if they've got the grades to go to university. Traditionally this is the day the media go to town to bash students or standards, depending on your view point. Students become angry that society has deemed them unworthy of the high grades they get, and society finds it hard to believe the ubiquitous A grades that are handed out are really reflective of the students' understanding.

In my mind there are two clear problems with A level results here - the grading classification itself and how students are being taught, and both these issues rear their heads in the web analytics world.

Firstly, the grading. Here in the UK A levels are marked from A to E, with a new A* grade being introduced this year to try and distinguish the really bright students from the bright ones. However, as more and more students receive the better grades as the years pass, it becomes hard to distinguish the brighter students from the bright - the metric isn't transparent. This is a problem that many analysts try to overcome in their reporting. If the KPI doesn't clearly indicate what's going on, it's going to be hard to take action. Knowing that 50% of your visitors viewed 3 or more pages of your site, and are thus "engaged" doesn't help too much. Knowing the distribution of page views per visits allows you to isolate the extreme cases, and determine who's really ploughing through your site compared to those who just view three pages. In the case of A levels, replacing the grading classification with a simple % scoring system would allow universities to see a more accurate reflection of the students abilities, and compare them with others.

Secondly, metric manipulation. A common complaint is that students are being taught to pass exams, rather than being taught to broaden their knowledge of a subject. Complaints abound that first year university students are unable to string a sentence together or display an understanding of basic numeracy, but they do have a lot of A grades to their name. Back in the world of Web Analytics this manipulation often rears its head too, for example on content sites, where articles are split into multiple pages requiring the reader to click on links to view the next page, thus generating extra page views. This not only proves frustrating for the the visitor, but also implies an artificially high level of engagement with the site.

Of course KPIs are essential in the world of web analytics - they're our bread and butter. And whilst we strive to improve our sites through the monitoring of these KPIs, we need to be bear some things in mind. A KPI is useless unless it accurately depicts the outcomes you're trying to monitor. And manipulating metrics is essentially cheating. And as our Mums all told us, when you cheat, you're only cheating yourself.

Monday, August 2, 2010

Multichannel: the digital holy grail or a poisoned chalice?

It is the received wisdom that the 360 degree view of the customer should be the ultimate goal of all marketing functions. By combining their customers' online and offline activities the company can monitor how frequently and through which channels these customers are touched by or touch them, ascertain how the customers respond to incentives, and use this information to determine how best to market to them. But should this be the goal of analytical functions? Might it actually be a waste of time?

There are a number of implementation issues that have to be overcome for this sort of project to avoid turning from the digital holy grail to a poisoned chalice. Firstly, accuracy. Whilst we've been told before that accuracy isn't so important in web analytics, this isn't the case when it comes to combining multiple databases - having an all-singing-all-dancing database means nothing if your data's not up-to-scratch. Your offline data needs to be regularly cleaned to remove deceased customers and update addresses, otherwise your finely-honed marketing campaign will be flawed from the start. In addition to this there's the problem of linking the offline to the online records - if any of these are inaccurate, then you're going to have problems merging the two. So there's little point executing this project unless you're satisfied that databases can be accurately linked. Finally, this isn't cheap. This of course is no reason not to implement a scheme providing the ROI warrants it. But can that be guaranteed, given some of these potential pitfalls?

Once the implementation problems are out of the way, you face your next hurdle (which ideally you'd have considered before the implementation). This relates to the type of industry your company sits in. A multichannel programme is going to be of little use to you if your customers don't purchase from you that frequently, which could be the case if you're in a very competitive industry, or one where the purchase frequency is low (cars, for example). If your customers don't purchase from you through any channel that frequently, this causes two problems for you: the statistical robustness of the data is weakened and the accuracy of the data you hold on them is likely to deteriorate between purchases, as customers identifying information changes. Finally, and crucially, this data only deals with the purchasers through your differing channels, and obviously won't pick up those who fail to purchase, assuming that visitors to your site only login during the purchasing process. It provides little help for converting prospects by determining why they didn't purchase from you.

And this leads me on to my main point - assuming you've got this far and set up an accurate multichannel database with a 360-degree view of your customers who are purchasing frequently enough from you to keep it all together, what next? It won't have happened overnight, and it won't have been free. Is being able to determine that customer A has responded better to a direct mail than an email campaign, whereas customer B only purchases online really going to provide huge insight? Obviously insight is there to be found, and it may be of benefit to some companies. But wouldn't a well constructed and segmented email campaign have already told you that customer A didn't respond well? Looking at it in terms of the opportunity cost of implementing such a costly and timely scheme, isn't there something more productive you could have done instead? For example, looking at segmenting your email campaigns more effectively, and analysing their on-site behaviour compared to other visitors might give you an improved response rate to your campaigns.

It seems to me that this is the end result of a marketer's fantasy gone mad, with little thought for the practical realities of its implementation and shortfalls. It's part of a familiar scenario: we've got too much data, and we're struggling to deliver true, clear insight. So what do we do to solve the problem? Bring in more data, or try and link existing datasets in an attempt to find it. But often there is no identifiable answer, and actually customer A asked his partner who happens to be customer B to buy it for him online. Focussing on the basics could provide as much ROI implementing a mulitchannel solution.

Again, let me reiterate, I'm not saying there's nothing to be gained from doing this; just that it's being sold as solving all our problems and being some sort of Utopian ideal, when in fact it takes a lot of time and money to implement, and there are still many key questions out there that remain unanswered.

Friday, July 16, 2010

Isolating the outcomes of change

Working in digital analytics gives you access to a huge amount of data which with the wrong mindset can cause problems and complaints of drowning. But viewed positively and intelligently it allows you access to huge amounts of insight. Sometimes though, in our eagerness to measure improvements after making changes to a site, we can forget that this treasure-trove of data can help us to refine our measurement and filter out underlying influences on our results. For example, say you've recently made some changes to one of your landing pages and the bounce rate appears to fall in the period afterwards. But what if your marketing team lowered the spending on the PPC campaign on the same day, bringing less traffic to the page? How can you strip this out to get the true effect of your changes on the bounce rate?

A nice way of removing the effects of one other variable is a scatter plot graph. It can show that a site's conversion rate is a diminishing function of the amount of paid traffic the site receives - that paid search traffic's quality is inversely related to its quantity. A site can have varying daily levels of conversion with no change to the site itself, dependent entirely on the level of paid traffic it receives.

Graphing different time periods of the site's visits against conversion rate before and after a change to the site and putting a non-linear trend line through these different periods allows you to monitor its performance. If the changes have worked, the curve moves up and to the right - for a given level of visits, the conversion increases. Instead of determining the average conversion rate before and after the change, you can now compare the distance between the two curves, removing the effect of any fluctuation in traffic - i.e. the position of the observations on the curve.

Another option is to graph a page's bounce rate against entrances in a similar manner to the previous conversion rate example and compare the performance of two different pages, or different stages of a campaign. For example, you can compare the inital stage of a new PPC campaign with subsequent months to see how well it's being optimised to improve relevance and reduce bounce rate.

As with many things, there are caveats to this approach. The first example won't give you a specific measurement to determine the improvement in conversion, but it should serve as a pointer as to how much of the change in conversion was attributable to the page change and how much to paid search traffic fluctuations, allowing you to calculate it manually if need be.

Furthermore, it requires a decent quantity of traffic to allow for a statistically valid conversion or bounce rate and a sizeable number of observations at varying levels of traffic to build the curve. This is not so easy in the digital measurement world as sites and pages change regularly.

Finally, this method only allows you to monitor the effect of one other variable on your changes. There is another tool which can be used to explain the effect of more than one variable in a statistically descriptive manner, which is regression analysis, which I hope to cover in a future post.

Friday, July 2, 2010

The Russian spy ring - an analytical parallel

The recent unearthing of a Russian spy ring working in the United States and the facts uncovered about how its members operated made me think of a couple of parallels with the last post about the public's perception of the secretive world of tracking.

On first impressions one would assume that a spy ring working undercover would be run efficiently, gathering a lot of valuable information, and providing critical insight to the spy-master. However, in this case it soon transpired that the spy ring was actually struggling to function, having issues with both technology and communication. The parallels with some (not all!) company analytics functions are quite clear. Perceived by the general public as an all-knowing collection of spooks, in fact some companies are quite disorganised, struggling to determine what insight their managers need, let alone deliver it. Communication and technology issues can quickly get in the way of extracting clean data from which analysis can provide the necessary insight.

Many of the spy ring's issues centred around a desire amongst some for a return to the "good old days" when things were simpler, be it hiding packages in the ground under bottle tops, or using maps with stamps on. Similarly, there is still a reluctance among many companies to fully embrace digital analytics, with concerns about both the quantity and quality of data, or a timidity around entering the social media arena, let alone measuring it.

As I mentioned in the previous post, we can help ourselves by becoming more transparent and making our case more clearly. But having a better understanding around tracking with the visitors to our sites counts for nothing if we're not organised and brave enough to deliver the insight to drive the change the visitors need.

Friday, June 25, 2010

Bridging the Numerati-Ignoscenti tracking divide?

I've just finished reading an informative book on the likes of us - "They've got your number" by Stephen Baker. The book talks about the Numerati - "mathematicians who are mapping our behaviour" in various industries, not just e-commerce, for example in the workplace, in politics, blogging and healthcare. There were a number of themes in the book, none of which came as a surprise. For example, Baker talks about the large amounts of data available in each scenario, and how powerful mathematical tools and knowledgable analysts are required to not only derive insight but the correct interpretation of this data. In the chapter on terrorism he pointed out the importance that the NSA (or GCHQ) analysis is correct first time; in other industries as Avinash likes to point out, we can (and should) learn from our mistakes; indeed, failing makes success easier.

Whilst Baker's book didn't try and paint a picture of illicit snooping and stir up the usual scare stories, it did get me thinking about how this subject is perceived by the general public. There is a lot of information available on internet technology, and more of this is filtering into the public arena. For example, browser selection is becoming more sophisticated; whereas a couple of years ago Firefox was the preserve of net geeks, now my parents are using it - Microsoft's share of the market is eroding. But it's not just browser choice that people are becoming more au fait with; it's the contents of the options menu within the browser, and with it cookie blocking, then private browsing and opt-out addons.

Whilst we should respect the wish for privacy of those who've chosen to block cookies, adopt private browsing or install these addons, we should not be scared of making the case for tracking so that these people have all the facts at their disposal before they make their decision. As people become more aware of the perceived murky world of corporate tracking, without a clear counter-argument being proposed it's easy for the public to assume it's of no benefit to them, or worse. And yet, one of the most popular websites on the planet is in that position precisely because of its tracking. People agree that Amazon is a great site, and are impressed by its cross-selling abilities and its recommendations based on their search history (both on and off the site). It surely shouldn't be hard to use this to sell the benefits of tagging a site. Whilst it's becoming fashionable to talk about how we live in a "Big Brother" society with constant surveillance, be it CCTV or online tracking, it should be possible to make the distinction between a true "Big Brother" society whereby monitoring takes place to crush dissent, and one which is built to help people do what they want to on a website more effectively.

So how to do we go about getting rid of this "Big Brother" image before the battle's lost?
1.Site Transparency. A clearly stated (i.e. not legal speak) and up-front privacy policy page (i.e not hidden away in the smallest font possible somewhere inaccessible), explaining the methods used and the information gleaned.
2. Present a clear case to the public. Whilst the case is clear, how it should be communicated is less so. Is this something for the WAA to do? The case needs to be made globally, and whilst they have a presence across many countries, this is something which needs to get into the living rooms of people across the world. Web analytics is being discussed in German and American parliaments at the moment; maybe petitioning your local polititian to raise a question could bring it into the public domain. What is clear is that the internet is a global phenomenon, and, as with policing it, lobbying it is hard to do.
3. Better education. In a previous post I discussed the importance of educating children in the internet. IT is an important topic, and the learning about using the internet is a major part of it, be it tracking, site construction or communication. Informing young people of all the facts at an early age is the best way to remove this image, if a slightly long-termist one...
4. Improve your site! Earn the right to stop people deleting your cookies - people would be more reluctant to delete their cookies to a site if they got an Amazon experience from it.

So there we have it, my thoughts on how we can turn the ignoscenti into the cognoscenti. Have I left anything out? I'd love to hear your comments.

Wednesday, June 9, 2010

Applying statistical rigour to web analytics reporting

Web analytics is all about making decisions from the data. But how can you be sure of the quality of the data you investigate, and the recommendations you provide from it? Whilst the numbers may be accurate and reflect what happened on your site thanks to a successful tagging implementation, are they statistically significant? Furthermore, once you've uncovered what's fluke and what's not, how can you illustrate this succinctly in your reporting?

Unfortunately, with a few limited exceptions, vendors don't provide any indication of the robustness of the data in their consoles. Wouldn't it be great if, for a selected time period, you could segment your site and see that although there's a marked difference in conversion (or metric of preference) that it's only significant at the 50% level? Or, alternatively, what appears to be only a small difference in bounce rate is actually statistically significant? Until that day comes though, you need to be able to do it yourself. Avinash has written about on a couple of related topics a while back - applying statistical limits to reporting and an overview of statistical significance. In a more recent post, Anil Batra highlights the importance of not rushing to pick the winning page from an A/B test. And in the last few days, Alec Cochrane has written a great piece on how to improve the statistical significance of a web analytics data-based model.

There are plenty of statistics tests out there, with different datasets and situations that call for them, but for the purpose of this post I'll focus on just two both of which are listed here amongst others.

The two proportion z test compares whether the specified proportions of two samples are statistically different from one another.

This test can be applied to a number of reports within web analytics, but its main use would be for comparing the click-through rate or response rate of two campaigns to determine whether one is conclusively better than the other. The beauty of this test is that it only requires four values - the two proportions (%s) and the two sample sizes, and as such can be calculated without use of a spreadsheet.

The second test is the two-sample t-test which determines whether the means of two samples are statistically different from each other for the two given sample sizes and sample standard deviations.

By requiring the standard deviations of both samples, this result takes more time to compute by requiring the user to download the series data in question. This test has a variety of uses, for example comparing whether the different average values of a given metric for two segments are statistically different, or perhaps looking at the same data series before and after an external event takes place to determine whether it has had a statistically significant effect on the data or not.

Now that you're confident that you know which results are statistical flukes and which ones aren't, how do you go about illustrating this in your reporting? One option would be to include the t test results and significance levels in your reporting, but this is likely to clutter your reports as well as potentially confuse the reader. A neater way might be to colour code the values to illustrate their confidence level if you're happy to introduce different text colours to your report. For time series data you can add the mean, and upper and lower bounds to a graph, to show which peaks and troughs merit further investigation.

Of course, once you've come up with a clear and succinct way of displaying this statistical information, you still need to explain it to your stakeholders, not all of whom will have a background in statistical analysis. Demonstrating the robustness of the data to them and how the varying levels of robustness are determined will not only provide extra confidence in the decisions you're recommending from the data, but illustrate the importance of asking constructive questions of the data, rather than slavishly following what the data suggests at first glance.

Images from Wikipedia.org

Tuesday, May 25, 2010

The Times Paywall - Not such a bad idea?

The Times of London (owner Mr R. Murdoch) recently announced it would be introducing a paywall to its two sites, due to go live at the start of June, and has been receiving a lot of coverage about it. Previous experiments with paywalls have yielded poor results, and there has been a lot of discussion, especially on social networking sites about how setting up a walled garden and attemping to work against the principle of an open internet will ultimately hurt Mr Murdoch. Furthermore, keeping content behind a paywall would not only limit the availability of the site's news, but also its reporters. Users of Twitter would no longer be able to link to their work, and see it be shared across the internet - their brand as well as their paper's would be curtailed.

Looking at it in purely monetary terms, this is clearly a beneficial move. Mr Murdoch needs to ensure that

VaCaR + PVa > VbCbR

where a denotes after and b before the paywall is introduced, V is the number of visitors, C the display ad clickthrough rate, R the revenue per click, and P the subscription price.

Making some assumptions about the share of visits to visitors before and after the paywall change, taking the visitor values for The Times with the forecast 95% fall in traffic afterwards, and assuming that the clickthrough rate will improve following the introduction of a paywall suggests that unless revenue per click is more than approximately £26, this will generate more revenue for News Corp. This demonstrates that this move will clearly benefit them (unless anyone knows of an ad that regularly yields that sort of revenue per click!). Of course the cost to the brand in terms of damaged reputation, reduced visability and possible loss of staff is not so easy to calculate.

In terms of analytics, though, things get more interesting. Reducing the amount of traffic to the site in this way should tighten the audience profile - effectively removing the drifters, and bringing the online profile of visitors closer to the offline one. This then should enable more effective advertising, appealing to the more shared interests of the new profile. Also, the new traffic should be more engaged with the site, generating more page views per visit, and thus more opportunity for clicking on the ads, as well as helping any behavioural advertising the company may be using. And of course in terms of the site itself rather than the advertising, the more engaged traffic should improve the conversion funnels the site has, such as engaging with their live chat facility.

I'll be interested to hear what News Corp has to say about the effects of its paywall from this perspective in a month or so!

Monday, May 24, 2010

Education in web analytics: Teaching kids how to measure the internet proper

Following my previous post on education for those in the web analytics industry, I thought I'd have a look at the opportunites for those still in full-time education to develop the skills to enter the industry.

Recent events have encouraged me to take a more active interest in the education system, and although I'd never encourage my children to follow my career just for the sake of it, I've been thinking about which courses I would recommend to a student to develop the necessary skills and knowledge to be a success in the digital measurement industry. In my opinion, there are two sides to this. The first is a generic internet education, which I would hope would be included in the "standard" ICT courses (for those in the UK) up-to and including GCSE level.

Delving into the internet section of an ICT course, children would need to be made aware of general internet skills. Being taught about the privacy and safety implications of using the internet should be as important as the Green Cross Code. I would also consider a basic understanding of how websites are built, covering coding and infrastructure to be an important part of any ICT course. Then, more specifically related to the web analytics area, a discussion of the way sites are tracked; rather than looking at it from the standard "all tracking is evil and intrusive" perspective, considering why businesses do it, and how it could be beneficial to the customer when executed correctly. One would also hope this would also be covered in any Business Studies course available today. Finally, it would be desirable if they covered internet terminology so that once and for all people could distinguish between visits and hits! Obviously these topics would likely constitute a module or part of one in a larger ICT course, but imagine how much better the world would be if all youngsters were taught this...

The second of the two key areas that a student would need schooling in relates to analytical nature of the role. Assuming that the student had covered the more generic knowledge just discussed, they would then need to build on that with both further education and development. Of course, as before, there's no way that a course specifically tailored to web analytics would exist at this level, but rather the student would need a combination of broader courses to provide the more detailed knowledge required. These courses might include statistics (for obvious reasons), with a course in econometrics helping to teach them about models and statistical tests, and a course in marketing to give them the background of the problems they'll be looking to solve in the future. However, being a web analyst requires more than a knowledge of data manipulation and sales; a critical and curious mind which questions assumptions and asks the right questions is paramount. No one course can create this, but something which encourages detailed analysis and looking at an argument from multiple angles would help; perhaps philosophy or history?

Once they've got this far they'd have a good idea of the theory behind the internet, and the necessary skills set to analyse and ask questions of the data. All they'd need then is some hands-on experience of the role itself - some pre-on-the-job training. And this is where the innovative Analysis Exchange comes in, linking students, mentors and donors (non-profit and NGO websites). The mentors provide the tuition to the student, who provides a piece of analysis to a grateful charitable website. Not only does this give students a fantastic opportunity to both learn and shine, but due to the close-knit nature of the web analytics community, it gives them a great way to advertise their skills on completion of the project.

So there you go kids - that's how to get into the #measure industry. For those of you already in the industry, what would you recommend to an aspiring student?

Wednesday, May 12, 2010

Education in web analytics: Experience vs qualifications

This is the first of a couple of posts that loosely fit under the category of education in the web analytics field. This one looks at some of the courses and qualifications out there for web analysts, and the benefits of them versus everyday experience.

There are a number of courses that are available to web analysts looking to improve their knowledge. One of the most well-known is the UBC Award of Achievement in Web Analytics. Whilst being a comprehensive course, it's geared towards people with little prior knowledge of the topic, with the first module named an "Introduction to Web Analytics". There is also an extension course, the Web Intelligence Certificate from the UBC and UC Irvine Extension, which as a prerequisite requires the UBC Award of Achievement to have been completed, and focuses on coursework.

The WAA Base Camp consists of two days of training which allows the student to get a "solid foundation of online marketing analytics knowledge and authoritative course material in a workshop environment". Unfortunately, as with a lot of WAA content, it seems to be very US-centric, with no courses available outside of North America this year, and no previous workshops outside of there since September 2008.

One complaint about why people don't join the WAA is that there's no proof of the member's quality. This has been answered by the WAA with new certification. There's been some discussion about its validity, its availability to non-WAA members and concern around pricing and how to take it - see here for an example. The WAA have responded to these in attempt to make the issue clear. There's general agreement that it's tough, with two to three years of experience required; its aim is to test your analysis skills rather than your knowledge of a particular analytics package, and it has a quasi-MBA slant to it. In a recent post Stéphane Hamel discusses the benefits of taking the test.

The Google Analytics IQ test is proving to be quite popular. So much so that they've recently had to make it tougher, raising the pass rate to 80%. Obviously this is tied to one package, and has an 18 month shelf life.

There seems to be agreement that certification is better for those earlier on in their careers. Those who've been in the industry longer have less need to demonstrate and prove their skills. However, those with experience can easily be siloed - all too often I've seen adverts for web analyst positions which talk more about the need for experience with a particular analytics package, rather than concentrating on the skills required to do the job properly. Having a qualification like the WAA certification would demonstrate your ability to bring the necessary mindset and skills to the role to successfully carry it out. The individal could show their knowledge of the technical, marketing and statistical skill sets required for this job, rather than have to spell it out on a CV. The technical knowledge one would have acquired through gaining the qualification would go a long way to helping out with any package-specific issues that may be faced in a new role.

The downside is that many of these qualifications are expensive (especially if you have to travel to take them as Steve Jackson notes in his afore-referenced blogpost). But labour economics dictates that as long as prospective employers can be assured of the quality of the test, this should only go to highlight the candidate's quality (and self-belief) - that they were willing and confident enough to invest in themselves to set themselves apart from the market. That the qualification has to be re-taken regularly for such a changing market would also demonstrate the candidate's quality.

Finally, the obligatory UK naval-gazing section. Whilst in the US where there is more knowledge about web analytics industry, here in Blighty fewer companies and managers are familiar with it. They may well be aware of what a web analyst does and how they can help provide insight and drive action etc (although possibly not to the extent a similar position in the States would), but I suspect they wouldn't be so familiar with the role of the WAA, or have any knowledge of the varying types of qualifications out there. This then, for now, lessens the impact of obtaining one of these qualifications on this side of the pond, although hopefully this will change soon.

Monday, May 3, 2010

The UK election - a measured perspective: Part 2

In my last post I looked at how the three parties websites were tracked, and how their Twitter social media campaigns were running.

To recap, there was little evidence of anything other than implementing the vanilla tags for Google Analytics, with no custom tracking of any sort to be seen. The three parties all had Twitter accounts, and the numbers depicted quite different levels of advancement. The Tories had the largest number of followers and also the most engaged followers according to Twitalyzer, with 100% clout and the highest influence and impact ratings of the three. The Lib Dems whilst (only just) having the smallest number of followers, had a more engaged following than Labour, with higher clout, impact and influence metrics than the governing party. Furthermore, these metrics were all still rising, indicating an campaign that is being optimised. The Labour campaign was flat-lining for impact and influence, and whilst rising for clout, was still way behind the other two parties.

The last post looked at the 30 days of data in Twitalyzer up to April 16th, effectively looking at the first half of the official campaign. This time around we look at the data in the run up to the election. This should show us how the campaigns have faired since then, and possibly give an idea of who has run the best online campaign, and how offline events have affected the online campaigns. Who knows, I may even inspire you to bet on the actual outcome should you fancy a flutter.

This time around, we look at the 30 days of data in Twitalyzer up to May 2nd, 4 days before the election. Twitalyzer uses a 30 day moving average, with the score for an account changing every 7 days. Rather than look at the levels for each week, I shall look solely at the moving average for this time period versus the last. Note also that the diagrams below use change indicators that refer to the 7 day time period just mentioned rather than that the being used for this analysis.

Before we look at the Twitalyzer numbers in detail, we'll take a quick peek at the number of followers each of the accounts has to give a very rough indication of how the parties are faring. Both the Conservative and Labour parties have seen steady growth in their follower counts, with the Conservatives growing by 11% to just shy of 30k, and Labour by 10% to 16k. The Lib Dems saw a large 38% jump in followers, taking them to almost 19k followers, and easily taking second place off Labour. This reflects the surge in popularity that Nick Clegg, the leader of the Lib Dems, has seen following his successful appearances on the Leaders Debates. Another sketchy indicator of the quality of the campaigns of each of the parties is the ratio of followers to following. Last time around both Labour and the Conservatives had fewer followers than those they followed - unrequited love? This time around things were different, with only Labour having fewer followers than those they followed. On this measure the Conservatives were the most successful, now having 7% more followers than those they followed, with the Lib Dems slipping from 5% to 1%.

To recap, in terms of Twitter followers the Conservatives remain firmly in the lead, with the massive surge in popularity for Nick Clegg not really affecting them. It has, however, propelled the Lib Dems into second place.

Turning now to the Twitalyzer numbers it can be seen that whilst there has been some change to the performance of the campaigns based on the five metrics, as with the follower data the Conservatives remain in control.

Their impact has fallen to 38.6% from the previous analysis's 40.3%, which given their increase in followers since that time period suggests this is caused by factors that have caused the influence metric to fall down to 54.1% from 58%. As you may recall, influence takes references and retweets into account, whereas clout only looks at references - given that the Tories clout score has remained at 100%, this indicates a fall in the number of retweets the Conservatives account has been receiving - whilst people are happy to mention the account, they're not so keen on spreading what the Tories have to say. However, the generosity metric for the Tories has also risen dramatically since the last analysis, to stand at 34.7%, up from 3.7%! This suggests that as well as having engaged followers (although not possibly as engaged as they were a fortnight ago) the Conservative Twitter account is now engaging better with them too, with a nearly ten times increase in the share of its tweets that reference other accounts. Inspection of the actual account reveals that this is actually retweets of accounts affiliated with the Tory campaign, so is less engagement with followers, and more spreading the message.

The Labour party have been having a poor campaign offline, and this appears to be the case online as well. With the exception of generosity, all their metrics fell between the two time periods. Given that this presumably takes into account the 10% growth in followers, this indicates a marked fall in follower engagement, and suggests that the message that the Labour account is delivering isn't inspiring people enough to spread it. As with the previous period, all the follower engagement metrics for the Labour account are lower than both the Tory and Lib Dem accounts, and are now becoming more so. It is hard to determine the effect of Gordon Brown's gaffe late last week where he accused a member of the public of being "bigoted" on the Twitter account performance, given the seven day time periods used by Twitalyzer and the dates used here for the Twitter follower count, but it is improbable that this is the main cause of the downturn in their performance as this began before the gaffe. Although it can't be proved, it does appear that this specific event has not had the large negative effect on Labour's Twitter campaign that Nick Clegg's positive performance had on his party's campaign. This one-way relationship (if it exists) could give us some limited insight into the type of follower the social media campaigns have - being more positively affected by the campaign it suggests that party account followers are more affiliated with their party than the typical member of the public (as one would expect).

The Lib Dem Twitter account has seen a slight increase in impact, rising to 32.6% from 30.5% a fortnight ago. This reflects the large increase in followers, and will have been held back by the smaller increase in its follower engagement, represented here by the influence metric which only rose from 45.9% to 47.1%. The clout metric actually fell between the two periods, from 97.7% to 92.5%. This indicates an increase in the number of retweets, but a fall in references, suggesting that the new followers that the account has received (probably off the back of Nick Clegg's TV performances) are happy to retweet its message, but less likely to engage directly with the account.

It would appear then that the Labour account has deteriorated since the last analysis, when it wasn't performing well anyway, and is clearly behind the other two parties. The Lib Dems, whilst seeing a large increase in followers, and improvements in its influence, engagement and impact metrics, has seen a fall in its generosity and clout metrics, indicating an increase in the number of retweets as well as other accounts that it references, but fall in references by others. Whilst the Tory campaign has deteriorated since the last analysis, with both the impact and influence metrics falling, indicating a fall in their tweets being retweeted, they are still spreading their message more effectively than the other parties. Traditionally the Tories have had a stronger online presence than the other main parties, primarily in the blogosphere, and this goes some way to explaining the strength of their social media campaign.

Of course, this has only a slight correlation with the outcome of the election on Thursday, but it does show that the Conservative party have managed to generate a solid social media campaign (for Twitter) in the run up to the election, engaging better with their (larger number of) followers. Obviously, the numbers involved here are small relative to the size of the electorate, and to a certain extent their campaigns are preaching to the converted. This election will be won by convincing swing voters to vote for a party in key constituencies, and it will take more than Twitter or Facebook campaigns to do this.

Oh, and my prediction for the result? The Tories to take the largest share of the votes with a hung parliament overall. And the whole thing to begin again in under a year.

Friday, April 16, 2010

The UK election - a measured perspective

Now that I'm getting into the swing of the UK election, I've been having a look at the three parties's websites and assess how they're doing. Rather than the usual analysis of how the sites look and function, which has already been done here, I thought I'd do a quick post trying to see what tracking exists, and look at their social media strategy.

Analytics

On inspection of the websites tracking, using my Chrome extensions, it appears that all three are using Google Analytics, although none of them were using the the latest asynchronous version. Furthermore, there was no evidence of the code being tweaked. For the pages of the three sites that I visited, there was no evidence of any events being set up in Google Analytics, for example to track exit links, or people signing up to their email campaigns. There was also no sign of any custom variables being set when I carried out any significant actions (e.g. signing up for an email - I wasn't dedicated enough to this post to make any donations).

Social Media

The Labour website has a very basic set of social media links, pointing to just a Facebook page and their Twitter account.

The Conservative website has the most comprehensive set of links for its social media strategy, including ones to their Twitter account, Facebook fan page, YouTube Channel and iTunes podcast amongst others.

The Liberal Democrats website is more advanced than the Labour site in terms of its links, incorporating a YouTube channel link, but is not as comprehensive as the Conservatives.

However, just comparing what tools are used does not give an indication of how active their social media campaign is. We can have a snoop on how one element of their social media campaign is doing by looking at Twitalyzer (Super site by Mr Eric T Petersen - great review on its capabilities here) for the three parties.

The most noticeable chart here is the improving clout metric (the relative likelihood that the Twitter username will appear when searched for) which has been steadily rising, and is now rated at 80.7% for the last 30 days, indicating an increased presence in the last 30 days. The engagement metric (measuring the type of interaction the user has in Twitter by examining the ratio of people referenced by the user to the number of people referencing them) is 0%, and has registered no change over the last 30 days! This indicates that the account is referencing very few of the Twitterers who mention it - closer inspection of their account reveals that whilst they do reference others, a lot are the same account, and in no way match the RTs of their own account. The Tories (see next chart) suffer the same problem, although they have much higher Impact, Influence (similar to Clout, but looks at RTs and references) and Clout metrics. This indicates a higher number of followers (at currently c. 27k, nearly twice Labour's 15k), unique references and retweet rate amongst others.

Until yesterday, when the Liberal Democrat leader clearly won the leaders debate, the Liberal Democrats were very much in third place in this contest. Looking at their number of followers shows that whilst trailing the other two, the gap between them and Labour is very small.

Their Twitalyzer data shows a different picture, suggesting that their followers are more engaged than Labour's (although not as much as the Conservatives). On every metric bar Generosity (the share of tweets that retweet others) the Lib Dems trump Labour, suggesting their following are more likely to retweet and mention them than Labour's followers would them, and that they would be more likely to appear in a Twitter search. They manage a 0.5% engagement score, indicating they are doing a better job of talking to those that reference them.

Of course, all this references activity in the last 30 days - it is the next 30 days that are more important. However, looking at the trends and levels it would suggest that the Tories' larger number of Twitter followers are more engaged with them than their rival parties are, and that the Lib Dems and Labour are close, with the Lib Dems interacting better with their followers than the incumbent party. This then might have a bearing on future polls, and loosely corroborates the pre-debate polling suggesting a tentative Conservative lead. It also corroborates evidence that the Tories have a larger share of voters who are determined to vote and have made up their mind.

How these scores change in light of yesterday's debate and the events of the coming weeks are another matter, and one I hope to look at in the coming weeks. Let me know your thoughts on these numbers - what would you look at, and would you interpret the numbers differently?

Tuesday, April 13, 2010

Chrome Analytics Extensions

I've been using Google's new browser for a while now and, as a Google Fan Boy, have fallen head over heels in love with it. Now that it incorporates extensions, Google's answer to Firefox's addons, the browser's really come into its own, and has convinced a lot of Firefox users to see the light. However, the volume of analytics extensions isn't as a large for Chrome as it is for Firefox. In fact, the addon I used to use with Firefox, Ghostery, doesn't exist for Chrome. I've done some hunting around to see what's out there for the discerning Chrome-using analyst, whether it be for monitoring tracking on other sites, or for helping you with your own sites.

BuiltWith Technology Profiler is a collection of easily-accessible stats for a site. Divided into a number of sections, the first (and most important!) is right at the top: "Analytics and Tracking". This lists the analytics providers to the website in question, with a bit of blurb as to what sort of information it would provide the site owner with - handy for those who use this tool from a non-analytics perspective. Other sections here are also of interests - widgets, detailing bolt-ons the site uses (for example Wordpress plugins); Aggregation functionality (RSS, Feedburner); and Document Information (the Meta information behind the site) amongst others. Whilst being quite wordy, the information is clearly laid out, and easy to get at. This extension then is great for people who want a quick overview of the tracking behind the site, and thus a good replacement for Ghostery.

META SEO Inspector is a bit harder on the eyes, but has a lot of helpful data, particularly from a site-owner's perspective. As its title suggests, it provides SEO information for a site, which it groups into four categories - information box, SEO Tools, Site Safety and Other. Within the information box, it begins with the Meta data, which spells out word for word the contents of the various categories. It also has sections on links for showing if canonical links have been specified for example, lists no-follow links, and then in the scripts section lists the various tracking codes, but without naming them. If a script contains multiple URLs in it these will all be listed, so it can get a bit messy if more than one tracking provider is being used. META SEO Inspector also has a SEO tools function which contains a list of a variety of tools each linking to their respective site with information on the site in question.

Chrome SEO promises to provide Google Analytics integration shortly, but for now concentrates on links, ranking and indexed pages, and is essentially a more compact version of the META SEO Inspector extension.

For me, the BuiltWith Technology Profiler provides the best information on the tracking on a site for now, managing to provide the information that Ghostery does for Firefox. That said, a slicker more informative version would be welcome.

Whilst looking for some extensions to compete with Ghostery, I came across the ClickyChrome extension, which is handy if you've got a Clicky analytics account (but not very if you haven't). Clicking on the icon brings up a mini-dashboard, with metrics around the number of visitors, actions, time, bounce rate and goals. The extension allows you to track more than one website, with two drop downs allowing you to choose your site, and the timeframe. The extension comes into its own when someone visits your site - being a real time tool, Clicky enables its extension icon to alert you if you have visitors in near real time, so you can open your account up and see what they're up to whilst they're still there!

This being a new(ish) browser with a frequent update schedule and dedicated coders, I wouldn't be surprised if the quantity and quality of analytics extensions increased over the coming months. I'll be keeping an eye out for new developments. If you use an extension for Chrome that I've left off here that you like (or know of an addon for Firefox that is so super-whizzy it'd tempt me back to the dark side), then please let me know in the comments.

Wednesday, March 24, 2010

Free versus paid tools: A review

I've been mulling this one over for a while now. I've worked with paid and free tools before, but in my current role I use both Google and Yahoo Analytics on my company's sites. However, for all their strengths, there are some things that the free tools can't provide me with. Should I fork out the company's cash to bring in a paid solution? This post aims to provide an overview of the strengths of the various tools, look at the two sides in the paid / free tool discussions going on in the Digital Analytics world, and start me on the long process of making up my mind!

Free Tools

Some free tools are big players in the web analytics market. In the WAA 2009 Outlook Survey 30% of respondents used a free vendor solution, whilst a recent Forrester report stated that 53% of enterprises surveyed used a free solution as their primary tool, and 71% use one in some capacity.

There are currently two main players in the free tools market - Yahoo! Analytics and Google Analytics. The latter is freely available and the most popular free tool, being used to track 28% of all websites. Yahoo! Analytics is not so easy to acquire, and thus isn't so widespread. There are, of course, other free tools out there such as Woopra and GetClicky, which focus more on simple real-time reporting, and as such will be excluded from this post. Whilst the reporting available in Yahoo! Analytics is more advanced than Google Analytics, is it an "Enterprise Class" solution? What makes a tool "Enterprise Class"? I've spoken to a number of people who believe that Yahoo! Analytics is strong enough to compete with the paid tools out there; not many people would say the same of Google, at least for now.

First off, let's compare the two free tools. James Dutton has a fantastic post analysing the two. His analysis breaks the two tools into various categories, and scores them accordingly. Yahoo! comes out better than Google for most, quite often by a large amount. User management, code customisation, report bookmarking and filtering all fair better under Yahoo! than Google. However, Google has a much better support network, both in terms of documented help files, as well as forums and online guidance. Campaign tracking is also better for Yahoo! than Google, as is path analysis, and custom report design.

James' post is so comprehensive there's little, if anything, that needs to be added. In my experience Google is the more user-friendly of the two, although I've had more experience of it. However, Yahoo! Analytics real strength is its adaptability, allowing the user to amend reports on the fly, drill-down into the data and then link to related reports. With its limited help facilities it might take you a while to figure out how to do it, but once you have you're away.

When people talk about "Enterprise Class" solutions, what do they mean? Essentially the ability to tailor your tool to allow you to measure your data how you want to, be it custom segmentation, metrics or dimensions, or the reporting itself. In this post Eric Peterson discussed Google and Yahoo! Analytics, and concluded that "I don't think that Google Analytics is appropriate for free-standing use within the true enterprise". Yahoo! Analytics, however, "is an Enterprise-class solution," according to Eric, designed to support business with custom data collection, reporting and segmentation needs. Nevertheless, whilst Yahoo! may have the edge for the two main free tools, don't discount Google altoghether. Google have clear plans to become "Enterprise Class", and its improvement over the last few years has been astounding, moving from a basic tracking mechanism to a tool which is now pushing the boundaries of the industry, and forcing established companies to justify their fees, and in some cases catch up. Analytics intelligence, custom alerts, segmentation, advanced analysis, Mobile, custom variables - the list is long and distinguished. Whilst many tools have these features, no provider has released so much so quickly. And, true to Google's ethos, the tools are easy to use and interpret.

Paid Tools

So that's the free tools covered. But what distinguishes them from their paid competitors, other than the price tag? And just how important are these extra features?

A recent analysis by Forrester looked at the main players in the paid web analytics tools market, as well as the aforementioned free tools. Firstly they looked at the factors that respondents considered when selecting a vendor. The most important were: importance of data accuracy, and the flexibility of the tool, both in terms of servicing the business needs, and reporting - an enterprise tool!

They then took these factors and others, and ranked the various tools accordingingly. Briefly returning to the free tools, they ranked Yahoo! Analytics behind Google Analytics, although this was due solely to the product and corporate strategy weightings that they used; in terms or the current offering, Yahoo! was comfortably ahead of Google.

Returning to the paid tools, the big four (Omniture, Coremetrics, Unica and WebTrends) were ranked closely, and significantly ahead of the two free tools. They highlighted these tools' data handling, reporting and analysis, in addition to the ancillary marketing applications and plug-ins. Looking at the scores for these tools it is interesting how each have clear perceived strengths and weaknesses; there is no one clear winner from the list - all have at least one drawback. Furthermore, for each section there is either a clear winner, loser or both. If a company were just using this report to make their choice, it would be a simple matter of determining which category(ies) were most important to them and the choice would be clear(er).

Comparing the paid tools to the free ones for their current offerings (leaving out strategy rankings), it's clear that the paid tools are comfortably ahead of the free. However, the gap is smallest for the metrics, dimension & correlations and reporting categories. It's the data handling and service support that distinguish the two groups, where paying the money delivers the most return. Of course there are plenty of consultants out there who would be willing to help out with any technical implementation issues you may have for the free tools, no doubt for significantly less than the cost of an annual licence fee for a paid tool.

Let the fight commence

OK, so now we know the strengths and weaknesses of the different tools. The question is, does it pay to pay? We've looked at the two types of tools, now let's look at the two camps. On the one side of the fence are those who support the use of paid tools, claiming that the free ones just don't cut it. The powerful analysis engines that can be plugged into the main tool enable us to slice and dice the data in new and insightful ways, looking at non-aggregated data. Eric Peterson has recently been talking about the bifurcation of tools: Yahoo Analytics and/or Google Analytics are great for business users, but you need an enterprise tool (e.g Omniture Insights or WebTrends Visitor Intelligence) for the analyst to deliver really detailed insight. It's clear that these tools are more powerful than the free ones, and with the resource can deliver more insight than the free ones ever could. By combining a powerful tool with a free one, or setting up some simple dashboards in the powerful tool, companies can deliver the appropriate level of data to its employees.

On the other side there are those who claim that the free tools, whilst not being as powerful as the paid ones, provide the analyst with enough information. They state that it's as much about the quality of the analyst as the tool. The analyst needs to ask the right questions of the tool and correctly interpret the resulting data - simply exporting a data puke and mailing it out won't do anything, even if it did come out of a very flashy and expensive tool. In a recent post Avinash explains that setting up a powerful paid tool takes time, and you get a hobbled tool - you'll need to upgrade and get plugins to get all the information. This wastes time and money, and still won't present insights on a plate to you. Applying this to the 10/90 rule, if you've not got the resource to bring in a large enough analytical function that can manipulate these tools to the best of their ability, don't waste the money on an expensive tool.

Of course both Eric and Avinash have their reasons for being on their respective sides of the fence, for different reasons both would be out of a job if they didn't. In many ways though they agree with each other, but just choose to focus on different parts of the issue. No-one disputes that paid tools aren't better than the free tools, although, conversely, no-one would dispute that the free tools aren't catching up. The issue boils down to implementation and resource, both financial and human.

On the one hand, if a company has unlimited resource and cash, has the management in place to see through an effective implementation, and is not desperate for results, then the best route to take would be installing an expensive tool. Conversely, if a company has limited cash and human capital, little time or knowledge for installing and needs results straight away, it should be obvious what to do. Of course, most companies lie within these two extremes, so the decision is rarely that straightforward. What is clear though is that implementing a complicated web analytics tool is a drawn-out process, requiring not only the technical knowledge, but a great planning skills, and the full engagement of the various stakeholders involved. Simple it is not. To further complicate matters, you have to contend with the issue of trying to map what you need the tool to do to what it can do, not what you're told it can do. Unless you really know what your goals for the tool are, you risk being sold a pup.

Obligatory UK Naval-Gazing Section

At this point in a brief aside I'd like to focus on the UK, where web analytics has been acknowledged but not engaged yet for most companies. Obviously there are exceptions, and the blue-chips here are ahead of the rest of us. But the fact remains that for most companies, senior management are aware of web analytics and the power it has to drive through change and appreciate the need to investigate it, but have not been convinced of the benefits of fully implementing it. Most companies don't have a full time web analyst, let alone a team, and haven't allocated the money to buy in a tool. This will happen, but we are still at an early stage in the cycle. Effectively we are being asked to prove what can be done with a free tool before we shell out £££s on something better.

So what's the conclusion to all this? As any guru or university student will tell you, the correct answer is "it depends". If you've got the resources (money and people) to justify a paid tool, then you might reap the benefits as part of a medium to long term plan to improve the analytics of your company. But it requires dedication, good management and focus to pull this off. I've seen companies falter with their paid solution as the implementation has let them down, and people drown in the data and data issues. Focus on the human capital first, and apply the most suitable tool to it to derive the insight we all crave.

Pages