Website Statistics & Analysing Traffic LogsWhat's a hit, a unique, a bounce rate? Don't assume others know what the terms mean: Many unsophisticated webmasters use "hits" when describing the number of visitors they've had, and some use it to refer to page views. Neither is correct. A page like this with several small images and calls to CSS files could show up in the stats as 20 hits each time the page is loaded in a browser. However, the hits number does look more impressive and is often used in auction listings and sales memoranda to "big up" the site being sold. A savvy buyer would not assume the seller is using terms correctly. Understand the traffic terminology yourself: Unless the webmaster is himself familiar with the terms he's at a disadvantage with assessing traffic stats. In brief: a visitor is a human visitor (as opposed to a bot) from a particular IP. He is considered a unique visitor unless he visited very recently already in which case he's a repeat visitor. Each time he loads a page in his browser it's counted as a page view. When the average number of page views per visitor is high the site is said to be sticky. Each time a page is loaded it counts as multiple hits where the hit count depends on the number of independent elements on the page. The bounce rate is the percentage of visitors who arrive at a page and leave shortly after without visiting other pages on the site. The terms hits, uniques, visitors, impressions, page views, bounce rates, referrers etc are covered in more detail elsewhere: 1 | 2 | 3. Understand the rest of the stats terminology too: When it comes to analysing statistics numerous terms mean different things in different stats analysis packages, but these are constants: Referrer, Path, Entry and Exit Page, Server Response, HTTP Status Codes, Server Logs, User Agents, Bots, Spambots, Scrapers, Email Harvesters etc., are defined on our Glossary Page.
Understanding what the logs are When a request is made of the server for a file, the server records various facts. These include the IP the request is from, date and time of the request, the referrer, the operating system and browser version, the user agent, the screen resolution and whether javascript is turned off. It also records what was served - from the response code to the number of bytes. These records are created not just the first time someone lands on a site but also each time they click a link on the site to navigate within or away from it. Anatomy of a server log entry The log entries are saved as simple text. Each line of text represents one request of the webserver. Here's a rough example: 102.111.111.111 - - [20/Apr/2009:00:23:48 -0400] "GET /1062-confirm-site-traffic/index.htm HTTP/1.0" 200 12451 "http://www.experienced-people.co.uk/1058-due-diligence/" "Mozilla/4.05 (Macintosh; I; PPC)" This is what the fields are: IP Address: 102.111.111.111 - This is the IP of the machine that requested the file. It could have a name instead of a number. Sometimes this field is blocked by the machine itself for privacy reasons Date Stamp: 20/Apr/2009:00:23:48 -0400 The record of time the request was made. File Requested: GET /1062-confirm-site-traffic/index.htm Status Code: 200 - (see Status Codes described elsewhere on this page) Bytes downloaded: 12451 Should normally correspond with the size of the page Referrer Page: The visitor came from http://www.experienced-people.co.uk/1058-due-diligence/ User Agent: Mozilla/4.05 (Macintosh; I; PPC) Details of the software used to access the file.
So what do traffic statistics programs do? A good stats package or analysis program can take that raw data and collate it for easy presentation and digestion. It can create graphs, cross reference data to extract how many visitors to a certain page used a particular browser etc.
WHERE'S THE GOLD? Realise the wealth of information that's available in the stats Usually when the client logs into their account they select a time period and can see and/or download graphs and charts showing, for example - number of uniques per day But there's a lot more to be gleaned from the stats. Check Geographic Distribution of Traffic: Traffic from countries like the US, UK and Australia is considered more valuable than traffic from places like China and India. It's an economic justification - advertisers pay more for traffic in the rich countries. Check Time Distribution of Traffic: The stats could tell you if the traffic is seasonal. Proxy sites, for example, tank during holidays when their main users, students, are not in school. When buying a proxy site just before the holidays the buyer needs to know that the level of traffic is not going to continue as it was during the last two months. A tech site stats may likely show that traffic dips sharply every summer. A gift site's stats may show just two peaks in the year - Christmas and Valentine's day. The traffic stats provided for November to February cannot be extrapolated to make an estimate of annual traffic levels.
How do sellers prove traffic? Most sellers have at least one of the below and many are likely to have multiple stats records. It goes without saying a buyer using multiple inputs in his due diligence builds a clearer picture of the site's visitor activity. There are essentially two main routes to storing traffic, visitor and referrer history. One involves server logs (and their analysis via programs like AWStats and Analog) and the other involves javascript tracking code on each page of the site.
Option 1: Most hosting companies provide some form of free stats. They have agreements with stats companies like AWstats, Analog and Webalizer and have one or more of these programs installed on their servers monitoring their clients' traffic and analysing the server logs. Sellers of the smaller site may just provide stats screenshots rather than access to the stats. Caution: Photoshop has made altering images quite easy. It's preferable to get full password access to the whole statistics data and some sellers do provide this. For those that don't we have a handy tip (next page) that may give you full access to the stats anyway!
Option 2: Some websites have javascript tracking code (or page tags) on every page. The most used code by far is from Google Analytics followed by Urchin (now merged with Google Analytics). The Google Analytics code bypasses the server logs altogether and the javascript on each page does the logging of all the visitor details and stores them on Google's servers. Webmasters - and any "guests" they designate - can monitor the website activity via a browser. Advantages with this method: Javascript based logging is much more reliable than the server logs. Further, with Google Analytics there is functionality far beyond that offered by the server logs (and analysed by programs like AWStats). For example, the webmaster can set goals and tracking to see how well his different advertising campaigns are performing. He can also integrate his Adsense account with the Analytics account to give an even clearer picture on the earnings and which pages they are derived from. Obviously, any extra tracking or functionality enabled is more data for the prospective buyer. Disadvantages: It's not every site that has javascript tracking and... those that do may not have the script on every page. Another disadvantage is that the script can't track visitors who have javascript turned off in the browser (anywhere from 10-0% depending on whether the typical visitor is web savvy or not). This understates the traffic by a small amount. Should a site have Google Analytics, it is normal for the buyer to insist on access to the Analytics login. Other traffic programs may allow guest access as well. (How to enable guest access in Google Analytics: Sign into your Analytics account > Click on your URL in the main table > Click on "User Manager" just below the table > Click on the "Add User" text to the right of the dark grey bar > provide the guest's email address, select the url in the lower box and click on "Add".)
To work on the raw stats and make any sense out of them takes an analytics program such as 123LogAnalyzer.
Caveats 1. While javascript tracking under-counts the visitors by a small amount - by excluding those with javascripts disabled - server based logs overstate the traffic by a huge amount. A lot of apparent visitors are actually not human but bots and they don't add value to the site. Some stats analysis packages allow you to strip the main bots out - here are some of the bots you want excluded from the figures. 2. Bear in mind that if you are dealing with a wiki, forum, blog etc it may not be just about the number of visitors - e.g. blogs have RSS subscribers who don't show up in the server stats. 3. Don't forget to inspect the error logs - they offer a lot of useful information on where errors were encountered in the normal operation of the site (due to factors like faulty scripts or missing pages). For example, they may disclose that the site doesn't have a robots.txt file or that a particularly high traffic page has now been removed (perhaps it had a copyright violation and the hosting company forwarded a DMCA warning!) A program like the free Error Log Viewer may come in handy to make sense of error log files. 4. Site owners signing up to free versions of stats programs like StatCounter may have limited stats available for inspection and these may go back for only a short time. 5. Multiple sites can be hosted under a single Virtual Private Server (VPS) or hosting account. In these cases the traffic logs are often an amalgamation of figures for multiple sites. The obvious danger is in the misreading of the figures as pertaining to a single site. 6. The same thing can happen with Google Analytics. It's not unknown for sellers to use the same Google Analytics code on multiple sites. When analysing the GA stats all that traffic supposedly coming to Site A may actually be aggregated traffic from sites A, B, C and D. Tip: Hire an expert to help with your due diligence Next Page: Due diligence on traffic logs << Basics of Due Diligence on Buying Web Businesses
|