... and I do not have any hair to spare!
We run a Drupal site that is getting about 60,000 page views a day. We use Google Analytics and AWStats to analyze our traffic, and the Google Analytics page views are coming in about 13% lower than AWStats. This is after we have aggressively filtered AWStats to make sure we're only counting actual page views.
We tried loading single pixel images two ways at the same time, first with the IMG tag, and then with javascript. While the numbers did not match perfectly, they were never more than 3% off, with the IMG totals being slightly higher overall. This would suggest that at most only a very small number of our site visitors have disabled javascript.
We loaded small (<1K) images at the top and bottom of all pages, and see a 4% dropoff of requests for the top image over the bottom images. This would suggest that 4% of our site visitors are bouncing before the bottom of the page loads. We are about test this theory with one pixel images to see if the load size / load time has any effect on the %.
We are also experimenting with loading different types of cookies to see if we can isolate how many of the folks that visit our site have disabled cookies.
Finally, I am doing some research to see if I can get some numbers of the difference between the effectiveness of the Google Analytics and AWStats bot filtering.
For those of you that have tackled this issue - where else should I be looking? Is there anything specific to Drupal I should be looking into?
Thanks!

Comments
Not too uncommon
Andy,
I ran Omniture, Google Analytics, and a log based solution for a month. Omniture and Google analytics were nearly identical, however I noticed a similar discrepancy between the javascript based solutions, and the logfile solutions. There are a few things that lead to this...
Mike O.
Hmm ...
Mike:
We have AWStats configured to rather agressively filter bots but, of course, there is no way to compare it to GA's filtering. Browser type does not seem to affect the numbers much.
AWStats is log file based, so all HTTP(S) GETs and HEADs are logged. The 64 dollar question is - what percertage of site visitors have caching browsers or caching proxies that do not send a HEAD. And there is always the wild card of AOL that can simultaneously make their members look like more and less visitors than they actually are. We ran an analysis of how many of my site visitors use browsers that are not running / have disabled javascript, and the answer appears to be less than 3%.
We have the GA js just before the /BODY tag. This is because that is where Google recommends putting it, and because if it is near the BODY tag and the GA js is slow to load it holds the whole page up. My reason for putting the images at the top and bottom (with the bottom one right near the GS js) was to see what the bounce rate was before the GA js ran. Interestingly, the early results of the one pixel images are that they both load at about the same rate.
Thanks for google analytics module suggestion - we'll check this out!
Andy Forbes
CTO, WorthPoint
http://www.worthpoint.com/
andy.forbes@worthpoint.com
Andy Forbes
CEO
Yeaton Partners
(e) aforbes@yeatonpartners.com
(w) http://www.yeatonpartners.com/
People can "disable" Google
People can "disable" Google Analytics tracking and still have JavaScript enabled, e.g. by blocking requests to the analytics server using their systems' hosts file. But probably only very few people do this.
Server time outs of analytics servers may also play a small role. Still this and the aspects you mentioned seem to not account for a 13% difference.
--
ramiro.org
We have a similar difference
We have a similar difference in our our traffic reports between AWStats and Analytics. My guess is that the real number is probably somewhere between the two, accounting for the various subtleties of both.
If you do have a more conclusive or scientific method, I'd like to know. The standard for site traffic audits is the ABCe. They do not accept Analytics as reliable, but rather will collect your log files and process them themselves and make a decision. As it costs some £2000 to do this, it would be nice to know what the figures are likely to be before doing it.
Server time?
A large proportion of the discrepancies can usually be put down to the time an analysis starts and ends
If you look at a whole month you often find the numbers are fairly close
Unless you happen to have your server set to 'Google Time' (whatever that is) you'll regularly see a drift between numbers if examined daily as GA will use CA time while your server will use local time
This is particularly annoying in certain time zones as your peak traffic time may occur yesterday or tomorrow as far as GA is concerned
For example our server is set to GMT - if someone looks at our site when they get to work GA thinks it happened yesterday (9 hour timeshift)
Specifics as to why differences are important?
All the reasons above are valid as far as I can see to why there might be some differences.
Do you use the GA code on error pages?
By default the Analytics module excludes user directories, admin and track I believe.
Also there are some options for including site search which could inflate / deflate page views.
This is because the site search tracking is done using the following example:
pageTracker._trackPageview("/search/node?search=my search");
Likewise if you track pdf downloads / clicks on outbound links etc.. You are creating additional page views.
You can use GA filters to exclude these pageviews if needed.
If you are trying to get the numbers to match in different stat packages what is the reasoning?
I believe the GA module allows you to cache the GA.js code locally which may help speed up load times.
Also you can include the GA tracking script at the top of pages, but keep in mind that if someone doesn't let the page fully load and you put the ga code at the top of the page it will count a page impression even if the page didn't fully load. Not sure if that would work for you or not?
Finally if you have a particularly busy website, GA will sometimes show a sampled data due to the load required to show full detailed reports.
PHP Files - Upping AWStats Counts
In my search to find the difference between our AWStats numbers and our Google Analytics numbers, I happened across an explanation that actually makes sense to me: AWStats (presumably) records each PHP file as a pageview while GA does not. So, the more PHP references you have in a page, the more pageviews AWStats will register. Sure, there are probably issues with the GA code being blocked by browsers, etc. but it seems that the GA numbers are going to be more reliable overall. Mind you, if you are relying on pageviews for online ad sales, the AWStats numbers are preferable but not necessarily honest. Hope that helps...
This doesn't make sense.
This doesn't make sense. Awstats parses server logs, which contain records for requested URLs. There are no records for included php scripts or do you mean something else?
--
ramiro.org
Piwik
Has anyone done any comparisons with http://drupal.org/project/piwik? It's the non-Google Google Analytics that's installable locally.