Counting anonymous node views

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
yhager's picture

On a D5 site with high (registered) traffic, there is also a requirement to count node views by anonymous visitors.

The existing setup is using Boost, and node view count is done using an ajax call on the pages, into a hook_menu that increments the counter.

Serving static files through boost does relieve the load off Drupal, but the ajax call bootstraps Drupal in full afterwards, just for the sake of counting. To me it sounds like removing boost alltogether from this setup, and letting Drupal page cache + memcache serve the data quickly AND do the counting will give the same effect, with less hassle.

How would you implement anonymous node views counting?

Comments

markus_petrux's picture

Just make sure to send non-caching headers.

If you need to pass arguments, and you wish to hide the image is really a script. Then you can build the filename dynamically like "node-xxx.gif" and then use mod_rewrite to transform the request to something like "node-counter.php?nid=xxx".

This script just needs DB access, probably, so you can do something pretty light if you use the PHP driver itself. Or if you need Drupal, then you can just bootstrap to phase where DB connection is opened.

Bootstrap

yhager's picture

So if we need a drupal bootstrap anyway, then Boost does not really off-loads the PHP/DB layers.
With this reasoning, I figured, why not just let drupal page cache do its work, cache the pages themselves on a memcache server, and do the whole stuff (serve the page and count views), within a single request to the web server.

markus_petrux's picture

Booting Drupal to DRUPAL_BOOTSTRAP_DATABASE requires just a few included files and opening the DB connection.

If you wish to save the included files overhead, then you can use mysql_*() functions provided by PHP directly.

An alternative approach is

cfuller12's picture

An alternative approach is to use page fast cache and memcache as you're doing, store the counter in memcache and then dump the stats to the db via cron. This approach does require a small hack to whatever caching mechanism you're using (cache router, memcache module, etc.) but it works well and allows you to have the fastest architecture for anonymous page loads.

It's all ball bearings these days...

Yes, that would be

yhager's picture

Yes, that would be super-fast, but I am not sure how it will work for multiple web servers, due to the non-locking nature of memcache.

Just share the memcached

wuinfo - Bill Wu's picture

Just share the memcached service amonge those webservers.

Using mod_rewrite sounds

dalin's picture

Using mod_rewrite sounds unnecessary. You can use JavaScript to create whatever URL with query already attached. The PHP file can then just bootstrap the DB and nothing more. I don't think you can get too much lighter than that.

--
Dave Hansen-Lange
Web Developer
Advomatic LLC
East Asia Office
Hong Kong

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

1px image is currently on

mikeytown2's picture

1px image is currently on the table...
http://drupal.org/node/331537#comment-1561040
I was looking at using it for page expiration of views, but using it as a counter would work as well.

<img src="1px-image.php?nid=xxx">

is how I would do it.

Prototype Code

mikeytown2's picture

I have some prototype (never ran) code in case someone wants to test it out.
http://drupal.org/node/422620#comment-1722248
Please report in the above issue.

Working code in CVS

mikeytown2's picture

latest code is in boost CVS. 6.x-1.x-dev 2009-Jun-21 Is the version you want if downloading it. Need to add in javascript so I can grab referrer.

Although outside of Drupal

jredding's picture

Have you considered just grepping the Apache logs and throwing the results back in Drupal. It wouldn't be real-time but if you already have something like awstats grepping your logs then its just a quick add-on to that process. In fact you could just let awstats do all the grepping then steal its results and plug them into Drupal.

That is unless you've turned off Apache logging.

-Jacob Redding

-Jacob Redding

Parsing webserver logs

furmans's picture

I'm using Boost for a Drupal 5.x site and have built something like exactly like this for use with the nginx webserver. I have nginx set up to log the cached files directly to a special log file. Periodically, a cron script will parse the log file and update the correct Drupal tables with the current read counts.

At the moment the implementation is specific to nginx and a UNIX/Linux, but I'm assuming it could easily be adapted for use on other systems.

Using cacherouter?

jtrudeau's picture

I made some similar changes for Drupal 6 and cacherouter which store the anonymous views for each uri in a data structure in memcached, then write out this data to the database when cron runs. There is also some basic logic to try and not loose any additional views while the cron process is running.

http://drupal.org/node/462844

Regarding your comment about possibly loosing some of these views in a multi-server memcached environment: currently you're unable to capture this data, so even if the captured data is 90% accurate that should be fine for identifying trends, right? The only caveat here is that the node statistics are not updated in real time, but we think it's a nice balance between performance and functionality.

Contribute Back

mikeytown2's picture

If anyone codes this up, would you mind posting it? Especially if your using the 1px image; odds are it will help boost out (prevent code duplication).

While I don't typically use

Jamie Holly's picture

While I don't typically use boost, here is one method I employ to handle ajax calls without fully bootstrapping Drupal. The AJAX callback is handled by an actual PHP file instead of a menu handler. That file bootstraps the database:

include_once('includes/bootstrap.inc');
drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE);

Now the only queries ran are the ones I actually send and things like loading all the activated modules doesn't occur. It's a lot more efficient this way.

Another option is using Google Analytics. If you can handle the data being up to 24 hours stale, then I would look at the Analytics API. Cron can be set up to retrieve the counts from the Analytics API. I do something similar on a site to track views of internal ad campaigns, using Analytics event tracking.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Clarification

mikeytown2's picture

There is a reason why the Boost menu callback is "boost_stats.php"; you can replace that with a file.
http://drupal.org/node/545908#blocks

Wouldn't this solve the

Fidelix's picture

Wouldn't this solve the problem for good?
http://drupal.org/project/google_analytics_counter

No load at all for your server, except for reading the counter.
I hope it gets ported to D7 soon.