Posted by kandrupaler on September 26, 2016 at 12:05pm
The core statistics module gets a lot of bad press because it triggers a db write on every page visit. But how bad is it, really? There are several functional advantages of using it...
Are there any high performance sites which actually use the module? Do you use it?
How about the Node View Count module (https://www.drupal.org/project/nodeviewcount)?
Comments
You don't want any writes at
You don't want any writes at all to the database during your page requests.
The reason is:
You can scale up reads (add DB slaves), but not writes as much.
One frequent write can bring down an otherwise very healthy site.
To do statistics right, the easiest is to use a persistent but transient store.
e.g.
One possibility is to use APCu, to store things, then have a PHP script that gets back that data.
e.g. as simple as:
<?php
apcu_add('my_statistics:node:[id]', 1);
?>
Then have a script that uses an APCuIterator to retrieve all things prefixed with my_statistics. (and obviously it needs to be the same webserver, so if you have multiple you need to collect statistices from all then count them together)
Boom Big Data problem solved ;) ...
Be aware that before APCu 5.0.0 the apcu_add() can fail and you need to then instead do a apcu_store().
I think the code goes somewhere along the lines of:
<?php
if (!apcu_add('my_statistics:node:[id]', 1)) {
apcu_store('my_statistics:node:[id]', 1);
}
?>
Hope that helps :).
Oh this looks like a great
Oh this looks like a great approach! The only problem is that we're not using APC. We've got Opcache enabled, so didn't think there'd be a need for APC...
And probably you could use
And probably you could use nodeview_count, but replace its hook_menu item with a custom very lightweight PHP script path instead, which counts nodes and then batch inserts them on cron into the database.
The ajax request is needed anyway - unless you only want to count node views of uncached authenticated users. (and then that won't work with authcache).
But the ajax request could also be a little nodeJS thingy instead ...
There are multiple
There are multiple specialized analytic tools out there which beat core statistics in any regard (E.g. Piwik and GA). Decoupling analytics from the CMS allows you to offload that workload to other machines or to a service provider.
It also helps you to keep a history of metrics even when you change your publishing software.
If you want to reuse analytics data in your site in an automated fashion, then you always can pull in the required metrics via a reporting API (Piwik, GA).
Sure, we're using GA and the
Sure, we're using GA and the Google Analytics Counter module to fetch data. But we can't get real time statistics with this setup. We run cron every 3 hours but that's not real time...
If you need real time
If you need real time tracking, you could use another stats provider like Stat Counter which will give you more data rather than the curated data Google provides.
--
Portland Drupal Developer
Even GA gives realtime data,
Even GA gives realtime data, but the difficulty is to get that data back into drupal. Is it possible / easy with Stat Counter?
GA's real time really isn't
GA's real time really isn't good for this, given how it actually tracks the data. For example, you can query the metric rt:pageviews and dimension rt:pagePath to get views, but that's views per path in whatever time frame real time reporting figures on, which there's no real way of knowing.
StatCounter may present the same problems looking at their API (http://api.statcounter.com/docs/v3) in that it doesn't appear you can do a timeframe. Given how StatCounter retains data depending on your subscription level, this could cause for bad data as well.
Like mentioned above, some sort of persistent store is best for this. APC (APCu for PHP 5.5+), memecache, Redis or even something simpler like a file storage solution (simple file or Sqlite). To keep with performance and not interfere with Drupal's page cache you can either put a blank image tag on your pages that directs to a PHP script with the NID either in the URL or as a query variable, then in that script record/store the hit. Then to display the views (if you want them publicly visible), do a javascript call from the browser to another script to retrieve them.
Another solution I have used and last night handled over 200,000 page views in an hour (political site during the debates) and only peaked at 1% CPU usage and 54mb is a constant running Node.JS script. Then you don't have to worry about the persistent stores above, as you can store the views in a variable, then dump them to MySQL or whatever on a cron job.
HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.
Thanks for the great options.
Thanks for the great options. Your nodejs setup looks very interesting! Right now I think we'll go for a memcache based solution since we already have it and it's only half used at any given point of time.