Drupal Analytics

Events happening in the community are now at Drupal community events on www.drupal.org.
jtsnow's picture

Overview: This project will improve the tracking and analysis of website traffic in Drupal.

Description: There is a need for better traffic analysis in Drupal. A possible implementation of this project may involve writing a script separate from Drupal that accesses server logs directly. Other features may include an API to allow contrib modules to log events or a way to send statistics to third party tools, such as Google Analytics.

Some ideas:

  • Provide both client and server side tracking APIs.
  • Access server logs to get statistics for files and requests not served by Drupal (JavaScript, images, other files).
  • Be aware of Drupal's page caching mechanisms and work around those.
  • Track actions- Use actions as 'goals'?
  • Track JavaScript events.
  • Extensibility: Allow analysis plug-ins to be made.
  • Performance is a big factor! Traffic tracking shouldn't produce a too big of a load on the system, whether it is database size or performance when actually loading the page.

Please share your thoughts and ideas.

Mentors:

Difficulty: Hard

I'd like to propose an idea for a project that improves the tracking and analysis of website traffic in Drupal. Here are some thoughts for the project:

Comments

Interested in the discussion

dldege's picture

We battle this a lot so I'm open to the idea and discussion.

I see pros and cons to trying to move this type of analysis into your own site and databse vs. sending all this sort of data to say google or yahoo who can store the data (which can get to be a lot of rows) forever and provide really polished dashboards and reports. However, I think there are also limitations in Google Analytics.

Another idea for tracking would be email click through for CRM type campaigns, etc.

We have considered looking into integrating Drupal with Mondrian. You might look at that a little and see what you think.

Dan DeGeest
Lead Software Developer
iMed Studios
http://www.imedstudios.com/labs

Dan DeGeest
Software Developer
Somewhere or Another

Good idea! I'm not entirely

kleinmp's picture

Good idea!

I'm not entirely clear about what you mean by using actions as 'goals', though I am intrigued.

Matthew Klein
matt@zivtech.com

Actions as Goals

jtsnow's picture

Tools such as Google Analytics allow you to set up goals for your visitors. For example, User Registration, Mailing List Signup, or making a purchase are common goals. Google Analytics will show you things like the path users took to complete the goal or at what point they stopped before completing the goal. This is useful for analyzing, for example, what parts of the checkout process on an e-commerce site need to be improved.

The ability to mark an action as a goal would be a simple way to set goals for visitors.

I've had the same thought

rwohleb@drupal.org's picture

Google Analytics, for example, is great. However, it's often a lot more than most people need, and can be confusing to use to the average Joe. It would be nice if there was a purely Drupal option that went beyond any current metrics.

The new system could still use a post-request AJAXy tracker, like Google Analytics, to keep page requests responsive. It could hit a new script that falls outside of the Drupal menu callbacks, but instead handles its own bootstrap. The bootstrap would be minimal and just get us access to the base system (eg. DB). If we keep actions down to a simple DB insert, it would limit overhead and table locking issues.

Now that the request is logged, there's still the issue of processing. We could consider the incoming request table as a queue, and do periodic post-processing to 'clear it out' periodically via cron. The post-processing would contain all of the intelligence that considers sessions, goals, metrics, etc. The trick here would be efficiently processing data so that processing the queue table wouldn't lock the table too much on active sites, and would be able to process the queue fast enough to keep it from eternally growing.

On more active sites, where there is more control over the DB, there would be options to increase performance. For example, with MySQL, the InnoDB engine, Insert delayed, Delete Quick, and possible Delete Low_Priority could be used.

On issue with the client

dldege's picture

On issue with the client side tracking is that it makes it impossible (at least hard I'm sure there is a solution) to track things like RSS feeds, AJAX only requests, file downloads, and so forth. I think the system should provide both client and server side tracking APIs.

I agree with you about the analysis - this is where Google wins right now however, it can be hard to find what you want in GA and definitely more then needed for many sites but there is a lot of number crunching done on the data. The 24 hour data lag is also a big drag especially during development. Certain metrics always come up, time on page, time on site, etc. that required computation on the raw logs. Perhaps a big consideration for such a project could be module extensibility for creating new analysis options that can be plugged in rather then developing said analysis.

Dan DeGeest
Lead Software Developer
iMed Studios
http://www.imedstudios.com

Dan DeGeest
Software Developer
Somewhere or Another

Excellent comments. I'm glad

jtsnow's picture

Excellent comments. I'm glad to see that people have already put some thought into this. I definitely like the idea of having both client and server side tracking APIs.

Also an interesting idea about having analysis plugins.

Awesome idea! :)

mitchell's picture

I currently use three separate modules for this. Firestats and Piwik, but I think Statistics Pro would be the most helpful if your goal is to keep the code in Drupal. There are a number of projects already out there for this type of stuff, but I think building on Drupal's contrib repo would yield the coolest result (views, views charts, actions, ajaxy stuff). I'm definitely with you on the hard categorization, so best of luck there.

Your perspective on what information the users after after is impeccable, but the scope of this project seems large, yet doable. I'll be very interested to see how this proposal comes together, and I'll keep an eye out for possible mentors.

Tying it all together

therzog's picture

The challenge we've been struggling with is tying tracking and analytics from separate sources all together into a "5,000 foot view" that would make sense to non-experts, but still allow deeper dives for people that are really into the tools like google analytics. On our site, some users ask big-picture questions like "what's the trend in site visits" or "what are the most popular site sections" while other people are more in the weeds: "how many page views on my project/story page" or "who is linking to my page"

I agree that google analytics is great, but too much for most people, and it can be cumbersome to use. So part of what we're doing is developing a module that scrapes our analytics account, taking advantage of the fact that nearly every report can be exported to CSV. So our drupal site keeps a cookie so that google thinks it's logged in, and uses curl to send requests like this:

https://www.google.com/analytics/reporting/export?id=YOURACCOUNTID&pdr=2...

(or something like that). Then it parses the result and displays as a table or google chart.

So essentially it's a front-end to google analytics that can be displayed in blocks or pages within drupal. This lets us generate simplified or aggregated analytics data that is still consistent with the "deep dive" reports that we generate, and we can integrate with information from other tools like Digg.

Here is a sample:

That looks pretty awesome.

slip's picture

That looks pretty awesome. Have you posted the code? Are you still developing this and thinking about making a module?

Status?

kristen pol's picture

Wondering if this ever went anywhere... ??

Looks useful!

Kristen

No updates

therzog's picture

There is a module that implements the Google Analytics Data API.

I haven't looked at it to see if it could be used to implement the dashboard I posted above. However, the API is about 80% of the guts behind the dashboard. The remainder is just using an API to query GA to get analytics data that matches certain criteria (e.g., based on a URL pattern) and theming out the result as a table.

PHP tracking

Aveu's picture

I noticed a while ago that my Apache log stats vs my GA stats showed significant differences and in looking into this have come to the opinion that a lot of people are using various methods to avoid web tracking such as the AdBlock and NoScript extensions for FireFox. The common factor is that almost all current tracking systems depend on javascript and as more users block or turn off js on their browsers then our stats become less and less accurate.

Since Drupal is PHP based I began wondering why not develop a tracking system that is server based? Yes that has drawbacks but overall it would allow for better onsite tracking (logging detailed page-to-page transit path patterns for example) and at the very least would ensure that all visits have some level of logging outside of the Apache logs. I have run across a open-source (GPL) tracking system called YMMV (Your Mileage May Vary) that could be the seed for such a PHP based tracking Drupal module. I keep tinkering with it but really haven't had time to do any serious work. Here is the link to the project ... http://www.adrianspeyer.com/YMMV/

What do you think?

What Became of Drupal Analytics?

gofair's picture

@jtsnow: How did you get on with your GSOC application? did the idea ever progress to an project?
@therzog: love the dashboard queried from GA, any chance you could share some of your code?
@David Reid: What has happened with http://drupal.org/project/analytics ??

If no progress has been made on building a decent Drupal-Specific Analytics module, we should formalize the requirements and get the project off the ground. IMO it needs to keep super basic stats server side and then compliment them with GA.

What is the consensus on 3rd Party analytics modules for Drupal?

  1. Firestats : http://drupal.org/project/usage/firestats - 176 sites report using the module. Any reason why it does not have more users? (bad marketing or limited features?)
  2. Stats Pro : http://drupal.org/project/statspro - 878 sites - Flat growth. But looks like it could be good. Good views and charts integration but no demo site or screenshots to get people interested in using it. Is this deliberate to have few support queries from noobs?
  3. Piwik : http://drupal.org/project/piwik - 2518 sites report using (and rising steadily!) An Excellent alt to GA but appears to require server admin to install the Piwik server software: http://piwik.org/docs/requirements/ - so might not be appropriate to "small" Drupal sites on shared hosting... we've all been there... :-p
  4. Google Analytics : http://drupal.org/project/google_analytics - obviously has 100k+ sites using - the downside of Analytics: Someone else has all your data. Fine for most sites, not so good for Gov, and "sensitive" sites where data/stats have to be internal. And also does not capture for users with NoJS.

So how are we to proceed? Contribute to one of the existing projects? (Stats Pro)
or start fresh?

Essentially I want to build custom dashboards with site-specific goals so that non-web savvy website manager(s)/owners can track data. GA is way too complicated for non-technical users so it simply gets ignored, so need something (even) simpler... Has anyone managed to do this?
Cheers.

SoC 2009

Group categories

Admin Tags

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: