Top Nodes with Google Analytics: balancing exploitation and exploration for content recommendation using "multi-armed bandit" algorithm

Events happening in the community are now at Drupal community events on www.drupal.org.
danithaca's picture

About me: I'm a PhD student at the University of Michigan School of Information, with a research focus on designing recommender systems. I'm also a Drupal fan, and have been using Drupal and developing for Drupal for 3 years. Last year I participated in GSoC 2009 and developed the recommender bundle modules. For more information about me, please visit http://michiza.com

Overview: The proposed module will make content recommendations using Google Analytics tracking data, and strive for a balance between "exploitation" (recommending popular contents) and "exploration" (recommending new and trendy contents) using the "multi-armed bandit" (MAB) algorithm

Description:

The module will display a "Top Stories" block, as shown in Figure 1. It is similar to the "Today" block on my.yahoo.com that shows the top stories of today, as shown in Figure 2.


The 5 links in the "Top Stories" block is a mixture of popular items (that receive the highest clicks rate in Google Analytics), and new items (that may or may not receive high clicks rate). If we only "exploit" to show the known popular items, then we might miss the other new items that might be more popular. On the other hand, if we only "explore" to show new items, then we might run the risk of showing too many not-so-good items. The key challenge here is to select 5 "best" items from a large and changing pool of items based on real-time learning Google Analytics data. The "multi-armed bandit" (MAB) algorithm is proven to be the best machine learning approach to balance exploitation and exploration that maximizes overall clicks rate for the block in the long run. In fact, the "Today" block on my.yahoo.com uses MAB as well.

In addition to showing "Top Stories", this module can easily be used to show "Top Products" for e-commerce sites, "Top Videos", "Top Images" for media sites, etc.

To implement the module, I plan to do 4 things:

1. Implement the MAB algorithm in the Recommender API module. I have already implemented 2 versions of MAB algorithm in Python for my research. Just need to re-write it in PHP and integrate into Drupal. The two versions of MAB algorithms are Auer, 2003 and Brezzi & Lai, 2002

2. Retrieve pageivews and clicks stats from Google Analytics using the GA Data Export API. There is a GA API module that does the job, but extra work is needed to integrate that module into the MAB algorithm.

3. Make this MAB module work with Taxonomy. That is, for each taxonomy category, it can show a "Top Stories" block only for that category.

4. Integrate with the "Views" module. That is, the "Top Stories" block should also be able to show pictures, cck_fields, etc, for customized display. An issue is already created for here.

Mentors:

local mentor: Michael Hess (mlhess@drupal.org)

Contact Details:
email: danithaca@gmail.com
Skype: danithaca
IRC: danithaca (usually at #drupal-infrastructure)

Difficulty: Medium
The MAB algorithm was already implemented in Python. Just need to integrate it into Drupal/PHP.

You comments or suggestions are much appreciated. Thanks!

Comments

Very interesting project!

skyredwang's picture

Very interesting project!

Although I agree that this

Bojhan's picture

Although I agree that this sounds very interesting. I do wonder how much different this is from the projects you did last year, it is attacking a new topic but the challenges seem very similar. With that in mind, wouldn't it be interesting to either extend the scope of the project. Or to bend the focus of the project on an data visualization / workflow issue.

For example the issue queues of Drupal suffer from a lack of context, and usually when creating an issue you spend quite some time looking if there is a same issue. Wouldn't it be awesome if it would suggest similar issues? With this fixing a workflow issue.

Data visualization could come from many aspects, but in terms of machine learning it should learn to create smarter connections between data? For example all the module data that we have, it could create a connection between what kind of modules people are using, what combinations - and make proper suggestions on that.

thanks for the suggestion

danithaca's picture

@Bojhan: thanks for the suggestion. I'll think about the "workflow" and "visualization" improvements, and perhaps submit a separate proposal.
In terms of the difference from the other recommender modules I developed last year, I guess there are 2 major differences. First, this module will take input from Google Analytics tracking data, which could also be used by the recommender modules I developed last year to improve the recommendation results. Second, this module tries to balance recommendations on "popular items" and "new items" for all users. In comparison, the modules I developed last year only recommend "relevant items" based on registered users' previous history.

extra work

danithaca's picture

To make it a more challenging work for me in the summer, I'll do some extra steps:

  1. Add GA support to the Browsing History Recommender module, so that it can show "Users who browsed this node also browsed" block using GA data in addition the the data in {history} table. An issue was opened at http://drupal.org/node/509848
  2. Visualization improvements: site admins can choose to show a picture (perhaps using some image module or cck module) in addition to the textual link in the "Top Stories" block.
  3. Add the "Contextual Top Stories" block (as Bojhan suggested): site admins can choose to anchor the "Top Stories" block to the current node, so that if a user is on different node, the user might see different contextual "Top Stories" block.
  4. Add "Make a suggestion" button: users can make suggestions to the "Top Stories" block if the block doesn't show the items they like. The suggestions will be taken into account in the MAB algorithms.
  5. (optional) Add mahout support in Recommender API module, if time permitted. This issue has a high request from the community: http://drupal.org/node/503212

The outcome of the proposal would be a new "top_nodes" module and various improvements to [[http://drupal.org/project/recommender|Recommender API]] module and [[http://drupal.org/project/history_rec|Browsing History Recommender]] module

Sounds awesome, looking

Bojhan's picture

Sounds awesome, looking forward to this project. Hope you can find a mentor to work with this

anyone interested in becoming a mentor?

danithaca's picture

please contact me at danithaca at gmail dot com. thanks!

Daniel, I'm possibly

dave reid's picture

Daniel, I'm possibly interested in mentoring your project. I'm still waiting for all the applications to come in.

Senior Drupal Developer for Lullabot | www.davereid.net | @davereid

Dave: Great! Thanks for

danithaca's picture

Dave: Great! Thanks for considering this. Let me know if you have other ideas for this one.

timelines

danithaca's picture
  • May 24 - 30 - setup development environment: generate test cases and scenarios, create modules stub on drupal.org, study D7 development so that the code would be written for easier D7 upgrade.
  • May 31 - July 4 (5 weeks) - each week works on task 1-5
  • July 5 - July 11 - contingency week and test
  • July 12 - submit midterm
  • July 12 - August 1 (3 weeks) - each week works on task 6-8
  • August 2 - August 15 (2 weeks) - contingency weeks and test; if time permitted, work on the optional task 9, and D7 release.
  • August 16 - "pencil down" date
  • August 17 - August 20 - polish up and submit final report

Interesting idea. I'd be

rwohleb's picture

Interesting idea. I'd be interested in being a possible co-mentor on this. I've done some work recently against the Google Analytics API in Drupal, so I can give some direction on that.

sounds good!

danithaca's picture

Thanks rwohleb, let's see if it's got accepted :)

Google Summer of Code 2010

Group organizers

Group categories

Important Announcement

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: