About me: I'm a PhD student at the University of Michigan School of Information, with a research focus on designing recommender systems. I'm also a Drupal fan, and have been using Drupal and developing for Drupal for 3 years. Last year I participated in GSoC 2009 and developed the recommender bundle modules. For more information about me, please visit http://michiza.com
Overview: The proposed module will make content recommendations using Google Analytics tracking data, and strive for a balance between "exploitation" (recommending popular contents) and "exploration" (recommending new and trendy contents) using the "multi-armed bandit" (MAB) algorithm
Description:
The module will display a "Top Stories" block, as shown in Figure 1. It is similar to the "Today" block on my.yahoo.com that shows the top stories of today, as shown in Figure 2.


The 5 links in the "Top Stories" block is a mixture of popular items (that receive the highest clicks rate in Google Analytics), and new items (that may or may not receive high clicks rate). If we only "exploit" to show the known popular items, then we might miss the other new items that might be more popular. On the other hand, if we only "explore" to show new items, then we might run the risk of showing too many not-so-good items. The key challenge here is to select 5 "best" items from a large and changing pool of items based on real-time learning Google Analytics data. The "multi-armed bandit" (MAB) algorithm is proven to be the best machine learning approach to balance exploitation and exploration that maximizes overall clicks rate for the block in the long run. In fact, the "Today" block on my.yahoo.com uses MAB as well.
In addition to showing "Top Stories", this module can easily be used to show "Top Products" for e-commerce sites, "Top Videos", "Top Images" for media sites, etc.
To implement the module, I plan to do 4 things:
1. Implement the MAB algorithm in the Recommender API module. I have already implemented 2 versions of MAB algorithm in Python for my research. Just need to re-write it in PHP and integrate into Drupal. The two versions of MAB algorithms are Auer, 2003 and Brezzi & Lai, 2002
2. Retrieve pageivews and clicks stats from Google Analytics using the GA Data Export API. There is a GA API module that does the job, but extra work is needed to integrate that module into the MAB algorithm.
3. Make this MAB module work with Taxonomy. That is, for each taxonomy category, it can show a "Top Stories" block only for that category.
4. Integrate with the "Views" module. That is, the "Top Stories" block should also be able to show pictures, cck_fields, etc, for customized display. An issue is already created for here.
Mentors:
local mentor: Michael Hess (mlhess@drupal.org)
Contact Details:
email: danithaca@gmail.com
Skype: danithaca
IRC: danithaca (usually at #drupal-infrastructure)
Difficulty: Medium
The MAB algorithm was already implemented in Python. Just need to integrate it into Drupal/PHP.
You comments or suggestions are much appreciated. Thanks!

Comments
Very interesting project!
Very interesting project!
Although I agree that this
Although I agree that this sounds very interesting. I do wonder how much different this is from the projects you did last year, it is attacking a new topic but the challenges seem very similar. With that in mind, wouldn't it be interesting to either extend the scope of the project. Or to bend the focus of the project on an data visualization / workflow issue.
For example the issue queues of Drupal suffer from a lack of context, and usually when creating an issue you spend quite some time looking if there is a same issue. Wouldn't it be awesome if it would suggest similar issues? With this fixing a workflow issue.
Data visualization could come from many aspects, but in terms of machine learning it should learn to create smarter connections between data? For example all the module data that we have, it could create a connection between what kind of modules people are using, what combinations - and make proper suggestions on that.
thanks for the suggestion
@Bojhan: thanks for the suggestion. I'll think about the "workflow" and "visualization" improvements, and perhaps submit a separate proposal.
In terms of the difference from the other recommender modules I developed last year, I guess there are 2 major differences. First, this module will take input from Google Analytics tracking data, which could also be used by the recommender modules I developed last year to improve the recommendation results. Second, this module tries to balance recommendations on "popular items" and "new items" for all users. In comparison, the modules I developed last year only recommend "relevant items" based on registered users' previous history.
extra work
To make it a more challenging work for me in the summer, I'll do some extra steps:
The outcome of the proposal would be a new "top_nodes" module and various improvements to [[http://drupal.org/project/recommender|Recommender API]] module and [[http://drupal.org/project/history_rec|Browsing History Recommender]] module
Sounds awesome, looking
Sounds awesome, looking forward to this project. Hope you can find a mentor to work with this
anyone interested in becoming a mentor?
please contact me at danithaca at gmail dot com. thanks!
Daniel, I'm possibly
Daniel, I'm possibly interested in mentoring your project. I'm still waiting for all the applications to come in.
Senior Drupal Developer for Lullabot | www.davereid.net | @davereid
Dave: Great! Thanks for
Dave: Great! Thanks for considering this. Let me know if you have other ideas for this one.
timelines
Interesting idea. I'd be
Interesting idea. I'd be interested in being a possible co-mentor on this. I've done some work recently against the Google Analytics API in Drupal, so I can give some direction on that.
sounds good!
Thanks rwohleb, let's see if it's got accepted :)