Making Drupal Smart: The Recommender Bundle.

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
danithaca's picture

Overview: The Recommender Bundle provides a set of modules that generate recommendations and personalized views in a wide range of areas. For example, "Customers who bought this also bought" for Ubercart, Facebook-like new friends suggestions for social network sites, Youtube-like related videos for media sites, or the classical example of generating personalized node recommendations based on users access history, and much more.

More details:The applications described above should be quite useful. They are the key value-add feature used in eCommerce sites such as Amazon, in social websites such as del.icio.us or Facebook, or in meida providers such as Last.fm, Youtube or iTunes Genius. Unfortunately, Drupal doesn't support this popular feature very well. Perhaps it's because the algorithms involve heavy math and quite sophisticated matrix calculation. In fact, Netflix offers one million dollars prize just to improve the math/algorithm to increase only 10% recommendation accuracy.

To handle the math, I wrote the Recommender API. It doesn't have the algorithm so fancy as the one used by the Netflix Prize leading team (perhaps I'll add it later too), but it does have some commonly used algorithms such as the classical "correlation coefficient", which can indeed generate some good results too. However, the Recommender API only handles the math, someone has to write additional modules to use it in Drupal. I wrote two such modules as examples: User-to-user Recommendation and OG Similar Groups. It's not too hard to write the additional modules using Recommender API, but someone has to write them for all the areas including eCommerce (Ubercart), media sites, and social networking sites.

There exists some Drupal modules that generate content recommendations, for example, the "More Like This" module, the "Similar By Terms" module, the "ApacheSolr More Like This" module, the "relevant content" module, and so on. But all those modules require text contents or taxonomies. They can't be used to recommend media/video contents without ample text description, and they can't generate recommendations based on abstract things such as "product ordering history" in ecommerce site. What I proposed here can do all those things based on users history, which bypasses the text requirement. The closest thing to what I proposed here is the GSoC 2006 project, the Content Recommendation Engine (CRE). But CRE requires VotingAPI, and it uses the SlopeOne algorithm, which is fast but doesn't usually have good results, and it's subjects to malicious manipulation.

In short, I propose to write a package of recommender-based modules to be used in a wide range of areas. The modules can generate useful recommendations to users based on a variety of algorithms. Hopefully, it can improve Drupal's competitiveness in media sites, ecommerce sites, and social networking sites. And it could layout the framework for future development of similar modules

About me: I'm a PhD student majoring in Information Science. My research focus is designing recommender algorithms for online communities. Since I love Drupal and recommender system is my thing, I would probably write the modules described here even without SoC, if I have the time. But participating in SoC can make sure that all this happens after the summer. I'm looking forward to making this contribution to the Drupal community.

Difficulty: Medium.

Any comments? Thanks!

Comments

Awesome!!!

Alex UA's picture

This is a really great proposal. I haven't used your modules yet, but I'll definitely give them a try.

One question I have is how this could work with Context, which we now use for a lot of projects and which seems very closely related in its aim (though not identical).

Anyway, this gets a big +1 from me. Let's see what comments we get and we'll put it on the official idea list.

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Thanks Alex! I'll study the

danithaca's picture

Thanks Alex! I'll study the code of Context and see how it can be integrated with the proposed Recommender Bundle. One possible direction is to use Recommender Bundle generate the similar/recommended items list, and to feed them to Context to trigger Context blocks. I'll put it to my list.

Sounds great

Owen Barton's picture

This is an interesting problem, and a useful tool.

I think the challenges to this are (a) make the user interface easy to understand, and allow admins to easily weight and combine inputs/scores in a manner that is simple, reliable and doesn't depend on understanding the numerical distribution of the data, and (b) make it easy for modules to provide a variety of relevancy scores and algorithms without needing them to each manage their own running statistics to keep the scores between some normalized range defined by the API (e.g 0-100).

I have been doing quite a bit of thinking about this and did some work previously on a (never completed) module called the relevancy API, which you can check out at http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/relevancy/. This doesn't provide any full relevancy algorithms, but what I did do was make it really easy for multiple sources (could be simple, such as counting the number of common taxonomy terms or smart, such as text analysis) to each contribute an relevancy index between 2 objects under consideration. The more advanced alogorithms could potentially based on inputs from several simple sources of relevancy. It works on multiple axis, such as nodes alone ("quality"), node-node relevancy and user-node relevancy. It then (optionally) does a statistical normalization of each of the algorithm scores so that they all have initially equal(ish) weight and standard distribution. The admin would then (although I didn't get to this part) be able to they weight the normalized scores differently and weight/sum the different axis in different situations and use views to filter/sort.

Your Recommender API seems further along, especially in the area of using advanced algorithms, however it might be interesting to look at combining some of the normalization/standardization layer from the relevancy api.

Either way, I am interested in this and I would be happy to help out and mentor (if the project goes ahead) as needed.

Thanks grugnog! Those are

danithaca's picture

Thanks grugnog! Those are good suggestions! Looking forward to work with you regardless whether this goes to SoC. I'll look at relevancy API and discuss with you later :)

Moving to Official Ideas list

Alex UA's picture

I created a new term for "Official Ideas" as a tab on the group: http://groups.drupal.org/node/18103/official-ideas

I'm not sure if anyone objects to me creating the list this way, but it seems a bit better to have it here rather than over on the documentation page. I'll move the ideas to the documentation page as we move forward, but my plan is to use groups.drupal.org for all planning (which I consider this to be).

I'll post this to the group as a separate discussion, but consider this an "official idea"!

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

AI for Drupal

kvantomme's picture

I think this could be the start of an AI framework for Drupal, so I think it's a really good project. Probably you should if you haven't already done so, make a service plugin architecture for your framework, so that for projects where performance becomes an issue you could plugin services that are running separate from Drupal (e.g. not in PHP).

Is there any place for a more general Bayesian logic in your framework (e.g. a Bayesian classifier API not necessarily for recommending content would be awesome)?

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

yes. classification is a

danithaca's picture

yes. classification is a good idea. that's the next add-on to Recommender API together with community finding algorithms based on social network analysis. my concern is what application area can we use it for. any suggestions?

some use cases for classification

kvantomme's picture

Of the top of my head:
-suggesting organic groups
-suggesting tags
-sorting algorithm so that you first see stuff that has to interest you
...

but make it an API so that other people can decide how to use it. You'll have to do some research though about performance, I haven't seen classification with Bayesian filters in PHP for more than say 10-20 classes. I imagine there are some performance issues down there.

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

yep!

danithaca's picture

suggesting organic groups is already there: OG Similar Groups at http://drupal.org/project/similargroups.

Recommender API is just the API that does the math. I've done some algorithms, and any algorithms such as SVD is on the way. This SoC proposal, Recommender Bundle, is to add some modules based on Recommender API, partly to demonstrate to other people how to use the APIs.

Performance is indeed a concern. Earlier I was thinking to build a Java Bridge module so as to outsource math computation to Java. But it might be too complex for the API users. By nature the AI algorithms are complex, many of them have at least O(n^3). Even for Java or native C it's still going to be slow.... I'm thinking just to use brutal computing force and give it enough time to run offline.

Some approaches might be employed to improve performance. First, I'm trying to outsource all the complex computation to the database (e.g., calculate standard deviation). Second, developers might use Drush to run the PHP script offline.

Any other ideas of improving performance?

happy to mentor you

kvantomme's picture

As you can probably notice I've spent quite a bit of time thinking about this. Before I saw your recommender bundle pop-up I was prepping one of our trainees to do a Gsoc on an AI API (artificial intelligence), I even have a blog post on it (but then we saw your module show up in the new modules RSS feed) so we decided to go for another project ;)

Bottom line is that I have some crazy ideas about how you could use a Baysian classifier to do guided tagging, how you could expose that data with JQ. How you should build an AI API that is pluggable, so that you can interface with other AI algorithms that live outside of Drupal, etc.

So maybe you could further extend your recommender module into an AI API? I'm happy to mentor you on this topic as a Drupal project coach (I don't write code, but I do know how to do stuff in Drupal).

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

that sounds good!

danithaca's picture

I was also thinking about other AI algorithms in addition to recommender algorithms. E.g., PageRank is definitely an algorithm I want to implement that does not quite fit to the name "recommender". In fact, I'm more interested in the math and the API so that other developers can write insane modules based on that. So I'd be very glad to work with you :)

I've got a concern of using the name AI API, because some algorithms doesn't fit exactly in the category of AI, such as the collaborative filtering algorithms. What about Intelligent API, or Smart API, or even Math API?

the name game

kvantomme's picture

If it's a pluggable framework I don't see any reason why couldn't call it AI, even if some of your algorithms are more A than I

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

Daniel, really great work so

Dave Reid's picture

Daniel, really great work so far and a great proposal for SOC 2009. We really do need to get some kind of defacto, reliable recommendation API that all the other modules can use instead of the jumbled mess of modules right now. Sounds like your Recommendation API is a good start.

Senior Drupal Developer for Lullabot | www.davereid.net | @davereid

SNA was another related GSoC project

narres's picture

See http://drupal.org/project/sna (Social Network Analysis Tool)

This is a really cool idea.

YaxBalamAhaw's picture

This is a really cool idea. I think I'm going to apply to work on some modules for this :) I've found a lot of bands I like from using last.fm. I can see how this kind of thing would be useful in a lot of different applications.

What kind of modules were you thinking about? I took a quick look at Ubercart's documentation and saw that that module offers a lot of hooks. So that probably wouldn't be too complicated. Does Drupal keep track of what pages a user has visited in the database? If not, how would someone implement that? A general purpose module (like the user-user module) that would offer recommended nodes within certain content types or taxonomy terms would be nice.

cool

danithaca's picture

keep in touch!

Please add to the SoC site!

Alex UA's picture

@danithica - please post this to the official Summer of Code site! Someone already added this as their proposal, but before I mark as ineligible I'd like to see this one on the SoC site...

Thanks!

--
Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

thanks!

danithaca's picture

@Alex UA - thanks a lot for this friendly reminder. i have posted the idea to the SoC site already at http://socghop.appspot.com/student_proposal/show/google/gsoc2009/danitha...

for some reason i didn't receive the update from the SoC site, so i wasn't able to response in time. i'm updating the comments in the SoC site now. thanks :)

Mockup screenshot

danithaca's picture

Per request by grugnog and dmitri01, I'm attaching some mockup screenshots for the proposed modules.

The first screenshot is for Ubercart site, which is my 1st proposed module in the Recommender Package.

The second screenshot is for Multimedia site, which is my 2nd proposed module in the Recommender Package . "Users who viewed this also viewed" is to display related videos based on the fact that other users viewed some similar videos. "Recommender for you" is to display personalized video recommendations based on user's view history.

The third screenshot is for News/Stories/Blog site, which is my 3rd proposed module in the Recommender Package. Note that this is different from ApacheSolr MoreLikeThis module because it's not based on similar text/keyword descriptions. Rather, it's based on the community's viewing history.

co-mentor?

kvantomme's picture

Would you still need a co-mentor for this project (from the coaching perspective)?

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

yes.

danithaca's picture

I listed you as my co-mentor in the GSoC application at http://socghop.appspot.com/student_proposal/show/google/gsoc2009/danitha.... Hope it's fine. Thanks!

Hey Kristof - if you are

Owen Barton's picture

Hey Kristof - if you are still interested you should sign up to mentor this project via the SOC site by going to http://socghop.appspot.com/student_proposal/review/google/gsoc2009/danit... by clicking the "I am willing to mentor" button :)

there is already a mentor for this

kvantomme's picture

AFAIK the SoC app only allows 1 mentor or did that change now (I've seen a discussion about it)...

--

I blog and Tweet

--

Check out more of my writing on our blog and my Twitter account.

danithaca's picture

I have created the project Wiki page at http://groups.drupal.org/node/22427. I'll update progress milestones on that page.

If you have other related ideas on intelligent algorithms, please join the Intelligence group at http://groups.drupal.org/intelligence (newly created; waiting to be approved)

Thanks!

@ grugnog2 and kvantomme

danithaca's picture

Hi Owen and Kristof,

Thanks for agreeing to be my mentor in my GSoC'09 project. I have created the Wiki page and a group, as mentioned in the earlier comment. And I'm starting to code now. Is there anything else we want talk about? My email is danithaca@gmail.com. Please shoot me an email sometime so that I know how to contact you if needed :) Also, I'll be on IRC #drupal-dev occasionally. Thanks!

--Daniel

Status

mgenovese's picture

Hi there Daniel,

Just curious how things are going with the Recommender API? Is there a place where you're blogging about your progress?

I'm specifically interested in the Similar by Terms work you'll be doing (integrating with the Recommender API, I assume). Just wanted to track how it's coming along.

Thanks!

Matt
Austin, TX

My Recommender Module

david.mann's picture

I just finished writing my recommendation module for ubercart. It gives you the following blocks:

  • easyrec: other users also bought
  • easyrec: other users also viewed
  • easyrec: recommendations for user
  • easyrec: most bought items
  • easyrec: most viewed items

It uses easyrec for getting recommendations.

http://drupal.org/project/easyrec_for_ubercart