[It looks like I'm a bit late to the party :) But please, anyone who is interested in this functionality, review and rate this project so it'll be ready to roll come next spring. . . thanks!]
All organizations, large and small, have a vital need to deliver relevant and timely information to its members. My project will be to make it possible for organizations to easily meet this need. I will improve two Drupal modules, write a new module, and write documentation so that organizations can easily add sophisticated news aggregation and recommendation tools into their Drupal website.
The two modules I will improve are Memetracker and Content Recommendation Engine. I will write a new module, Profile API, which can be used to create profiles that will be used in content recommendation.
I wrote Memetracker as part of the 2008 Google Summer of Code. The Memetracker module uses machine learning algorithms to intelligently filter and group all types of content. The module's purpose is to find and display to a community in real time the most interesting conversations and memes on relevant topics as they emerge.
My goal for the memetracker module is for it emulate functionality of successful commercial memetrackers such as Techmeme, Google News, Tailrank, and Megite. I want it to be a robust open-source implementation of memetracking technology that can be easily plugged into Drupal-based community sites.
The Content Recommendation Engine module is designed to provide personalized content recommendation. It learns what types of content individuals are interested in and recommends new content as it comes in.
Both modules are powerful ideas but need quite a bit of work to be usable in real-life situations.
Some specific things I'd like to change.
Make recommendation more flexible. Right now it only considers data from the VotingAPI. This is unnecessarily limiting. CRE's goal is to learn what kind of content an individual likes and to then suggest additional content CRE guesses might be interesting. CRE should be able to learn from things such as what authors or feeds a person likes, what topics they're interested in, what articles are interesting to other people similar to them, etc. Also, it will be integrated into the Profile API to further refine content recommendation. A simple example. If Bob always clicks on articles about Mac computers, then any new articles on Mac computers will automatically be added to his feed.
Make CRE much more robust under heavy loads. CRE doesn't scale well.
Integrate with Views: I think both CRE and Memetracker are natural fits for integrating with views. It would simplify the process of learning how to use these modules for new users and also immediately add a great deal of flexibility to CRE/Memetracker in creating custom outputs. This past summer when I was stymied when creating the admin interface as contemplated the 100s of potential memetracker types I'd have to support with my UI. In Views, these custom memetrackers would be very easy to create. A memetracker which includes two node types written by three authors as well as two different feeds would be easy to do in Views but it would be difficult to create a custom admin UI to do the same thing.
Integrate with Views
Turn memes into nodes
Create "archive view" so you can view memebrowsing pages from the past -- what was the hot news on December 15, 2007.
Add classifier algorithm to memetracker which will automatically place incoming articles into separate memetrackers. For example, you have an
agriculture site and wish to have a memetracker on vegetable, fruit, fertilization, and the farming business environment. Any news source that you would aggregate to be part of these four memetrackers would contain articles for more than one of the four memetrackers. I would add an interface such that you could train the classifier algorithm what articles are appropriate to which of the four memetrackers. (See this issue: http://drupal.org/node/292561)
Import and display images for memes (See this issue: http://drupal.org/node/283752).
Simplify the memetracker install process (currently it requires installation of several python libraries, ideally I would rewrite the python code in c or php and ship that in Memetracker tar ball).
Detect interlinking between content
Add hooks so other modules can easily write their own rules for filtering and sorting memes. For example, a company might want that their intranet always displays nodes of type announcement at the top of the page for a certain time period. Or a branded news site might want news from their companies to be ranked higher than it would otherwise.
Fill out testing coverage.
The profiler module would be a small module that would allow the site’s admins to create some “questionnaires” for users. Their answers would be fed into the Content Recommendation Engine as a starting point for their content interest profile.
A music site might ask a question like:
Choose between the following :
- Wu Tan Clan
- Rolling Stones
- King Crimson
- Céline Dion
Using answers to questions like this, the system would find users with similar profiles. This means, for example, that the site can try to send personalized content to new users right after registration.
There are many tools available to communities to aggregate and distribute information. What's missing are open source tools which leverage not just human intelligence to filter content but also artificial intelligence.
There is far too much information generated daily for any person or organization to sort through manually. These automated tools can be thought of as pre-processors that improve the signal-to-noise ratio reducing the stress people endure trying to follow news. By filtering out the noise, important news is much more likely to be identified and acted upon.
My project will most directly meet the third goal of KDI, "To encourage people to improve their communities by supporting the free exchange of information and ideas." I believe these tools will become the basic building blocks of a rich flowering of content aggregation / filtering web applications.
I estimate the whole project will take 3 months.
1 day to port CRE to Drupal 6.
2 weeks to rewrite architecture of CRE so it's not solely dependent on data from VotingAPI.
1 week to investigate and fix bottlenecks in CRE
1 week to write and test the Profile API
1 week to investigate and test new algorithms for Memetracker
2 weeks to integrate CRE and Memetracker with Views
1 week to remove the Memetracker Python dependency.
1 week to write code to detect interlinking between content.
2 weeks to complete other tasks on the Memetracker module.
1 week to write documentation.
I (Kyle Mathews) will be the main programmer on the project. Depending on my time availability when (if?) this project is approved, I may ask other Drupal community members to take on parts of the project.
The three modules will be hosted on Drupal.org.
- $40,000 for 1 programmer at $100 / hour
- $40,000 total request