Last updated by kyle_mathews on Wed, 2008-07-30 18:01
Project information
Project pages on drupal.org: Memetracker and MachineLearningAPI
Current status: Writing tests, adding CRUD functionality, writing code to detect interlinking
Description
I am writing two modules for Drupal as part of Google Summer of Code. One called Memetracker and the other called MachineLearningAPI. The Memetracker module will use algorithms in the MachineLearningAPI to intelligently filter and group content from designated content sources both internal and external. The module's purpose is to find and display to a community in real time the most interesting conversations and memes within the community as they emerge.
My project will emulate functionality of successful commercial memetrackers such as Techmeme, Google News, Tailrank, and Megite. It will be an open-source implementation of memetracking technology that can be easily plugged into Drupal-based community sites.
Project schedule
- 1 week (May 25-30) -- Write post on g.d.o on plans for architecture, hooks, use cases to support etc. Spend the rest of the week responding to other's ideas and investigating machine learning algorithms.
- 2 weeks (June 2-13) -- Start stubbing out code. Get something working as soon as possible that I can release.
- 1-2 weeks (Late June) - Set up public testbed for memetracker. I talked more about this in my proposal. I'll start a number of memetrackers to test algorithms / performance under real-world conditions.
- 2 weeks (early July) - Experiment with a huge range of different algorithm setups, other configuration testing all the while to discover optimal set-ups for different conditions.
- Until end of summer -- improve admin UI, test coverage, bug fixing, documentation, etc.
Status updates
- June 3rd, 2008:
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
I planned the architecture for my module and discussed it with my mentors here.
I will start building the memetracker using two simple machine learning algorithms to test the wisdom of different architectural decisions.
Nothing at the moment.
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
I coded up the architecture discussed in the memetracker group.
Identify a clustering library and implement it within my module. Make my initial release of code. Set up a live test-site.
Finding the right clustering algorithm. There's a bunch of open-source libraries and I'm not sure which will be best. I guess I'll just start experimenting.
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
Last week I got most of the basic architecture laid out in code and everything was working except the cluster analysis part which is tricky as it requires integrating with a non-php 3rd party library. Progress has slowed a bit as I talked to my brother last week and he preached to me test-driven-development (TDD), i.e. write tests before code so I've stopped writing new code to study up on SimpleTest and to start writing a test suite for Memetracker.
Write unit tests. Implement cluster code. Clean up code and documentation sufficient to make a first release.
Not really.
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
Since my last update I've released three alphas, set up a demo site at memes.educon20.org, fixed loads of bugs, and added a number of optimizations.
Write unit tests. Add pages showing list of feed sources and page showing clicks in past 72 hours, CRUD functions for feed sources, code to detect links.
This feature request is very important to Memetracker. Memetracker needs a way to get content somehow from web pages that feed items link to. I'd appreciate help here: http://drupal.org/node/283607
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
I've released another alpha with additions to the admin ui allowing easy creation of new memetrackers.
Add code to detect interlinking between feed items and simplify memebrowsing UI.
Nothing really.
Comments
Interesting case study with meme tracker undertones...
I see that you found this already Kyle, but for anyone else who missed it:
Eureka! Science News is a fully automated science news website that figures out what stories are "hot" based on some kinda fancy algorithm.
Drupal 5 Support
Kyle,
This is a great module. Do you plan to provide a D5 version? Is there any possibility?
At the moment no
I don't want to be burdened with the overhead of supporting two versions of the module. If there's still demand when Memetracker arrives at a more usable state, I might reconsider but for now, no.
Kyle Mathews
Kyle Mathews