Memetracker Wiki Page
Project information
Project pages on drupal.org: Memetracker and MachineLearningAPI
Current status: Programming rough prototype
Description
I am writing two modules for Drupal as part of Google Summer of Code. One called Memetracker and the other called MachineLearningAPI. The Memetracker module will use algorithms in the MachineLearningAPI to intelligently filter and group content from designated content sources both internal and external. The module's purpose is to find and display to a community in real time the most interesting conversations and memes within the community as they emerge.
My project will emulate functionality of successful commercial memetrackers such as Techmeme, Google News, Tailrank, and Megite. It will be an open-source implementation of memetracking technology that can be easily plugged into Drupal-based community sites.
Project schedule
- 1 week (May 25-30) -- Write post on g.d.o on plans for architecture, hooks, use cases to support etc. Spend the rest of the week responding to other's ideas and investigating machine learning algorithms.
- 2 weeks (June 2-13) -- Start stubbing out code. Get something working as soon as possible that I can release.
- 1-2 weeks (Late June) - Set up public testbed for memetracker. I talked more about this in my proposal. I'll start a number of memetrackers to test algorithms / performance under real-world conditions.
- 2 weeks (early July) - Experiment with a huge range of different algorithm setups, other configuration testing all the while to discover optimal set-ups for different conditions.
- Until end of summer -- improve admin UI, test coverage, bug fixing, documentation, etc.
Status updates
- June 3rd, 2008:
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
- June 10th, 2008:
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
- June 17th, 2008:
- What did you get done this week?
- What are you planning to do over the next week?
- Is there anything you're blocked on?
I planned the architecture for my module and discussed it with my mentors here.
I will start building the memetracker using two simple machine learning algorithms to test the wisdom of different architectural decisions.
Nothing at the moment.
I coded up the architecture discussed in the memetracker group.
Identify a clustering library and implement it within my module. Make my initial release of code. Set up a live test-site.
Finding the right clustering algorithm. There's a bunch of open-source libraries and I'm not sure which will be best. I guess I'll just start experimenting.
Last week I got most of the basic architecture laid out in code and everything was working except the cluster analysis part which is tricky as it requires integrating with a non-php 3rd party library. Progress has slowed a bit as I talked to my brother last week and he preached to me test-driven-development (TDD), i.e. write tests before code so I've stopped writing new code to study up on SimpleTest and to start writing a test suite for Memetracker.
Write unit tests. Implement cluster code. Clean up code and documentation sufficient to make a first release.
Not really.



Interesting case study with meme tracker undertones...
I see that you found this already Kyle, but for anyone else who missed it:
Eureka! Science News is a fully automated science news website that figures out what stories are "hot" based on some kinda fancy algorithm.
Drupal 5 Support
Kyle,
This is a great module. Do you plan to provide a D5 version? Is there any possibility?
At the moment no
I don't want to be burdened with the overhead of supporting two versions of the module. If there's still demand when Memetracker arrives at a more usable state, I might reconsider but for now, no.
Kyle Mathews