Memetracker Wiki Page

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Project information

Project pages on drupal.org: Memetracker and MachineLearningAPI

Current status: Writing tests, adding CRUD functionality, writing code to detect interlinking

Description

I am writing two modules for Drupal as part of Google Summer of Code. One called Memetracker and the other called MachineLearningAPI. The Memetracker module will use algorithms in the MachineLearningAPI to intelligently filter and group content from designated content sources both internal and external. The module's purpose is to find and display to a community in real time the most interesting conversations and memes within the community as they emerge.

My project will emulate functionality of successful commercial memetrackers such as Techmeme, Google News, Tailrank, and Megite. It will be an open-source implementation of memetracking technology that can be easily plugged into Drupal-based community sites.

Read my full proposal

Project schedule

  • 1 week (May 25-30) -- Write post on g.d.o on plans for architecture, hooks, use cases to support etc. Spend the rest of the week responding to other's ideas and investigating machine learning algorithms.
  • 2 weeks (June 2-13) -- Start stubbing out code. Get something working as soon as possible that I can release.
  • 1-2 weeks (Late June) - Set up public testbed for memetracker. I talked more about this in my proposal. I'll start a number of memetrackers to test algorithms / performance under real-world conditions.
  • 2 weeks (early July) - Experiment with a huge range of different algorithm setups, other configuration testing all the while to discover optimal set-ups for different conditions.
  • Until end of summer -- improve admin UI, test coverage, bug fixing, documentation, etc.

Status updates

  • June 3rd, 2008:
  1. What did you get done this week?
  2. I planned the architecture for my module and discussed it with my mentors here.

  3. What are you planning to do over the next week?
  4. I will start building the memetracker using two simple machine learning algorithms to test the wisdom of different architectural decisions.

  5. Is there anything you're blocked on?
  6. Nothing at the moment.

  • June 10th, 2008:
    1. What did you get done this week?
    2. I coded up the architecture discussed in the memetracker group.

    3. What are you planning to do over the next week?
    4. Identify a clustering library and implement it within my module. Make my initial release of code. Set up a live test-site.

    5. Is there anything you're blocked on?
    6. Finding the right clustering algorithm. There's a bunch of open-source libraries and I'm not sure which will be best. I guess I'll just start experimenting.

  • June 17th, 2008:
    1. What did you get done this week?
    2. Last week I got most of the basic architecture laid out in code and everything was working except the cluster analysis part which is tricky as it requires integrating with a non-php 3rd party library. Progress has slowed a bit as I talked to my brother last week and he preached to me test-driven-development (TDD), i.e. write tests before code so I've stopped writing new code to study up on SimpleTest and to start writing a test suite for Memetracker.

    3. What are you planning to do over the next week?
    4. Write unit tests. Implement cluster code. Clean up code and documentation sufficient to make a first release.

    5. Is there anything you're blocked on?
    6. Not really.

  • July 18th, 2008:
    1. What did you get done this week?
    2. Since my last update I've released three alphas, set up a demo site at memes.educon20.org, fixed loads of bugs, and added a number of optimizations.

    3. What are you planning to do over the next week?
    4. Write unit tests. Add pages showing list of feed sources and page showing clicks in past 72 hours, CRUD functions for feed sources, code to detect links.

    5. Is there anything you're blocked on?
    6. This feature request is very important to Memetracker. Memetracker needs a way to get content somehow from web pages that feed items link to. I'd appreciate help here: http://drupal.org/node/283607

  • July 30th, 2008:
    1. What did you get done this week?
    2. I've released another alpha with additions to the admin ui allowing easy creation of new memetrackers.

    3. What are you planning to do over the next week?
    4. Add code to detect interlinking between feed items and simplify memebrowsing UI.

    5. Is there anything you're blocked on?
    6. Nothing really.

    Comments

    webchick's picture

    I see that you found this already Kyle, but for anyone else who missed it:

    Eureka! Science News is a fully automated science news website that figures out what stories are "hot" based on some kinda fancy algorithm.

    Drupal 5 Support

    Doktor.Science's picture

    Kyle,

    This is a great module. Do you plan to provide a D5 version? Is there any possibility?

    At the moment no

    kyle_mathews's picture

    I don't want to be burdened with the overhead of supporting two versions of the module. If there's still demand when Memetracker arrives at a more usable state, I might reconsider but for now, no.

    Kyle Mathews

    Kyle Mathews