Hello everyone,
I'm not entirely sure this is the place to post this, but I'm starting here. Feel free to redirect me if necessary.
I'm tasked with using Drupal to build a site that has a "memory" and can help suggest new nodes for you based on your previous node-viewing habits.
Think of it as 2 things:
1) An amazon.com-esque function that says "Hey, I noticed you have 11 page-views on the Drupal Semantic Search Node, have you considered checking out Apache Solr?"
2) An email notification system that tracks your previous searches and emails you notifications (opt-in, of course) when new nodes are published that match your previous search criteria.
So, my questions:
A) Does anyone know if the Apache Solr or other search modules discussed here are headed in either of those directions already? I certainly don't want to re-invent the wheel.
B) Has anyone thought about taking this approach through search, or is it entirely a cron-populates-block type issue where you look through old logs and come up with new suggestions for end-users and then shove that content into a block and email?
Interested in your thoughts.
Thanks!

Comments
You touched on 2 different
You touched on 2 different subjects:
1) Figuring out a recommendation algorithm based on user behavior (and? or?) content "likeness".
2) Notifying users of new content, based on (1)
You can see the notification part (2) is "easy" once you have (1) =) I could suggest using the OpenSearch module (the DEV version) along with ApacheSolr, which will provide you with RSS feeds for any search result... it might not exacly be what you need but it works, today.
There are a bunch of projects dealing with (1):
* ApacheSolr already suggests "More Like This" items using Solr. It just based on node content, and not user behavior.
* Browsing History Recommender: stores a log of node views and builds recommendations from that. It's built on top of Recommender API.
* Others.. check out this overview of Drupal content recommendation projects.
I haven't had much experience with recommendation algorithms in general, but I can tell you that getting it "right" is tough work.
Doing it is a cycle: deploy an algorithm, trying to find out what your users think of the recommendations (useful? timely? on-topic?), make adjustments, repeat. It's not just checking clickthru rates, but observing real users closely.
Great information
Thanks for the great information! This is a fantastic start.
I will tell you as the
I will tell you as the author of the Content Recommendation Engine, that is very very hard to do with a pure php solution. It really needs to be a daemon running using RAM on a separate server.
This stuff is hard to scale otherwise.
I would look heavily to http://lucene.apache.org/mahout/
Solution for #2?
The Favorites module may be a solution for #2 that doesn't require #1 if users are willing to save searches and the searches are generic enough to find related content...
http://drupal.org/project/favorites