Greetings Drupalistas,
I'm working on what feels like the hundredth (but is really the 4th or 5th) project that includes some variety of Facebook-like "Activity Stream" for a Drupal-based community. Having tackled this problem in a number of different ways in the past couple years -- none of which I've really ever loved -- I was tempted to launch a new module project to solve this thing once and for all.
My primary concerns are modularity -- such that anything can potentially be an Activity -- and scalability to work with 100s of 1000s of actions and users.
After some initial review, I found both a existing module:
drupal.org/project/activitystream
Which lead me to post to the devel list, and the discovery of many more modules:
http://drupal.org/project/activity
http://drupal.org/project/activity_log
http://drupal.org/project/heartbeat
I'm in the process of reviewing all these and will post the results soon. Unfortunately, but understandably, they all appear to be divergent efforts.
I think it's a laudable goal to develop a good overall module to support this functionality, though this is much easier said than done. Really, it takes a group of people committed to developing for the community in addition to their own projects. I know from experience that this is very hard, but am trying to do better in 2009 (Drupal karma refresh!).
And now a question: should a unity of "Activity" be a node?
PRO:
Nodes are functional. Facebook already lets you comment on every little thing that goes on. This is fun! It would be good for this module to do that too.
It also makes future integration with notification/messaging updates possible, as well as every other wonderful thing nodes can do.
CON:
This will mean huge amounts of nodes, bloating the table. It also means more overhead when logging activity (node_save vs a single optimized db_query). I'm also skeptical that the core node table structure has the right stuff to be queried with maximum efficiency (e.g. nothing to group similar queries by unless I make a ton of node types, etc).
There's also the question of unwanted node functions. You don't really want anyone to edit activities. You also don't want them to start showing up in search queries, etc.
I could see a possible solution in maintaining an optimized index table for queries, as well as nodes for functionality, individual page views, etc. The bloat problem could conceivably be solved by giving activity nodes (and index entries) a maximum TTL ala watchdog and other big tables.
I've already got a table/query design down for indexing that seems to scale very well to 200k activity entries grouped over 20 types and 5000 users. The core tables from many of the existing modules are also similarly constructed. I suppose the next step is to do some testing around what the overall effects are of having short-lived nodes, and whether or not the other edge cases can be solved.
I'm wondering if anyone has done any of their own thinking along these lines and has any comments to add.
Happy New Year!

Comments
I´m in the same "to be" or "not to be" dilemma
And now a question: should a unity of "Activity" be a node?
Definately not! What´s the use of having thousands of nodes of that type in your database? That´s the big no-no that´s driving me crazy too, because I´m in want of an Activity Stream with certain special features like you do.
Right now, I´m playing a bit with Activity Stream, after trying the other two... but I think it´s not quite working as expected...
Happy New Year!
Rosamunda
Nodes, no, but what about micronodes?
I just brought the topic up on IRC and was sort of thinking outloud / bouncing ideas. What I started thinking is there's a need for a standardized "micronode" format. Activity is just one module with the nodes/not nodes debate. Guestbook is another. Privatemsg went through it as well, though the private nature of that makes that not as good of a fit. This is my general idea, with only about 10 minutes of thought in it:
What else do we need to do with these micronodes? Any more thoughts?
Michelle
See my Drupal articles and tutorials or come check out the Coulee Region
Brilliant
I've been searching my mind for something similar to micronodes for a long time and I just never found the words to express it. facebook_status could benefit from the same thing. As far as the database, I'm thinking index, timestamp, text, uid, and a way to relate the micronode to something else, probably usually actual nodes. For example in the case of Guestbook there would need to be the UID of the author of the guestbook message and the UID of the user whose guestbook was posted in. For Privatemsg you'd need the "to" and "from" UIDs. For Activity you'd connect to nodes, comments, etc. But that brings up the question of how do we tell what type of content we're referencing? Should there be some sort of hook_micronode that returns the type? But what if--as in the case of Activity--there are multiple types? So maybe we also need a 'reference type' column.
Hmm. Definitely agree with having each module keep its own table. It doesn't seem to me that adding votingapi support would be very difficult, although I can't say I've looked into it. And someone could always write an alternative search/edit/revision/menu module.
Another question: what about theming? What functions do we need? I assume there won't be a page/url for each micronode.
.
Hmm... I'm wondering if we need more than one table? It's been a long time since I've done any database design and I'm very rusty so perhaps this is overkill but...
Table guestbook_micronode
mid - micronode id
uid - author
timestamp
content
Table guestbook_mn_relation
mid - micronode id
rid - relation object id
type - relation type
So the first table would hold the micronode itself and then the second table would hold a relation to an existing object, usually node or user.
Just to clarify, I was pointing out the lack of that as a good thing. That would be all overkill for the micronodes.
About theming, normal theme functions would apply. You could theme a micronode just like any other bit of content.
Michelle
See my Drupal articles and tutorials or come check out the Coulee Region
.
The only benefit I see with multiple tables is that you don't have to have a one-to-one relationship between the MID and RID; so you could have a micronode related to both a node and a user, for example. I can't come up with a use case though, so maybe that's unnecessary, in which case I'd stick with one table.
About theming, I meant should any default theme functions be provided. For example, theme('micronode', 'guestbook', $user, $account, $message_id) could return
<div class="micronode guestbook"><div><?php theme('username, $account) ?></div><div>$text</div><div>$time</div></div>which would be pretty standard for something involving a user, text, and time.I've used activity_log which
I've used activity_log which is extremely simple : it uses Rules to make a call to the module and give it the required data using Token.
All the module really does is bunch up multiple logs to a single "activity" in order to provide a block containing "Philippe posted new content Node 1, Node 2, Node 3" rather than 3 seperate lines in the block.
I know there was a long discussion about this earlier, perhaps we could pick up from there ? See http://groups.drupal.org/node/15088
Although I tend to look for generalizations to solutions in my work, I don't think micronodes would fit here because every module has a little different requirements and you're going to end up with a different data schema anyway.
This is a very interesting
This is a very interesting discussion. Some of my quick thought is if we can keep in mind the end-goal, that is what we want "Activity" ( used generically, and not for any specific module to do ) to do : some recursive or retro thinking :
We need to take a hard and close look at what the popular actual sites like facebook, orkut are doing :
users to our site also visit those sites and there is a general "user expectation: how things should work
and a close look at what other scripts like elgg, buddypress etc are doing [ list these, if possible, graphically ]
Which of the different activity modules comes close to what is already there in the standard sites or scripts [ list these, if possible, graphically ]
What an end-user or member to my site expects from activity :
The above is very basic, that apart, users may also
- expect to see overall site activity ( except those hidden by privacy settings ) in a page : this page can be a front page panel or block OR the page they are redirected to after login OR a tab in profile page that shows My activity and Site activity
- expect to see icons or images, for example
--- [photo of user A] user A posted a blog titled "kJDHJKhdjksHDJKShdjksHDJShdjkhs ..."
--- [photo of user B] user B posted an image titled "hjadafg" [thumbnail of the image]
--- [photo of user C] changed her profile image to [new photo of user C]
--- [photo of user X] user X and [photo of user Y] user Y became buddies 2 sec ago
[ add to this list if any thing necessary ]
Now the site admin or the super admin should have the ability via admin interface to
--- determine whether activity be a part of profile or a separate page or both
--- determine size of photo or thumbnail showed in activity
--- how many words of a title to show before '...' appears as long titles break a block or page design
--- whether rss feeds be allowed or not as there are resource issues with too many rss feeds
--- after how many days to auto-delete/truncate old activities
[ add to this list if any thing necessary ]
Users certainly expect NOT to see activity of persons they blocked/ignored and users certainly expect to hide or delete what they want from the activity list.
I think the above are very basic or mostly basic needs for "activity"
The other thing Activity shows or going to show in standard social sites are widget or apps activity. To call a site social networking this is becoming necessary and some of the scripts like phpfox are going to package this but since Drupal has no system of user apps or gadgets this thing obviously cannot come inti Activity directly.
I'm definitely on the nodes
I'm definitely on the nodes side, having comments, and comment threading, would be a huge deal for me. I think in the end everything should be nodes. Otherwise you're relegating some of your content to some unsearchable backwater of unimportance, which might make sense for some people, but I still think you're losing valuable opportunity for community interaction.
While there are definitely performance benefits to using 'micronodes', I would just say, use nodes and throw more hardware at it. Skimping on database should not come at the expense of functionality imo. (Comments, permissions, taxonomy, searchability, flaggability, votability, statistics, and so on) Being exposed to this wonderful ecosystem of drupal modules, but not taking advantage of it, is a poor choice.
The programmers view
...is what i miss here.
Let me explain what the problem is from my (programmers) view.
The node system is pretty, useful and good - but its only ONE way of handling data - and only really usable for some kind of data - drupals native one and senseful only for non dynamic one.
Please have also in mind, that the node system of drupal is nothing what you get "for free" in terms of performance and load.
If you really would create all as a node, drupal would become unusable for bigger sites.
A mid-range social site site with, lets say, 25k to 50k users would generate easily some 100k of nodes.
We, as a game site, would easily run in over 1000k of nodes (i reseted our last site with 150k user) because we have ALOT single data entries (players, items, character, clans...).
From database design, the "all node" idea is a nightmare too, because one thing you try as DB designer is to "design" your database with intelligent data sets. To have more or less only one big table is a real nightmare.
To part the type "node" on different tables is a nightmare too, because the unique identifier (the node id) must be generated then with a much complexer system.
Drupal is extremly limited compared to other CMS when it comes to handle external data - means non node data.
Thats because Drupal is "hard coded" when it comes to nodes - hard coded to the node id from the node table.
Every comment and other "node related" functions and modules use that DB generated auto increment value as
a unique identifier. Its stored as integer in the comment db entry too. Its even not serialized - have in mind that drupal
is serializing most other data before storing. In the comment db, the node id is technically a "tag" - in the node table
its a mysql autoincrement value. Thats from programmers view something different and not a good way of programming,
sorry to say it.
Its easy to use - but limited. You can for example hook in comments, ratings and others in CMS like Zikula/Joomla with 2 lines of code to EVERY possible content. Because you feed the comment entry with a generic tag which can be the site url but also some module specific stuff of your content creating module.
The future of the web is to connect data from different sources. Drupal is there atm in a bad position. Because it can only handle self stored content (as a node of course) in a complete way.
I just have included the wikka wiki as session bridge module to drupal. You can see a public snapshot of our work site here:
http://prag112.server4you.de/daipedia/StartPage
The problem is - there is no good way to attach for example drupal comments or ratings to that pages. Because the pages
are from the wikka DB. And before the usual answer "why not converting it to nodes/drupal..." comes - thats not the solution
and pretty ill minded. I want later show the single player characters (we are a mmorpg) for every player. That data is even not stored in a mysql database - there is no need. If you think it should then think for example to fully generic data, for example graphical pages of the chaos theory formula. In other CMS you can create them virtual by using the formula parameters. In drupal you have first to create a node using CCK, then storing it just as a placeholder to attach the comments to it... Its really a ill minded idea when you know how easy it is with other systems.
Again: 2 lines of code and 5 minutes in Zikula or Joomla. Because they use what Drupal don't have - a generic and "serialized" tag instead of a "hard coded" database table index from another database table.
So, don't think i blame Drupal here - there is nothing wrong in the way drupal is working out its own content.
It is just wrong to think from external content as node and to try to use it on any type of content.
What drupal needs is simply a "virtual node" system, where a module declares content as a node and then we insert it to the drupal hook queue (or how its called?).
And, of course, an extension to the comments and other hook features adding a "tag" entry to that tables. If it set (or the node is not valid) then we use it as identifier where its "hooked" too. Thats btw the sense of a "hook".
You can also think about the current way drupal handles comments as "early" binding. Because its loaded somewhat early when a node is loaded too.
With a serialized tag, we first create the content (like the wikka page) then we hook in the comments - with a "hook" indeed. Thats a late binding and how Joomla, Zikula and most other CMS are doing it. They of course don't have anything like the drupal early binding of course - there are they are limited.
The way for drupal should be to hold it strengths (the early binding with the system hooks in the bootstrap) but also to extend to the late binding (usually done in the template, in fact drupal is doing something similiar already with the template.php file in the themes).
Drupal is not the story of nodes but of hooks.
Not all powers the node - all powers the hooks- ;-)
Just so I know : why is it
Just so I know : why is it an issue for your to have a 1000k row node table ? Is your database not able to optimize the queries ?
For example, mysql5 offers partitioning and if its anything like Oracle's, it helps keep the performance of accessing the single node table as if it were multiple tables. So you could partition by node type and still have blazing performance.
Although I do not agree with having everything as nodes (I like it when table represent a "business entity"), I tend to resist tweaks that break the architecture's integrity when there are technical solutions to the issue.
Well, first we talk about
Well, first we talk about drupal - its drupals core which access the db and we don't want change it, or?
There are ways, of course, 1000k entries are not that much for real big things - but thats not the point.
For drupals node table it is much and you must have in mind that we access it alot - if all is node, we access
the table even mutiple times for a complex page. Its scaling up pretty bad. And the rule "bigger table = slower
access" is for standard systems like a CMS for sure more or less true.
I am not sure what you means with tweaks... We don't talk about architecture integrity. We talk about something
what drupal can't do but other can. Its a missing feature - something we want add. We don't talk about changing
anything of the current architecture.
So, let me give another example:
Create a drupal site which mirrors all youtube pages to your site - with an unique URL of your site but only the media links.
Comments, ratings and all by drupal.
Drupal can't handle it without we create at last a node for every page and all the problems, work and redundant data.
With a real tag i can create such a site with joomla or Zikula in minutes just by manipulating the related template hooks.
And we have NO DB entries generated except for our unique content.
Thats really nothing which has something to do with the drupals architecture. Thats something different. Don't get confused
by the term hook. And also: The fact you has to create a content node without any content just to have an anchor is not a good example for an architecture integrity, or? :-)
I just moving my site from Zikula to Drupal just because i found the underlaying architecture (and the modules library of drupal) better - but the missing real tag hooks and the primitive SEO link "system" is sad. There Drupal is really behind many other systems.
My experience with databases
My experience with databases has been that they are much, much, much better at doing things than what we developer think they are. The caveat is that you need to tune them.
I have been amazed at how Oracle has been able to do complex joins over multiple huge tables in zero time. In fact, some coworkers were thinking of doing materialized views on another project and I told them to just test the naîve query : run time was instantaneous and they were floored.
My point is : databases can go a long way before you have to build around their shortcomings. And yes, I am talking about Oracle but Mysql5 has some pretty advanced features such as partitioning.
But I get your point that there may be a better way to do what you are doing than creating node entries although what you are doing is substituting a database id (which ensures some sort of referential integrity) with an external id. I'll take a look at what joomla does to understand better.
Re:
Re: micronodes:
http://drupal.org/project/data may be of interest.
http://www.twitter.com/lx_barth
http://www.twitter.com/lxbarth
Outcome?
Josh, what was your final decision?