Feed parsing overview in a wiki

Events happening in the community are now at Drupal community events on www.drupal.org.
boris mann's picture

Hi all -- I put together a short overview of current modules here: https://svn.bryght.com/dev/wiki/DrupalFeedParsing

Feel free to edit or add comments. It was meant to easily show in one place the feed aggregation / node feed creating thingies in one place. We recently wrote a very simple feed_node module that uses the core aggregator.

Budda's feedparser looks good, but basically we need more input and a concerted effort, as well as a migration plan / core integration options.

Comments

Feedparser updated just now

budda's picture

I've just checked in a working alpha update of the package tonight which appears to be parsing feeds and creating nodes ok.
Download the CVS tarball from http://drupal.org/project/feedparser

I hope to add support for the enclosures as a proof-of-concept for the feed parser api "real soon now".

Been so bogged down with work for the past 3-4 months that I've not had chance to push the Feedparser project forward.

I'm looking for feedback on it working/not working for node aggregation on your site. Post in the issue tracker if you have problems. I've only tested it on a clean out of the box install of Drupal 4.7 so far.

Will test

boris mann's picture

Been so bogged down with work for the past 3-4 months that I've not had chance to push the Feedparser project forward.

Maybe you should get bogged down getting paid to work on Feedparser :P

I'll test and see if this fits in where we need it to head.

Tested and issues in tracker

boris mann's picture

Doesn't seem to work right now after some recent edits.

Thanks for adding Aggregator Node to the wiki page...seems very similar to feed_node, except we made feed_node work with arbitrary node types.

"Work"? What is this "work" of which you speak?

Max Bell's picture

Yay API!

I expanded on Boris' entry somewhat, as I was able to add feeds but could not seem to it to return content. I am leaving for a couple of days on Saturday, but when I return I have only a single project (itself heavily involved with aggregation, albeit of the usual kind), so I will be certain to beat on the thing and wring as much feedback as possible out of it when I get back.

I'd love to discuss this with you at your convenience when I get back, though.

In a nutshell, having spent a few weeks looking at various aggregators and aggregation methods, what I am seeing is that they tend to be written to deal with unexpected elements by ignoring them. This does not address efforts to extend formats using custom name spaces, it does not address a specification that is not RSS/RDF/ATOM.

Being able to specify an arbitrary node type would allow CCK nodes to be used, but better still, there should be some mechanism that works something like "Hey, I just found an element or namespace nobody told me what to do with, do you want me to show you the tags and let you create a CCK node type to handle this kind of feed item?"

This would not only allow users an unprecidented level of control over aggregation, but provide support for customized aggregation, both in terms of display and creation, that isn't present in any other aggregator that I've come across so far. And it just so happens that I have a format I want to use that's SUBSTANTIALLY more complex than RSS, which I am thinking I should probably write up a white paper on when I get back, since after spending several hours reviewing it, isn't entirely coherant to me, much less anybody else.

This oughta be cool...

At present I've now opted to

budda's picture

At present I've now opted to go with the SimplePie parser library. Having watched the issue tracker for the original aggregator.module I didn't fancy being drowned in bug reports about feed XYZ doesn't parse correctly. That doesn't look like fun.

So I have effectively outsourced the boring work to SimplePie developers. I also confirmed with them (Geoff) that it was okay to distribute the latest simplepie.inc file with FeedParser package.

At present the SimplePie parser v1.0 does as you mention, and discards data tags it doesn't know about. So it's not ideal for your master plans. However - using this as a base we can now either modify the class to provide all a feeds raw data in to the $item->data array for custom manipulation by a module - and also allow us to strip out some of the sugary crap from simplepie.inc, such as the subscribe_* functions.

SimplePie v2 is already in the works though... with a backwards compatible API thank goodness.

Heh heh...

Max Bell's picture

Indeed, I was quite pleased with the results of your work -- I also read up on Simplepie and downloaded the current beta. I've also since spent much time reading up on the semantic web, XML, etc. and recognize that, ultimately, what I was considering when I posted this requires SAX or it's nearest equivalent.

I posted background

Development Seed's picture

I posted background information on the leech module set here https://svn.bryght.com/dev/wiki/DrupalFeedParsing#LeechandLeechNews

There is a latest copy here http://drupal.org/node/77451#comment-144058

RSS & Aggregation

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: