A summary of feed-related modules and code.
Page needs updating
Feeds
Source: http://drupal.org/project/feeds
Description: Import or aggregate data as nodes, users, taxonomy terms or simple database records.
Status: Actively maintained
Aggregator
Description: core module that parses / aggregates feeds as non-nodes.
Current status:
- needs more flexible feed handling -- any invalid XML causes it to barf
- category system needs replacing with core taxonomy format
- more hooks to handle different types of feeds needed
- a hook needed to allow easy creation of nodes from feed items
- no paging on the admin feeds page
- causes high load on cron / systems with many feeds (50+)
BorisMann: I've spoken with Dries and he's willing to look at improvements to core aggregator...someone needs to spend dedicated time evolving it (if the goal is to have functionality in core).
There is a discussion here about overhauling the core aggregator: http://drupal.org/node/130942
Aggregator2
Description / Purpose: provides feed nodes and related feed-item nodes
Source: http://drupal.org/project/aggregator2
Current status:
- Aggregation module provides an upgrade path for existing aggregator2 users
- Being deprecated / replaced / redone as leech module
- Capable of reading SSL-encrypted feeds which require a client SSL certificate, as it makes use of cURL to read the feed.
- leech was only available in issue thread at http://drupal.org/node/77451, but now has its own project at http://drupal.org/project/leech
- Architecture still highly confusing and code not of high quality
Leech and Leech News
Included modules:
- leech.module
- leech_news.module (integrated in leech in current versions)
- leech_opml.module
- xml_preg_parser.module
- node_template.module
Idea behind it and how it works:
Leech module is for regularly leeching (fetching, downloading, hence the name) information on remote sites, like RSS/Atom/OMPL feeds and could also support downloading information via FTP.
It could also be used for things like:
- downloading comic strips
- downloading data from UPnP hardware (http://www.upnp.org/)
- downloading image from some internet-camera
- downloading any data available through internet connection.
The Leech module is meant to facilitate the leeching of different types of information, and with the appropriate parser. Currently there is the leech_news module which comes with a small PREG parser which is fairly fast, and an improvement over agg2 parsing of RSS/Atom.
leech_news uses the node_template module to turn news items into nodes.
thoughts from the developer (ahwayakchih - Marcin) on why it works the way it does
In agg2 feed node's data is used to create item's data (ie. it was being "cloned" changing type and adding item's data). This was working ok, but created a bit of a "mess" on edit page (especially if site uses more than one vocabulary for feeds, and more than one for items - taxonomy selection boxes become very confusing for end users). So the node_template was created to allow admin to setup most of the things on template, without a need to put them into feed edit page. With node_template, feeds may create different types of nodes, ie. one feed can create 'story' nodes, and another can create 'page' nodes, and 3rd could create CCK created type of node.
Both ways (agg2 and node_template) are not perfect. Best would be if Drupal had some API for "automatically generating new nodes", but it doesn't. Of course we could create nodes by defining most required data in PHP (inside module), but then there may be some module which requires some data attached to different types of nodes and things will get broken. That's why i tried to "imitate" whole process of creating node by user.
Another great thing to have in Drupal would be better separation of data and representation of data. I saw many modules add own things to node's body at 'view' call. This makes things harder for theme and other things. For example implementing pay-per-view on one type of data is not easly possible without modification of module(s). If we had data separated (like: each module defines of data it attaches to nodes, defines type of variables, etc...) we wouldn't have to define huge formAPI arrays (they could be generated automatically, unless special things were required by module) and we could allow setting access rights to data (ie. user A can view images attached to node, but can't edit them, even if he can edit rest of the node). Such separation of data would allow leech_news to ask other modules what data they need for given node (maybe prepare defaults too) and we wouldn't need to copy already existing node.
Well.. i could try to call formsAPI myself, gather whole form data, and then parse it to get variables.. but i'm not sure it would work ok (even slight change in forms API could breake things then).
Feedparser
Description / Purpose: This package provides a drop in replacement for the aggregator.module but provides an API to extend the feed parsing capabilities of Drupal.
The initial release plans to provide a module to create standard nodes based on feed items, in addition to a module which will create the standard/current aggregator feed items for users who don't like nodes being created.
Planned features include per-feed processors so some RSS feeds can be turned in to nodes, and others in to traditional Drupal aggregator items, mapping of RSS items to CCK nodes.
Source: http://drupal.org/project/feedparser
Current status:
- recent updates (November 2006)
- Parses feeds using simplepie engine. (http://simplepie.org)
- Per feed settings for flexibility.
- Creates nodes from RSS items - including filtering via Drupal system.
- Aggregates enclosures with RSS items.
- Automatically extracts <category> tags and creates matching taxonomy terms for the RSS item nodes.
- Flexible FeedAPI for extending use of feed data being parsed.
Looking for testers and feedback so that I can work towards a stable release now.
BorisMann: this looks like the best direction for 4.7.
AlfEaton: I like the way this works, and have adapted it to use SimplePie instead of rss_parser.inc. I think the next step is to extract structured content from each item (eg enclosures, GeoRSS, etc) and store these fields so that they can be accessed by themes/output filters/searches/views.
Feed_node
Description / Purpose: add feed URLs / feed items to arbitrary node types -- you "feed enable" content types, using core aggregator to store feed URLs and feed items. Next "to do" is to look at extending to 1) create feed items as node and 2) parse feed items and create structured data in nodes (i.e. GeoRSS, PubMed, etc.)
Current status:
- first commits available in local Bryght SVN repos at https://svn.bryght.com/dev/browser/feed_node
- very simple module so far
- meant to easily plug into core and cleanly extend
BorisMann: this is unlikely to be further developed
Aggregator Node
Description / Purpose: Aggregator Node produces new nodes which connect the Drupal node system to the aggregator items. It does not duplicate the data, instead it stores a reference to the aggregator feed item.
Current Status:
- When feed items expire it leaves a node pointing to a non existant feed item
Source: http://drupal.org/node/59712257575161307
Aggregation
The aggregation module was created with two main considerations, high performance & a thoroughly studied architecture. It currently supports RSS 2.0, ATOM 1.0 and RDF 1.0 but can be expanded to aggregate from any XML feed, the module can handle one image per item if an image exists in a feed, this requires the image module. To parse new feeds, no module changes are required.
- Required PHP 5 and CURL support
- Expandable architecture
- Highly efficient
- Has recently provided an update path for aggregator2 users
- Extracts categories from all aggregated feeds and adds them as terms
-
Check out the project page for full details...
SimpleFeed
Similar to Feedparser in that it also uses the SimplePie library. Looks like very clean codebase, Drupal 5 only.
Original from Bryght wiki.
FeedAPI
Recent Google SoC project. Available for Drupal 5 or in beta for 6x. Can use SimplePie as an engine. Haven't tried it yet. Someone who knows more than me should update this page. ;)
Category Aggregator
Based on the Aggregation module, this module is a Feed Aggregator to syndicate items using tags/categories as filters. The author is currently looking at possibilities including this functionality into the FeedAPI.