Feed parsers comparison

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

I plan to compare the different XML feed parsers here in the viewpoint of functionality / speed and the interface that the API provides.
Please extend this page with any parsers / feed formats you think it's worth to take care.

I'll include the following parsers:

Example feeds

Code size

leech_news_parser SimplePie Aggregation's parser
20K 130K 6.3K

Supported feed types

Parser RSS 0.91 RSS 0.92 RSS 2.0 Atom 0.3 Atom 1.0 RDF
leech_news_parser X X X X X X
SimplePie X X X X X X
Aggregation X X X X

I tested Aggregation parser's compatibility with some feeds in Aggregator's issue queue. Results are here

Parsing speed

This time is the process time of the biggest example feed above: http://www.christiannewswire.com/rss/catfeed_2.xml .

leech_news_parser SimplePie Aggregation's parser
2.41s 10.99s (1s - if the sanitizing is off) 0.1s

Morbus Iff, morbus@disobey.com -- The above tests are "unfair" to SimplePie, see http://lists.drupal.org/pipermail/development/2007-June/024790.html.

Aron Novak - ok, thanks, I didn't know a lot about the SimplePie's internal structure.
So the situation is that SimplePie provide a lots of things if it is allowed to do sanitizing:

  • $this->enable_order_by_date(true);
  • $this->remove_div(true);
  • $this->strip_comments(true);
  • $this->strip_htmltags(true);
  • $this->strip_attributes(true);
  • $this->set_image_handler(true);

The values are the averages of thee times execution. The deviation was negligible.
At SimplePie I have to do an estimate, because the feed download time was accumulated to the measure.

Behaviour, structure of the result / input of the parser

leech_news_parser

Input: the raw XML data
Example usage: $parsed = leech_news_parse_news_feed($xml_data);
Output: Mix of arrays and objects->StdClass

SimplePie

Input: URL to the feed - downloading and like this is behind the API
Example usage

$feed = new SimplePie();
$feed->feed_url($url);
$feed->init();
$parsed = $feed->get_items();

Output: Mix of arrays and objects->SimplePie_Item
The parsed data can be accessed through SimplePie->function() functions also. Let's see the API.

Aggregation

Input: Type of the feed, and the XML data processed by simplexml
Example usage:call_user_func_array('aggregation'.$handler_name.'_parse', array($feed_xml, $feed));
Output: Uses the PHP5-only simplexml_load_string then mine the data from it according to the specified feed format. Then consume it immediately - no between-stage is present like above.
Expand the parsing capability is trivial. The most straightforward and the smallest codebase because of the big part is inside the PHP. But feed-format auto detection is a must.

Mentor's note: I agree. The parser-selection system for Aggregation is tied to Taxonomy, which is a very odd design choice. Automating feed detection == good user experience. - Ken

Aggregation author: I used the taxonomy to utilize its current add/edit/delete capability. With a good argument against this, I could do a move to an external table in an hour tops. As to auto-detection, I got no issues at all with users finding trouble in specifying their feed types. A patch for auto detection could be done in an hour's time. I currently have many more interesting feature requests on the issue queue. Any patches are welcome though!

Aron Novak - It is very likely that I'll send you patches like you mentioned. Honestly my current idea is to make Aggregation's parser to the new API's default parser. At the top of the page, you can see that i plan some more test with the parser.

What version of SimplePie did you use, and what exactly were you trying to test? It's important to know that SimplePie does a lot of data sanitization by default to protect users from malicious feeds, and generally make things more useful -- which, in turn, requires a bit more processing. Also, if you're going to test speed, you should test the post-cached speed instead of calculating the wait-for-the-remote-server-to-respond time into it. -- Ryan Parman

Aron Novak - You are absolutely right, I just missed to make public the test method and the details. (btw. the SimplePie-question is discussed above).

Ryan Parman - Aron, thanks for updating that. It should be noted that set_stupidly_fast() is only available in the development builds (soon to be 1.0), and we've also spent some time tuning SimplePie for performance. If possible, I'd also like to add an Atom 1.0 feed to your testing: http://www.tbray.org/ongoing/ongoing.atom . This feed would test the correct handling of a very, very complicated Atom 1.0 feed, which SimplePie does supremely well and most others fail at.

Details of the test method

I made the test on a PHP5 + Apache2 server with Zend Platform installed (for debugging reasons).
The versions:

leech_news_parser SimplePie Aggregation's parser
1.2 1.0 Beta 3.2 3.0

Test code:
You can see the testing code here.

RSS & Aggregation

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: