A FeedAPI Parser using Querypath

Posted by budda on July 21, 2009 at 11:00pm

I've just spent a couple of hours learning the FeedAPI Parser, erm, API. Pretty straight forward. The reason was to test out an idea of using the nifty QueryPath library to parse the RSS feed markup and keep it easy to understand in the PHP code.

The module is based on the SimpleXML parser code, with most of it either ripped out or commented out.

I'm not really sure where i'm going with this - and i've done no benchmarks to see if it really is faster than any of the existing libraries like Simplepie. It might be a waste of time!

I've attached the code so far, for reference if anyone wants to pick up on this. Remove the .txt exension from the attached file -- had to do this to get around Drupal Groups restriction on attachment file types.
Currently I've tested it parsing a Wordpress RSS feed (Techcrunch.com) and it's working fine and nippy.

To use the parser you'll need to install the http://drupal.org/project/querypath module too.

Matt Butcher, the author of QueryPath, is currently working on v2.0 of the library which claims to be much quicker, so it might make this parser engine worth pursing? I've not tested it with the alpha/beta 2.0 library.

Attachment	Size
parser_querypath.zip	2.81 KB

Comments

Cool

Posted by alex_b on July 22, 2009 at 12:49pm

"I've just spent a couple of hours learning the FeedAPI Parser, erm, API"

Um yeah, there is room for improvement. Aron and I are starting to think about a major revamp of FeedAPI. The API is one of the main reasons for this.

Just had a look at the code - seems nice and simple. I'd be really curious how it performs robustness wise in comparison to SimplePie (very robust) or Parser Common. (Which reminds me of another todo on my list: unit tests for feeds!).

Nice work.

http://www.twitter.com/lx_barth

http://www.twitter.com/lxbarth

Using querypath fully would

Posted by budda on July 22, 2009 at 12:56pm

Using querypath fully would still need a lot of safety wrapping around the parser. For example at the moment if the feed url can't be reached (404 etc) then querypath throws an exception.

The actually extracting of data is pretty simple so doubt (!) there is much room for breaking.

What would be a reliable test of parsing speeds though in order to compare parsers?

"What would be a reliable

Posted by alex_b on July 31, 2009 at 9:22pm

"What would be a reliable test of parsing speeds though in order to compare parsers?"

A system where you have resource usage under control, a varied range of test feeds (different formats, lenghts, element types) and good instrumentation.

Right now, I only have the first item : p and the last item so-so - there is some logging of performance going on in feedapi, I doubt that it's sufficient for measuring parsing performance though.

These are 2 items to put on the backlog: test feeds, better logging.

http://www.twitter.com/lx_barth

http://www.twitter.com/lxbarth

A FeedAPI Parser using Querypath

Comments

Cool

Using querypath fully would

"What would be a reliable

RSS & Aggregation

Group organizers

New groups

Group notifications