XPath-based parser for core aggregator

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Aron Novak's picture

Imagine if the core RSS/ATOM/whatever parser would be robust and solid in Drupal core, even it could serve 3rd party modules like Feeds.
I have a concept of relying on an associative array, full of XPath expressions what a smart piece of code could use directly for building the final $feed object. The high level goal of this new parser would be the following: maintainence on the parser would mean mainly maintaining this array, nothing more.
Also XPath seems to be more future proof solution for parsing compared to what we have in core now.
I have a patch at http://drupal.org/node/1268232 , what's really just a sketch from the concept. It discards some functionality, also it does not include solid namespace handling - what's clearly in my mind for a robust parser.

To save some time for you, here is the "big" associative array from the current patch:

array(
    'title' => array(
      '/rss/channel/title',
      '/x:feed/x:title',
    ),
    'description' => array(
      '/rss/channel/description',
    ),
    'link' => array(
      '/x:feed/x:link/@href',
      '/rss/channel/link',
    ),
    'items' => array(
      'children' => array(
        '/rss/channel/item',
        '/x:feed/x:entry',
      ),
      'title' => array(
        'title',
        'x:title',
      ),
      'description' => array(
        'description',
        'x:summary',
      ),
      'link' => array(
        'link',
        'x:link/@href'
      ),
      'author' => array(
        'author',
        'x:author',
      ),
      'guid' => array(
        'id',
        'x:id',
        'guid',
        'link',
      ),
      'timestamp' => array(
        '#process' => 'strtotime',
        'pubDate',
        'pubdate',
        'issued',
        'created',
        'modified',
        'published',
        'updated',
      ),
    ),
  );

- so you can have a wild guess of my idea without taking a look on the code.
If you have any feelings about this, please share here!