simplepie memory usage

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
alaa's picture

I've noticed that on simplefeed and feedparser one runs out of memory if cron is set to parse a large number of feeds.

turns out this is due to a PHP bug http://bugs.php.net/bug.php?id=33595

SimplePie objects are full of circular and self references so basically no object gets destroyed until the end of the cron run, each feed you load and parse remains in memory until the end.

talking to developers @ #simplepie I don't think they'll be implementing any workarounds in the near future.

Comments

Ah ha!

Boris Mann's picture

Thanks for the post. This makes a lot of sense. And also...it's not just large numbers of feeds, it's a small number of feeds that have LOTS of items in them....

it also means you can't

alaa's picture

it also means you can't afford to make any copies of any simplepie object since the copies will never be destroyed. it forces one to code in a very particular way and will never scale.

the question is what do we do? is there another php library for parsing XML feeds? do we have to have a drupal specific parser? should it use PHP5 features like simplexml?

or do we just accept this memory limitation and live with it?

one solution would be to only parse a few feeds per cron run but run more cron runs. trouble with that is you can never guarantee there is enough memory (unless you allow obscene amounts) due to the use of cron_semaphore variable in Drupal 5 if cron.php runs out of memory it will not run again until an hour passes even if you run cron every 10 minutes.

FeedAPI

Boris Mann's picture

Well, the reason for using simplepie was for not having to maintain the support for all the different edge cases of "liberal" parsing of feeds. The Aggregation module has a built in parser that got merged into the new Feed API module.

I can't believe we might possibly have to maintain our own parser because there isn't a single decent library out there....almost makes me want to plug in something like a Python-based parser....

Exactly

msameer's picture

I'd say that we managed to try almost every single RSS feed parsing library but no use.

The good thing about an external parser is php will not be gaining extra memory thus will not fail easily. The best thing about python is it's already a mature RSS parsing library.

That's good thing to

alex_b's picture

That's good thing to know.

Does this memory leakage also occur with all of Simple Pie's default sanitation algorithms off?

See:

http://lists.drupal.org/pipermail/development/2007-June/024790.html

and:

http://groups.drupal.org/node/4519

yes it happens just by

alaa's picture

yes it happens just by creating simplepie objects no matter what options are set.

it's a bug in PHP garbage collector it also affects Pear's XML_Feed_Parser and any objects with circular, recursive or self references.

Just a note, I realized a

m3avrck's picture

Just a note, I released a SimpleFeed beta today with significant performance improvements, including fixing this memory issue (comments in the code where I did this).

Running much faster, using less memory, cheers!

(note: lots of DB changes, please recreate your tables, this is still in development, there has been no release yet, so no upgrade path, same as Drupal core :-p )

Partner at Detroit Venture Partners. Sold ParentsClick to A&E. Ex-Drupal dev. Cornell Engineering alum. Tech pioneer leading startup renaissance in Detroit.

There's no way around it.

gsnedders's picture

There's no way around it. You can't have fixed it (or worked around it). The only way to fix it is to patch PHP (the bug is one of the PHP SoC projects, due to be in PHP6).

Not SimplePie's fault

Skyzyx's picture

This bug has been very frustrating for the SimplePie developers. SimplePie runs out of memory when processing a large number of feeds, the bug is in PHP -- not SimplePie, and SimplePie gets blamed for sucking.

I agree that it doesn't matter one way or the other for people trying to integrate SimplePie into their projects -- feed parsing needs to work -- but at the moment, our hands are tied. We're certainly interested in any patches that can help alleviate this issue, but up to this point we've received lots of complaints, and no helpful suggestions.

I'm keenly interested in m3avrck's workaround. I'm digging through the SimpleFeed package right now trying to find where he patched it. I'll follow up with anything I find, and hopefully we'll have a much more bearable solution ASAP.

If you are using the latest

m3avrck's picture

If you are using the latest version of Simplefeed, you can download the 1.1 version of SimplePie --- together no more memory issues working well on 500+ feeds every 15 min on a site of mine.

Partner at Detroit Venture Partners. Sold ParentsClick to A&E. Ex-Drupal dev. Cornell Engineering alum. Tech pioneer leading startup renaissance in Detroit.

BTW, same thing true for

alex_b's picture

BTW, same thing true for simplepie parser in feedapi of course.

RSS & Aggregation

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: