QueryPath - just the job for scraping

Events happening in the community are now at Drupal community events on www.drupal.org.
budda's picture

If you're still scraping content from other sites using a mixture of regular expressions and string searches in a HTTP page load then you should check out the QueryPath library!

With a bit of fiddling I've managed to scrape forum posts and extract usernames, dates and content in a small amount of lines without any complex regex.

There's a handy getting started tutorial by the QueryPath author published over at the IBM developerworks site.

Once you've extracted your values in to PHP variables you can use drupal_execute() to create nodes from your fresh content, or generate an RSS feed from the data outside of Drupal, which is what i've been doing.

Can be a great way to migrate a site if you don't have access to the database behind it too. Just scrape away!

Web Scraping

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: