If you're still scraping content from other sites using a mixture of regular expressions and string searches in a HTTP page load then you should check out the QueryPath library!
With a bit of fiddling I've managed to scrape forum posts and extract usernames, dates and content in a small amount of lines without any complex regex.
There's a handy getting started tutorial by the QueryPath author published over at the IBM developerworks site.
Once you've extracted your values in to PHP variables you can use drupal_execute() to create nodes from your fresh content, or generate an RSS feed from the data outside of Drupal, which is what i've been doing.
Can be a great way to migrate a site if you don't have access to the database behind it too. Just scrape away!
