MIT's Piggy Bank, Solvent for web scraping

Events happening in the community are now at Drupal community events on www.drupal.org.
dado's picture

Has anyone seen these products from MIT's Simile project?

  • Piggy Bank - "Piggy Bank is an extension to the Firefox Web browser that turns it into a “Semantic Web browser,” letting you make use of existing information on the Web in more useful and flexible ways not offered by the original Web sites."
  • Solvent - "Solvent is a Firefox extension that helps you write Javascript screen scrapers for Piggy Bank."

I have only begun to check this out but it is looking very cool.

Piggy Bank needs web pages to embed information in a format that it can understand. This format is called RDF (Resource Description Framework) and its main advantage is that makes machine processing a lot easier. Unfortunately, at these very early stages, not many web pages embed or link to such "purer" RDF information, so Piggy Bank is capable to execute a particular screen scraper on particular pages in order to "extract" the information it needs.

In short, screen scrapers make you turn a regular web page into a semantic web page, freeing the data from the page/site that contains it.

Comments

Krake.IO - cloud based web scraping application

krakeio's picture

Hey guys,

do check out our new cloud based web scraping service - Krake.IO. It comes with a funky GUI. No more writing tons of codes just to scrape a website. Do it in a few clicks.

Web Scraping

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: