PHP Web Scraping Libraries

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Posted by lolandese on February 29, 2024 at 2:12pm
Last updated by lolandese on Fri, 2024-03-01 09:45

When writing custom or contrib modules that are aimed at web scraping, it makes most sense to use a library, without reinventing the wheel. Furthermore, it is advised to write a generic module first that handles scrapping requests and is capable of mapping them into fields of a specific content type, kind of the way the Feeds module does that. Targetting a specific site with specific selectors could then extend on that, either through a UI or a separate module's code.

Here is a list of useful links. If adding links, make sure they include working PHP examples. Furthermore, capture links in the Internet Archive: Wayback Machine so that if the URL is removed the URL can be changed to the web archive's snapshot. Staying in the Open Source spirit, solutions work without a subscription (API key) would be preferred, although it is true that when effective antibot measures are in place on the target site, this is almost inevitable.

PHP Web Scraping Libraries

Web Scraping

Group organizers

New groups

Group notifications