Html To Node Module

Events happening in the community are now at Drupal community events on www.drupal.org.
igalarza's picture

Hello!

I've been looking for a module to import html pages as drupal nodes, similar to the Feeds module but importing static web pages according to some criteria. Is not there anything like that, right?

I started working on this idea for a contest. I'm using the module simplehtmldom to extract information from the static web page and turn it into fields of the nodes, according to previously established criteria.

What do you think about this idea?

Regards.

Comments

There are a lot of import modules already

rfay's picture

You should look at Comparison of content and user import and export modules.

As a general solution, the Migrate module with tablewizard is mighty nice.

Import HTML

sreynen's picture

I think rfay's link was supposed to go here. This sounds close to the Import HTML module mentioned in the comments there. But that's a run-once tool for importing; if you want the periodic updating available in Feeds, I don't think that exists. I've been thinking about playing with that myself. There's a lot of targetted search engine type functionality that could be built on top of that. That may be best done as a hook_cron wrapper around a run-once tool like Import HTML to avoid duplicate code.

Import HTML

igalarza's picture

My idea is quite close to Import HTML module, but not exactly.

I want to import pages of sites that can be updated, not necessarily static pages. With the URL retrieves the HTML from the page and convert the information from html to cck fields, title of the node, and so on. I'm using simplehtmldom for select the data from the HTML page.

Sorry for not explaining clearly, I'm not fluent in English.

Regards.

Basic implementation of HtmlToNode module

igalarza's picture

I implemented a simple version of the idea that I explained in this thread. You can download the code and and also see an example of imported nodes with this module.

The module uses the module Easy Field Validation for validate forms, but it can change if its a bad idea. It also depends of Simplehtmldom API and cck.

If a developer is interested in "adopting" this module and upload it to drupal.org Iwould be happy, I have not ever done and I not have very clear how to do it.

Regards

Import large HTML as Drupal book

dr jason guo's picture

I've implemented a solution to import large static HTML files as Drupal books. I'm not sure if this is what you need but what we are required to do is to convert large reports of hundreds of pages that are in PDF or Word format into HTML. The HTML will then be divided at a given heading level and imported into our Drupal site as a book.

The basic idea is to implement a plugin to the Feeds module and automate everything except for the conversion from PDF/Word to HTML. All tables, graphs, figures, footnotes will be retained and all cross-references will be re-linked automatically. With this automation we are able to process a couple of thousand pages' worth of reports every month and add them to our Drupal site.

Some examples of the imported documents are here: http://doccloud.xing.net.au/sample-documents

Contributed Module Ideas

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: