"Document Import Module": an unfinished project from SoC 2008 we would like to continue in 2009

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
msapega's picture

Project page is http://drupal.org/node/236461

For more information, see http://groups.drupal.org/node/10890
Discussion: http://groups.drupal.org/node/9929

The goal of this project is to create a plugin based import module for Drupal that allows the upload of office suite file formats which would be parsed into Drupal nodes. Allowing even novice CMS users to generate content using a familiar office productivity suite.

PROJECT DETAILS:
Features to Implement:
-The Router module
-Create a set of hooks other modules implement to parse the various data types
-Allow the parsers to specify mime types that they are interested in and keep an internal registry of those types
-Handle files put to the server routing the data appropriately.
-Create the appropriate node(s) for the uploaded data
-Map any metadata to the correct node fields
-The Parser Modules
-Microsoft Office
-Parse the .doc file type provided by various versions of Word.
-Parse the .xls file type provided by various versions of Excel.
-Open Document Support
-Support the various file types under Open Document
-Implement parsers for ODT, ODS, and possibly ODP

Styles to Support:
These should be the minimum styles supported by any parser
-Lists with the appropriate formatting / styles, (ul, ol, dl)
-Bold text (strong, b)
-Italicized text (em, i)
-Underlined text (u)
-Paragraphs and breaks
-Basic symbols, certain symbols should be supported (and encoded) by default (&, copyright, tm, etc)

Comments

I am interested in joining

irinaz's picture

I am interested in joining forces on this project

I am going to start working

YaxBalamAhaw's picture

I am going to start working on a proposal for this idea. Anybody know of any good way to parse .doc files that would work on your average cheap shared server?

I don't know of a good way

gdd's picture

I don't know of a good way to do it out of the box on shared hosting, but there are a ton of resources for external libraries in the threads from last year. You might also want to contact Chris Bradford to find out how much he got done since maybe you can pick up where he left off. I know he's in SoC again this year, so he should be around.

I would love to see this project get off the ground again.

Mentors?

YaxBalamAhaw's picture

Anyone willing to be a mentor for this idea if my proposal is accepted?

I think I'm going to look into the Google Documents List Data API to see if that might be helpful in parsing. All the other solutions I've found for parsing the old Microsoft Office files would require permissions that are usually not available on cheap web hosting packages.

yes, I am ready to mentor this

irinaz's picture

we plan to include capability to upload multiple files and create nodes from them

Willing to co-mentor

dldege's picture

Dan DeGeest
Lead Software Developer
iMed Studios
http://www.imedstudios.com/labs

Dan DeGeest
Software Developer
Somewhere or Another