Content and User Import and Export Modules

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!
greggles's picture

There are a multitude of different import and export modules in existence to handle the complex task of getting data from other systems into Drupal. This wiki is an attempt to focus the efforts of the module developers and provide a handy guide for users looking to make a decision.

Importing/Exporting/Transforming data is a complex process and each of these modules approach it in a different way.

Comparison of import/export modules:


Import and Export (Generic)

Collaboration

Module Import / Export API Taxonomy Import/Export Transformations
Project page links here
Communicated with other maintainer(s) / ? /

General Info

Module Import / Export API Taxonomy Import/Export Transformations
Maintained Limited Yes Kinda
Releases (Status) d6 (dev), d5 (stable), d4.7 (stable) d6 (stable), d5 (stable) d6 (beta)
Dependencies Drupal Core ? CTools (only for UI), jQuery UI recommended
Target Audience Developer ? Site Admin, Developer
Documentation API (extensive) Yes API (extensive), End User (rudimentary)
Weight/Drupal Core Upgrade Path Heavy, but easy to upgrade ? Complex but modular
Track Record Proven against several thousand records (not highly scalable) ? Proven against import of 200 records (CSV file), converted to Drupal nodes. (erm.)

Functionality

Module Import / Export API Taxonomy Import/Export Transformations
Interface API, UI ? API, UI
Input CSV, XML, or another Drupal database (can be extended to support other formats) XML, CSV, TCS, RDF CSV, XML (via XPath), can be extended to support other formats.
Internet Imports Doable through hooks ? Not yet!
Interoperability Supports CCK. Can support any module through hooks. ? Plugin framework for providing whatever functionality. Current extension modules support CSV, XML, Drupal nodes.
Processing (Batch, Cron, Single) Single (can be scripted to run via cron or in batch) ? Single
Output Anything in the Drupal database, XML, CSV... (can be extended to other formats) XML, RDF Nodes (using Transformations -- Drupal data), ... (or any type that can be processed by plugins)
Reporting Per-import pass/fail only (not stored) ? Per-import pass/fail only (not stored)
Settings saved for reuse ? ? ?

Import and Export (Drupal to Drupal)

Collaboration

Module Backup and Migrate Deploy Node Export Yamm
Project page links here No No No Yes
Communicated with other maintainer(s) ? ? ? /

General Info

Module Backup and Migrate Deploy Node Export Yamm
Maintained Yes Yes Yes Yes
Releases (Status) d6 (stable), d5 (stable) d6 (dev) d6 (stable), d5 (stable) d6 (beta1)
Dependencies ? ? ? DataSync, PHP 5.2
Target Audience ? ? ? Site Admin, Developer
Documentation README.txt Handbook pages ? API (extensive), End User (enough)
Weight/Drupal Core Upgrade Path ? ? ? Not that complex
Track Record ? ? ? Works for maintainers, need comunity testing

Functionality

Module Backup and Migrate Deploy Node Export Yamm
Interface ? ? ? API, UI
Input Database dump (SQL file) Drupal content and configuration Drupal nodes Any Drupal data on site, content type, nodes, users, vocabulary in core, easily extensible
Internet Imports ? ? ? No
Interoperability ? ? ? Should support any module that extends node and users, API is highly and easily extensible
Processing (Batch, Cron, Single) Multiple backup schedules ? ? Cron and Batch on system side (needs you to have a UNIX host)
Output Database dump (SQL file) Drupal content and configuration Drupal nodes Any supported Drupal data that server and client handles
Reporting ? ? ? Weak watchdog output, DataSync reports
Settings saved for reuse Yes ? ? ?

Import (Generic)

Collaboration

Module CSV Parser Import Import HTML Migrate Node Import Taxonomy CSV Import User Import
Project page links here No Yes Yes Yes Yes No Yes
Communicated with other maintainer(s) ? Migrate Import / Export API ? / ? /

General Info

Module CSV Parser Import Import HTML Migrate Node Import Taxonomy CSV Import User Import
Maintained Yes Yes Yes Yes Yes Yes Yes
Releases (Status) d6 (alpha) d6 (dev) d6 (dev), d5 (stable), d4.7 (dev) d6.x (dev) d6 (RC), d5 (stable), d4.7 (stable, unsupported) d6 (stable), d5 (stable), d4.7 (stable) d6 (stable), d5 (stable), d4.7 (stable)
Dependencies FeedAPI (and Feed Element Mapper recommended) Drupal Core / Views, Table Wizard, Schema Drupal Core, Date API ? /
Target Audience ? Developer Site Admin Developer Site Admin ? Site Admin
Documentation README.txt Example modules included (extensive) Walkthrough and readme's included, Code commented verbosely In progress In progress Help page API, End User
Weight/Drupal Core Upgrade Path ? 91 lines of code. Easy to upgrade Complex, but becoming more modular Heavy Complex but modular ? Medium (d6 up. testing time)
Track Record ? Proven against import of +1.7 million records Works for me. Does often need template tweaking on a per-site basis. Proven against import of +3 million users ? ? Infinately scalable via cron runs. Tests via SimpleTest (d5)

Functionality

Module CSV Parser Import Import HTML Migrate Node Import Taxonomy CSV Import User Import
Interface ? API API (in dev), UI API, UI API, UI ? API, UI
Input A CSV file or a zipped CSV file Oracle, MSSQL, MySQL, any datasource PHP can communicate with / CSV, XML, HTML, images, video, flash ... any files HTML site mirror, as through wget. / XML, XHTML, HTML, Also all resource files (images, attachments) into /files/* . MySQL, CSV, TSV. Table Wizard hooks allow support of any file type. / Doable through hooks CSV, TSV, ... (any delimiter separated values file) CSV file (comma-separated values), or a copy-and-paste list CSV
Internet Imports ? web pages, feeds ... anything accesible on the internet including other API's Proof-of-concept (single page http import demo) available. Remote spidering, maybe one day. Doable through hooks No ? No
Interoperability ? Supports any module that extends nodes or users Plugin framework available for extending node fields. Mostly undocumented. Support for third-party (CCK, nodewords etc) extensions available. Supports CCK, Email Registration, Content Profile. Can support any module through hooks. Can support any module through hooks. ? Can support any module through hooks.
Processing (Batch, Cron, Single) ? Batch API d5: ran a Single process with timeout risk.
d6: Batch API
Cron, Single d5: Single
d6: Batch, Cron
? Cron, Single
Output Data into Drupal content types Nodes (any content type), Users, Roles, Taxonomy Nodes (also additional node fields, images, attachments, downloads) Nodes, Comments, Users, ... (hooks to define any destination) Nodes, ... (Comments, Taxonomy, Users in development) Terms in existing or new vocabularies Users, Profiles, Nodeprofile, Organic Groups, ... (hooks to support anything else)
Reporting ? Pass/Fail stored in import tables for developer use Mixed Per-object errors/warnings/notices stored Per-object errors/warnings/notices stored ? Itemised error report to user.
Settings saved for reuse ? ? ? ? No ? Yes

Import (Source-Specific)

Collaboration

Module Import Typepad Joomla Phorum phpBB2Drupal vBulletin to Drupal Wordpress Import WP2Drupal
Project page links here Yes Yes Yes No No Yes Yes
Communicated with other maintainer(s) ? ? ? ? ? ? ?

General Info

Module Import Typepad Joomla Phorum phpBB2Drupal vBulletin to Drupal Wordpress Import WP2Drupal
Maintained ? ? Yes Yes Yes Yes Yes
Releases (Status) d6 (stable), d5 (stable), d4.7 (dev) d5 (stable) d5 (dev) d6 (stable) d6 (rc), d5 (stable) d6 (stable), d5 (stable) d6 (stable)
Documentation ? ? Limited - README.txt Handbook pages, README.txt, directions on screen ? Stable WIP - readme.txt

Functionality

Module Import Typepad Joomla Phorum phpBB2Drupal vBulletin to Drupal Wordpress Import WP2Drupal
Input ? ? Phorum DB in MySQL phpBB 2 or 3 vBulletin forums and users WordPress eXtended RSS (v2.1+) WordPress DB in MySQL (v?)
Processing (Batch, Cron, Single) ? ? Cron ? ? Single Single
Output ? ? ? Drupal equivalents to phpBB data: user profiles, containers, forums, topics, comments, attachments... Drupal forums and users ? ?
Reporting ? ? Limited messages during import. ? ? Limited, after import Limited, during import
Settings saved for reuse ? ? ? ? ? ? ?

Export

Collaboration

Module HTML Export Profile CSV Services Views Bonus Pack Views Datasource
Project page links here No No (author disabled contact from g.d.o) Yes Yes Yes
Communicated with other maintainer(s) ? ? ? ? ?

General Info

Module HTML Export Profile CSV Services Views Bonus Pack Views Datasource
Maintained Postponed; looking for co-maintainers ? ? ? ?
Releases (Status) d6 (stable), d5 (stable) d6 (dev), d5 (stable) ? d6 (beta), d5 (alpha), d4.7 (stable) d6 (alpha), d5 (alpha)
Dependencies ? ? ? ? ?
Target Audience ? ? ? ? ?
Documentation Blog ? ? ? ?
Weight/Drupal Core Upgrade Path ? ? ? ? ?
Track Record ? ? ? ? ?

Functionality

Module HTML Export Profile CSV Services Views Bonus Pack Views Datasource
Interface ? ? ? ? ?
Input Drupal sites Users, Profile data (may be deprecated in 6.x since Views can deal with profile data) ? ? ?
Processing (Batch, Cron, Single) ? ? ? ? ?
Output Static HTML files CSV XML XML, CSV, TXT, Word.doc (all using included "export" package) XML (raw), OPML, Atom, Simile/Exhibit JSON, Canonical JSON, JSONP/JSON in script, FOAF, SIOC, DOAP, hCard, hCalendar, Geo
Reporting ? ? ? ? ?
Settings saved for reuse ? ? ? Saved with the view ?

See also


Login to post comments

Open Letter to Import Module Developers

cyberswat's picture
cyberswat - Fri, 2009-04-17 21:48

It looks like the wiki is mostly complete and contains a lot of good information comparing a solid cross section of Import modules. Each of us have very valid approaches to handling imports and I don't want to detract from anyones efforts ... but there is an extremely valid argument that the Drupal community as a whole benefits from collaboration. I've been brainstorming with a few others about this issue and would like to reach out to you for your thoughts, opinions and interest in collaborating.

If you reflect interest we would like to provide an opportunity for us to get together and talk. Mike has a great group going at http://groups.drupal.org/migration-drupal that I think we could leverage and comments have been enabled on the wiki page at http://groups.drupal.org/node/21338 ... in addition to that I would like to propose a dual layered code sprint that happens the weekend of June 27th and 28th. We are having a Drupal Camp in Colorado with a capacity of up to 400 attendees. The most desireable outcome would be if each of you could make it to that event and we can come up with a way to make Drupal Imports rockk for everyone. The likelyhood of that happening in person may be a bit lofty so I would like to propose coupling the code sprint with a virtual code sprint similar to http://groups.drupal.org/node/18443

Following this approach would give us almost 2 months to first hash out amongst ourselves the direction we would like to take and organize community support.

I'd like to recieve your thoughts on this proposal before taking it further. Thanks so far for all the great work!


thoughts

dman's picture
dman - Sat, 2009-04-18 01:41

Well first, I'm not going to be in person anywhere (unless I get sponsorship or something like I did last code sprint). :-/

However, where I'm at is:
- enough players have been sniffing around to push import_html forward, so the d6 will be out and about within a month.
Certain API bits are moving forward, and certain functions have dropped out to become modular (contrib or glue modules can catch the raw data and the newly formed node pre-save and add their own data extraction additions via hooks)

I've watched the birth and death of import/export API (last activity April 2007) and am wary of looking for the one true answer. (I DID think it was possible in the beginning)
Migrating from different datasources - which is pretty much what the tools in this realm do - are unique jobs.

I've looked hard at my own and others code and I've not seen the point at which these jobs hit common ground. Node import is a different tool, and I use it myself, learned from it, and even patched it a bit. It's the closest to import_html of the lot - but I've not yet found a function that could be placed in common.

Maybe others can see some points of commonality. I'd be interested to hear. My docs are verbose on methodology and a walkthrough illustrates what can be done.

FYI, here's a slideshow of someones import experiments.
http://www.slideshare.net/emmajane/moving-in-how-to-port-your-content-fr...

.dan.


Commonality

mradcliffe - Sun, 2009-04-19 03:47

The common portion of pretty much all import modules is the actual import. From what I looked at and from my experiences writing a custom wordpress import module (old wp version), saving node, term, comment, and possibly other content data should be the same. It's the data source and data mapping that's different.

It also breaks apart with users. Since importing users from a plethora of different encryption types is impossible to transfer passwords over unless they're plaintext.


Different object types

dman's picture
dman - Sun, 2009-04-19 04:16

Different object types (nodes vs users) have different needs.
But at the bare metal, all the node import routines have just one section in common :

<?php
$new_node
= (object)array(
 
'title' => $new_title,
 
'body' => $new_body,
 
'type' => $new_type,
 
// plus some housekeeping like $user
);
node_save($new_node);
?>

It's getting it to the point where you can plug the values into a node object that is the hard bit, and that's highly individual.
We MAY be able to find some useful common routines or CRUD to assist with creating some bits (I needed a bit of extra work to be able to create arbirtrary menus on the fly, for example) but the point we truly have in common is already API. node_save()

... Of course, this is because I look at the entire task from a Drupal API POV. I've seen lots of other scripts that try to do their import via direct database updates. Ug. Those guys may have different things in common, but I'm not in that camp.


Different commonalities

jcfiala's picture
jcfiala - Sun, 2009-04-19 18:01

Really? Because the commonalities that I've become used to are different:

You need to decide which data you need to import and from where.
You need to decide how to handle reporting successes and failures, and what to do with them.
You need to set up a process to do the work in slices, because importing 50,000 nodes in one php request usually fails - happily, the batch api helps a lot with that these days.

These are all tasks which you have to do on every import, and having code to support and make these tasks easier is useful.

-john


Thanks, I agree with the

mradcliffe - Sun, 2009-04-19 18:51

Thanks, I agree with the second two, but the first doesn't seem to be a commonality. Or I guess code-wise the way that migrate, wordpress import, and user_import aren't common imo. I guess abstractly it's all common: find data, get data into proper array/object data, save data.

A handler class system would be nice.

I really need to look at import module. I was excited when I first saw it on the session list at DCDC, but it was dropped. I haven't had the time to browse through that api yet.


Regarding the report of the demise of Import/Export API

ngaur - Mon, 2009-04-20 11:56

As of the past few days, things are looking promising for a 6.x release of the Import/Export API some time soon. Some people are reporting success already and a beta quality release sounds like it's not far off.


We've had a good amount of

cyberswat's picture
cyberswat - Mon, 2009-04-20 14:53

We've had a good amount of email communication between the authors of the WP2Drupal, Import/Export API, Migrate, and Import API modules. It sounds like the desire to start unifying our efforts is there ... now we just need to figure out what we will be doing and how. As expected, the idea of a code sprint where we can all collaborate is unlikely but we can still move forward. I understand that we all know our own modules quite well, but in an effort to fully understand the situation we need to take the time to evaluate the other modules in this category. The Migrate modules maintainer came up with these questions and I think they are a good starting point ... I'll be posting my answers after testing the modules listed above:

"As a starting point, I think an interesting exercise would be for each of us to look over the various modules in the same category, then write up:

  1. What do I consider the single most important unique value of my module?
  2. For each of the other modules, what's the one aspect I would most want to add to my own module?

If we can first converge on the unique value of each of these modules, we can move from there to figuring out how to make them work together..."


The stages of migration

mikeryan's picture
mikeryan - Thu, 2009-04-30 20:56

Sorry I've been quiet here - I was off on vacation last week, and I'm slowly getting caught up. I haven't had time yet to look closely at the other modules. But, some quick thoughts on the prospects of a common framework...

My experience doing large-scale migrations - not just with Drupal, but also with a big ERP (payroll/HR) migration a couple of years ago - is that the full process needs to look something like:

  1. Analysis - The incoming data needs to be fully understood. Even in the rare event that it's documented, the real data never exactly matches the documentation.
  2. Consolidation - Bring the incoming data into a common form.
  3. Import - The actual creation of the result objects from the common form. This is not a one-time event, this is a process to be developed, so you need to be able to easily back out and redo the import process as you refine it (fix bugs, add features, etc.).
  4. Auditing - how do you validate the results?

Obviously, not all stages need be implemented in a single module. But, by separating input (Analysis and Consolidation) from output (Import, Auditing), I do think one can have a common framework, dealing with various sources (HTML, Wordpress, etc.) independently (more-or-less) from destinations (nodes, users, etc.).

Mike Ryan
http://mikeryan.name/


I agree with Mikes analysis

dman's picture
dman - Sun, 2009-05-03 01:57

I agree with Mikes analysis of the phases - it's certainly the way I go about it.
Steps 1-2 is the hard part - and is the bit that's different for everybody.
There may be some crossover we can do between the approaches for step 3 however. A queue of pending items, preview and rollback.

FWIW, my 'common form' for import_html is vanilla XHTML with a bunch of semantic meta tags and a few microformats. Once I've got from 'clunky data' to standardized clean data, I know the import will go smooth.
There is a second 'common form' inside step 3 - the Drupal PHP object itself - which is what some import mechanisms go straight to. I find that a bit too close to the bone and (although I do it) putting too much care into adding elements to $node always feels like I'm bypassing the API and almost touching the DB directly.
Maybe we just need better CRUD tools.

With taxonomy_xml, (import/export taxonomy terms) I broke the different formats up into their own libraries - which scan the input and return a queue of almost-complete Drupal $term objects. For that one, queuing and partial imports was harder as I needed internal references most of the time. I needed to save a term to get its ID so that another term could point to it as parent etc.


Joining

JerryH's picture
JerryH - Fri, 2009-06-12 05:01

Just thought I'd say hello :)

I've been looking for an area to get involved with Drupal for a while, and seeing as I wrote the ImpEx for vBulletin (and 120 migration systems) this might be a good place to start !

Just reading over what is going on and getting up to speed, so I can lend a hand with some thoughts.

Cheers,
Jerry


Righty looks like a

JerryH's picture
JerryH - Sat, 2009-06-13 20:27

Righty looks like a combination of transformations sitting on top of the Import / Export API.


Middleware

omega8cc's picture
omega8cc - Sat, 2009-06-27 16:34

Modules table overhaul

s470r1 - Sun, 2009-07-05 18:56

Hi all, been using Drupal since 4.6, used to make some small contributions to d.o here and there quite some time ago (satori at abc3400 dot be).

Anyways, I've done a major edit of the modules tables. Apparently, there's no way to view the revisions nor the logs, so I'm reposting my log message here for posterity:

Massive overhaul;
Grouped and ordered table rows logically, added links where applicable (eg. module names), added lots of info regarding modules' releases and dev status, reordered field text for global consistency, etc...
Still loads to do though, eg. cleaning up the "Files Imports " and "Internet Imports" rows. (Get rid of 'em? Could all be merged with the "Input" row...)

MODULE MAINTAINERS: I've taken the utmost care not to let any data get lost or corrupted, but please review, just in case.

Peace


I've been evaluating all of

cyberswat's picture
cyberswat - Tue, 2009-08-18 15:37

I've been evaluating all of this information for a long time and have come to the conclusion that it would be in the best interest of the Drupal community to keep efforts focused on the best solution. For that reason I've discontinued the Import module and placed a reference on it's project page to the Migrate module. I still don't think the Migrate module is where it needs to be but it is leaps and bounds above the other modules in regards to planning and approach. I'll be using Migrate with all of my upcoming projects and look forward to submitting patches.


New module

pounard's picture
pounard - Tue, 2009-08-18 21:09

Hi, I'm currently developing a Drupal to Drupal synchronization/migration module, should it stands in this list?
See it there http://drupal.org/project/yamm

Pierre.


My brain literally just

cyberswat's picture
cyberswat - Tue, 2009-08-18 22:34

My brain literally just popped from the weight of my jaw hanging open in disbelief.


@pounard - Please add it -

mikeryan's picture
mikeryan - Tue, 2009-08-18 23:17

@pounard - Please add it - it looks very promising!

@cyberswat - Thanks for that note. I'm sorry that I never did do a really thorough review of all the modules covered here, although I did take a quick look at each of them. I'm not convinced there is a "single solution" - there is room for different domains (Drupal-to-Drupal migration/synchronization vs. import from external CMSs, not to mention the export side), and different approaches (quick-and-simple vs. kitchen-sink). I do feel Migrate is the best available external-CMS/kitchen-sink solution (naturally, since I've built it to meet my needs), but there's room for other modules.

I think it is still very valuable to maintain comparisons among modules in this general space, both to help people find the best module for their particular situation and to help developers avoid reinventing the wheel. Let's keep it up - but let's talk about the scalability of the chart, which is difficult to manage - can we do a Google spreadsheet or something like that?

Kevin, I'm looking forward to seeing your improvements to Migrate.

Thanks.

Mike Ryan
http://mikeryan.name/


Yamm module added to matrix

pounard's picture
pounard - Wed, 2009-08-19 09:50

@mikeryan did it.

@all My module is more like a Drupal to Drupal mass synchronization module than really an import/export module, I think some topics on general table are not really revelant for this module.

EDIT: typo

Pierre.


Note on this wiki page

pounard's picture
pounard - Wed, 2009-08-19 10:16

On small screens, blocks cover the right side of the matrix. This is quite annoying to fully read it I had to use firebug to put the whole blocks part as "display: none;", not much elegant.

Pierre.


New table: "Import and export"

juan_g's picture
juan_g - Mon, 2009-08-31 11:52

Pierre wrote:

On small screens, blocks cover the right side of the matrix.

I've moved the modules that import and export, from "Import (Generic)" to their own new table, "Import and export". This was necessary, I think, and I've made it in two steps to verify it carefully. As a secondary effect, this wiki page is more readable now.


Node Export

epersonae2 - Tue, 2009-08-25 15:25

http://drupal.org/project/node_export It's not in the table now, and I just used it yesterday and found it pretty helpful for Drupal-to-Drupal data migration. I'll try to add it in, although I'm not associated with the project at all. (It turned out to be the best solution for moving a webform from one site to another.)


Node Export and other modules added

juan_g's picture
juan_g - Tue, 2009-09-01 00:55

I've just added a column for Node Export with a few details (and ordered the table columns alphabetically). Others may complete the data.

Also, I've added some of the most used modules in this field, according to the drupal.org's list of import/export modules (link ordered by usage stats).


This is a big chart and I'm

trish t's picture
trish t - Mon, 2009-09-14 19:18

This is a big chart and I'm not seeing one of the main deciding factors -- can import profiles be saved?

Eg. user import and backup and migrate - both permit saved profiles, which allows a site dev to preconfigure some and then pass the routine along to a site admin and it covers a wide range of abilities of that site-admin. I can't pass along 'node import' with the same ease at all.

This variable is perhaps more meaningful than 'target audience'

How about 'saved import profiles'?


New rows: "Settings saved for reuse"

juan_g's picture
juan_g - Mon, 2009-09-14 20:59

I've added those requested new rows (just the html, and few details, because of lack of time), to facilitate that others can add the data. Since "profiles" could be confused with user profiles, the new rows are labeled "Settings saved for reuse".


sure - sounds reasonable

trish t's picture
trish t - Tue, 2009-09-15 20:02

Thanks. And sure, that sounds reasonable.

Fits in with the language of the user_import module - http://drupal.org/node/137672

While it's essentially the same routine, backup_migrate uses the variable $profiles and $profile_id in a file named profile.inc , which adds a bit of confusion into the mix


Case study with Migrate module

juan_g's picture
juan_g - Fri, 2009-10-16 14:26

The recent case study on the front page of drupal.org, Trócaire: Working for a Just World, says:

"we used Migrate to pull in the data from the existing CMS"

(...)

"The decision was taken to switch to Drupal, but there was still the issue of migrating all the existing content, files and users to Drupal. Using the excellent Migrate and Table Wizard modules, created by Mike Ryan and Moshe Weitzman of Cyrve, this task was made much easier. Over 2000 pages, 500+ files, almost 600 taxonomy terms and close to 100 users were successfully migrated across."

(...)

"Migrate, Table Wizard and Schema - allowed relatively pain-free migration of all existing content and users to Drupal."

(...)

"Trócaire were happy to contribute back all custom code and patches developed during the project to Drupal. Here are some of the main ones:"

(...)

"Migrate - #484404, #551254, #541700, #540908, #411156, #525996, #426824 and #520264."