Migrate in Drupal 8: Drupalcon Denver BOF

Events happening in the community are now at Drupal community events on www.drupal.org.
mikeryan's picture

This week at DrupalCon Denver, we held a BOF to discuss the proposal at http://drupal.org/node/1052692 to incorporate some form of the Migrate module into core in Drupal 8 to handle major version upgrades.

Migrate module roadmap

My plan is for Migrate 2.4 to be the last functional release of Migrate V2, with improving the file destination and field handlers being the main theme. Migrate V2 would then go into maintenance mode (bug fixes only) while we focus 99% on D8, possibly backporting that core API (plus contrib enhancements) to D7 as Migrate V3 (depending on demand, and availability of underlying APIs in D7 similar to the dbtng and autoload modules in D6).

Migrate features to go into core

  • Management of map tables (tracking source->destination relationships).
  • Source/destination abstractions.
  • Handlers/listeners - ability to manipulate the data at key points in its flow from source to destination (equivalents of prepare, prepareRow, complete, callbacks...).
  • Handling of simple relationships - stubs, sourceMigration.
  • Dependency handling
  • Highwater marks
  • Dynamic migrations (may actually become the default, or even only mode)
  • Simple UI for performing migrations (upgrades) - two buttons, import and rollback.
  • Support for PDO and XML sources.
  • Example module supporting automated tests (as wine.inc in migrate_example does now).

Migrate features not to go into core

  • Rest of migrate_ui.
  • Field mapping annotations (description, issueGroup, etc.).
  • systemOfRecord = DESTINATION (although this may be difficult to add at the contrib level - we need to be sure the core API gives us the right entry points to be able to do this).
  • Drush commands (well, maybe)
  • Support for Oracle and MS SQL sources.
  • Support for CSV and JSON sources (or should one or both of these go in core?).
  • Examples beyond any that support tests.

Improvements to make over contrib Migrate

  • Smarter sources - Right now, the source and destination classes aren't quite mirror images. Sources operate at the data layer - SQL, XML, CSV - while destinations are at the next layer of abstraction - nodes, users, terms. We would like an infrastructure that supports mapping a Drupal 7 node (source) to a Drupal 8 node (destination). How to deal with either a direct DB connection or XML feed as the source of those Drupal 7 nodes?
  • Better handling of complex relationships and structured data, like collection fields.
  • Destination support for any core data not currently covered by the contrib module.

Support for D2D upgrade path

  • We need to migrate configuration (including content type and field definitions) - so, a Drupal variable source and a D8 config file destination.
  • DX is critical - contrib modules will need to implement this API for their upgrades, we don't want this to be an impediment to contributors.
  • Is Batch API adequate for big migration/upgrade jobs? Short of including drush with core, what else can we do? Recommend using the contrib module with drush support? Or, can we implement the drush commands in core?
  • This should support first PDO databases as a source, then the output from the Web Services Initiative, then more general sources (e.g., aggregrator/services-based feeds from pre-8 versions of Drupal).

Next steps

I see three pieces to be committed to core:

  1. Import API - the general purpose get-stuff-into-Drupal classes.
  2. Upgrade API - Should be a dirt-simple DX for not only core components but contrib modules to transform data from another Drupal installation into Drupal 8. From any version of Drupal, including 8, so "Upgrade API" may not be the best name for this.
  3. Core upgrade support - implementations of the Upgrade API to pull data from other Drupal installs. First, of course, would be Drupal 7 as a source - it would be great to also support Drupal 6 and Drupal 8 as sources, with contrib support for earlier Drupal versions.

So, let's start figuring out the architecture of the Import and Upgrade APIs. I will add separate posts to the group for those.

Participants

Thanks to everyone who came to the BOF and contributed to the above:

Mike Ryan (mikeryan)
Andrew Morton (drewish)
Kristofer Widholm
Ashok Modi (btmash)
Adrian Rollett (acrollet)
Camilla Krag Jensen (naxoc)
Joe Stewart (joestewart)

Comments

Other improvements

mikeryan's picture

A couple of other thoughts on improvements to make with the Import API a.k.a. Migrate V3:

  1. Right now createStub() is an optional Migration method - it really should live in the destination class and have a default implementation, with the ability to disable stub creation if necessary.

  2. Whether systemOfRecord = DESTINATION goes into core or contrib, we really would like to implement the basic semantics in a single shared place, rather than have each destination import() implementation have to deal with it.

Mike Ryan

Sorry...

mikeryan's picture

It's been a while - I have been pondering the broad design of the API off-and-on (and I have some POC work on migrate_d2d to commit as soon as it's semi-functional). I kept imagining in just a few days the design would be ready to share with the community, but somehow it never quite was... So, I'll post my semi-coherent thoughts here for the moment - I'll pull together a wiki once I've got some feedback on the high-level concepts.

Where the proposed Drupal 8 class or method has a close analog in Migrate, the Migrate name follows (parenthesized and italicized). The Import API will basically consist of the base classes from Migrate’s includes folder. The destination support now in plugins/destinations would live as implementations of ImportDestination in the individual targets (ideally, one entity destination replacing the current node/user/etc.). Analogous to Database API, the source support now in plugins/sources will be in a Driver subdirectory/namespace. TBD: Review of the proposed plugin architecture and how we can leverage it.

The Upgrade API will be based on the work now going on in the migrate_d2d module. The base classes such as DrupalEntityImporter (DrupalNodeMigration et al) would be part of the Upgrade API, and the version-specific derivations would be... Not quite sure where... Perhaps under Driver?

As for a UI for upgrades, not to mention whatever help users would need to bootstrap the process... Working on some ideas in migrate_d2d, to be committed soon.

Import API

ImportController

Right now, the support for handling multiple migrations - returning lists, running processes in the right order, enforcing dependencies - is either in static MigrationBase methods, in global functions like migrate_migrations(), or handled directly by the UI/drush. I think we should have an ImportController class managing this stuff in the Drupal 8 Import API. Separating it out would make it easier to do things like defining import playlists.

One point of flexibility that could be added is making the possible operations dynamic. Right now we have hard-coded import and rollback operations, as well as stuff like stop, reset, etc. A means for contrib or custom modules to dynamically add operations such as audit/analyze/etc. would be nice (Moshe points out this issue.

<?php
public function registerImportStep($machine_name, $class_name, array $arguments);
public function
deregisterImportStep($machine_name);
?>

ImportPlaylist

An ordered list of ImportStep objects, with dependencies. Analogous to groups in Migrate, but rather than a group being an attribute of a migration class, an ImportStep could belong to multiple ImportPlaylist objects.

ImportStep (MigrationBase)

MigrationBase was abstracted out from the Migration class to permit inserting steps into the import process that didn’t fit the model of moving data row-by-row from source to destination, the most common use cases being disabling/enabling stuff during the import process, or creating necessary fields. Much of what it currently holds may be managed by ImportController, but it still will be helpful to have this level of abstraction to support import processes that don’t fit the DataImportStep model.

DataImportStep (Migration)

The heart of the API - this class manages the relationship between one data source and one data destination, moving data one row/object at a time.

ImportSource (MigrateSource)

An Iterable class, returning one row of data at a time from the appropriate source. Implements stuff like idlist, itemlimit, highwater marks (or should the controller take that on?).

ImportDestination (MigrateDestination)

Represents the destination - hopefully a single “entity” destination should be doing most of the work in D8.

Another thought

I just started thinking about this - while it appeals to our sense of symmetry to treat sources and destinations as parallel concepts, maybe that’s not the best model. The destination is really much more tightly coupled to the importer (migration) class than the source is. Consider the Drupal-to-Drupal migration example - while, say, an article migration would only have one possible destination implementation, wouldn’t it be nice if the source were a true plugin? I.e., although migrate_d2d right now hard-codes the source as a MigrateSourceSQL (import directly from a PDO database), wouldn’t it be great if we could choose whether to import the articles from the DB or an XML feed? Without changing a single line of the DataImportStep implementation? We could just plug in a different source class, which just would need to satisfy the contract of what to deliver in $row.

ImportMap (MigrateMap)

Maintains the relationship between source items and destination items.

ImportPropertyMapping (MigrateFieldMapping)

Represents how a given destination field will be derived from the source data. This is a fluent API, with a number of chained methods beyond the basic “copy this source field to this destination field”. Should make this easier to extend (pluggable) in D8.

Handlers

These former handler classes and their derivatives will be replaced by the plugin mechanism now being developed for core:

  • MigrateHandler
  • MigrateDestinationHandler
  • MigrateFieldHandler

Upgrade API

The first question here is, is “Upgrade” the right name for this? Because, although the use case at hand is upgrading from Drupal 7 to Drupal 8, this API should be general enough to support importing from any other Drupal installation into the current one - it could handle Drupal 6 to Drupal 8, Drupal 5 to Drupal 8, someday Drupal 10 to Drupal 9 (and vice versa). In light of this, “Drupal” rather than “Upgrade” may be the base of API names in this proposal.

Mike Ryan

...

sun's picture

on ImportDestination: At least starting with D8, some imported D7- data conditionally needs to be migrated into configuration system objects instead of entities. So the migrate process will have to be able to, say, select all node types from the D7 database and convert + write that data out into D8 configuration files. (This will most likely affect all "configurable thingies")

The goal for D8 is to generally differentiate between 3 forms: content (entities), state (persistent non-content/non-configuration data), and configuration (ex variables + configurable thingies).

Thus, on the migrate destination side, there would likely have to be three CRUD/conversion handlers that are able to convert the raw/old input data into the new expected format, suitable for the actual CRUD/storage controller of the respective format (i.e., entity's DatabaseStorageController, state's generic key/value store, and configuration system's Config file object format).

Also note that the configuration system supports an export/import mechanism of its own (for staging configuration between dev/stage/production sites), which will resolve interdependencies (within to be imported config; e.g., a new entity bundle and a new field needs to be imported, before the field instance can be imported). I'd think that the migration process to D8 should result in the required configuration files to be imported only, and the migration process then fires off the config import mechanism, before importing any data.

Daniel F. Kudwien
netzstrategen

API names

juan_g's picture

The first question here is, is “Upgrade” the right name for this? (...) In light of this, “Drupal” rather than “Upgrade” may be the base of API names in this proposal

I think the names you are considering are already fine.

For example, if it was just one API, used mainly to import and upgrade, it could be simply named "Import/Upgrade API".

Or, since you seem to prefer two APIs, one can be "Import API" like now, and the other either "Upgrade API" or more comprehensively "Drupal to Drupal API", just like the current sandbox.

Migrate D2D

rivimey's picture

Just a couple of thoughts, if this is not too late:

migrate_d2d_ui is currently quite easy to use if it can describe what you want to do, but if not then it seems you have to abandon it entirely. It would be great if the ui code could export its config to code, from which you can then add in whatever is needed.

I am glad you are emphasising the same-drupal-version migrate as a requirement - that is exactly what I've doing with migrate all 3 times I've used it.

Content migration, import, and export

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: