Weekly calls #1 meeting notes

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Posted by chx on October 10, 2013 at 9:00pm
Last updated by chx on Thu, 2013-10-10 21:04

10 October 2013

Present: chx, eliza411, marvil07, moshe, mike ryan, alex weber, brent dunn, ashok modi

Architecture Overview
Moshe: concerned sources may be too specific
chx: extensible base classes, for example drupal 6 node would be an extension of the sql driver

most of what’s happening is moving code around, little new code being introduced

Mike Ryan: one goal for D8 is to switch out the base layer, to get the nodes from web services or PDO

The Process just wants a Drupal 6 node. It doesn’t care where it comes from. There is no abstract Drupal 6 knowledge - the processing step takes in an array representing a node and doesn’t care how it’s constructed. There is no re-usable between PDO / CSV / SOAP, etc.

Mike Ryan: As long as the sources provide the expected array structure, we’re okay.

ID mapping
chx: ID mapping needs abstraction
Mike Ryan: D7 has an abstract method for the ID map … nothing but sql has been implemented, so the abstraction already exists

Moshe: Let’s talk about configuration entities - what we’re losing.
When defined in PHP, you can do dynamic things with them. Mappings are in the config entity … sometimes you want them to be dynamic/generated based on a for loop over 1 - 100 and create 100 mappings.

chx: script creating the .yml file, but that’s a cop-out. Nothing stops you from writing a dynamic process plugin to do what moshe has described. Default is column mapping. Write the plugin, get the row, expected to change the row object. Do what you will with the row.

moshe: how do you set more than one process plugin -
chx: set it in the config entity: the key process contains a list of process plugins+configurations

destination plugin receives the data last in the chain of process plugins - it’s only a little different from process because it needs to help with rollbacks

moshe: error handling and successful saves, is it changing?
chx: hasn’t tackled this

performance instrumentation - now that xhprof has matured, it’s no longer a priority

call field column or property? Moshe and mike: property is more clear.
inside the row plug in instance, store the destination as an associative array for symmetry reasons. Most of the processing will want to use a setter method to set a single property at a time; therefore how we store it is less important. there’s no real structure so it seemed cleaner.

source should be associative array - class implies structure

destination plugin will take the pile of data and figure out how to create the actual domain object it works on and save it.

moshe: entity_create takes entity name (string node) second thing is an array.

Decision: keep destination an array.

migrating core config is the priority (not views, at this time) - we’ll need provider modules.
6 -7 -8 -5. 5 can get done in contrib, migrating views is higher.

We have to port everyone’s configuration (to state system and config system) / will need a CMI destination. We might need to write a module for 6 or for 7, that puts into the db or writes to files or provides a REST interface to things that only exist in code.

Next agenda items:

error handling, logging.
We’ve never migrated revisions before / how will we handle that. Note: https://drupal.org/node/1528028 is a prerequisite. Let's pretend it happened (perhaps help it happening).
Preserving IDs (previously avoided) even though we have UUIDs

Comments

php-etl

Posted by cosmicdreams on October 11, 2013 at 12:55am

Are there ideas in this library that can be of use?
https://github.com/docteurklein/php-etl

Software Engineer @ The Nerdery

Hardly

Posted by chx on October 11, 2013 at 1:08am

We have our architecture set up; we have the code already in Migrate that will be poured into that mold. The hard challenges ahead of us are Drupal-bound (keeping IDs, keeping node revisions and just getting the whole thing done) not generic ETL bound. The architecture is not far from an ETL tool either, we have a source plugin doing the extraction, a series of process plugins doing transformation and the destination plugin does the loading. However when I looked at the problem I thought that bringing the word Load into the Drupal world where we call the Read operation from CRUD already Load would introduce a lot of confusion. We can do that in D9 by renaming entity_load to entity_read and we perhaps should.

Weekly calls #1 meeting notes

Comments

php-etl

Hardly

IMP

Group organizers

New groups

Group notifications

Hot content this week