Posted on behalf of dman@drupal.org:
From http://basement.greenash.net.au/soc2006/ImportExportApiModule?show_comments=1#comment_19:
Where is good for discussions? A Drupal.org thread, the issues register, or somewhere here? I'd rather keep it open than emails.
Granted you may not be ready for a code review yet :) I had a quick look and was surprised you are attacking the database directly. Wouldn't it be better to let Drupal continue to handle the serialization, and get (eg) user_load() or node_load() to instantiate the php object, and base your mappings to that structure? By digging directly at the database (clever though your structure mapping system is) you are bypassing a lot of Drupal goodness, especially once you come to node_load().
-- DanMorrison (2006-06-16 18:41:10)

Comments
This group?
I think that this group is probably the best place to have discussion. Hope you don't mind my cross-posting your comment here.
Jeremy Epstein - GreenAsh
Jeremy Epstein - GreenAsh
Right place for certain
This is the right place, and it is a great debate to have. In fact, your point about whether node/user load return already themed values is a really clear-headed tip on how one can vet code for bad smells.
Querying engine
You are quite justified in being surprised that I'm "attacking the database directly". However, considering what the Import / Export API has to do, I feel that it is the right way.
First of all, the DB querying engine that I've written is very efficient for grabbing data in bulk, unlike functions such as node_load(), which use queries over and over again for each instance of a node.
Second, the querying engine puts data into a standard structured form, whereas the various *_load() functions in Drupal are inconsistent in terms of what they return, and as there is no way to guarantee consistency in what they return, since contrib modules can add new badly-formatted fields through things such as hook_nodeapi('load'). With the API's hook_def_alter(), OTOH, modules can add new fields, but their format is guaranteed to be consistent.
Third, much of the "goodness" in the *_load() functions is not needed, since it processes some values rather than returning them in raw form, and sometimes even in a way that is specific to HTML output (although wherever this happens, it is really a bug, as this should all be done through themable functions in Drupal). The Import / Export API has its own 'process' hook, which is able to emulate what's needed from the *_load() functions, and which can happily leave out what's not relevant.
Having said all this, I still think that you have a very valid point, and I still share your reservations about bypassing Drupal's standard functions. Also, I doubt that a direct querying approach will be practical for the import process - i.e. bypassing node_save() would probably be more trouble than it's worth, especially since INSERTs have to be done one query at a time anyway, unlike SELECTs.
Jeremy Epstein - GreenAsh
Jeremy Epstein - GreenAsh
Thoughts
Meh, I didn't get back to you directly, it being a wiki with no notification and all, and I forgot to check back after a few days. And now IE doesn't persist cookies on this groups site (?) Anyway, we're here now.
I appreciate your decisions I guess, if you are thinking more about bulk imports. I don't see what the danger from other modules really is, but I guess you deserve to be able to pick and choose which processes you apply and which you don't. If you are doing your own process() or node_invoke() type thing when needed, it should be fine.
Some of them really are quite useful, like path or menu. Sure, you CAN rebuild the work they do, but I found it easier (in my import_html.module) to just set $node->path and then node_submit($node) . Remaking the DB calls for that job would have been so wrong.
Other ones, that really modify the content may be unwanted, but I don't work with any of those at the moment. Surely most of them run as filters anyway?
I imagined that node_load() and node_save() just did validation and serialization. It's node_prepare() and node_show() you need to avoid. OTTOMH.
When you start thinking in bulk, you need to also think about creating your navigation on-the-fly also, not to mention cross-referencing between sibling files you might be importing simultaneously (or even asynch). I had fun with that on my attempt. Have you given thought to those issues? finding relative images and links when it arrives on the other side?
.dan.
Another use-case scenario
I've done some work on netnews.module.
It's a 2-way interface to an nntp server, reading NNTP and copying it in as a Drupal forum node, and vice versa, sending a copy of new forum posts to NNTP as new messages.
I was half-way through some code when I realized I wanted an import API rather than dig around and remake the node insert calls again.
Anyway, I added attachment support, and saw some thing you'd have to deal with.
...Like attachments being added to a node from a filesystem or other external source that's NOT 'upload' protocol. Seems upload.module that sorta owns the files table doesn't allow for being told an existing path too well.
How on earth should you export attachments anyway? May not be the core API job, but i'd need a hook that can be done through...
Just a few thoughts...
.dan.
Attachments etc
At the moment, I haven't got any solid ideas for how to handle attachments, so any brainwaves that you (or other people) have is most welcome! However, my thinking so far is that 'user files' (i.e. attachments, images, audio, etc) will require their own 'get' and 'put' engine (e.g. 'userfile'). In fact, this is the main reason why I have introduced the 'get and put API' into the module - because files are not stored in the database - they're stored on the filesystem - and so they require their own engines separate to the database engines. Since files in Drupal are also tracked in the 'files' table in the DB, however, they'll also need a corresponding entity whose primary engine IS the DB.
Like I said, a UI is technically out of scope for this project. However, I'm thinking that ultimately, the default UI for exporting attachments would consist of a 'download' link for each file; alternate UIs could do more advanced things, such as packing multiple attachments into a tarball that could then be downloaded (but which would probably require non-standard PHP libs/extensions, or 3rd party libraries - don't know, haven't researched such things). Similarly, the default UI for importing attachments would consist of a series of 'upload' file form fields for each file; and alternate UIs could do things such as accept a tarball and then unpack it.
BTW, I would recommend that you wait until I have written the definition for the base 'node' entity, and for the various core node types, before you start writing node import/export things that depend on this API. I haven't written all the core definitions yet, and for good reason too: the API is not yet stable enough! But when I do write the 'node' definition, I'm planning for 'node' to be one entity (which is not actually defined through hook_def(), but which is just defined in a stand-alone function), and for each node type to be a separate 'node_typefoo' entity (which grabs the base definition from the stand-alone function, adds to it, and then passes its definition along through hook_def). Nodeapi-like modifying of node type fields will be done, of course, through hook_def_alter().
Jeremy Epstein - GreenAsh
File archives in PHP
Just did a bit of digging around, and I have plenty of good news! The Zlib library (enabled by default on most PHP installations) supports reading and writing to .gz archives, and seems to be our best bet for allowing import and export of attachments in a single archive file. There is also the Zip library (not enabled by default in PHP), which allows reading (but not writing) of standard zip files. And there is the Bzip2 library (also not enabled by default in PHP), which allows reading and writing of bz2 compressed files.
Looks like the default UI for attachments might be able to get away with handling .gz archives! Can anyone comment on whether it's safe to rely on Zlib being installed on the majority of PHP boxes?
Jeremy Epstein - GreenAsh
dman
Well, Dan, looks like you've become my unofficial 3rd mentor for this project. Congratulations, and thanks for your help and feedback so far!
Jeremy Epstein - GreenAsh
Sorry -- stupid comment
I'm editing this comment because it was stupid (wish I could delete it). I somehow got confused and wasn't seeing posts in this group later than June of last year. So I posted asking whether it was still alive, which obviously it is (now I see the posts after subscribing...) So, sorry for the spam.