Data APIs for Drupal 7 and web services support

Events happening in the community are now at Drupal community events on www.drupal.org.
nedjo's picture

There's growing interest in the Drupal community in the prospect of renewing our core data handling APIs. Doing so will increase consistency and efficiency and ease barriers. It will also be a key step in enabling transactional web services.

In Drupal 6 we took some impressive first steps. What should we tackle for Drupal 7?

The attached paper, written by Nedjo Rogers and Henrique Recidive and sponsored by CivicSpace, aims to carry this discussion forward and map out both some conceptual space and concrete development tasks.

An accompanying post will present for review and discussion a proposed Active Records implementation for Drupal, taking advantage of features built into PHP 5.

Please, wade in and review, critique, and improve!

(I'm posting this to the Schema API and Services groups as the two most closely matching this topic. We could consider forming a new Data API group as well.)

AttachmentSize
drupal_dataapi.pdf156.73 KB

Comments

Lazy loading -- no

chx's picture

PHP magic methods are slow and you want to add that to fundamental load functions? While I am really grateful for this work I find it frightening that just the mere existence of Record module now will be used as an argument towards how should we do all this. My experience with the installer was not pleasant and that was fairly independent. I have no desire to clean up / refactor a patch which affects all core.

For example record uses the class notation. This is not what we have agreed in Barcelona. The agreement was to try full out OOP with file as that's broken enough that a new implementation can not break it more and then evaluate. And plan meanwhile. A good OOP framework is great but if it is not good enough, you can't hack your way out as you can do with procedural code. Also, there is the argument that people who do not come academia find OOP alien and strange while those who come from academia still can easily write procedural code. In other words, if you go full OOP you will raise the bar and lose a lot of contributors. I could be one of them, I do not know.

What we agree here is the theory -- yes, we definitely need to add relationships to schema API and build out a CRUD around that.

More

chx's picture

I warned people in Barcelona that this path will be hard as this time we need to write code , test, bench knowing that many of those will thrown away no matter what work went into it. It's simply not enough to draw up a skeleton and pry at it -- I have learned this hard with the menu system -- if not for Peter Wolanin, I do not know what would have become of the new menu system, there was just too many little (or not so little) things to find out and write. And this time, it could be that certain methods will make it impossible those things to implement. We are changing just too much at the same time.

Interesting

tjholowaychuk's picture

Some really nice points made in the PDF. I'm no expert but I would definatly agree with your idea of centralizing all the calls to each object type, that would reduce the learning curve slightly too I would imagine.

vision media

Issues as discussed in Barcelona

calebgilbert's picture

Discussions, held in Barcelona which include several active core contributors, concerning the direction of Drupal and OOP:

Part I:
DrupalCon Barcelona 2007] Drupal and PHP 5

Part II:
DrupalCon Barcelona 2007] Drupal and PHP 5 OOP

Record module and OOP only one example

nedjo's picture

Had a good checkin with chx. Our paper has probably left the impression that we're set on OOP. Let me rush to clarify, not at all.

The key challenge we're up against can be summarized as:

  1. large and 'core' pieces of core need renewing
  2. the scope of change is such that much of this can't be done gradually.

To warrant sweeping change, we have to be very sure of the direction. Are we going to get it right in one go? No way. So how do we start?

Probably by clarifying the basic vision, and then digging into several potential implementations.

Henrique's draft Record module is not our answer to what to do for D7. Rather, it's an attempt to push forward the conversation by giving us something concrete to look at.

Should we be considering some basic OOP, if we can find a Drupal way of doing so? Should we consider some of the new possibilities in PHP 5? Certainly. Is this the only option, or necessarily the best one? Certainly not.

Henrique's Record module would be relatively easy to redo without the classes.

We're going to need some healthy false starts. Probably we need a few different approaches that we can tear apart, benchmark, and then selectively learn from. That's the spirit that the Record module is offered in.

Henrique will be posting a writeup on his Record module soon to give some better context.

Thanks for wading in, chx and others. Keep those critiques and comments coming!

False starts

chx's picture

The big, big problem here is that we need more than starts. We discussed and agreed that the more promising starts will need to be brought to all the way -- you will need to fork Drupal temporarily and try to redo most of the important parts of core knowing that all the work might be in vain. Indeed, some of the work will need to be redone because core development won't stop meanwhile, so some parts will need to be redone. Having been through two major (fapi and menu) changes and two smaller but still quite big (installer, configurable content types) subsystems, I must say, it's never as easy as it seems.

Nicely put

nedjo's picture

And you've got me thinking. Is there a way we can have one common patch that would support a number of implementations that we can develop and compare?

In the previous dataapi patch, http://drupal.org/node/113435#comment-249734, I roughed in the ability to override a default generic set of data API methods with a custom one:

<?php
// Allow for custom dataapi handling methods.
// Must come after file.inc.
+  if (variable_get('drupal_dataapi_include', '') && file_exists(variable_get('drupal_dataapi_include', ''))) {
+    require_once
'./' . variable_get('drupal_dataapi_include', '');
+  }
+  else {
+    require_once
'./includes/dataapi.inc';
+  }
?>

Possibly we could do the same here. That is, we would (a) define a minimal spec that all the candidate Data API implementations would share, (b) patch/fork core to enable that, with a minimal default implementation, and then (c) develop the different implementations in parallel.

Document process

Chris Charlton's picture

Is there plans to formalize documentation or development specifications? At least outlines of (potential) API's/functions or UML or even wiki(s), for core team members and people looking to contribute?

Chris Charlton, Author & Drupal Community Leader, Enterprise Level Consultant

I teach you how to build Drupal Themes http://tinyurl.com/theme-drupal and provide add-on software at http://xtnd.us

Most of all: thanks

chx's picture

I was just too scared of the OOP code to properly express my gratitude for getting the ball rolling. I am surely that at the end we will a great Drupal 7! Keep the code coming :)

Amazon's picture

Web applications are becoming increasingly interactive. Drupal has adapted to this need by incorporating richer interactivity with JQuery. The next generation of rich Internet applications will take advantage of technologies like Flex, Silverlight, and XUL. How can Drupal support these rich applications?

First, we need a consistent external APIs which can interact with this new layer. In order to have good external APIs, we need consistent internal APIs. Many software services are beginning to expose their own DATA APIs(Microsoft Project Astoria, Google's GData, FaceBook Data API, to name a few).

Second, rich applications will not only take advantage of data internally in Drupal but also external Data sources like mash-ups. Mash-ups will also require these consistent APIs and support for web services.

Dries has layed out seven steps for a killer Drupal 7 release. Good Data APIs are a foundation for 4 of these features: Better UIs like DabbleDB, Better internal APIs, Better external APIs, and Web Services support.

Cheers,
Kieran

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

AMFPHP is golden

toursheet's picture

The services API is golden with AMFPHP - easy interaction with Flex and Drupal modules~! We're using it for a ton of calls including mash-ups.

http://drupal.org/project/amfphp

OpenSocial: Data API

Amazon's picture

More reasons why a consistent data API is necessary to keep open source a viable alternative to Software as a Service, with open standards.

http://code.google.com/apis/opensocial/docs/gdata/people/developers_guid...

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Great contribution

moshe weitzman's picture

It will take many heroic efforts like this white paper to get us where we want to be. Thanks for this.

  • Puzzle #5: seems like a nice to have, not really a fundamental principle.

The paper is spot on. Lets do it.

One other module to note is the Object driver. It is an extremely simple set of CRUD functions. I think it is immediately grokkable by all, which is a great thing.

Puzzle #5

nedjo's picture

Puzzle #5: seems like a nice to have, not really a fundamental principle.

Here's what we have for #5:

Create and update operations can accept data without IDs set and respond appropriately. For example, we can save a node with an associated user (author), that user being identified not by ID but by an array of properties. If the user does not exist, she/he will be created as a user. If the user does exist already, the corresponding user id will be saved.

Agreed that this is not strictly necessary. But consider our current approaches for distinguishing create from update transactions. In most cases, we do so by the presence or absence of an ID value. If the ID field is filled, we assume an update. If not, it's create. (Or, quirkily, we assume update if both an ID field and another identifying field are filled, and delete if an ID field is filled but another identifying field is not.)

My feeling is that, to really enable sharing of data across instances and platforms, we need to free ourselves from having to know specific IDs.

Mobility of structured data - freedom from the specific IDs of a given site - is a key need when we're aiming for seamless intersite data pooling. But because we're stuck on IDs, currently we need whole new systems for importing or exporting data (witness the incredibly complex Import/Export API).

Consider the task of externally publishing/creating a series of nodes over time by a given user on a given site. A user is uniquely identified by username and mail, and together these two fields are enough to create a new user record (assuming for the sake of argument that there are no other custom required user fields).

Currently, we would need to approach this in a series of steps, e.g.,

  1. Determine if the user exists. If so, get the user's ID.
  2. If not, create a new user, and get the ID.
  3. Create the node, assigning it the ID of the user.

Instead, I'd like to be able to feed something like the following:

<?php
$node
= array(
 
'title' => 'My content',
 
'#user' => array(
   
'username' => 'remote user',
   
'mail' => 'remote@example.com',
  ),
);
?>

The #user array is processed first. If there is an existing user, that account is matched (according to defined matching criteria) and saved (if it has any changed parameters). If it is not matched, a new account is created. In either case, the uid value is added to the #user array, so that it is present when the node is saved.

In the conceptual code at http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/dataapi/dat...
I suggested a matching mode for save operations:

<?php
* @param $mode
*   A constant representing the loading mode. Values are:
*   -
DATAAPI_MATCH_MODE_LOOSE. To determine if the item exists, match by registered #match_field values.
*   - DATAAPI_MATCH_MODE_STRICT. Only match an existing item by ID.
?>

The basic idea: if we build support for optional "loose" matching into our Data APIs from the start, we get a whole bunch of interoperability practically for free, avoiding the mess of building a whole new set of parallel APIs for data exchange as opposed to local data handling.

Unique keys

recidive's picture

For the example you provided we could use 'unique keys'. The user table has 'name' as unique key, and this suffice to match an user without a primary key (uid). I guess we could add 'email' as unique key as well.

I think not every object can be matched without a primary key. How could we match a node? Node titles are not unique. I saw there is a DATAAPI_MATCH_BEHAVIOR_ALL flag on the node table definition on the dataapi module. So should we match a node only if all the fields we provide match, e.g. a combination of node title, node type and uid?

Jotspot Data Model and DATAAPI

Amazon's picture

Google now has two data models since the acquisition of JotSpot.

http://www.slideshare.net/scottmcmullan/introducing-the-jotspot-data-mod...

is the other

http://code.google.com/apis/gdata/overview.html

Kieran

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

great for RIA

g10's picture

For RIA (flex, flash, silverlight…) development, this is great!

The main disadvantage at the moment, is that several methods return 'skinned/styled' data (with html included)… this makes the data unusable in a view other then html (so unable to make a service from it).
With a Data API this would be solved. (Or with consistent seperation of logic and view.) This even without the need of OOP.

Allthought, is this the right approach? In case OOP (and MVC patterns) would be used, then most of this work would be done... there would be no need to implement a seperate API, simply the public methods of the Controller would form the API. Which would make it much more usefull to build services with it.

Anyhow, a transition to full OOP doesn't happen in one night, so therefore a Data API would be a nice intermediate step. And if the modules also implement theirs, then this would be a great improvement to build services.

a transition to full OOP doesn't happen in one night

chx's picture

Who said it will happen in any number of nights? Anyone reads/listens to what I am saying or I'm just making an ass out of my mouth?

--

g10's picture

ok, my bad… I skimmed through this tread and was excited by the combination of the words 'Drupal' and 'OOP' ;)
(which can happen when such a tread is posted in a group with mainly Flex/Flash devs… )

Help Drupal adoption - migration and importing

Amazon's picture

From Dries "State of Drupal" survey

What did you use before Drupal?
Home grown CMS: 42.1%
WordPress: 23.5%

A Data API could help improve importing data and improve the adoption of Drupal.

Kieran

To seek, to strive, to find, and not to yield

New Drupal career! Drupal profile builders.
Try pre-configured and updatable profiles on CivicSpaceOnDemand

Puzzle #5 needs database introspection

Chris Johnson's picture

To properly implement solutions to puzzle #5 (key, no key) and not completely kill Drupal's performance, the code will need to know the best attributes to use against the datastore. This seems to imply either an in-core schema accurately reflecting the database implementation, or database introspection.

Since this was authored, we

moshe weitzman's picture

Since this was authored, we now have a fields in core patch and PDO DB layer. Anyone interested in reviving this?

Services

Group organizers

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: