Drupal Data API Design Sprint, Day 2

In order to join this group, you must login or register a new account. After you have successfully done so, you will need to request membership again.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Posted by nedjo on February 6, 2008 at 6:23am
Last updated by bjaspan on Sun, 2008-02-10 03:32

On this second day, we:

Defined use cases for Drupal as a web development platform as opposed to CMS
Elaborated a set of target outputs for the design sprint
Further developed key solution components and their implications for Drupal design.

Use cases: Drupal as a web development platform

What should Drupal as a web development platform be able to do? By answering this question, we both clarify the rationale for a renewed data API and bring into clearer focus some key aspects and requirements of that API.

High-level use cases:

Stand-alone web site
Web Services server
Web Services client
Legacy/proprietary front-end (e.g., pre-existing database)

What Drupal provides is an infrastructure for a large number of contrib authors to add to what is otherwise a very standard web services.

Target outputs

Plan for next six months
Outline of to minimal CCK in core
Breakdown between existing CCK core/contrib
Broader long term vision for data model
Next steps after minimal CCK in core, Who could do it?

Key challenges and solution components

Multiple (local and external) data stores

When we support external data stores in addition to local ones, we get a multiplicity of possible sources for a given object. Some examples:

A page that is a local blob of data.
A user representing a human being with access to the system.
Events in a third party system that we do no additional value to.
Say we have artwork in a remote server that we’re not storing data on but we want users to be able to e.g. comment on. That is, while the object itself is external, we have data we are storing locally linked to this external object.

(See attached image below.)

Local representation of remote entities

Problem: if an entity is external, when and how do we represent it locally?

Taking example 4, we’re storing a comment that needs to be tied to something. What do we tie it to?

We need a local identifier for this external object. Accordingly, we need a method, e.g., get_id(). We might store the external object ID (say, a url) directly or we might create a local linking table, where we create a local ID that references the external one.

We need to have an ID and metadata. We need to be able to tell that this is my object and it has these fields. Whether the fields are loaded locally or from an external source matters less.

In example 3, above, we have events that we provide no value to. So we can say that the Drupal represents them. Do those remote data have a unique id in our local system?

It seems that we need to create an ID for an external object when a user decides they want to comment on it.

Entity operations set

Problem: What is the basic set of operations that an entity type (local or remote) may need to support?

We currently have a data model for our SQL content that we call nodes. That data model provides a particular object structure, with a set of operations. E.g., load and save nodes, search nodes. We can search by properties (e.g., author) and by keyword.

Returning to our example of an external artwork entity type as loaded from a web service, the artwork model needs to provide a number of operations.

search
get ID
load
possibly, save

If we search over nodes and artwork, we execute two searches, since we have two separate data sources. We merge the results. If a user wants to vote on an external item, we need the identifier for that item so that we can link it to the comment.

Entities have to have a collections of properties. In the interface paradigm, the entity interface specifies that you have to have an ID and a title plus more. A node currently implements our SQL-based model. Nodes implement the SQL interface. Artwork will need to implement the entity interface. E.g., Views will need to be able to draw on anything that implements the entity model—not just SQL items.

So we end up with different types of entity that we can feed into our general API calls, e.g.:

drupal_load(‘node’, id);
drupal_load(‘art’, id);

An entity may be entirely local, entirely external, or a hybrid.

External data store metadata and operations: attached to fields or centrally pooled?

Problem: in the case that multiple fields refer to the same external data source, do they need to store metadata independently and make separate transactions?

Example: two fields, one price and other ISBN, both loaded from an Amazon record. Does each field need to store how to fetch this Amazon record? When the record is loaded, are there two separate requests for data?

Possibly, rather than being loaded in series, all fields need to be evaluated and then smart loading executed based on knowledge of common sources.

Likely this problem is approached in a way similar to what we will do with SQL-based fields.

A field type for every data store?

Problem: We have field types. Currently we have e.g., text fields, number fields. Is the storage mode part of the field type? E.g., we have a text database field type, an external text field type, etc.? Is the storage mode a setting of the field, like the formatter?

A concrete example. A locally-stored book item, with title and author and a price that’s loaded from an external web service. How is this third implemented—as a number field type? as an Amazon price field type?

Three options:

The data source is a property of a field type that can be set.
One field extends another field type
Fields types are grouped into classes, such that there is e.g. a ‘number’ type that might include various number field types. A renderer would describe itself as supporting not a finite set of field types but all field types of this class ('number').

Initial inclination is towards third approach.

Modelling external data sources

Problem: Should we have an equivalent for schema API for each external data source?

Example: simple DB from Amazon. Each role may have different rows. If there’s a web service that gives you access to that kind of data, we can’t model that in schema API.

Initial inclination: leave any data structure knowledge to the individual modules implementing entities.

Request formats

Our requests come currently in several forms:

path
additional GET variables
post operations
cookie data

To support web services and multiple rendering formats, we will need to add support for further request formats. For example, a web service may need to accept incoming XML requests (through a raw Post).

Multiple formats and the menu router system

We may need to create Drupal pipes. Currently the pipeline is three steps:

object loader, load object
call page callback
feed to theme(‘page’)

One thing that might help us is to see whether we need additional parts, or do we need to make this a configurable pipeline? It’s not impossible. E.g., the idea has arisen of making the page callback an array, but so far that hasn’t made sense. Now, we may see a powerful way of doing this. E.g., you have an array of callbacks and you call callback 1, pipe it to the 2nd, pipe it to the third.

Currently, we have node_load, callback, theme(‘page’). We may want to not fix these down with given names but instead allow the particular item to declare its own array of handlers, providing a lot more flexibility.

Understanding what’s in node module:

Problem: if entitites is a data model that can be implemented by various mechanisms, some of what is currently in node module will need to move into the entity level.

What happens in nodeapi?

Problem: if we’re going to provide superior ways of registering fields to nodes and potentially deprecate some of the operations we have, we need to ensure we replace any key functionality.

Take nodeapi 'load' as an example. In this op currently we can:

add to the node object
modify existing node attributes
fire additional actions

Clearly, fields will cover the first of these cases. At this point, fields will be rich enough that they should cover the case though we may leave support for other, non-field additions to the node object through the existing hook. The advice may be: please use the field API approach unless there is a good reason to do otherwise.

Taking the second case, should we still be able to modify existing node attributes? This is possible currently in nodeapi load only if modules are "misbehaving" by modifying the node reference directly rather than returning an array of new properties. So, likely, we won't need to explicitly support what is not officially available now.

Types and subtypes

What entities can have subtypes?

Problem: currently we have one entity type - nodes - that take subtypes, and a model in which fields are associated with these subtypes. Can other entity types (e.g., user) have subtypes?

Assume that we have two classes of users on a given site, staff members and the general public. For staff members, we have specific authentication information (e.g., an NT domain). One way to handle this might be as user subtypes. That is, we have two subtypes of users, staff and public, each with their own fields.

As another example (to follow an earlier scenario), take the comment entity type. An entity type is defined by its unique behaviours or characteristics, In this case, a comment has a threading functionality. This is part of what makes it a comment.

In our example, there are two subtypes of comment. First is a video comment. It has a nodereference field (a reference to the video being commented on). Normal comment has a text area. We could handle this as a video comment entity subtype, defined by the fields it has.

In this view, yes, non-node entity types can have subtypes.

Two potential models

While we can draw broad conclusions, we don't know enough yet to have a clear consensus on the structure of entity types, subtypes, and associated fields. We elaborated two alternate architectures.

Model 1: Entities define a key-space, and a set of Fields. Different Entities may define that their key-space is internally or externally controlled. In this model, almost everything becomes a "Node"/Entity, but Users are not Entities.

Model 2:

"Entity" is a generic interface that all Entity Types can be expected to offer. An Entity Type (such as node, user, file, comment, Thing-in-3rd-party-system, etc.) defines:
** a key-space (unique id that is unique within a given Entity Type, which may be locally sequenced in the case of nodes or externally sequenced in the case of a read-only external data source),
** some set of behaviors (methods?),
** a set of properties(?),
** one or more Sub Types.
A Sub Type (such as a node type) defines a set of Fields that are present on that Sub Type.
Each Field may be implemented to have local data, remote data, or calculated data (from the rest of the Entity)?

DrupalCon proposal (go vote for this!): http://boston2008.drupalcon.org/session/future-fields

Agenda

Wednesday, February 6

We will begin with defining in detail the pieces of a minimal initial CCK fields in core implementation, then look at next steps. We will begin at around 11:00, with Larry joining us at 1.

Thursday, February 7

Karen, Yves, and Nedjo available until early afternoon for followup work and planning.

Friday, February 8 and Saturday, February 9

Nedjo, Karoly, and Larry to dig into initial code sketches.

Attachment	Size
data_hydra.gif	21.91 KB