Much has been said (http://groups.drupal.org/node/9297) about how fields should be structured in D7 core, what aspects of fields can be changed, and how those changes are implemented. It is (past) time to move forward on implementing fields in core and in this post I am proposing an answer.
Disclaimer: I make a lot of declarative statements in this post. Obviously I do not have unilateral authority; this is just a proposal.
Field storage
Contrary to what we "decided" at DADS, D7 will continue to support per-content-type and per-field storage as we already do. I spelled out my reasons at http://groups.drupal.org/node/9297#comment-37050. The short summary is "performance"; until someone measures otherwise, we're assuming that fewer joins are more efficient than more joins. Since we already have the code to support both storage methods, the implementation cost is low.
The Fields API
At DADS, we declared/clarified some basic concepts:
-
A Field Type is a data type implemented by a module: text, nodereference, address, etc. The module defines semantics and functionality.
-
A Field is a specific configuration of a type. Fields have settings that define the unique characteristics of that field. All fields share some properties, such as: field type, name, cardinality (currently "multiple"), sharable. A Field Type can define addititional properties; for example, a the Text field type defines the property "formatted" which can be FALSE (plain text) or TRUE (user chooses input format).
-
A Field Instance is the binding of a Field to a Content Type. Field Instances have Settings that related to the association, such as: display name (label), weight, input widget, display format.
A field instance's settings do not affect the field's underlying data and can be changed without altering the field. By contrast, a Field's settings cannot be changed because changing any of them constitutes fundamentally altering the field's identity. However, a field can be "migrated" to a new field (of the same or different field type) to provide new functionality or semantics for the existing data. More on this later.
D6 CCK created the first version of the Fields API. It is a big step forward but does not cleanly separate the concept of Field and Field Instance. We currently have the function:
<?php
function content_field_instance_create($field);
?>The $field argument contains information defining both the Field and the Field Instance. (Almost) Any property of a field or setting can be specified. create() will both create the Field if it does not exist and then the Field Instance; if some Field Instance settings are not provided, they will be inherited from an existing Field Instance if one exists, the Field settings, and system-wide defaults. If a non-shared Field Instance of the Field already exists, the Field is immediately converted from a non-shared field to a shared field, requiring a change in database schema. And so forth. content_field_instance_update() is similar. A lot happens inside these functions and I think their complexity is is keeping us in a straitjacket.
In D7, the Field API will cleanly separate Fields from Field Instances. We will have two create functions:
<?php
function content_field_create_field($field);
function content_field_create_instance($instance);
?>create_field()'s $field will contain field name, field type, and can accept all global (e.g. field type, name, required, cardinality, sharable) and type-specifc (e.g. formatted) field settings. create_field() will probably do nothing but store the information in the content_field table.
create_instance()'s $instance will contain the field name, the content type to bind to, and all global (e.g. label, weight) and type-specific (???) per-instance settings. create_instance() cannot change any Field settings (e.g. sharable, cardinality); those settings are simply meaningless in the $instance argument. create_instance() also imposes the Field constraints; for example, if you try to create a second instance of a field created as non-sharable, create_instance() will fail.
The Fields API will also have some fairly straightforward functions such as:
<?php
function content_field_rename_field($old_name, $new_name);
function content_field_update_instance($instance);
function content_field_delete_instance($instance);
function content_field_delete_field($field);
?>Note content_field_update_instance() will be "straightforward" even though the current implementation is not because like create_instance() it enforces the constraints of the field; everything it does is pretty light-weight. The heavy lifting is reserved for "field migration."
Field migration
Conspicuously absent from the functions above is:
<?php
function content_field_update_field($field)
?>This is the magical function that can convert fields from shareable vs. not, from cardinality 1 vs. N vs. unlimited, from "plain text" to "formatted text", and from "text" to "nodereference". Notice that these are all changes of Field settings, not Instance settings.
As I said at http://groups.drupal.org/node/9297#comment-37050, Drupal needs this functionality because humans cannot predict the future. However, Drupal core does not need this functionality. Fields in core provide Drupal with capabilties not previously possible, but changes to fields in core is fundamentally a development-time operation that can perfectly well depend on contrib, as it does now.
update_field() will handle three kinds of updates:
-
Changes to shareable and cardinality. This is the code that moves columns between per-content-type and per-field storage and adds/removes the 'delta' column. It is the (only) code that would be unnecessary if we declared all field tables to use per-field storage, and it is code that is already written.
-
Intra-field-type changes such as plain-text to formatted-text.
-
Inter-field-type changes such as "text" to "nodereference" (a.k.a "the DabbleDB magic").
It is important to recognize that #2 and #3 are actually the same thing and, despite what many think, CCK CANNOT CURRENTLY PERFORM EITHER ONE. Yes, the one special case of plain-text to formatted-text works because it is a degenerate case of "add or remove a single column, changing nothing else," and maybe there are some other similar cases. But you can't convert an ISO Date field to a Unix Timestamp Date field because the CCK doesn't know how to do that with a simple assignment or typecast without any knowledge of the underlying field types.
The core content_field_update_field() function will work by dispatching these operations to modules via hooks. I am not yet 100% sure of the interface but it will look something like this:
-
Shareable and cardinality are "core" field properties and can be changed in a field-agnostic manner; the actual column types, names, and content always remain the same (though some content may be discarded), they just move from one table to another, possibly with/out a delta column. This will be handled by a dedicated hook that is implemented by the CCK UI module. An admin performing this operation will have to have the CCK UI module installed anyway; a module developer wanting to perform this via an update function will just have to depend on the CCK UI module (they do now anyway).
If you try to update a shared field to be non-shareable, it will fail; explicitly delete all but one field instance first. If you try to reduce a field's cardinality, I'm not sure if it should fail or simply silently discard data (which is what it does now, I think).
I can easily imagine this functionality, which is mostly already written, being moved into the core Fields API at some point when we resolve the PHP timeout and race condition issues. If we ever want to change a core field's shareability or cardinality, of course, we'll need the core in core then (as an example, this would be the case if we decided that nodes can only have one term and it lives in the node table). For now, we do not need it in core.
-
Changes to field-type specific properties (e.g. "formatted") or field-type changes will be implemented via a hook like:
<?php
hook_content_field_update_field($old_field, $new_field);
?>The first module to return TRUE says the update is done; if all return FALSE, the update is not supported. Fore example:
-
If $old and $new are both of type 'text', presumably text.module will accept the update: change the formatted property by adding or removing the format column.
-
If $old is an ISO Date field and $new is a Unix Timestamp Date field, date.module will accept the update: create a new field, execute "UPDATE new_field.value = TO_UNIXTIME(old_field.value)", delete the old field, and rename the new one into place.
-
If $old is any type and $new is type 'nodereference', presumably nodereference.module will accept the update: create a new node type, SELECT DISTINCT on the old field, create a new node for each value, call content_field_create_field() to add the new field, call content_field_create_instance() for each appropriate content type, UPDATE the _nid column of the new field based on a join of the old field columns to the node table for the new node type, call content_field_delete_field() on the old field, and cannot content_field_rename_field() to move the new field into place.
Or something like that. :-) CCK can't do this today so I do not feel too bad about not having ironed out all the details yet.

Comments
This is a terrific action
This is a terrific action plan. I am so happy that we are recognizing the need to change field instance properties after create time. What about features like fields wanting to do own multiple handling or own storage. Is that supported? I guess Views integration will remain in contrib - probably in Views itself. Formatters? Widgets? Is that all UI and therefore outside of this Phase 1?
Thanks to Barry for the writeup, and even more thanks to Barry if code is forthcoming.
The DADS final report
The DADS final report specifies what will be in core vs. contrib; I am planning to stick to those decisions.
One correction: Above I wrote that text.module and nodereference.module would implement hook_content_field_update_field() to perform Field updates (== data conversions). I meant to say that those will be handled in contrib. Perhaps there will be a "data conversion module for core field types" that implements text, nodereference, etc., conversions, and then of course any other contrib module can expand on that capability.