Day 3

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Notes: Comments are now enabled on this page.

CCK simplifications for core

The day started with ways we can simplify current CCK functionality because there is a feeling that what we have is way to fragile and complex for core at this time. Functionality slated for destruction includes:

  • Per-content type field storage, e.g. no more content_type_story tables with columns for multiple single-value fields. All fields are stored in today's equivalent of content_field_fieldname tables.
  • All field tables always have a delta column even if it is a single-value field.
  • Field-level settings, which are defined as those that (at least) affect the schema, cannot be changed. e.g. You cannot convert a text field from plain-text to formatted-text; instead, you must create a new field, copy the data, and drop the old one.

The net effect of these changes is that field tables never need to change dynamically except during version upgrades. No other part of core performs dynamic schema changes outside of update and it we felt it was much simpler to maintain this restriction.
bjaspan: I suppose there is an argument that if we are removing the ability to change field settings dynamically, allowing per-content-type storage and per-field storage does not imply/require the same level of complexity currently in CCK because if you can't change the field settings you can't change the storage method so we can still lose all the code that handles changing the storage method.

Field and Field Instance Definitions:

  • A field type is a set of functionality implemented by a module, e.g. text, nodereference, gmap.
  • A field is a collection of settings of a field type: e.g. a text field, max length 60, plain text (no input format option for the user).
  • A field instance is the binding of a field to a content type with additional settings: e.g. the above text field assigned to the story node type with the label "Five word summary."

The field settings contain aspects of the field that will be the same in all instances. They often include things that affect the database schema or storage and the way the field data will be stored. Because of this, they cannot vary from one instance to another or you would have a field with a different database schema. Examples of field settings are the size of the field and whether it can hold multiple values. (bjaspan: Since all field tables will always have a delta column, I'm not clear on why single/multiple cannot be an instance setting UNLESS we continue to allow per-content-type storage as discussed above, though clearly changing a field from multiple to single will lose data).

The field instance settings contain aspects of the field that can be different from one instance to another. They include things that do not affect the way the data is stored, only the way it is displayed. Examples of field instance settings include the label and weight of the field.

The settings are stored in two tables, one for the field settings (one row for each field) and another for the field instances (one row for each instance).

Open questions

Is it necessary to have two separate tables or could all the data should be stored in the instance table, even if the instances have duplicate field information. (bjaspan: That would be non-normal form and we'd need a good reason for is; is there one?)

Should field settings be locked, once created, so they cannot be changed? They impact the schema and it would greatly simplify core code if that is just not allowed. It would be OK to change instance settings like labels, but not field names, sizes, multiple, and other things that impact the database schema.

Multiple Values

We discussed whether core will need to handle multiple values and concluded that will certainly be necessary. So we examined the way that the D6 version of CCK handles multiple values to start thinking about what will need to be implemented in core.

The D5 and earlier versions of CCK uses a separate formatter for each individual field value, one after the other. That does not always make sense. There should be a possibility to format all values for a field in a single formatter (like a field that contains multiple points on a map or a slideshow of images). This change has gone into the D6 version of the code.

The D5 and earlier versions of CCK presume widgets will handle multiple value forms themselves and pass them back to the Content module. In D6 that is changed so that widgets create a single form value element and it is the Content module that combines them together into a single form element. Some widgets need to opt out of that behavior to handle all values in a single form element (like a select list or checkboxes). Now that the Content module handles multiple values, there is also a way for the widget to opt out of that and handle its own multiple values. The optionwidgets module does this.

API

Example code

What will the API look like. An example of what a module developer might write:

$info creates a content type called 'user_profile', it contains stuff like 'name', 'has_body', 'min_words', etc. (except that body probably won't be a built-in part of node types anymore, it will be a normal field).
node_type_save('user_profile', $info);

$settings = array(
'name' => 'Eye Color',
'restricted values' => array(
'brown', 'green', 'blue'));

field_create_field('eyecolor', $settings);

$settings = array(
'label' => 'User eye color');

field_create_instance('user_profile', 'eyecolor', $settings);

To add this field to another content type:

$info creates a content type called 'admin_profile'.
node_type_save('admin_profile', $info);

$settings = array(
'label' => 'Administrator eye color');

field_create_instance('admin_profile', 'eyecolor', $settings);

Needed Functionality

  • Create node type / Delete node type
  • Create field / Delete field
  • Add field to node type / Delete field from node type
  • Update field configuration (required, multiple, widgets/formatters, weight)
  • Get table name for field
  • Get types
  • Get fields
  • Get field instances for type

Open questions :

  • Current CCK field CRUD API operate only on 'field instances', and do not allow fields without instances. What sense does it have ? Is that even doable (both Karen and Eaton came to the conclusion that it might be a problem)? Karen did not remember what the problem was; we are thinking that the simplifications we are planning will eliminate it.
  • Other drupal CRUD APIs merge 'create' and 'update' into a 'save' function...

Approaches for letting modules define node types and their fields

  • imperative : sequence of direct field CRUD API calls at install time
  • declarative : cf hook_node_info(), hook_schema()
    -- modules provide returns an array describing node types and their fields
    -- this array gets processed and turned into a series of field CRUD API calls.
    -- the hook_update_N() mechanism calls API functions for updates is used for version updates

We decided that we need to implement the API in either case so we're starting with that. A declarative approach (hook_node_types() + hook_update_N() for changes) is more consistent with other parts of Drupal and can be added on later.

Workflow

Typical workflow for module authors (Just like what currently exists for Views) :

  • Adjust their fields setup using contrib CCK-UI
  • 'Export' types and fields in a PHP definition (new 'Content Copy')
  • paste the PHP snippet into their hook_node_info (or whatever the actual hook)

Open questions :

UI-defined fields probably need to live in a different namespace (see below), so this workflow probably hits name issues.

Node structure :

$node->fieldname[n]['column']

Field namespace :
It's the job of the modules not to create fields whose name clashes with other modules' fields.
We need to ensure that module-defined fields don't clash with user-defined fields (with CCK UI in contrib)

What happens when disabling the module that originated the field_create_field ?
- skip field
- raise exception

Compatibility with existing module

The 'core fields' won't be able to co-exist with a D7 forward-port of current contrib CCK, if only because there can only be one 'text.module'. Thus, we need to get 'fields in core' right :-)

Node Load

Current node load operations (e.g. loading a poll node):
1) node_load: SELECT FROM node, node_revision
2) hook_load: SELECT FROM poll
3) hook_nodeapi op load: SELECT FROM comment

New :
1) node_load: SELECT FROM node, node_revision and load all fields.
2) hook_load: calls poll_load() so the poll module can do something if it wants but it does not need to load its field-based data
3) hook_nodeapi op load: SELECT FROM comment, for any modules that are not converted to fields (e.g. if comments are fields, SELECT FROM comment is handled automatically in step 1).

If we go to a class Node, loading of all fields in step 1 can actually be lazy-loaded so unused fields are never actually loaded/processed.

Which CCK functions should go to core ?

Go to Core:

  • field storage engine
  • Field CRUD
  • widgets / form gen.
  • Node CRUD
  • formatters
  • add more / drag 'n drop re-ordering
  • Field validation
  • custom multiple value handling

Stay in contrib:

  • field default value
  • field allowed values
  • Field CRUD UI (Manage Fields tab)
  • Display UI (Display Fields tab)
  • Fieldgroups
  • Content Copy
  • 3rd party Drupal integration (Views, Pathauto, Token)

Which CCK fields should go to core?

The current CCK core fields:

  • field types
  • fieldgroup
  • noderefence
  • number
  • userreference
  • image
  • file
  • computed
  • date
  • text

widget

  • optionwidgets

Things in core that might be fields :

  • body: text field
  • created: date
  • upload: file
  • user pic: image
  • IDs: number+options

Criteria for 'in core' :

  • wide applicability
  • minimum useful functionality

Action plan:
Start with text, number, optionwidgets, image, file. Leave nodereference and userreference in contrib until we need them in core so we can continue to easily add features and do other work on them. Once in core they will be hard to change. Getting a date field into core would be easier if improved date handling gets into core, which is a separate initiative. The complex date field would probably remain in contrib but a simpler date field could be created for core.

Next Steps

Beyond a minimal implementation of CCK fields in core, what will be the next steps?

  • node as class?
    • To get a consistent set of methods for all first-class entity types (node, user, etc.), we need a consistent data format--array or class. If we use class, we may find it advantageous to use a custom rather than StdClass to have access to PHP 5 features such as lazy loading.
  • gradual conversion of existing core 'fields'
    • Having implemented a small number of new-style fields in core, we will need gradually to convert much (all?) of the remaining existing 'fields' to the CCK-style system. Doing so will require new methods and functionality. For example, existing CCK cannot fully handle the current node taxonomy implementation.
  • field property can not be edited

Lessons learned for future 'Design sprints' :

  • A facilitator was very useful. We benefited from having a process oriented faciliator who was a little removed from the issue (Nedjo).
  • The initial scope too large. We did narrow it, but not in time, so we prepared for things that weren't directly relevant.
  • Sometimes key contributors need financing to attend. In spite of the cost it would be a worthwhile investment output.
  • Commercial community understands the value, but employee time is precious.
  • The final number of six people was good size.
  • Eight hours a day is good. It's tempting to go longer, but wouldn't have been a good idea.
  • The nightly wiki summaries was a useful way for participants to analyze and review the results.
  • Taking complete notes is critical, could it be done by someone else (e.g. a support role)? It would have to be someone who understands the discussion.
  • Fun is good, eg. dinner out. There is a lot of pressure and intensity and providing some respite helps the process.
  • Three days feels like an appropriate length of time.
  • It is useful to have people on a similar skill/expertise level
    • invite key contributors for task on hand
    • also useful to have a mix of the centrally involved with in this (some use CCK others not.)
  • Getting everyone face to face in a small group really benefit the process.
  • The facility needs
    • whiteboard
    • wifi
    • sufficient space
  • A nice location (Hawaii) would be nice! :-) Chicago in Feburary not so good.
  • Put everyone up at the same hotel. It avoids wasting time and provides more chances for unscheduled interaction. Lots of good ideas came up while eating at Panera!
AttachmentSize
plan.ods13.23 KB

Comments

Field-level settings, which

yched's picture

Field-level settings, which are defined as those that (at least) affect the schema, cannot be changed. e.g. You cannot convert a text field from plain-text to formatted-text; instead, you must create a new field, copy the data, and drop the old one.

This means the result of current hook_field_settings('columns') need to be constant per field type, without any logic dependant on $field and its settings. This includes name and number of columns, as well as columns properties ('size'...)

  • Does simplify code
  • Really won't make people happy :-) You can't adjust a text field's max length after the field gets created. I do have a gripe about this.
  • Would probably require moving the definition of columns from current hook_field_info($op = 'columns') (where the function does get $field as an argument, making conditional logic possible anyway) to a new hook_columns() (with no argument)
  • Also requires a refactoring of the current UI workflow of creating a new field. Right now, you simply pick a field type and widget, and the field is created. Only after that you get to change the specific field type settings such as 'formatted / plain text' or 'max length'.
  • Possibly means fields need to differenciate 2 sorts of settings : the ones that affect data columns and can't be changed after creation, and the 'other ones'. The conceptual question behind this is : are field-level settings only the ones that define db storage, with all other non-storage-related settings being field-instance-level ?

bjaspan: I suppose there is

yched's picture

bjaspan: I suppose there is an argument that if we are removing the ability to change field settings dynamically, allowing per-content-type storage and per-field storage does not imply/require the same level of complexity currently in CCK because if you can't change the field settings you can't change the storage method so we can still lose all the code that handles changing the storage method.

Having only per-field tables is what allows us to say all tables get a 'delta' column, and thus possibly move 'multiple/single' property to the field-instance level.
Going the way you propose means a field cannot be changed from single to multiple after its created, which will also drive people crazy :-)

Very true.

bjaspan's picture

Very true. So "per-content-type storage" and "single/multiple as a field-instance property" are mutually incompatible.

We need to do some performance testing for the effects of making all fields per-field storage (i.e.: requiring a join for each field). If there is not much impact, great, we'll go with per-field only. If there is, then maybe we decide that single/multiple just can't be a field-instance property because allowing per-content storage is too important.

This should not be too hard to test because today we can change from per-content to per-field storage. We can take a site with many per-content fields, mark them as multiple, and see what happens.

Are there other things we'd like as field-instance properties that are incompatible with per-content storage?

Non-schema changes

Crell's picture

I'd say field-level changes that do not require much effort (anything that would change the schema is defined as "too much effort") are OK. That's functionality we don't want to take away from people if we don't have to. Changes that would require a schema change will be "far too much effort", and so should not be in core. e.g., if a change has even the possibility of requiring the batch API in order to complete successfully, leave it out. :-)

The Death of SQL

metzlerd's picture

I think the general concept of changing all storage to per field tables is a step in the wrong direction. It sounds like what you gain is the basic ability to make the schema more dynamic, but at the expense of complexity of data queries. I agree with CRUD and schema automation, but not with the elimnation of SQL for ad hoc query purposes.

Perhaps we believe that we can invent a better query language than SQL, but I'm dubious with regard to that. For example, consider modules that use status fields date fields, etc to do their work. What does the query look like to say I want all of the cases of status x and transaction date of Y.

I believe that this step will drive module developers like myself away shy away from the data API and move back to building our own tables, because of the inherent complexity of querying the data for fields like status fields, etc. I think you also want to be careful about the impact of this development on the views module, and other things that leverage the power of query rewrites etc.

In principal, schema flexibility should not be done at the expense of query flexibility.

Fields in Core

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: