Review of current code + thoughts

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Below are the notes I already emailed a few people in the group. Those are my review of the kick-off work Barry did in his SVN, along with thoughts on tasks and open questions. Not all of those have to be adressed / answered before we can move forward, this is more of a reminder list.
I'm posting this here so that it's online somewhere instead of in a few mailboxes.

This page is a wiki, so it can be freely edited. Formatting gave me a hard time, there might still be a few glitches here and there.

The current state of the code can be grabbed in CCK HEAD, and should work with D7 HEAD at the time of this writing.
Quick reminder : in the current scenario,

  • fields.module (field storage engine + CRUD API) is targetted to core
    fields/CRUD_sample_code.txt file contains examples of CRUD code.
  • fields_ui.module (field customization + user-defined fields) stays in contrib
    Currently doesn't 'work' (no form submission), so CRUD API is the only way to test fields.

Concepts

  • No field_update_field(), only instances can be updated :
    • Pros: all 'data migration' handled by contrib UI, can safely use batch API.
    • Pros: 'predictable' db layout : modules can define fields and write direct queries against field tables without wondering 'now where did cck put my data' ?
      (works only if module-defined fields are kept out of any 'field UI / field updater' contrib action)
    • Cons: Main issue IMO: how can a module update its fields from one release to another ?
      or during the development of the module ?
      directly update the fields meta tables + take care of migrating the data if needed ? Scary + reinvent wheels (and bug-prone ones).
      On this regard, we loose flexibility compared to 'old-style' nodeapi additions with db tables the module has full control on.
  • Shared fields: As long as a field is sheareable, we act as if it was actually shared (per-field storage).
    • 'Shareable / Not shareable' might be a confusing choice for modules defining their fields. We'll need to provide guidelines.
    • devil's advocate: 'shareable' is all or nothing : a module might want to add the field in two node types it declares, and yet forbid 'free for all' sharing in fields_ui ?
  • Field converter (contrib) - From Szeged presentation :
    "Convert text field from plain to formatted :
    field_update_field() calls hook_field_convert()
    text module: add new column, return TRUE"
    Scares me - field modules shouldn't have to touch the db (think third party contrib fields...)
  • Do we need some of fields.module in /includes/fields.inc ? (outside of a module)
    At some point I thought 'yes we do because of X', but can't remember X right now :-(
  • actual handling of fields attached to entities (currently nodes : hook_nodeapi*) : should this stay in fields.module, or go in the 'entity' module (node.module, later user.module, etc) ?

"objects" structure

  • terminology: settings / properties ? We could agree on :
    'properties' (or 'attributes' ?) are constant across field/widget type (type, multiple, shareable, label, description)
    'settings' are what varies from one field type to the other
  • => $field, $instance and $instance['widget'] : isolate 'settings' (text_processing, max_length) from 'properties' (type, name, module...) ?
  • weight, label, description, default_value are instance properties, not widget's properties.
  • $instance : it's important to get rid of the instance / widget confusion
    instance settings / widget settings - CCK only has widget settings.
    ex : userref 'reversed_link' is inherently an instance setting, widgets should have nothing to do with that.
    Proposed structure :
    $field:
    - field_name
    - type (field type)
    - module
    - active
    - required
    - multiple
    - shareable
    - columns
    - db_storage
    - settings (type-specific)
    -- text_processing
    -- max_length

    $instance:
    - field_name
    - type (content type)
    - label (currently in widget)
    - description (currently in widget)
    - weight (currently in widget)
    - default_value (currently in widget)
    - settings (currently CCK lacks those, they're all 'widget settings' - nonsense)
    -- userref's 'reversed link' / filefield's 'allowed extensions' ?
    - widget
    -- type('widget type)
    -- module
    -- active
    -- settings
    --- textarea's 'number of rows'
    - display settings
  • Detail: Instances being stand-alone citizens imply API functions with a param named "$instance"
    $field is clear, but $instance lacks context - 'instance of what ?' - Awkward in 3rd party code calling the API. No better suggestion for now...
  • Storage of field definitions :
    • Currently : cross field-type settings (label, weight, description, required, multiple) get their dedicated columns
      [field|widget]-type specific settings are serialized in a single column.
      Do we keep this ? example : dedicated column for 'initial number of empty widgets on node creation form' ?
    • rename field.global_settings to field.settings ?
    • group instance.widget_settings, instance.display_settings ? + columns 'description', 'label,
      Gather all the settings in a single binary serialized array (like Views2 does) ?
  • type / field_name / type_name... A little messy :
                      node type     - field                - instance               - widget
    machine name :    $type['type'] - $field['field_name'] - n/a                    - n/a
    human name :      $type['name'] - n/a                  - $instance['label']     - n/a
    "specilisation" : n/a           - $field['type']       - n/a                    - $widget['type']
    "instanciation" : n/a           - n/a                  - $instance['type_name'] - n/a

    Well, we can probably live with that - we have so far...
  • What should happen when a module defining a field is disabled ? Do not load the field on node_load anymore ?
    (mimicks current behavior of nodeapi additions)
    If so, we need to store which module defined the field (would go in content_node_field table).
    How do we do that ? Rely on the module that defines the field to add $field['base_module'] = 'mymodule' before calling field_create_field() ?
  • Do we want a $field['tablename'] property, alongside $field['columns'] and $field['db_storage'] ?
    maybe grouped in a $field['db_info'] array ? (already discussed a while ago in private mail with Barry and Karen)
  • content_type_info() (should be renamed fields_info() ?)
    - field types
    - widget types
    - fields
    -- field_foo
    -- field_bar
    - content types
    -- article
    --- instances
    ---- field_foo instance
    ---- field_bar instance
    --- tables
    ---- (name of per type table)
    ---- (name of per field table for field_bar)
    --- CCK currently duplicates the regular node_get_types() definition of the $type.
        We probably need to stop doing so - explicitely call node_get_types() when we need a type property.

    We could probably add 'instances' entry to fields, and 'field' entry to instances (duplicating parts of the array by reference to avoid memory bloat). Makes 'get field from instance' and 'get instances of field' more straightforward.
  • How do we make CRUD API and internal structures as much ready as we can for 'fields on non-node entities' ?
    field_get_instance($field_name, $type_name) is very tied to 'fields on nodes'. So is the structure of content_type_info() aka fields_info()
    Problem is: we don't know yet how the idea of 'attach an instance to an entity' generalizes. Users don't have 'types', they have non exclusive roles.
    In fields_info(), ditch 'content types', replace with :
    - entities
    -- node
    --- article
    ---- instances
    ---- tables
    --- page
    ---- instances
    ---- tables
    -- user
    --- (TBD)
    -- (...)

CRUD API

  • field_new() / text_field_new() :
    Shouldn't field_new() be _field_new() ? (problem of external api / field_module api)
  • Sample field creation sequence :
    $field = text_field_new('field_text');
    $field['shareable'] = 1;
    $instance['text_processing'] = TRUE;
    field_create_field($field);

    strange, looks like OO without OO.
    More typical sequence would be :
    $field = field_new($type, $field_name); // calls text_field_new(), then renamed to text_field_defaults() ?
    field_create_field($field);

    Not sure I see the interest of the two separate calls - couldn't we embed one into the other :
    $field = array('type' => 'text', 'name' => 'field_text', 'multiple' => 1); field_create_field($field);
  • With the current shape of the API, you cannot reuse an $instance for multiple field_create_instance() calls.
    You need to duplicate code even if you just want to 'copy' an instance :
    // Add to 'article'
    $instance = text_field_instance_new('field_text', 'article', 'text_textarea');
    $field['shareable'] = FALSE;
    $field['multiple'] = 1;
    $instance['text_processing'] = FALSE; // etc...
    field_create_instance($instance);
    // Add to 'page'
    $instance = text_field_instance_new('field_text', 'page', 'text_textfield');
    $field['shareable'] = FALSE;
    $field['multiple'] = 1;
    $instance['text_processing'] = FALSE; // etc...
    field_create_instance($instance);
  • _field_write_field(): we don't use the return value ?

  • field_get_instance($field_name, $type_name) duplicates current content_fields($field_name, $type_name),
    only gets from db rather than from cached info (_content_type_info()).
    Those 2 distinct levels are legitimate, but how do we avoid the risk of confusion ?

  • field_instance_get_field() looks strange : sets $instance['field'] = $field, but not persitently, only returns $field.
    What is $instance['_field_name'] vs $instance['field_name'] ?
  • field_get_instances() : would it make sense to return an array keyed by type name ?

  • field_delete_instance($field_name, $type_name) is symmetrical with field_get_instance($field_name, $type_name)
    shouldn't it be symmetrical with field_create_instance($instance) instead ?

  • field_delete_instance() : return $instance ?
  • field_delete_instance, field_create_instance call field_get_instances() a lot
    refresh cache, regenerate schema... (also the case in current CCK)

  • Field creation: 'database columns' are collected twice : once in field_new(), and once just before saving the field in field_create_field() in order to account for the latest state of the field settings.
    Do we really need to collect them in field_new() ? Until it gets actually created, a $field is 'in limbo', with no actual db existence. Do we need to predict the columns it doesn't have yet ? Gives a false feeling that you can alter column definitions :

    $field = text_field_new('field_text');
    $field['columns']['value']['default'] = 'hi mom'; // Will be overwritten in field_create_field()
    field_create_field($field);
  • Most probable workflow for modules that want to define a field:

    • define the field as a 'custom field' through fields_ui, adjust its settings
    • when it's ready, 'export', and paste into hook_install()
      But 'export' generates definitions ($field array / $instance array), not stub-code like the one above.
      Means modules would be tempted to bypass *_field_new(), and use field_create_field($field) directly.
  • // Invoke hook_field().
    module_invoke_all('field_create_instance', $instance);

    This is more like hook_content_fieldapi (invoke all modules) than hook_field()
    CCK only calls hook_field() for the module in charge.
  • CCK's hook_content_fieldapi() : $op = create instance, read instance, update instance, delete instance
    • do we need to keep these hooks ? Karen probably has clearer ideas about this.
      AFAICT, it's the only way field-type modules can act on instance deletion (e.g fieldfield : delete file resources)
    • currently through module_invoke_all, which forbids 'by reference' altering / adding to the $field - pure notification.
    • do we want to allow alteration (à la hook_nodeapi) ?

cruD : instance_delete / field_delete (+ node type deletion)

currently absent from content.crud.inc
Field deletion happen mainly on module uninstall.
Instance deletion can happen through regular UI ops : when uploads get disabled for 'page', body removed from 'story'...
(no field_ui in core means every core feature we implement as 'fields' needs to keep its own current UI)

  • field_delete_instance()
    • allows field-type to take action (e.g delete file resources). Can potentially take a while.
      Currently possible through hook_content_fieldapi('delete instance'), though filefield doesn't use it (yet ?)
    • for shared fields, CCK currently doesn't clean the table (remove rows for the node type whose instance is being removed) - neither does taxonomy, btw - didn't check upload.
    • remove db columns if last instance of the field : makes storage consistently 'tied' to instances (created with the 1st one, removed with the last one)
    • do not remove field if last instance of the field : fields can exist without instances (we probably need to keep that in mind in various parts of the code)
  • delete_field() :
    • automatically deletes all instances of the field (thus deleting storage)
      Can potentially take a while^2.
  • node type deletion :
    • automatically delete all instances atached to the type (deleting storage for last instance)
      Can potentially take a while^2.

"Can potentially take a while" means "batch API ?".
But batch API is not well suited for API functions : batch breaks code execution, batch does not work inside cron...

db storage

  • Crazy idea, I don't think I ever saw that mentioned :
    store all shared (single) fields in a single table, filling up with NULL entries ?
    (multiple fields do need their own table no matter what)
  • CCK currently always maintains a per-type table whenever a node type has fields (even if all shared, or all multiple)
    We probably need to stop doing this ? I don't think I remember the actual need, maybe Karen does.
  • Nedjo : request for 'Fields per translation set' (i.e : values shared for all translation) : http://drupal.org/node/340355

db indexes on data columns.

CCK currently has no support for this. Two issues:

  • Preserving indexes when data gets migrated (field shared => unshared, single => multiple...)
    Non-issue for fields.module (no data migration), only an issue for the contrib fields_ui.
  • In most cases, a field type doesn't know if/how a field should be indexed. Always creating indexes ends up in index bloat and db performance issue. Only "actual usage" (queries, Views...) dictates the relevant indexing. See http://drupal.org/node/231453, http://groups.drupal.org/node/7614, http://groups.drupal.org/node/16483.
    Some module-defined fields will need to set indexes (e.g taxo as field).
    User defined fields : admins might need to set indexes on their custom fields (maybe with the help of a Views analyzer / query EXPLAINer... - whole different issue)

We could go two different ways :

  1. Keep it 'manual'
    module defined fields : db_create_index()
    user-defined fields : phpMyAdmin :-)
    Drawback : then those indexes are not in the schema => schema mismatch, + preserving them on data migration requires some db inspection
  2. Allow field definitions (not field types) to specify indexes
    easy for module-defined fields
    Drawback : UI work in fields_ui for user-defined fields : basically replicate phpMyAdmin's index UI (index length, multiple columns, order, multiple indexes...). Really tedious.

'plug-in' entities : field types, widgets, formatters

  • hook_field_info(), hook_widget_info(), hook_formatter_info() :
    Those we ship with core would still need a 'label' (human-readable name) and 'description', even though only fields_ui would consume them.
  • same thing for hook_[field|widget]_settings('form') : only used by fields_ui
    should core field types still ship with this ? If not, we need a mechanism to let UI stuff be added from contrib.
  • Long-missing : validators...
    Do we need them right now ? Meaning : can we spot existing core would-be fields that require specific validation (in a more granular way than 'for all fields of this type', which we can do already) ?
  • For later : sources ?
    Abstract storage into 'source' types : 'local', 'distant_xml', 'distant_json'...
    Sources can be 'read only' or 'read-write'.
    Lets you have a field of type 'text', not stored locally, but still using text widgets ('read-write') and formatters ('read-write', 'read only')
    • Out of the box, drupal sites can read / write each other's fields using the built-in 'source' plugins.
      To integrate with a specific 3rd party provider, write your own 'source' plugin.
    • Works fine for 'self-contained' field types (text, number, date...), more complex for 'reference' field types (noderef, userref, taxonomy, file...)

Changes / enhancments wrt CCK "as we currently know it"

  • Cleanup: content_field_tablename() / content_instance_tablename() can go away.
    (only existed because of update nightmares, non-issue if we live in core)
  • Cleanup: "rename 'columns' (reserved name in PHP 4 ?) to 'db_columns' " / Maybe this is not needed anymore ?
  • table names : field / field_instance + fielddata_[field_foo] / fielddata_type_[bar] ?
  • 'field_' prefix for field names / name clashes:
    "custom" (UI defined) fields need to have a different prefix than module-defined fields to avoid name clash at install-time
    how do we enforce that ?
    "type_" is a forbidden prefix because of tablename clashes
  • default values for all settings (change return value of hook_[field|widget]_settings('save'))
    So that an instance can be 'just created', and still have sound settings.
    Barry : Is this sort of what _field_all_properties() does ?
  • CCK D6 lets 3rd party modules inform about their own render modes ('nodeasblock'), so that they can get their own display settings.
    Should fields.module care (at least wrt storage) ? Or is it strictly a job for the contrib fields UI ?
  • field access : is the current one good enough ?
    use case : file attachments and 'Upload files' / 'View uploaded files' perms

Fields / fields_ui

Current minimal scenario is : fields_ui and 'custom fields' feature stay in contrib
To what extent does fields.module have to pave the way for contrib fields_ui-specific features ?

  • Cohabitation between 'hardcoded' (module-defined) fields and fields_ui module :
    CCK currently 'supports' a $field['locked'] property : simply hides the 'field settings' form, still lets you change weights and display settings.
    Ideally, for module-defined fields :
    • $field level properties are locked
    • $instance level properties like label, description, weight, display settings, are changeable (directly alter the db definition) or maybe 'overrideable' (with 'revert to original' - means some sort of fields_info_alter() hook)
    • other instance settings + widget type and settings : maybe some modules would want to keep those locked for some of their fields ?
      => fine-grained, setting-by-setting 'locked' status for instance-level stuff.
      Can probably be handled by fields_ui altogether, without impacting core fields.module (fields_ui-defined hook)
  • Display settings
    Reminder: 'context' means $node->build_mode : full node, teaser, RSS, search index...)
    CCK currently has :

    • label display (hidden, above, inline)
    • per-context formatter (or 'hidden')
    • per-context 'exclude from $content in node templates'

    IMO fields.module needs to support :

    • label display (per-context should be easy, mainly UI issues)
    • per context formatter + formatter settings (should be easy code-wise, mainly UI issues, but the supporting API needs to be in core)

    Ideally (if we can find a good UI - probably a la Views 2), fields_ui would support additional settings :

    • abstract contexts through 'displays' (set of display settings + field order) : 'full node' uses 'display 1', all other contexts use 'display 2'
    • styles for multiple values : div, ul, ol, comma separated

    field.module probably doesn't have to bother about those, we only need a way for fields_ui to hook-in and provide its own display settings before a node is rendered.

Fields in Core

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: