No field update

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
KarenS's picture

During the code sprint it was decided that core would not allow field attributes to be changed, only instance attributes. This was to prevent destructive changes, like trying to change a field type. But not allowing field changes creates another list of issues. Yves and I talked about this some last Friday (the last day of the sprint that got aborted by bad weather in Boston). Here are my (rough, unedited) notes from that discussion:

No field update:

  • What about allowed values list? Must be able to change it any time, must be able to share it (implies it must be a field rather than instance setting or otherwise sharable).

  • What about allowed values php? Must be able to change it? Would be better if it just refers to a PHP function defined somewhere else, but we don't require that now. What about typos and bugs that need to be fixed?

  • What about changes that have no effect on the schema? Why should we prevent them when they won't hurt anything, seems unnecessarily harsh.

  • What about changes that affect the schema but don't change the field type (i.e. change the size of a varchar)?

  • What about changes that change the field type, clearly should be a third party contrib solution.

  • What is the optimized way to store values? Would it be best to always use a 255 length on a varchar even if we don't need it so it can be changed later?

  • Should we have other field types for varchar and various sizes of text? Should we provide a few standard varchar sizes instead of leaving it open?

  • What about number fields? Should we provide a single large container for all sizes or limit it to some preset sizes, or to any size?

Maybe the Field module should provide no field_update at all and only CCK would allow it? CCK could allow non-destructive changes and prevent others after setting messages to explain why not. But this produces unpredictable results, you can't be sure anything will happen until you try. Could test schema before and after changes to be sure. Could ask fields to define a flag for each setting to tell which ones would alter the schema.

Comments

What about taxonomy?

Owen Barton's picture

Looking at this, it occurs to me that we already have a very capable system of handling selection lists - taxonomy module. I wonder if taxonomy (or the basis of it) could somehow be used to ease these problems? Certainly storing the options as their own table (or part of a table) removes the need to update the schema. It would be easy enough to add a "system name" or "stored value" field for each taxonomy term.

Obvously this raises more questions than it answers, but I think that long term (D8 if not D7) there is too much overlap between taxonomy and content multiple selection fields functionality. I think there is a lot to be gained by thinking about how easy per-field storage can solve some of the performance problems with taxonomy, and how the flexibility and power of multiple and multi-parent taxonomies can be brought to fields.

I like this

catch's picture

There's a somewhat stalled patch here: http://drupal.org/node/277209 to add a LOWER(name) column to term_data to avoid the performance issues for autocomplete. One of the reasons it's stalled is because we have no other use for that column in core. However, if we were to default that column to a lower case version of the name, and otherwise allow it to be used as 'system name' for CCK text selection then that could work pretty nicely. It'd also mean we could avoid storage multiple instances of the same string with CCK text select fields and just have the id instead.

By per-field storage, if you mean killing the term_node table as it currently is and moving to field API storage instead then that seems like a good plan as well.

I added my thoughts on that

David Strauss's picture

I added my thoughts on that patch, mostly in the form of setting "won't fix" status on it.

Fields vs. Schema

bjaspan's picture

The idea behind separating Field and Instance was very simple: Instance contains all the things that are just lightweight settings, and Field contains all the things that affect the schema and are thus "heavyweight" to change. You have now identified a third category, however. Field also contains those things that are shared between multiple instances, and some of those things may prove to be lightweight and easily changeable (e.g. allowed values and default values). As you say, we would prefer not to impose arbitrary and unnecessary constraints on what can be updated after-the-fact from core.

As a reminder, the design for field updates is to provide a field_update_field() function which first performs whatever changes it can/knows how to and second delegates the remaining operations to a hook. The set of changes field_update_field() can perform itself might be empty, but I had suggested (back before we decided on per-field only storage) that conversion of shared and cardinality might be in that set. Of course, lightweight field settings could also be in that set. Actually, in this model, the hook part will have to come first---we can't make some lightweight changes, invoke the hook for the heavy changes, and discover that no contrib module can handle the heavyweight change. We'll have to clearly specify exactly who is responsible for what to prevent contrib field_update hooks and field_update_field from stepping on each other.

So, perhaps we need to update our definition: Field contains everything that is sharable between instances (always including all schema-affecting properties), and Instance contains everything that is unique per instance. field_update_field() then exists in core, and defers to a hook function for all heavyweight changes.

Anyway, here are my responses to your specific questions:

  • What about allowed values list? Must be able to change it any time, must be able to share it (implies it must be a field rather than instance setting or otherwise sharable).

Three thoughts:

  1. I agree with "must be able to change it" but "must be able to share it" is not obvious to me. I can imagine a single field with integer values attached to two different bundles that wants either different allowed values or different display labels for the same allowed values per bundle.

  2. We discussed at the code sprint that Value List should probably be its own entity, separate from Field or Instance. Then an instance could either refer to an Allowed Value List or possibly someday also a Recommend Value List (think of a Windows-style Combo dropdown/text box). This gives you change-ability, share-ability, and update-ability within core. No mess, no fuss.

  3. As the previous commenters suggested, a Value List sounds an awful lot like a taxonomy vocabulary. However, taxonomy "values" are all integers, whereas CCK value lists currently also allow strings. In principal (I don't know if this is currently possible), a field type module could support a Value List of some other type in which the List itself would contain a string representation but the database would contain some other native format (e.g. non-timestamp dates). Using taxonomy would preclude this.

For now, the simple thing is to leave it at a Field property and allow core to change it. I have no moral objection to this.

* What about allowed values php? Must be able to change it? Would be better if it just refers to a PHP function defined somewhere else, but we don't require that now. What about typos and bugs that need to be fixed?

Didn't we recently remove the PHP Input Filter from core? I think allowed values PHP is a great thing to banish to a contrib module. Could we get away with this?

* What about changes that have no effect on the schema? Why should we prevent them when they won't hurt anything, seems unnecessarily harsh.

See above.

* What about changes that change the field type, clearly should be a third party contrib solution.

Yes.

* What about changes that affect the schema but don't change the field type (i.e. change the size of a varchar)?

This is splitting hairs. Changing the schema is changing the field type. In this particular case, the field module should just make the varchar be length 255 and impose restrictions in the business logic. The whole point of varchar is that doing it that way imposes no wasted database storage overhead.

* What is the optimized way to store values? Would it be best to always use a 255 length on a varchar even if we don't need it so it can be changed later?

Yes.

* Should we have other field types for varchar and various sizes of text? Should we provide a few standard varchar sizes instead of leaving it open?

In some other post in this group you suggested that a "text" field should not be able to be either a varchar or a text column, that we should provide separate field types for those. The same question would apply for text and text:big, as you say. I think I think that I agree, these should be separate field types. I'm not certain.

* What about number fields? Should we provide a single large container for all sizes or limit it to some preset sizes, or to any size?

If here you are referring to int:tiny vs. int:big, float:normal vs. float:big, then my answer is "whatever we decide for the question about text columns, do that for numbers too." :-)

So if we standardize our

KarenS's picture

So if we standardize our available text and number fields to match the types of db fields we support, and always create a field of the maximum size that type allows, we could allow any change except a change in field type. That makes the update code fairly simple -- see if a change in field type is requested. If so, see if there is a contrib module that will support it and allow or deny the change based on that. Anything other than a change in field type can be handled by core. I like that solution and it seems pretty clean, and it should not be too hard to differentiate between core and contrib.

I previously set up a separate issue on g.d.o. about allowed values (http://groups.drupal.org/node/17757), which is a whole discussion of its own. I'd like to get that discussion all in one place in that other issue. (I would have left it out of my notes in this issue except that it was an example of a field value that needed to be alterable).

OK, stating 'one field type

yched's picture

OK, stating 'one field type per db columns definition' means:
Text field type is split into varchar(255), tinytext, mediumtext, normaltext, longtext (or more probably a reasonable subset of the '*text' ones)
Integer is split into (a reasonable subset of) tinyint, smallint, mediumint, normalint, bigint.
Probably no biggie.

I'm less sure of what 'maximum size' means for float and decimal :
Float : well, we currently just use the 'float' schema column type, without any size option. We could probably keep on with this - if we want to keep a float field type at all ?

Numeric (decimal) has scale and precision.
- Do we settle on a single 'Numeric' field type with 'maximum' (= 'reasonably high') scale and precision - is there such a thing as a one-fits-all
- Do we pick and hardcode a couple values for the (scale, precision) pair to build a couple 'Numeric' field types ?

Unsigned integers?

markus_petrux's picture

Integer is split into (a reasonable subset of) tinyint, smallint, mediumint, normalint, bigint.

It would be nice if it was possible to defined the unsigned attribute for integers.

Also, additional libraries (BCMath/GMP) may be required to support BIGINT in PHP (int precission in PHP is platform dependent).

http://marc.info/?t=90279125900002
http://www.php.net/manual/en/language.types.integer.php
http://www.php.net/manual/en/book.bc.php
http://www.php.net/manual/en/book.gmp.php

So one downside of

KarenS's picture

So one downside of standardizing on field types is that we will have a much longer list of field options and that list will need to be explained to end users who don't know anything about databases -- how do we tell users when to use varchar vs mediumtext vs longtext? That doesn't matter for the API, but will be an important factor in the UI, since we want to make a UI that is easy to use without knowing anything about how data is stored in the database. I guess the UI could be responsible for trying to explain all that, but it does add an extra burden to the UI.

And what do we do for numeric types? There are a lot of options and we have people already using it that may have created fields that use some combo we wouldn't have picked as an option.

I wouldn't mind getting rid of the float type. It's there because it was in the original code and people have float data that has to be supported and there is no good, reliable way to convert that float data to decimal data. We can remove the 'float' option in the UI so no new float values can be created and just support the existing method in the API.

UI

catch's picture

I think we could probably just use the largest available varchar for 'single line' text fields, and then similar for longtext for textareas in the UI - or at least have these as defaults and defer the rest to 'advanced'.

Buried in the original notes

KarenS's picture

Buried in the original notes was an idea that we add a flag to the field settings to indicate which settings affect the schema. One option is to continue to handle fields as we do now and use that flag to identify field attributes that can't be altered. So then our API will check if a requested change alters either the field type or any of those field attributes that affect the schema. If so, see if there is a contrib module that can handle the change. If none of the changes affect the schema, the API can process the change.

We could still add some code in the background to store values in optimal ways, like standardizing on the 255 length for a varchar, but keep that all behind the scenes.

field_update_field vs. field_convert_field

bjaspan's picture

I'm going to try a new approach to answer Karen's set of questions. Basically, she asked: "Why have we disallowed changing field properties that are in fact lightweight along with changing field properties that are heavyweight?"

The new idea is that we do have a field_update_field() in core, but that it is defined only to be able to change those aspects of a field which do not affect the schema (== do not require CREATE/ALTER/DROP TABLE). It simply ignores any attempt to change schema-affecting properties; in fact it does not even look at them. We then implement field_convert_field() in the field_convert contrib module, and that is what knows how to do schema-affecting changes.

So, for Karen's original questions:

* What about allowed values list? Must be able to change it any time, must be able to share it (implies it must be a field rather than instance setting or otherwise sharable).
* What about allowed values php? Must be able to change it? Would be better if it just refers to a PHP function defined somewhere else, but we don't require that now.
  What about typos and bugs that need to be fixed?

I think this is no longer an issue since Allow Values fields are being separated into their own field type (of which I'm a big fan). However, if we were not making a separate field type, then field_update_field() could change the allowed values list.

* What about changes that have no effect on the schema? Why should we prevent them when they won't hurt anything, seems unnecessarily harsh.

field_update_field() would allow them. field_update_field() simply ignores requested changes to schema-affecting properties (ie: it just ignores the type property of the field structure you pass in because it will never change the type of an existing field).

Actually, now that Allowed Values are gone, are there any other lightweight field properties? If not, this whole topic is moot.

* What about changes that affect the schema but don't change the field type (i.e. change the size of a varchar)?

Changes that affect the schema (regardless of whether they change the field type) are handled by field_convert, via the hook method we've previously discussed. This includes changing varchar(100) to varchar(255); if text.module decides to support variable-length varchars, then a contrib module can implement hook_field_convert() to support changing a text field varchar(100) to a text field varchar(255).

* What about changes that change the field type, clearly should be a third party contrib solution.

Yes.

* What is the optimized way to store values? Would it be best to always use a 255 length on a varchar even if we don't need it so it can be changed later?
* Should we have other field types for varchar and various sizes of text? Should we provide a few standard varchar sizes instead of leaving it open?
* What about number fields? Should we provide a single large container for all sizes or limit it to some preset sizes, or to any size?

For varchar, yes, I'd say always use 255.

For the various sizes of int, decimal, and text, I think there are valid reasons for choosing the size. However, I actually do not think we want to expose to the user field types of "Small Text," "Medium Text," and "Large Text," and the same for Number. It's too many field types. I prefer our current approach of a Text field with a "maximum size" option that controls which Schema type is used for the column. This would be a heavyweight field property. field_update_field() would not change it, but field_convert_field() could.

Thoughts?

I generally agree with you

KarenS's picture

I generally agree with you Barry, just a couple clarifications/observations:

  • Other contrib modules have 'field' settings that may not affect the schema. Date module has a 'granularity' setting and I'm sure there are others, so it is well worth while to figure this out to be sure those harmless changes can be made.

  • How will field_update() know if it is being asked to change something that affects the schema? That's where I think we may need a flag in field_info() or something like that to separate field settings that affect the schema from those that don't. Or maybe we break the field settings into two parts: field_schema_settings() and field_settings() for the settings that do and don't affect the schema.

I think I can answer both

bjaspan's picture

I think I can answer both questions at once.

field_update_instance($instance) completely replaces the record for $instance['field_name'],$instance['bundle'] with the contents of $instance. That's fine, because all parts of an instance are lightweight, by definition.

field_update_field() can't work exactly that way. Instead, it will load in the current field, copy the values it is allowed to change from its argument into the current field, and save the modified current field. It simply ignores values it is not allowed to change. In pseudo-code, it will do this:

<?php
/* Update the allowable parts of the field $src['field_name'] with the values in $src */
field_update_field($src) {
 
$dst = field_read_field($src['field_name']);

 
$dst->cardinality = $src->cardinality;
 
$dst->some_other_lightweight_setting = $src->some_other_lightweight_weight;

 
module_invoke($dst->module, 'update_field', $dst, $src);

 
drupal_write_record($field);
}

/* implementation of hook_update_field */
date_update_field($dst, $src) {
 
$dst->granularity = $src->granularity;
}
?>

Does that make sense?

That still requires that

yched's picture

That still requires that field_update_field() can tell lightweight settings from heavyweight settings and does not save the latter (because even f_u_f() includes no code to update the schema anyway, we don't want 'precision = 10' saved for a decimal field while it still has its original 'precision = 5')

Rather than 'flagging properties as heavy- or lightweight', I'd vote for storing them in different buckets. Fields have 'settings' (lightweight) and 'schema settings'. To limit the number of 'expose your settings' hooks, they all should be grouped into hook_[field|widget|formatter]_info() :

<?php
function text_field_info() {
  return array(
   
'text' => array(
     
'label' => t('Text'),
     
'description' => t('This field stores text in the database.'),
     
'default_widget' => 'text_textfield',
     
'default_formatter' => 'text_default',
     
'settings' => array(
       
'foo' => 'bar',
      ),
     
'schema_settings' => array(
       
'size' => 'MEDIUM',
      ),
    ),
  );
}
?>

What worries me with all those considerations, though, is that they're all tightly sql-related. One of the (later) goals is 'Pluggable storage engines', where a field type wouldn't make assumptions on where it is stored, be it a table in a sql db, a non-relational or schema-less backend, etc...
I can imagine that some setting could be 'lightweight' for a given backend and 'heavyweight' for another.

No, it doesn't.

bjaspan's picture

That still requires that field_update_field() can tell lightweight settings from heavyweight settings and does not save the latter.

No, it doesn't. Well, yes, field_update_field() needs to know the difference between light and heavy settings, but the decision is made when the code is written. f_u_f() does not need to be able to "tell" at runtime because the logic is hard-coded in. Consider your example:

even f_u_f() includes no code to update the schema anyway, we don't want 'precision = 10' saved for a decimal field while it still has its original 'precision = 5')

Look at my pseudo-code implementation of field_update_field(). There is no line that says

<?php
$dst
->precision = $src->precision;
?>

In number_update_field, there will also be no line that says that. Thus, the $dst field that gets written back into the database will have the same precision value that it did when it wsa read from the database. Any precision value in the $src object will simply be ignored.

There is no line that

yched's picture

There is no line that says
$field_dst->precision = $field_src->precision;

Right, but there will be a line that says :

<?php
$field_dst
['settings'] = "the part of $field_src['settings'] that does not affect the schema".
?>

I'm saying that what we want to do IMO is :

<?php
$field_dst
['settings'] = "$field_src['settings'];
// What we don't do is
$field_dst['schema_settings'] = $field_src['schema_settings'];
?>

I think the misunderstanding comes from you using the word 'setting' for hardcoded properties like 'cardinality', and me using it for 'field-type specific' settings like text 'size', decimal 'precision' or date 'granularity'.

Wow, re-reading that part of

yched's picture

Wow, re-reading that part of the thread for http://drupal.org/node/367013, it looks like I've been really tired that day. Don't know why I missed it back then, but I do agree that barry's proposal above should work. Sorry for being a dumb*ss. I'll comment on the d.o issue.

I like this solution, too.

KarenS's picture

I like this solution, too. One problem for the CCK UI is that I need to know which values users can change and which they cannot and this would make that possible. Then I can disable schema settings in the UI (Sorry, these values cannot be changed) but leave other field settings available to be altered.

This is what I was trying to communicate when I said that we need a way to tell these settings apart.

What worries me with all

KarenS's picture

What worries me with all those considerations, though, is that they're all tightly sql-related. One of the (later) goals is 'Pluggable storage engines', where a field type wouldn't make assumptions on where it is stored, be it a table in a sql db, a non-relational or schema-less backend, etc...
I can imagine that some setting could be 'lightweight' for a given backend and 'heavyweight' for another.

But if the backend is not a db, it would just ignore schema settings, right, so no problem about not changing them. As to remote relational dbs, I can't think of any schema setting that would be 'heavyweight' in one relational db and not be heavyweight in all relational dbs, local or remote. That leaves non-relational dbs, but I can't think of any examples of these to imagine how they would work.

But, I do think we'll find when we get to this point that each backend will need its own settings, so maybe these are not field settings at all, but backend storage settings. and then we need another component in our API for field_storage and field_storage_settings, which initially will just be local db storage.

And is 'multiple' (or I

KarenS's picture

And is 'multiple' (or I guess it's now 'cardinality') an allowed or non-allowed change now? With per-field storage, it wouldn't affect the schema to change it.

I can't think of a reason

bjaspan's picture

I can't think of a reason that cardinality should not be an instance setting. Actually, I just did. It will make writing a contrib module to provide "per-content storage tables" harder or impossible.

However, I cannot think of a reason that cardinality cannot be changed with field_update_field() as I specified it in my reply to your previous comment.

Note that we'll need a hook to notify all modules of changes to fields, e.g. module_invoke_all('update_field', $new_field). But it can't be called hook_update_field if that name is also used as I recommended in my previous comment. So, we need a suggestion for the field-type update-field hook name and the module_invoke_all update-field hook name.

I can't think of a reason

yched's picture

I can't think of a reason that cardinality should not be an instance setting. Actually, I just did. It will make writing a contrib module to provide "per-content storage tables" harder or impossible

Views integration would also be 'harder or impossible' - single and multiple fields are handled quite differently in D6.

However, I cannot think of a reason that cardinality cannot be changed with field_update_field().

Means marking rows with delta > new_cardinality as 'deleted'. But sure, we should do that.