Combo field/Multigroup to create one-off complex field tables

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
KarenS's picture

I've been thinking about the concerns about storing fields in separate tables and have another idea to knock around. Storing each simple field in its own table could obviously result in a lot of tables for each content type, but storing a few complex fields could significantly reduce the number of tables involved. So another option is to make it easier to create a complex fields to match the needs of a particular system. If that was easier to do, anyone concerned about performance could combine the fields they need into a smaller number (or even a single) complex field. The specifications of the field would have to be locked down by the time there is data so we don't have the field schema changing, and someone could create a contrib module to handle a way to make changes to the complex field later. It is possible that once the complex field has its schema locked that it could become available to be shared on more than one content type, which has interesting possibilities of its own.

Allie suggested this idea in http://groups.drupal.org/node/9297#comment-29997, and we have work going on to get a 'Multigroup' module working for D6 at http://drupal.org/node/119102 (which has had so much discussion we're up to two pages of with over 300 comments). We have been thinking about Combo/Multigroup as a low priority for D7, but some variation on this idea could be a partial solution to this problem.

There are lots of things that would have to be worked out, but I'm just throwing this idea out for discussion...

Comments

And to be clear, the D6 work

KarenS's picture

And to be clear, the D6 work is not creating a single table out of the various fields, they are still in separate tables. My idea here would be to create a combo field that does store its fields in its own single field table, so it would take a different approach.

Desing thoughts

mki's picture

I would like to post some thoughts from data design point of view, not implementation or performance point of view.

The problem

Combo field/Multigroup is about common data design problem that concerns some specific kind of data, for example:

  • address(street, city, state, postal_code),
  • price(net price, vat, currency),
  • rental(value, currency, amounts of time),
  • weight(value, units),
  • date_time(year, month, day, hour, minute, second),
  • link(link_text, url),
  • email(owner_of_mailbox, url).

This kind of combo data have one specific characteristic: only all parts together makes sense. If just one part of data is missing, then data are senseless or worthless, for example:

  • price: 10 (USD? EUR?),
  • rental: 100 USD (per year? per month? per day?)
  • weight: 5 (kg? pounds?),
  • link: Wikipedia (URL?),
  • email: me@example.com (owner of mailbox?).

This kind of data MUST stay together. In these examples, the second part of information (currency, period, units, url, mailbox owner) make BIG difference.

Only date and time data types are implemented in database system.

The solution

Because this design problem is very common, there must be some convenient solution. And there is! Let's look at some canonical data model: RDF.

RDF

In RDF our problem can be resolved in many ways. But there are some convenient ways to do this, that RDF can offer:

1. Blank node

RDF can directly represents only binary relationships. So we can't just say:

Person -- address --> "Street", "City", "State", "Postal code"

In such cases we can use blank node. Blank node is a node which is not identified, because it's not expected to be referenced from outside. Here is our example using blank node:

Person       -- address     --> _blank_node_
_blank_node_ -- street      --> "Street"
_blank_node_ -- city        --> "City"
_blank_node_ -- state       --> "State"
_blank_node_ -- postal_code --> "Postal code"

2. rdf:value

The rdf:value is convenient when one part of combo data is considered main value, for example:

  • weight(value, units),
  • rental(value, currency, amounts of time).

3. Other solutions

From RDF Primer:

There is no need to use rdf:value for these purposes (e.g., a user-defined property name, such as exterms:amount, could have been used instead of rdf:value in Example 21), and RDF does not associate any special meaning with rdf:value. rdf:value is simply provided as a convenience for use in these commonly-occurring situations.

However, even though much existing data in databases and on the Web (and in later Primer examples) takes the form of simple values for properties such as weights, costs, etc., the principle that such simple values are often insufficient to adequately describe these values is an important one. In a global environment such as the Web, it is generally not safe to make the assumption that anyone accessing a property value will understand the units being used (or other contextually-dependent information that may be involved). For example, a U.S. site might give a weight value in pounds, but someone accessing that data from outside the U.S. might assume that weights are given in kilograms. The correct interpretation of data in the Web environment may require that additional information (such as units information) be explicitly recorded. This can be done in many ways, such as using rdf:value, building units into property names (e.g., exterms:weightInKg), defining specialized datatypes that include units information (e.g., extypes:kilograms), or adding additional user-defined properties to specify this information (e.g., exterms:unitOfWeight), either in descriptions of individual items or products, in descriptions of sets of data (e.g., all the data in a catalog or on a site), or in schemas (see Section 5).

CCK perspective

I think that in CCK-world there are many modules that using such design model like Multigroup, for example Link, Email, Money CCK field, Address, Fullname field, and Date. Of course date module make use of data type build in the database, but the idea stay the same. That's why I'm thinking of much better integration multigroup in CCK content module, so other modules could use common API to store their combo data in simply, clear and common way.

There is another disadvantage of current situation which I believe can be better understand thanks to the rule of least power. I mean at present every CCK-related module must define its own way of storing combo data. In result only that module can understand and process these combo data. This shouldn't work that way.

Data should be shared for all modules and be understandable in itself without magic processing by a specific module that created these data. Thanks to this principle, Views and every other module, will "understand" that some data are combo data so should be read/write/update/display together.

Special case: multiple value subfield

Flexibility is a valuable feature, so one could think about this kind of structure:

Record album   -- track     -->  _blank_node_
_blank_node_   -- genre     --> "Genre"
_blank_node_   -- artist    --> _blank_node_2_
_blank_node_2_ -- co-artist --> "Artist 1"
_blank_node_2_ -- co-artist --> "Artist 2"
_blank_node_2_ -- co-artist --> "Artist 3"

In this case:

  • artist should be multiple subfield, or
  • _blank_node_ should be a real node (not blank node), and track should be node reference.

In this particular example, I think that second solution (real node) is better, because the same track could be available on many record albums.

Summary

I think that idea of blank node is something that perfectly fit for combo data/multigroup. Blank nodes are useful where we need to designate some combo data but in very limited context, because blank node don't have identifier. That's why these nodes are blank.

The same should apply to multigroup. If we really need multiple value subfield in our combo data, then we should probably use a content type, which is a rightful and flexible content container. Multigroup should be some data type like datetime implemented in database systems, nothing more. Multigroup should resolve one specific problem and do it right.

As I said, if Multigrop is about combo data, that must be storing and viewing always together, then implementation is much more obvious.

content_multifield_address table:
+-----+-----+-------+--------+------+-------+-------------+
| vid | nid | delta | street | city | state | postal_code |
+-----+-----+-------+--------+------+-------+-------------+
|   6 |   5 |     0 | Street | City | State | Postal code |
|   5 |   4 |     0 | Street | City | State | Postal code |
|   7 |   6 |     0 | Street | City | State | Postal code |
+-----+-----+-------+--------+------+-------+-------------+

So what is your opinion? What is the best blank node/blank content type implementation? :-) If you have any better (or just different) idea of what multigroup should really be, then please share your thoughts. Thanking you in advance for you comments!

FlexiField Update and DrupalCon Paris Session

effulgentsia's picture

As some of you may know, I've been working on FlexiField as an alternate take on the Content Multigroup problem. This is a complex problem, and I think it's helpful to have some parallel experimentation going on. Today, I released alpha5 of flexifield. A concern with prior versions was that data for the child fields that make up a complex field was all being stored as a serialized string rather than using the child fields' storage. With alpha5, this is fixed, and child field data is stored with the child field and related to the complex field using a "vid" that is not an actual node revision, but allows the data linkage to work without needing to make any database changes to CCK.

Much in FlexiField is still a hack, and there are many modules (like date) that it still doesn't work with properly, but each alpha release is better than the previous. I'm looking forward to creating a much cleaner version for Drupal 7 that can take advantage of the improved Field API.

I haven't kept fully up-to-date with how Content Multigroup has been progressing, but I plan to delve into that soon. I know I have a lot to learn from picking apart Content Multigroup, and perhaps I'll have something valuable to contribute from my experience working on FlexiField. In the long run, I'd love to see this functionality as a standard and stable part of CCK, whether it's by evolving Content Multigroup, evolving FlexiField, or merging the best of both into some new module.

I've submitted a session proposal for DrupalCon Paris to discuss Content Multigroup and FlexiField. If you're attending the conference and are interested in this topic, please vote for it. If you've been heavily involved in Content Multigroup development and would like to co-present this session with me, please send me a message. Thanks!

Content Multigroup is available in CCK3

markus_petrux's picture

Just FYI: CCK3 is an experimental branch in CCK where Content Multigroup module is available. So you can take a look and see how it works, etc.

References:
- Development snapshot for cck 6.x-3.x-dev.
- State of the Content Multigroup module.

Creating Multigroup Programmatically

nilla054's picture

I wanted to use multigroup field programmatically. So, initially I created a content type (using gui) and trying to added multi-group field to the content type. To create programmatically, I created the node--edit.tpl.php, but I am missing something here. I got stuck with some questions like

  1. Creating them programmatically, wouldnt save the fields in the database tables.

  2. How do I add the multi-group fieldset and render in a table format.

If someone could clear my doubts, would be of great help to start up with as I am also new to drupal.