Per-Bundle storage module

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
bjaspan's picture

Today I mostly goofed off at work and created the Per-Bundle Storage module (pbs.module), a Field API contrib module that stores all values for all limited-value fields in a bundle in a single table row. Per-bundle storage is the exact analog of pre-Field API CCK's per-content-type storage and thus provides the same benefits (namely, loading all such fields in a single query).

The per-bundle tables are not used by field_attach_load() but are available for custom queries and Views. Once the code to connect Field API tables to Views is written, this module will alter the Views tables to reference the bundle tables for limited-value fields.

Bundle tables are stored in addition to, not instead of, per-field tables. Thus, one extra insert per object save is required. Per-Bundle Storage module is useful when a site does more reads on objects with multiple limited-value fields than writes, in which case the extra insert is irrelevant.

Unlike old CCK, per-bundle tables store all limited-value fields, not just single-value fields. If the field 'foo' can have 3 values, the per-bundle table has columns foo_value_0, foo_value_1, and foo_value_2. For simplicity, the same naming scheme is used for single-value fields, e.g. foo_value_0 instead of foo_value. I'm not completely sure yet that this will actually prove useful to Views; time will tell.

The module is not done but it is already enormously simpler than the single/multiple-handling code in old CCK. The functional basics take 150 lines. There are lots of ways it could be optimized, particularly by bypassing DBTNG and using non-standard SQL for pbs_synchronize_bundle().

The code is in the bzr repository in sites/all/modules/pbs.

Comments

I thought about (and thought

KarenS's picture

I thought about (and thought I proposed at one time) a simple db structure like this. The only hurdle I found was what to do with shared fields. So what have you done with shared fields? Are they duplicated?

Shared fields

bjaspan's picture

pbs.module ignores whether or not a field is shared, so data for limited-value shared fields are stored in per-bundle tables just like non-shared field data. It turns out that this does not result in any more duplication of data than for non-shared fieds---exactly one extra copy of the data is stored.

What does get duplicated multiple times is table columns. Consider an example: A text field called "subtitle" is assigned to two bundles, "page" and "story" (both happen to come from node.module, but that's irrelevant). Also, "story" has an additional field called "pullquote". pbs.module creates two tables, like this:

field_bundle_page: type, id, vid, subtitle_value_0
field_bundle_story: type, id, vid, subtitle_value_0, pullquote_value_0

Both tables have a column subtitle_value_0 because the subtitle field is shared between page and story. So the column is duplicated. However, any individual saved object can be either a page, a story, or neither, but it cannot be both. Thus, there will never be a row in both of these tables storing a subtitle value for the same entity type and id. Hence, exactly one extra copy of the data is stored.

So, is this useful? Well, it helps the goal of loading as many field values as possible in a single query, which is the whole point of module.

What about Views? We want to be able to create a Views query to load "subtitle" for any object, regardless of its bundle, and we cannot (easily) use the per-bundle tables for that. That's not a problem, because we still have the per-field storage table for subtitle. I said in the original post that this module will modify the Views tables to use the per-bundle fields when appropriate; it can simply not modify the Views tables for shared fields, so Views will query the per-field table for them.

Using per-bundle tables for field_attach_load()

bjaspan's picture

I do not think per-field storage is a performance issue for primary object loading (e.g. node_load() calling field_attach_load()) because loaded fields are cached, either within Field API or by the fieldable type itself. Per-bundle storage is intended to solve the "too many joins" problem for the other field-loading use cases: custom queries and Views.

That said, I now realize that it would be pretty simple to use per-bundle storage tables during field_attach_load(). We can just add an "external-load" hook that lets the normal storage system not load fields that are already loaded externally. An pseudo-code example, using a hypothetical single-object version of field_attach_load:

<?php
function field_attach_load($obj_type, $object) {
 
$externals = module_invoke_all('field_attach_load_external', $obj_type, $object);

  foreach (
field_to_load_for($object) as $field_name => $field) {
    if (isset(
$externals[$field_name])) {
     
$object->$field_name = $externals[$field_name];
   }
   else {
    
$object->$field_name = load_from_per_field_storage($field);
   }
  }
}
?>

pbs.module would then implement pbs_field_attach_load_external() that loads the single row from the object's per-bundle table, massage the results in memory, and return them. Poof! All limited-value fields are loaded in a single query.

The fact that we can get "the best of both worlds" like this is not surprising. pbs.module makes a simple time vs. space tradeoff. By using twice the space, we gain enormous code simplicity, actually correct functionality (which CCK never truly provided), and efficient operation for use cases that have conflicting storage requirements.

I think I like hook_field_attach_load_external() enough to go implement it.

Can we consider the debate over using per-field storage as the primary, master copy of field data resolved now? Jeff? Earl?

Not all the benefits

David Strauss's picture

Because storage is per-bundle, even for shared fields, there is no straightforward way to query for "all nodes with value x in shared field y" without quite a few LEFT JOINs.

Use the per-field table

bjaspan's picture

pbs.module does not replace per-field storage, it provides per-bundle storage in addition. So, to get "all nodes with value x in shared field y", you use the per-field table:

SELECT entity_id FROM field_data_y WHERE entity_type = 'node' AND y_value = 'x'

Am I missing something?

No, you're not missing

David Strauss's picture

No, you're not missing anything. I missed that you're preserving all per-field storage still. There's no problem.

Barry: Does that mean that

merlinofchaos's picture

Barry: Does that mean that 'unlimited' multiple is not possible? Or does it cause nasty breakage?

Of course unlimited multiple

bjaspan's picture

Of course unlimited multiple is possible, and it is stored in a per-field table, just as with old CCK. pbs.module simply ignores FIELD_CARDINALITY_UNLIMITED fields.

Again, I am not replacing per-field storage for anything. All fields are stored in per-field tables. pbs.module provides per-bundle storage in addition.

Hi! It was really

sin@drupal.org's picture

Hi!

It was really frustrating for me to know from Fields in Core Sprint Report that per content type tables field storage now officially dropped further denormalizing db schema (

I code a number of pages/dashboards which required custom SQL SELECTs/UPDATEs on this tables because Views and API calls just not fit sometimes: custom look and feel, integrated forms and JS, mass updates, import/export, integration with other systems and so on. I use Drupal > 3 years and I never was in need to change cardinality of a field on a working site, used shared fields once or twice (and do not like it :), 90% of my fields was 1:1 related to nodes. So D7 will introduce JOIN and UPDATE complication in my work (

I'd like to have a choice how to store fields.

+1 to hook_field_attach_load_external()

bjaspan, thank you for your efforts!

Views Plugins

mfer's picture

@sin - You might want to take a look at Views plugins and altering views queries. You can pretty much get and present anything you want any way you want.

Matt Farina
www.innovatingtomorrow.net
www.geeksandgod.com
www.superaveragepodcast.com
www.mattfarina.com

Absolutelly agree with @sin.

denisanokhin's picture

Absolutelly agree with @sin. New field storage approach is terrible for big sites with many fields and custom sql queries. Many developers of such sites will be forced to change Drupal as their basic platform.

Thank you @bjaspan for pbs module (http://drupal.org/project/pbs). However it seems that the module is not maintained anymore :(

Instalation problem

prabir123's picture

Hello All,

I am trying to install the module, it's through the error

Notice: Undefined index: field_name in pbs_bundle_schema() (line 109 of /opt/lampp/htdocs/drupal7/sites/all/modules/pbs/pbs.fieldapi.inc).

Notice: Undefined index: field_name in pbs_bundle_schema() (line 109 of /opt/lampp/htdocs/drupal7/sites/all/modules/pbs/pbs.fieldapi.inc).

It's very flexible module for large scale database and for huge fields in a table. It will be very helpful for us if any guys show some direction,

Thanks in Advance!!!
Prabir

I don't get it?

emilymoi's picture

After having successfully built high load drupal 6 sites using CCK, I can not comprehend how this decision was made to create a table per field? How is this even running properly on active sites? Is this the only alternative available? WOW.

:( Looks like the module is

sinasalek's picture

:( Looks like the module is no longer maintained , module like efq_views also are far from being complete, for example it doesn't support join!

If you are looking for Per

greggles's picture

If you are looking for Per Bundle Storage for Drupal 8 you might be interested in this stackexchange post discussing how to use base fields instead. It's not a config-oriented solution, but it is available now.

Fields in Core

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: