Sprint update (via chx)

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Posted by catch on December 16, 2008 at 10:51pm
Last updated by Amazon on Wed, 2008-12-24 01:14

Pictures from the sprint
Participants: Dries, Karen, Yves, Moshe, Barry, David Strauss, David Rothstein(1 day), Karoly, Florian
Current status from the Fields in Core sprint, via chx in #drupal

Database first:

Storage will be per-field.

What's a field?

We take a type that's coming from a module implementing specific APIs, for example, it can be number or text and then add some settings to it, like the max length and whether it's formatted. The combination of type and these basic settings together forms a field.

We store field_name, type, multiple, locked, module, active and settings in field_configuration
And on field creation we create a field_{$field_name} table
This has three mandatory columns and then whatever else the module wants to store in there.

We have entity (think node) , entity_id (think nid) and delta as the three mandatory fields.
There will be an optimization where entity maps to an int because it's hell inefficient to store "node" a million times.
entity_id is nid for nodes, not vid - since we don't store archived versions in the main table.

Next up we have field instances.
A field instance takes a field_name (see above) and a bundle.

What's a bundle?
So, bundle. It's an abstraction from content_type -- an example is node_story, but you can think user_admin if you want.

We have stuff like required, label, description, weight, instance_settings, widget_type, widget_module, widget_active, widget_settings and finally display_settings - all in field_instance_configuration.

o what happens on node_load_multiple?
We have the bundles we want to load -- they come from content types with a node_ prefix.

Then you look up in the field_instance_configuration -- I have node_story, node_page and node_foo, give me field_names for them
If you have the field names , you are golden because the field_name immediately means you have the table name to query from. And you have entity (node) and the entity_id.

Getting this initial data will be something like "SELECT DISTINCT field_name FROM field_instance_configuration WHERE bundle IN (my bundles)" - but probably a cache_get().

How to apply the patches
You can create a patch against head by running "bzr diff --old=bzr://vcs.fourkitchens.com/drupal/7" from your checkout. The patch may not be usable if we haven't merged in the latest HEAD.

To merge in the latest HEAD (from your checkout):
bzr update
bzr commit -m"My uncommitted changes."
bzr merge bzr://vcs.fourkitchens.com/drupal/7
[resolve any conflicts and run tests]
bzr commit -m"Merge in CVS HEAD."

Comments

Uh.

Posted by eaton on December 16, 2008 at 11:14pm

I realize that aggregating fields for a type into a single table is hard and all, but this is basically "making CCK simpler by abandoning the part of the architecture that makes UI-based schema generation somewhat performant and slightly less sucky."

Have the implications of this been thought through? If the answer is 'yes', can I ask again, until the answer is 'no'? ;-)

Lessons from flexinode

Posted by merlinofchaos on December 16, 2008 at 11:19pm

Remember that per-field storage was one of the reasons flexinode was abandoned.

Per field, not per field type?

Posted by recidive on December 16, 2008 at 11:55pm

If I've understood this correctly, storage will be per field, not per field type.

IIRC, flexinode storage was just ONE huge table for all fields, so not really per field nor field type.

Comparing to flexinode

Posted by eaton on December 17, 2008 at 12:51am

This is not as bad as flexinode, in that sense, no.

It's still pretty cataclysmic for performance on high-traffic sites with reasonably complex data, however. Can I ask WHY this change is being proposed?

Performance fix

Posted by chx on December 17, 2008 at 4:30am

Dont you worry, David is working on a materialized view module which will make your site like insanely faster because you wont need to use a JOIN at all when querying. Sorry for not including this in the report.

Is this slated for core

Posted by catch on December 17, 2008 at 11:47am

Is this slated for core inclusion parallel to the fields API?

Formatter options?

Posted by markus_petrux on December 17, 2008 at 1:54pm

Have you considered the possibility to configure formatters? Actually, in CCK2, you can choose from different formatters, but you cannot configure options for them, and that may sometimes force the inclussion of more formatters than really needed if they could be configured.

There will be support for

Posted by yched on December 25, 2008 at 2:25pm

There will be support for formatter settings. Current state : supported in the field data structures (saved and loaded), but we currently don't do anything with those, and they're not tested.

Next challenge is providing a UI.

Formatter settings section in field settings panel?

Posted by markus_petrux on December 25, 2008 at 9:43pm

If we think about CCK2, maybe fields could append formatter section fieldset in the field settings panel, for each formatter for the given field in that particular content type. If it was a collapsed fieldset, the current UI would like now, but the options would be there when needed. It's just an idea.

per-field...

Posted by jredding on December 19, 2008 at 5:20am

Have the implications to data migration been examined in this decision. I'm looking at this and shaking my head in a way that says
"oh my god when I have to migrate this data out to another system I'll want to strangle someone"

and yes I know I can just do node_load and grab the data to slap into another system but when you're working with massively large datasets going from DB to DB without running through a PHP layer is incredibly more efficient.

This is one of the reasons why I like the way CCK is setup now.

Sorry for the all the backseat drivers on this thread but this decision has extremely serious consequences to everyone's website(s). I trust you guys are smart and have thought this through but I'm also very wary...

-Jacob Redding

Why switch to another system

Posted by dixon_ on December 19, 2008 at 10:12pm

Why switch to another system when you already is using Drupal? ;)

What if...

Posted by jredding on December 20, 2008 at 7:49am

Drupal is a frontend to a backend CRM?
Drupal is a frontend to a larger storage system
You have a massive Oracle DB that contains hundreds of thousand of inventory items and Drupal is used as the frontend to acquire new items (thus a migration from Drupal back to the Oracle DB)

We talk about Drupal becoming "enterprise" ready part of being in the enterprise is incorporating and playing nice with data for old-clunky database systems.

-Jacob Redding

There are different aspects

Posted by moshe weitzman on December 20, 2008 at 1:40pm

There are different aspects to the performance question.

As for write performance, Mike Ryan and I are working on the Economist.com data migration into Drupal. At the end of the day, we use node_save() to populate our node and field tables. Our fields are a typical mixture of shared and multiple and single use/single value. We are seeing insert rates of 5000 nodes inserted per minute. I think that puts to rest write performance.

It seems jredding is concerned about entering into PHP layer. But for special cases like migration, the proposed schema is better. The proposed schema does not change behind your back like current CCK. Fields stay put. Custom scripts get written once and do not break - same for backup.

Materialized views are a perfect answer to the SELECT problems that many sites face during typical use. Materialized views is real code that will soon be contributed - I've seen it in action. Fields plus materialized views is going to be years faster than D6 CCK And it will help core Drupal as well - specifically forum topic listings and tracker listings.

I'm not worried..

Posted by jredding on December 20, 2008 at 5:50pm

about the PHP or the Drupal layer. I'm fully confident that with code we can pull anything.

Materialized views are a good answer to my concerns of JOINing 40 tables to get a simple node (for example a résumé node). Although it doesn't help with pulling data out of Drupal but I guess the standard will become looping through a bunch of node_loads (or using the wonderful node_load_multiple).

I guess I'm going to have to wait for the code to be posted and check that out.

-Jacob Redding

code

Posted by chx on December 20, 2008 at 7:37pm

bzr://vcs.fourkitchens.com/srv/bzr/repo/drupal/7-fic/

chx posted the bzr+ssh://

Posted by david strauss on December 22, 2008 at 12:41am

chx posted a modified version of the bzr+ssh:// URL he's using, which won't work for plain, anonymous bzr://.

Use this:
bzr://vcs.fourkitchens.com/drupal/7-fic/

Like Subversion, Bazaar has different paths for SSH and anonymous access.

Or you can use your browser:

http://vcs.fourkitchens.com/drupal/7-fic/

DB structure was a blocker for any other work

Posted by karens on December 21, 2008 at 2:00pm

We knew there would be concern, and probably a debate, about the field structure and we need to work through those questions, but I want to point out that there are many many many other things in the code that still need to be created and perfected. We made a decision about the DB structure so that we could unblock that issue and get other things done. If we waited until we had a solution for the DB structure we would be far past the point where we could get anything done for D7. We've been debating that issue for over a year now with no resolution.

We still need to press forward on getting all the rest of the code working. It would be possible (although not painless) to rework the new code later to revert back to the hybrid storage method, so the ultimate fallback would be to do that. But we all are very hopeful that the new method will do everything we need it to do and more and that that won't be necessary.

So I hope everyone will not just sit on the sidelines until they buy into that solution, there are many many other tasks to do to get this working and we cannot let that issue block all progress.

I think the ultimate

Posted by david strauss on December 22, 2008 at 12:43am

I think the ultimate fallback would be a module that provides a hard-coded denormalization of the single-valued fields into per-entity type tables. It would be fewer lines of code and less disruptive than moving canonical field storage back to the old method.

This is exactly the fallback

Posted by bjaspan on December 24, 2008 at 5:41pm

This is exactly the fallback solution I have in mind. It can live in contrib, manually maintain the per-content-type tables, and export those tables to Views (perhaps even yanking the original single-value-unshared per-field tables from the Views data structure to avoid confusion). This module will use hook_field_attach_* and, I agree with David, it will be a much cleaner and simpler implementation to factor this logic out of the core Field API. The only downside will be requiring one extra write during field_attach_insert/update.

And I also want to point out

Posted by karens on December 21, 2008 at 2:19pm

And I also want to point out that if we were to go back to the hybrid method, it wouldn't be the current hybrid method where the database schema changes, it would be a 'locked' schema that cannot be changed, which means you could not create a single value field and later decide to make it into a multiple value or shared field, nor could you later change anything about the field definition that affects the db structure.

Fields are just per-type-table Flags with widget

Posted by andypost on December 22, 2008 at 6:27am

Looks very promising because:
- lot of modules require customizable Reference between objects
- Flexible storage (DB, cache or any backend)
- ACL support
- Cache for rendered pieces

In conjuction with Rules and MaterializedViews it's really killer feature!

...and new wave in architecture with proposal to rewrite modules to use universal aproach. Maybe someday library of snipperts or some kind of this will be stored near modules, themes and so...

Comments

Uh.

Lessons from flexinode

Per field, not per field type?

Comparing to flexinode

Performance fix

Is this slated for core

Formatter options?

There will be support for

Formatter settings section in field settings panel?

per-field...

Why switch to another system

What if...

There are different aspects

I'm not worried..

code

chx posted the bzr+ssh://

DB structure was a blocker for any other work

I think the ultimate

This is exactly the fallback

And I also want to point out

Fields are just per-type-table Flags with widget

Fields in Core

Group organizers

New groups

Group notifications