CCK Performance

dewolfe001's picture

I am planning on doing some benchmarking to assess two models of CCK used in Drupal 5.

We have about 35+ content-types. To keep a lid on the number of fields getting generated, I used generic fields that could be re-purposed for use in other content-types. The idea: recycling fields so that we have the small number of fields in play. With CCK, if a field is used once for a content-type and not re-used for another content-type, the field will appear in the content_type_[content type name] along with any other than are unique to that content-type. If I recycle fields and use them in multiple content-types, that means we have many content_field_[field name] tables going on, all brought together from database joins.

Is this field re-use/recycling more database intensive? Has anyone done benchmarking on this idea-- recycling a few fields vs. using many fields with almost no recycling? If so, can you share your findings?

Thanks in advance,




catch's picture

If you're using views to display lists of content with shared fields, then you can show for example the 'image' field from ten different content types in a list just by querying the one table - this is a lot more efficient than having a node/teaser view and loading each node individually. However when viewing an individual node, you'll need to do those extra joins. On top of this I think CCK has its own cache for fields per node, so that saves some overhead whichever way you do it.

I don't like sharing fields unless I have to.

KingMoore's picture

If you use no shared fields, all the data can be stored in just the content_type_xxx table (like you said). Any time you do a node_load, there should be heaps less queries and/or joins to get the data you need.

If like Catch said, you will need to do queries to get all instances of a shared field accross types and DON'T need the other content data, than sharing fields would make sense.

Overall I would say it is best to not share any fields as a rule, and then make sure you have good reason to share one only as an exception to the rule. This has just been my experience, but I have found that more often than not I want to get multiple pieces of content relating to one node, as opposed to a single piece of content from multiple nodes. Also many contrib modules use node_load in a loop, so keeping the queries per node_load to a minimum is helpful.

DamienMcKenna's picture

From what I've seen, and the practice I'm moving towards standardizing on, is to only use shared fields when it's something that would be reused on multiple views, e.g. a custom summary or thumbnail field.

it depends

fago's picture

mostly on the kind of views you want to do - if you want to list the fields across several node types like catch has described, share the fields. If you don't need that it's probably faster if you don't share the fields - so you avoid more joins. This is in particular important for tabular views as CCK caches the node_load()..

Anyway if you create such many fields you will need quite a lot of memory - as each cck field also generates views fields, which bloats the views definitions. At least in 5.x views is always loading all table definitions into the memory... So recycling fields would help here. However I don't know if views has improved that for 6.x?

This is an interesting conversation

John Hodgins's picture

Because it raises more general questions about how to best use drupal as a development platform. The decision i've come to is to make a separation between performance and development -- to try to make development as simple and extensible as possible and then work out optimisation and performance at a higher level -- page caching, block caching, custom caching solutions depending on the site, etc. How do other developers tackle this?

I like to use CCK and Views because it makes development so much faster and simpler, and allows less tech-savvy site managers to extend sites in the future. So in this situation (35+ content types) i would definitely use shared fields, because it will be so much easier to keep track of and theme one shared thumbnail field, for example, than 35+ individual thumbnail fields. I would then try to improve performance (if necessary) through caching solutions. Using individual fields might be more database efficient, and creating your own custom tables for your content and writing your own SQL instead of using Views would probably be even more efficient, but it would also increase development time and make it more difficult to extend the site in the future...


joshk's picture

That's definitely how I look at it, but it certainly helps to architect things with an eye towards performance. In my experience, CCK is generally not that problematic for scaling in and of itself, although keeping it clean and tidy certainly helps avoid confusion and crossed wires.

Views, however, is definitely something to be mindful of. It's a fantastic system, and used correctly it works well to a point. However, if you've got very high performance requirement that includes a lot of logged-in users, you will have to look at replacing views queries with optimized database tables and SQL. In short, you'll need to denormalize the critical data that you want to query against (e.g. timestamps, categories, groups, etc) so they can be in a single optimized table so that mysql doesn't need to create temporary tables, or worse yet resort to disk-based filesorts in querying them. Check out David Strauss's work on the DNA project for some tools to help w/this.

When it comes down to it, going to the top in performance requires expertise in the core LAMP stack in addition to Drupal. |

Chiming back in

dewolfe001's picture

Can I say that I really appreciate all of the comments that have been coming through. With many fields and many content-types, migration will be a challenge, but I think it will deliver a performance boost and it's a worthwhile move.

Tech -
Rants -
Etc. -
Whimsy -
My First Drupal site -