The way that fields are structured in CCK now is that any field that is multiple or shared has its own separate 'per field' table, and all other fields are grouped together in a 'per content type' table. Querying this data to create a node can be expensive, so the serialized node is cached and the cached data is used during node_load().
That structure requires a lot of complex data manipulation when fields are shared or unshared or changed from single to multiple or multiple to single. When those things happen, the schema must be altered and data must be migrated from one table to another. This makes the code quite complex and introduces the potential for data loss and other errors.
Our conversations at the Design Sprint resulted in an initial idea of simplifying this so that all fields are stored in 'per field' tables. This will greatly simplify the code, making it easier to get it into core. The hope was that the caching model would alleviate most of the performance problems that might result. But there is still a performance concern with this model.
Another approach to this would be to go ahead and store the basic data in simple 'per field' tables, as we discussed, but instead of storing a serialized array in the cache, create multiple tables at an intermediate level that can be queried and filtered easily without any need for joining data. Those intermediate tables will contain duplicates of the basic data stored in the 'per field' tables, but the cost of keeping that data up to date is probably less than the cost of complex joined queries.
If we did that we would no longer store the serialized data in the cache, so the cache_content table would not be needed. Once fields are moved to core, these intermediate tables would probably also eliminate the need for the revisions table.
The new data structure might look like the following. The idea is that each node type would have a table with all its fields and each node might have multiple rows in the table to represent all the delta values of all its fields. Any field could potentially contain multiple values, so any node could potentially return multiple rows and node handling would need to take that into account. But these tables would be easily queriable for any of their values without any joins.
Note that the idea of adding an intermediate level of tables is Dries' idea, this implementation of this idea may or may not be anything like what he was thinking :)