Data Architecture Design Sprint

Several Drupal developers met February 4-6, 2008 in Chicago to work on and re-design Drupal's core data architecture. Topics will include data APIs, object modeling, fields in core, and an overlapping swirl of related ideas. Our goal is to have a design proposal for presentation at DrupalCon Boston 2008.

Field API, field structure, and data migration

bjaspan's picture
public
bjaspan - Wed, 2008-05-14 22:08

Much has been said (http://groups.drupal.org/node/9297) about how fields should be structured in D7 core, what aspects of fields can be changed, and how those changes are implemented. It is (past) time to move forward on implementing fields in core and in this post I am proposing an answer.

Disclaimer: I make a lot of declarative statements in this post. Obviously I do not have unilateral authority; this is just a proposal.

Field storage


Remote Field Revision

neoliminal's picture
public
neoliminal - Wed, 2008-03-05 17:14

This discussion will revolve around issues of Remote Field Revisions. This regards getting field population from single, multiple, and unknown sources. Sending data to other sites and how they can use, modify, and potentially change that data from source. Also regarding race state and origination of data sources and trusted sources for field revision.


Drupalcon: The Future of Fields

kentbye's picture
public
kentbye - Tue, 2008-03-04 18:32

Here are my notes from the Future of Fields presentation from this morning.

[Begins showing a picture of the Data Architecture design Sprint including: chx, yched, bjaspan, crell, karens and nedjo]

Drupalcon used to be a developer conference where they would meet and figure out what was going to happen for the next year, but in Barcelona they realized that it was harder to do that.

So they asked about the who are the right people to be involved, and they all flew into Chicago and did a sprint.


Field Structure

public
KarenS@drupal.org - Sat, 2008-03-01 12:18

The way that fields are structured in CCK now is that any field that is multiple or shared has its own separate 'per field' table, and all other fields are grouped together in a 'per content type' table. Querying this data to create a node can be expensive, so the serialized node is cached and the cached data is used during node_load().

Final Report

public
Crell@drupal.org - Wed, 2008-02-27 07:54

OK, so, it took a lot longer than I expected or planned, but here is the final summary report from the design sprint. I've provided it in both OpenDoc and PDF format. Hopefully there aren't too many spelling errors. :-)

The Future of Fields, Drupalcon Boston 2008

bjaspan's picture
public
bjaspan - Fri, 2008-02-22 01:14

This wiki page is for the development of the Drupalcon session we'll be presenting in Boston. We have 90 minutes on Tuesday morning; see our session page.

Please edit this page only if you attended the Design Sprint and will be participating in the session. Otherwise, please give us your feedback via comments. Thanks!

Session outline


Proposed Web Service Clients module: Preparing the way for external fields

public
nedjo - Wed, 2008-02-13 20:02

At the recent Data Architecture Design Sprint, participants outlined a development path in which fields can be local or external. A given entity instance (node, user, etc.) can be fully local, fully external, or a combination of the two. For example, a node can have several fields stored and handled in a local SQL database and other fields both loaded from and written to an external SOAP service.

Proposed Content Model 2

public
Crell@drupal.org - Sun, 2008-02-10 22:21

During the Data Architecture Design Sprint (or DADS, as I will henceforth call it), we discussed two general models for reaching our vision of "data anywhere, value-add anything". Those models became dubbed "Model 1" and "Model 2", because that's the order in which we happened to write them down.

Barry has already done a good job of explaining Model 1 in an earlier post. Here, I am going to lay out the structure of Model 2. I suspect that the final system, whever it looks like, will draw heavily from both models.

Taxonomy as a field

bjaspan's picture
public
bjaspan - Sun, 2008-02-10 18:12

It's useful to explore how existing core functionality can or cannot be implemented as a field. Doing so may cause us to change how we decide to implement them. In this post I'm talking in terms of nodes but trying to avoid assuming that we are limited to nodes instead of Content objects and that all fields are stored locally in SQL. I'm also mostly just thinking as I go/type; I don't have a specific outcome in mind yet.


Summary of the Data Architecture Design Sprint

bjaspan's picture
public
bjaspan - Sun, 2008-02-10 17:17

Six Drupal developers met for three days in Feburary 2008 to work on and re-design Drupal's core data architecture. We began with an overly broad list of topics and goals and spoke in very general terms about "Drupal's overall mission" and similarly vague concepts. By the end of the design sprint, we devised a tangible vision for Drupal's future and an understandable method for getting there.


Structure of "fields in core" patch and of CCK project

public
nedjo - Sat, 2008-02-09 18:11

Following the design sprint, drawing on what we had discussed, Yves, Karen, and Nedjo tried to map out some of the structure of a core patch and the accompanying short and longer term changes to CCK fields.

Specifically, we discussed the following approach:

  • In the immediate term, make small changes as feasible to facilitate longer term changes, e.g., move CCK modules into their own directories in CVS.

Remote content as minimal viable functionality

bjaspan's picture
public
bjaspan - Thu, 2008-02-07 23:42

We talked a lot about minimum viable functionality (MVF) for D7 core. We also talked briefly about what criteria to use to decide what should be in core vs. contrib. I'd like to revisit that.

Question: Why bother putting fields in core at all?

Being in core has many drawbacks. Much slower development/release cycles, necessity to gain broad consensus for everything, requirement to operate in every possible environment that Drupal supports (e.g. low-memory hosting), etc. We already have CCK and fields in contrib. They work great. Why are we going to incur all this extra pain?


Day 3

chx's picture
public
chx - Thu, 2008-02-07 01:28

Notes: Comments are now enabled on this page.

CCK simplifications for core

The day started with ways we can simplify current CCK functionality because there is a feeling that what we have is way to fragile and complex for core at this time. Functionality slated for destruction includes:

  • Per-content type field storage, e.g. no more content_type_story tables with columns for multiple single-value fields. All fields are stored in today's equivalent of content_field_fieldname tables.
  • All field tables always have a delta column even if it is a single-value field.

The new field settings array

public
KarenS@drupal.org - Wed, 2008-02-06 16:15

We're talking about having fields that can retrieve and store their data either locally or remotely. Here's an idea of what might happen to the field settings array.

Field settings for a local field:

Array
(
    [field_name] => field_caption
    [type] => text
    [type_name] => story
    [module] => text
    [db_storage] => 1
    [db_source] => array
      (
        [type] => local
        [text_processing] => 0
        [max_length] =>
        [allowed_values] =>
        [allowed_values_php] =>
        [table] => content_field_caption
        [columns] => Array
        (
            [value] => Array
                (
                    [type] => text
                    [size] => big
                    [not null] =>
                    [sortable] => 1
                )

        )
      )
    [display_settings] => Array
        (
            [label] => Array
                (
                    [format] => above
                )

            [teaser] => Array
                (
                    [format] => default
                )

            [full] => Array
                (
                    [format] => default
                )

            [4] => Array
                (
                    [format] => default
                )

        )

    [widget_active] => 1
    [required] => 0
    [multiple] => 0
    [active] => 1
    [widget] => Array
        (
            [rows] => 1
            [default_value] => Array
                (
                    [0] => Array
                        (
                            [value] =>
                        )

                )

            [default_value_php] =>
            [label] => Caption
            [weight] => -2
            [description] =>
            [module] => text
            [type] => text_textfield
        )

)

Field settings for a remote node:

Array
(
    [field_name] => field_caption
    [type] => amazon_text
    [type_name] => story
    [module] => amazon_field
    [db_storage] => 3
    [data_source] => array
        (
           [type] => remote
           [url] => amazon.com/feed/?item=%s&type=%s
           [replacements] = array
             (
               [0] => field_caption[0]['value']
               [1] = books
             )
        )
    [display_settings] => Array
        (
            [label] => Array
                (
                    [format] => above
                )

            [teaser] => Array
                (
                    [format] => default
                )

            [full] => Array
                (
                    [format] => default
                )

            [4] => Array
                (
                    [format] => default
                )

        )

    [widget_active] => 1
    [required] => 0
    [multiple] => 0
    [active] => 1
    [widget] => Array
        (
            [rows] => 1
            [default_value] => Array
                (
                    [0] => Array
                        (
                            [value] =>
                        )

                )

            [default_value_php] =>
            [label] => Caption
            [weight] => -2
            [description] =>
            [module] => text
            [type] => text_textfield
        )

)

The new node (or entity) object layout

public
KarenS@drupal.org - Wed, 2008-02-06 15:25

Here is what our node object looks like now:

stdClass Object
(
    [nid] => 13
    [type] => story
    [language] =>
    [uid] => 1
    [status] => 1
    [created] => 1202300504
    [changed] => 1202300759
    [comment] => 2
    [promote] => 1
    [moderate] => 0
    [sticky] => 0
    [tnid] => 0
    [translate] => 0
    [vid] => 27
    [revision_uid] => 1
    [title] => My new node
    [body] => Blah blah blah
    [teaser] => Blah blah blah
    [log] =>
    [revision_timestamp] => 1202300759
    [format] => 1
    [name] => admin
    [picture] =>
    [data] => a:0:{}
    [last_comment_timestamp] => 1202300504
    [last_comment_name] =>
    [comment_count] => 0
    [taxonomy] => Array
        (
            [1] => stdClass Object
                (
                    [tid] => 1
                    [vid] => 1
                    [name] => Art
                    [description] =>
                    [weight] => 0
                )

            [2] => stdClass Object
                (
                    [tid] => 2
                    [vid] => 1
                    [name] => Drupal
                    [description] =>
                    [weight] => 0
                )

        )

)

We have said we want all the fields on a node to start to prefix their names with the name of the module that put them there.

If we make all fields multiple, storing each in its own table, we should also combine related elements into a single field instead of making every single item on the current node into a separate field.

There is still a question about whether some of these items should become multiple values, but the following is a general representation of what the new object might look like.

Proposed Content Model 1

bjaspan's picture
public
bjaspan - Wed, 2008-02-06 14:09

Our concept for this Design Sprint is to use "fields in core" as a lens for thinking about Drupal's future data model needs. We will then propose a "Minimum Viable Product" for putting fields into core in a very simple way given our current node-based data model but hopefully do so in a way that provides a path towards supporting our vision of the future data model.

After two days in flaming, I'll summarize our "vision" as I understand it:

  • The future is about web services; if Drupal has no value in a web services world, it is dead.

Drupal Data API Design Sprint, Day 2

public
nedjo - Wed, 2008-02-06 06:23

On this second day, we:

  • Defined use cases for Drupal as a web development platform as opposed to CMS
  • Elaborated a set of target outputs for the design sprint
  • Further developed key solution components and their implications for Drupal design.

Use cases: Drupal as a web development platform

What should Drupal as a web development platform be able to do? By answering this question, we both clarify the rationale for a renewed data API and bring into clearer focus some key aspects and requirements of that API.

High-level use cases:

  • Stand-alone web site

Drupal Data API Design Sprint, Day 1

public
nedjo - Tue, 2008-02-05 05:49

Day 1 focused on scoping the challenge of introducing an abstracted set of data APIs into Drupal core.

Participants:

  • Yves Chedemois
  • Larry Garfield
  • Barry Jaspan
  • Karoly Negyesi
  • Nedjo Rogers
  • Karen Stevenson

Regrets:

  • Moshe Weitzman (ill)

The overall challenge was defined as: to fully convert Drupal into a web application development platform in which a powerful CMS is one built-in implementation. CCK fields provides a very promising set of solutions on which to base our work, yet also presents several important challenges. In our discussion, we:

An Evening Out by NowPublic

bjaspan's picture
public
bjaspan - Sat, 2008-02-02 14:15

To show its support for everyone participating in this Drupal Design Sprint and to encourage others to occur, NowPublic has graciously offered to treat the attendees to a fine dinner out while we're in Chicago. We'll go on Tuesday evening, giving us two days to choose a restaurant, make a reservation, and enjoy the anticipation!

So be sure to bring some non-raggy clothes. We can't go anywhere too up-scale, of course, because we'll inevitably talk loudly about Drupal all the way through dinner and we might scare the other customers.

Thank you, NowPublic!


Draft agenda

public
nedjo - Sat, 2008-02-02 05:40

[Here's a roughed in agenda for the three days. Please edit away.]

Roles:

  • We have two facilitators and five documenters (or else three and four). At any given time one person is facilitating, one documenting. We rotate these roles so everyone gets a chance to participate fully.

Day 1. Broad vision plus introductory discussion of fields in core.

Select roles. Review and refine agenda.

  1. Introductions. A bit about ourselves. [all]

  2. Lightning talks (five minutes). [all]

  3. Broad spec of unified set of data APIs. What, not how. [all]

High-level schema properties

public
recidive@drupal.org - Fri, 2008-02-01 07:46

Currently, hook_schema only provides the information schema api should know when creating tables. While keeping this tied to database make complete sense, putting some business logic in schema is worth considering.

Focus on "fields in core"

moshe weitzman's picture
public
moshe weitzman - Wed, 2008-01-30 21:44

As we get nearer to our fine meeting, I am getting more worried about scope. Specifically, I fear we will bite off too much and accomplish little of substance.

I strongly propose that we focus on "fields in core" as a primary task. It is true that this goal splinters into many subtasks. But we should only concentrate on those subtasks which are prerequisite for achieving the goal. I think we can ignore node render and web services and PDO and lots of other shiny apples in favor of this one. It is the feature most requested by our user community, IMO.


Thanks to...

bjaspan's picture
public
bjaspan - Mon, 2008-01-28 16:20

This Data Architecture Sprint is not an official event sanctioned by anybody. It is just a group of Drupal enthusiasts and, for some, their employers or clients who agree that this work is important and will benefit everyone. We'd like to recognize the individuals and companies that are donating their time and resources to make this event possible:

  • Nedjo Rogers
  • Karen Stevenson
  • Moshe Weitzman
  • Yves Chedemois. Publicis Modem France develops
    Drupal-based community websites for various high-visibility clients, and wishes to support the evolution of the Drupal architecture.
  • Larry Garfield. Palantir.net develops visually and functionally sophisticated sites for corporate, educational, and cultural clients.
  • Barry Jaspan. Acquia provides value-added software products and services for the Drupal web collaboration and publishing platform.
  • Karoly Negyesi. NowPublic is a participatory news network which mobilizes an army of reporters to cover the events that define our world.

Special thanks to Palantir.net for free use of their conference room and to CivicSpace for funding Nedjo Rogers' and Henrique Recidive's work on the Data API whitepapers.


Data APIs: building a spec, defining options

public
nedjo - Sun, 2008-01-27 01:38

[Here is an attempt to pull together some of the information and discussions on the go in a way that might help focus our discussion. Please edit/add/expand.]

Assume we're starting from scratch and writing a set of APIs for data handling for our main object types (node, user, etc.). What does such a set of APIs need to do?

Currently we have a diverse set of data APIs. A useful first step in designing a new unified set of APIs may be to summarize what we currently have. From there, we can identify key gaps and compare options.

Characteristics of existing data APIs

Purpose and aims

public
nedjo - Fri, 2008-01-25 19:32

[Let's get started with a statement of why we're meeting and what we aim to achieve. Here's a draft. Please wade in and edit. Let's work this up till we're happy with it.]

Purpose

Drupal is well designed and at the same time needs fundamental renewal.

The core data APIs for handling basic object operations are dated and highly inconsistent with one another. There is a strong need for a basic refactoring of the core data APIs.

Tentative Agenda

webchick's picture
public
webchick - Mon, 2008-01-21 21:48

Re-posting this as its own thing, per Barry. Admins, feel free to edit the crap out of this. ;)

Jan 17 - Jan 20
We get our collective stuff together internally... figure out a rough agenda, logistics about where/when we're meeting, compile resources, etc.

Jan 21
This group becomes the "Data API" group and is opened to the public.


A session at DrupalCon

bjaspan's picture
public
bjaspan - Fri, 2008-01-18 23:46

We need to give a session at DrupalCon to discuss what we come up with for everyone who did follow this group or our blog posts. It can be one person giving a presentation, a panel with a subset of us (not 8-10!) giving an intro and answering questions, or whatever, but something. I'd say the intended audience is other core and contrib developers; this should be an expert-level session.


Start with lightening talks?

moshe weitzman's picture
public
moshe weitzman - Fri, 2008-01-18 16:09

I'd like to start with each person talking for about 5-10 minutes on the topic of their choice. You should research and prepare your talk a bit beforehand. This way, we will learn what pieces each person is interested in, and get a quick pulse on how to proceed.


Conference agenda

bjaspan's picture
public
bjaspan - Thu, 2008-01-17 04:56

We have Monday, Tuesday, and Wednesday afternoons and evenings in Palantir's conference room. We can schedule things in the morning, too, thought I'm thinking we should reserve that time for people to do personal work and/or hack code based on our discussions.

So: Node representation, fields in core, data API, data rendering, database abstraction. Suggest an order of events.


Required reading

bjaspan's picture
public
bjaspan - Thu, 2008-01-17 04:55

Everyone coming to the conference (or suggesting ideas we should consider) should first read all of the papers, issues, blog posts, etc. that are listed in this post. If you have more to suggest, please do so.


Several Drupal developers met February 4-6, 2008 in Chicago to work on and re-design Drupal's core data architecture. Topics will include data APIs, object modeling, fields in core, and an overlapping swirl of related ideas. Our goal is to have a design proposal for presentation at DrupalCon Boston 2008.

Syndicate content