This document is the (evolving) plan for the Fields in Core Code Sprint at Acquia from December 15-19, 2008. This is a DRAFT. Many sections are incomplete. Comments encouraged!
Mission
Integrate a "CCK-like" Field infrastructure in Drupal 7 core.
Guidelines
This wiki page is the main planning document for the Code Sprint. Please edit it only to improve or clarify the plan. For questions, discussion, or debate, add a comment. To create a new design proposal, make a new page and add a link to it from the Design Discussions section of this document. For discussion about any design proposals, please comment on the separate page.
People
Volunteers wanted! If you are available for any of the roles listed here (or others), please add/update your entry and indicate your availability. A lot of the initial conversion to the Field module/API will be done by the on-site developers but there will be plenty of tasks for remote volunteers as well.
Developers:
- bjaspan: on site all week
- chx: on site all week
- david.strauss: on site all week
- dries: ???
- karens: on site all week
- weitzman: on site all week
- yched: on site all week
Unit test writers (not to imply that the developers won't also write tests):
- flobruit: on site all week
Patch reviewers/committers:
- dries: ???
- webchick: ???
Doc/handbook writers:
- ???
Massage therapists:
- ???
Required reading
A lot has been written about the result of DADS sprint and how Fields in Core should work. A lot of it is wrong. :-) However, it is important that we all have the common background, so please read (or re-read, if you already have) and be familiar with these documents before the sprint begins:
- http://groups.drupal.org/node/11487 (Field API, field structure, and data migration - Barry's rationale for his CRUD API)
- http://groups.drupal.org/node/8793 (Taxonomy as a field case study)
- http://groups.drupal.org/node/9221 (Larry's DADS final report)
- http://groups.drupal.org/node/8786 (Barry's DADS summary)
- http://groups.drupal.org/node/8694 (Proposed Content Model 1)
- http://groups.drupal.org/node/8796 (Proposed Content Model 2)
NOTE: I think some of these are redundant; Larry's final report is pretty good.
Design discussions
Face it, there is no way this group will magically agree on a complete design right off the bat. However, keep in mind that this is a code sprint. If we do not deliver working code at the end of the week, we suck.
If you have a design proposal, please post a link to it here so we can think about and discuss it ahead of time:
- Field API specification
- New core multiple load API (as of last week) - the current code in CCK HEAD already supports this.
- Multiple node-storage back-ends
User Stories and Tasks
We have at least six experienced Drupal developers for a week. Let's set the bar high!
See the Stories and Tasks sheet (tabs on the bottom) of the planning spreadsheet. The spreadsheet is world-editable (oddly, Google Spreadsheets can be but Google Documents cannot).
- Each Story (light blue row) represents a useful and (relatively) self-contained chunk of functionality. Provide a Description for each story that gives enough detail for everyone to understand it. If the description is too long to fit here, post a link to at g.d.o page.
- Each Task (white row) is an individual step to implement the Story. Provide a Description only if it is not evident from the task name. The more we can break down stories into bite-site Tasks (perhaps 1-4 hours each), the easier it will be to have multiple people working on a story simultaneously. Also, tasking out a story requires thinking through the design and implementation, which is always useful. :-)
- Everyone is encouraged to identify new Stories or Tasks for existing Stories and add them to this sheet.
- The priorities are per-Story and are open to discussion. If a Story has tasks that are much lower priority than others, create a separate Story. It is pretty obvious that we will not accomplish everything on this list.
Infrastructure
We cannot have six+ developers spend a week mailing patches and to-do items to each other, so we need a source code management system and issue tracking.
- For SCM, we will use Bazaar, hosted by Four Kitchens. David Strauss has unknowingly volunteered to give a 30 minute intro to the tool on Monday morning and then we can bombard him and chx with questions throughout the week if we have any.
- For issue tracking, we'll use drupal.org. Version 7.x-dev, component "field system" (just created).
Acquia is providing wired and wireless network connectivity.
Agenda
Daily: Start at 9AM. Starting Tuesday, we will have a 15-20 minute (e.g: 2-4 minutes per person) around-the-table summary of what each person did yesterday, what they are planning to do today, and any problems they are aware of. This is a quick summary only. We will record these meetings and post them online so the community can keep up to date on our progress.
Day 1:
- 2 hours: Design Warfare. Discuss, debate, and decide on all design issues we are aware of that are required before we can start coding. We will only have a couple hours for this so be sure to be familiar with the required reading and design discussions linked above. Ideally, we can resolve all of this before the sprint starts and hit the ground running.
- 1 hour: Field API specification.
- ...
- 30 minutes: Introduction and Q&A for Bazaar. If someone has a good "Getting Installed and Started" document link, please post it here.
- 2 hours: Planning session. We will have a list of Goals/"user stories" ready before the sprint starts. In this session, we will break down each story into as many of the components tasks as we can jointly identify. This will help us figure out who should do what and how long everything will take.
Day 2 and beyond: Hack. Details TBD, based on our planning session.
Comments
Infrastructure
Four Kitchens also maintains a JIRA+Confluence+Crowd system which I'm happy to provide access to, but I think at least issue tracking should remain on Drupal.org. It might make sense to use Confluence to develop the documentation because it supports better concurrent editing than Drupal or even MediaWiki. Because we run Crowd, our Confluence and JIRA installations are part of a single-sign-on system.
As for version control, I'm a fan of Bazaar. If you use it in the centralized way, the commands are almost identical to Subversion's. (Bazaar's commands are more similar to Subversion's than Subversion's are to CVS's.) Bazaar comes packaged for easy installation on every major platform. The important advantage is that it will support our distributed teamwork better because we can merge in each other's changes any way we want (central to developer or developer to developer) and then merge into the main branch or create diffs against the main branch. The merging capabilities completely outclass Subversion's. I'd be happy to host the repository on a Four Kitchens server. I host all repositories for my company and have hosted some for Chapter Three's work.
Also, both Four Kitchens and
Also, both Four Kitchens and Launchpad maintain Bazaar branches synchronized from CVS HEAD. Using these would allow us to "branch" from CVS HEAD and continually merge in changes with no special infrastructure for the sprint. If we use Subversion, we'll have to spend some time implementing a way to keep CVS HEAD synchronized into a sort of vendor branch to prepare for later merging back into CVS.
And I also use bzr
for three years now. There is no way to get a simpler util, really.
Path to semantic web technologies
Oh, and to spark the interest of Dries, modular storage and querying paves the path to an engine supporting a SPARQL-based querying interface with a back-end optimized to handle such queries. We can never efficiently support RDF and SPARQL unless we support storage in alternative engines.
The D7 code currently in CCK
The D7 code currently in CCK HEAD has all the db-storage related code isolated in a separate .inc file. Not pluggable / modular yet, but at least gives a clearer vision of the current db-centric dependencies.
SimpleDB
My concern with CouchDB is that while it does sound awesome, the entry level is so very high... SimpleDB does not deployment and now it has a free tier. So if we can, I'd love to have SimpleDB as our secondary piece. And yes having a secondary, non-SQL layer is important, as postgresql was important for DB:TNG and even then it turned out that we needed even more abstraction for SQLite.
SimpleDB is not viable
I researched and SimpleDB is not a viable solution. Barrier-to-entry not so terrible, http://jan.prima.de/~jan/plok/archives/142-CouchDBX-Revival.html for Mac OS X, Ubuntu has a package of it since Intrepid and a binary windows installer is in the works. http://wiki.apache.org/couchdb/WindowsBinaryInstaller
On DB storage
While I've traditionally been a support of a hybrid table schema (one table per node type, one table for each multi-valued field, and one table for each field associated with more than one node type), I think the option of having alternative storage engines for large-scale deployments would change my perspective.
Assuming we can also store nodes in something other than the relational database (in addition to the relational database) if we need to, I would go with one table for each field type. It keeps the schema changes isolated to installing or uninstalling a field-providing modules. It allows good enforcement within the database of data types.
Then again, we'd probably get a performance edge with table-per-field over table-per-type.
Regardless, I'd like to kill disruptive schema changes from configuring fields. It's just not something we can keep doing if we want to have enterprise credibility.
Abstract the module-defined/admin override code from node types?
Part of the need will be parallel to what we now have with node types. Like node types, we need module-defined fields, admin overrides of those module-defined fields, and custom (admin-defined) fields.
I recently drafted a patch on votingapi based on the node types code, http://drupal.org/node/335668. I ended up using many of the same fields as in the node types table and blocks of virtually identical code. Given that this general pattern is common - module-defined items, admin overrides, admin-defined items - it might make sense to abstract it for use in both node types and fields (and in contrib as needed).
For folks not following core
For folks not following core development, Catch's patch to core was recently accepted. It introduces node_load_multiple() which is used by the home page and other node listings. Presumably Views will use it too. So modules get a chance to do one query to pick up their extra bits for 10 nodes whereas we used to do 10 queries. He is now working on a node_load() caching patch which is similar to the cache that CCK already provides.
My point is that we can now deprioritize query speed a little since we will be issueing fewer queries. Other things like code maintainability could be considered more important.
yched has already patched HEAD cck to for the node_load_multiple() feature - Thanks!