A DITA documentation distribution for Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Introduction

A couple of weeks ago we launched Modulecraft an awareness & fund-raising project that aims to rally Drupal professionals around a shared effort to create the ultimate toolset for Drupal business. The first fundraising round has as goal the development of a documentation distribution for Drupal that uses a similar approach as the localization server and that enables a distributed/federated documentation architecture for the Drupal project. As a Drupal user you'll be able to get a set of documentation from the drupal.org docs server imported into your own site. You will than be able to edit it and build subsets of the documentation for your own projects. You'll also be able to submit topics that were edited or created by you on your own infrastructure and add them as suggestions to the Drupal documentation server.

The following is a first proposal for the specification of the Documentation system we want to build as part of the modulecraft project. It is by no means complete, and it strongly needs your feedback. This is our first encounter with DITA and our ideas should really be proof checked by technical writers that have extensive experience using DITA. It also contains some proposes a somewhat exotic usage of RDFa, feedback is also very much needed here. In the coming days I'll be adding new sections to the specification here. This is a wiki so be bold! You can add comments either here or at the original posts on the Pronovix blog. I'll be incorporating feedback into this wiki.

Why DITA

Several people from the community have indicated DITA as the ideal architecture for a new redesigned Drupal documentation.

Darwin Information Type Architecture is an open XML standard curated by the Oasis consortium that was originally developed by IBM. DITA was built to enable single source documentation: you make one central set of documentation topics that can than be reorganized in new so-called DITA maps to serve a different documentation purpose.

Drupal needs single sourced documentation: this would solve our current need to have 1 documentation structure that needs to serve all purposes and that needs to include all topics. We could build DITA maps for different user types, distributions, projects, etc.

There are existing tools that can than convert DITA output into a number of formats.

Store as XML or XHTML

DITA is currently mostly managed in a dedicated tool that directly edits the XML. To display the documentation afterwards on a website it needs to be converted to XHTML. Drupal has some tools to work on XML, but since Drupal is mostly used to publish HTML, these tools (e.g. XML filter) are few and little tested.

Storing everything in 1 XML formatted text blob would make it harder to edit the documentation and would require to either edit in XML or periodically convert back the changed XHTML back to DITA (IMHO this defeats the whole purpose of single sourcing). Since in 90% of the time we'll be working with web content it's better to store in XHTML and convert to DITA only when needed.

Using fields to simplify the interface

Adding a fixed set of fields to a form makes it easier to enforce/simplify the input of structured data. Using the Drupal (CCK) fields system to build those parts of the DITA structure that are main sections of a topic (e.g. title, description) will make it easier for people with less experience working with DITA formatted information to use the format. Inside these fields we can than use a markup editor (could be WYSIWYG or something else, see later article) to add the 'freestyle' markup, that can be added inside those main containers (only offering valid markup options).

Potentially such a system with CCK and an interface that allows for the creation of new tags could make it really easy to make specializations (this is what the D (Darwin) in DITA stands for: evolutionary extensibility of the basic topic types). If there is budget and sufficient demand we could potentially make a system that automatically derives the definition/specification of custom DITA topics from the CCK/markup settings.

Using RDFa to add not always present XML elements

DITA (specialized) topics have elements that are primary branches in the XML tree that if present are used once and mostly on the same place. Concepts have for example a title, a shortdesc, conbody and related-links. The task element has for example the taskbody that in turn contains the prereq (prerequisites) and steps element.

The step element however has several useful child elements that are not always required and that are more free-form in their use e.g. the info (additional information about the step), stepxmp (example that illustrates a step), substeps, choices (the user needs to choose one of several actions) and stepresult (expected outcome of a step).

These in-field elements could be added using an RDFa vocabulary that we derive from the DITA markup. That way we would in the first place create valid XHTML that can be converted to DITA when it needs to be exported. The RDFa markup could be applied using a specialized WYSIWYG editor that is context aware (so that only valid child elements are added).