DITA - One Success Story to Share

stevebain's picture

I’d like to share a recent project with you that I think may provide a solution to problems you’re facing in publishing tech comm content using Drupal. This is my first post on Drupal.org after registering two days ago, so bear with me as I tell you the [somewhat long] story of how I used Drupal to solve the problem of publishing technical documentation online.

I’ve been working and teaching in tech comm for over 20 years. I recently directed a large (40-person) technical communication team producing software and hardware documents for users and system admins for a telecommunications company. I was hired to lead the team, but also to improve quality and transform the delivery method to HTML. This story focuses only on the HTML delivery portion.

Tackling the Transformation from PDFs to HTML Books

Like any project, this one began by collecting requirements. My team initially provided a long, writer-oriented list, which included things like managing, tracking, and sharing writing and visual components, two-way integration between source and published versions, and one-button publishing. The company requirement was quite simple -- deliver HTML-based Web documents that can be easily found and viewed on any device.

For years, the team had been using Adobe FrameMaker, and to a lesser extent MS Word, to publish documentation as PDFs posted to the company support site for download. While PDFs make desktop printing easy, they often require long download times and can’t be read on small screen devices. PDFs also prevent the ability to collect any useful search or display analytics critical in measuring content effectiveness. I knew switching to responsive HTML-based Web books would solve this.

Although I knew there were proprietary CMS products available, I also knew they were expensive and often lacked all of the capabilities I knew were needed. The vision I set was to make HTML delivery the main delivery method for all customer-facing product documentation using a CMS that provided the freedom to customize and innovate.

To simplify the project, I focused initially on a way to process native Frame and Word files into HTML-based books. I chose Drupal as the Web CMS for its economics and flexibility. The book module provided a basic way for navigating multi-page documents and views provided ways to associate collections of relevant tagged content.

Successful 2013 Launch

The initial requirements took roughly two years to complete and launched in early 2013. On the authoring side, it required developing new doc templates that a strictly enforced style formatting (read: structured authoring) across the client applications. It required development of a taxonomy model that could support all the various products, document types, versions, and other supporting documents. And, it required developing a user interface that would expose the features and functionality, be easy to use, and be easily brandable for marketing purposes.

On the Drupal side, we ended up with a navigation model based loosely on the book module. The entire project incorporates roughly 250 new and/or modified Drupal modules. These include scripting modules to clean up source XHTML output from the client applications, as well as others to handle style-mapping, issues management, and content workflow to name a just a few.

This solution is capable of quickly transforming a complex 1,200-page native Word or Frame document into a responsive “Web book” format that’s optimized for low bandwidth devices. It also inherently supports publishing DITA XML, as the structure is well-suited. "Among the first documents published are this admin guide...

http://plcmtechnet.com/documents/en/voice/ucs/5-0-1/administrators-guide...

...which began as a 600-page Word document, and this admin guide...

http://plcmtechnet.com/documents/en/realpresence/realpresence-collaborat...

... which began as a Frame file at twice that length.

Other Technical Wins

Besides delivering HTML books, the solution also includes printing and language support. To maintain print capabilities and offline content viewing, it includes automatically generated PDFs. This was done by integrating Drupal with PDF Reactor which can be extended to support geographic-specific page sizes, ebooks, and epub versions.

For language support, it will also generate PDFs from either vendor-translated or Google Translated language versions which are incorporated into the UI. In this instance, this feature essentially pays for the ongoing annual hosting, maintenance, and resource costs.

My mission now is to expand this solution to support more native file types and help others transition from print/PDF-based delivery models to HTML-based publishing on the both the authoring and publishing sides of the fence. I hope I’ve inspired you to make Drupal your tech comm publishing solution and I’d be happy to try and answer any questions around this.

Cheers,
Steve (steve@definio.ca)

Comments

Hi Steve, I'd love to know on

suzannewang's picture

Hi Steve,

I'd love to know on the drupal side: based on the book module, do you have to create a book page and copy and paste the word contents into the book page on Drupal site one by one? If not, how do you deal with a 600-page Word doc? Can you be more detailed?

Thanks a lot,
Suzanne.

Importing Word docs

stevebain's picture

Hi Suzanne,

The copy/paste method isn't realistic for a variety of reasons. The simplified answer is that the process works by importing the Word doc HTML content as new books using the heading tags as new book pages. There is a module that specifically handles that.
For example, in this document...

http://plcmtechnet.com/documents/en/voice/ucs/5-0-1/administrators-guide...

...the TOC you see on the 'landing page' represents the top-level headings used to divide the original document into book pages. However, all heading levels in the doc are hyperlinked to enable cross referencing to and from specific points in the document.

Steve

Links to example HTML tech docs

stevebain's picture

Apologies. I didn't notice the editor stripped out the example links I provided. The post should read:
"Among the first documents published are this admin guide...

http://plcmtechnet.com/documents/en/voice/ucs/5-0-1/administrators-guide...

...which began as a 600-page Word document, and this admin guide...

http://plcmtechnet.com/documents/en/realpresence/realpresence-collaborat...

... which began as a Frame file at twice that length.

Which Module Does the Importing Job

suzannewang's picture

Hi,

Your site looks nice.

May I know which module do you use to import doc html into Drupal please?
Your book layout looks great. Does theme affect the layout and effects? If you don't mind me asking-what theme do you use?

Thanks,
Suzanne.

Re: Which Module Does the Importing Job

stevebain's picture

Thanks Suzanne -- great questions.

We created a custom module that performs clean-up routines on the client HTML. It is still in development, but essentially runs global and client specific scripts get the markup in shape for publishing.

The theme (style, color, font, etc) the site uses is CSS controlled also custom and follows the corporate rebranding requirements set out by the client's Marketing team. The layout and UI are also customized and have evolved gradually since the launch in Jan 2103.

In terms of effects, we incorporated expanding/collapsing procedures and tables, scrolling for oversized elements (oversized images and tables), and lazy image loading to optimize the reading experience regardless of screen size. It works well since this type of content so often includes very large reference pieces.

Steve

Import module available as open source?

Frank Ralf's picture

Hi Steve,

Many thanks for sharing this success story! I have two questions:

  1. AFAIU, you didn't use DITA XML as your source format but your solution is capable of handling DITA XML. Is that right?
  2. Is your import module publicly available as open source (yet)?

Kind regards,
Frank

Re: Import module available as open source?

stevebain's picture

Hi Frank,

More great questions.

1) Correct. Importing DITA XML, which is currently being incorporated into the import script as a native source type, is inherently easier to work with than the source from Word or Frame (as you can imagine).
2) The import module isn't available (yet) since it is still a work in progress (we still need to devise a way to accommodate video clips) and it maps a unique set of template HTML tags from the source files to the CSS on the site. Ideally, the future plan is for the module to handle additional source file types and versions (especially Flare and eventually Indesign).

Steve

Publishing DITA content to Drupal

rjohnson42's picture

Hi,
It's been a while, but I created the Drupal 7 bulkpub module so I could publish DITA content as Drupal 7 books. I used the module and some Python scripts I wrote to do the publishing. My input was XHTML output from the DITA OT, so I was able to handle XHTML from other sources. I no longer maintain or support my solution, but would be glad to make available what I did to others.

The big problem I ran into was Drupal 8. Its internals are completely different and I did not have the bandwidth to learn a completely new CMS after I had just taught myself Drupal 7.

Dick Johnson

Integrating "bulkpub" module

Frank Ralf's picture

Hi Dick,

Many thanks for your contribution. Your "bulkpub" module was already high on my list of already available Drupal modules which can import DITA XML. However, I haven't yet had the chance to test it. If you want you can make me co-maintainer of your module (https://www.drupal.org/project/bulkpub). At the moment I suppose any solution will yet be based on Drupal 7 but we definitely should keep an eye on Drupal 8.

Best regards,
Frank

re: Integrating "bulkpub" module

rjohnson42's picture

Hi Frank,

I'll go ahead and make you a co-maintainer of bulkpub. I also have a package of my Python scripts and documentation on the whole solution which I can give you when you are ready.

The criticisms I got about my solution were:

  • It can only publish and delete an entire docset. People wanted to selectively update single topics.
  • When a docset was republished, the comments on the prior topics were lost.
  • Some potential clients asked me for many enhancements that really amounted to them wanting a Drupal-based DITA content management system. Those systems are very complex and expensive to develop. There is a reason why such a thing costs a lot of money!

Regards,
Dick Johnson

Was there a specific reason

kvantomme's picture

Was there a specific reason you didn't extend the feeds module to build the importer?

--

Check out more of my writing on our blog and my Twitter account.

Moved to its own thread

robertnthomas's picture

Moved to its own thread

Bob Thomas
Tagsmiths, LLC

brilliant

WorldFallz's picture

This is brilliant-- what a gorgeously functional documentation site. Well done.

I'm about to tackle something similar, but with somewhat restrictive document formatting requirements. The links provided above appear to be using a single body field on the book content type for the actual page content.

Did you have to deal with any formatting/section restrictions (ie document type 1 has to have 'intro, scope, risk, body' sections but document type 2 only requires intro and body sections)? If so, how?

I'm in the process of evaluating a number of options (fields on content types for different document types, fields on content types for sections related by term to a single 'document'), but haven't gotten to the implementation stage yet. I'd be curious to hear your thoughts.

And again, well done-- what an amazing piece of work!

Re: brilliant

stevebain's picture

Thanks - I appreciate your inspiring comments and great questions. :)

The links provided above appear to be using a single body field on the book content type for the actual page content.

Correct - each page of content is a single field comprised of processed HTML that originated from the source file output. Each node of the book structure is a division of the document based on heading level. On importing a document, the heading level split is an option.

Did you have to deal with any formatting/section restrictions...

Absolutely. The source file documents use structured (meaning enforced) styles which are directly mapped to the page content CSS. Any content in the source file that uses arbitrary formatting will not display correctly.

I'm in the process of evaluating a number of options (fields on content types for different document types, fields on content types for sections related by term to a single 'document')...

The important thing about a project like this is that it requires a clear understanding of the overall content architecture before any CMS work starts, so you're wise in planning ahead. The key to relating content is in creating a comprehensive taxonomy and using Views to build related lists. In this case, it needed to be detailed given all of the variables (product family, product, release version, language, document type, etc.).

I'm keen to see this implemented elsewhere and to expand the source file support, so you are welcome to contact me directly at steve@definio.ca.

Steve

_

WorldFallz's picture

The key to relating content is in creating a comprehensive taxonomy and using Views to build related lists.

Yes! And that's precisely where we're at now. So it was quite serendipitous coming across this post ;-)

Our source files are quite all over the place so we basically have carte blanche to define our own standards. But I'll definitely take you up on your offer and contact you! Thanks.

kvantomme's picture

Thank you Steve for sharing the case!

You said you used the book module, that is also what we are currently using. But the book module has a few important shortcomings:

  • It only allows a topic to be in a single map
  • The structure is stored together with the content in the node

Was this a problem for you?

A former colleague did a project for a Google Summer of Code that tried to address this problem. It unfortunately never got beyond a sandbox project: Documentation Entity. It's implementing a context switcher that lets you choose the book/map context (as in what DITA map) you would like to view a topic in.

--

Check out more of my writing on our blog and my Twitter account.

DITA Tech Comm CMS

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: