DITA vs. DocBook

Eliot Kimber discusses pro and contra Last Updated: 2006-08-26

Note that neither DocBook nor DITA are directly useful for production out-of-the-box.

(Editors note by Karl Johan Kleist: After writing this I've realized that this comes off pretty much like the standard "it depends" consultant answer--unfortunately, without more details about your specific requirements and constraints it's impossible to provide a more definitive answer. Hopefully this provides some helpful guidance.)

A custom document type should, of course, give you the best fit to your specific requirements (although its easy enough to go wrong there). But of course the startup cost is high. While I like nothing more than to do ground-up engineering of document types and the systems that support them, it's still a fact that both DocBook and DITA reflect years of collective wisdom and practice in using SGML and XML for technical documentation and therefore using them helps to minimize the risk that your system will go horribly wrong. While both have their warts, they're both clearly good enough for a wide range of applications and both are designed to be adapted to local requirements.

DocBook is both too general and very likely missing key features you need (such as process-specific metadata). With DocBook you almost always need to create your own customized variant that eliminates all the things you don't need and adds any missing things you do need. The DocBook DTD is designed to be customized and you should expect to do so. Like any general document type DocBook out of the box is too general, too big, and too loose in its constraints to be very useful for authoring. You have to tighten it up to make it useful.

DITA as defined in the OASIS specification is not intended to be directly usable for authoring--it is a base or framework for more specialized applications. With DITA you need to define the specializations that are appropriate to your specific business requirements, including tightening the base DITA content models that are, by necessity, very loose. You also need to define both which existing sets of domain elements you need and, if necessary, define any new domains that are specific to your business. While people do sometimes use unspecialized DITA elements for authoring I wouldn't normally recommend it.

This means that while with both DITA and DocBook you have a solid starting point, both still require a significant amount of effort to adapt to your specific applications. Given the 80/20 rule, unless your requirements are very generic, you will likely end up spending about as much to implement a DITA- or DocBook-based solution as you would with a custom one. That is, the DITA and DocBook bases both provide about 80% of what you need for a typical technical documentation authoring and production system (and provide nothing out of the box for content management, of course). Thus you're still faced with implementing the remaining 20% of the functionality, which will typically cost about 80% of the total.

This is because regardless of which approach you take, you still have to:


 * Carefully analyze your requirements to understand what you need in order to do the appropriate customizations or specializations. This analysis is the same regardless of whether you are starting from scratch or starting from a pre-existing base.
 * Design and implement custom editing features that reflect your specific requirements
 * Design and implement output production processes that reflect your specific requirements (page designs, work flows, etc.)
 * Design and implement content management features that reflect your specific requirements.
 * Design and implement any legacy support processes or any interchange processes that might be needed.

The advantage of starting with a DocBook or DITA base instead of from scratch is that the initial cost of entry is much lower--you can start doing *something* even if it's not optimal with very little investment. However, to produce a complete production system that is optimized for your specific requirements and business processes, you will eventually spend about as much as you would have with a from-scratch system (remember the 80/20 rule always holds).

As for whether to start with DocBook or DITA, my standard guidance is:


 * If you need to get started immediately, want to initially invest as close to zero as you can, and your primary task is to produce typical technical manuals, use DocBook.
 * If you have any requirement for re-use, specialization, or modular delivery AND you can afford a little up-front investment in analysis, specialization, and tool development, then use DITA.

The trade-off is that DITA is, as currently defined, less well suited to book production out of the box--you can do everything you need but it may require more effort on your part (for example, DITA 1.0 is underspecified for index markup, something the DITA Technical Committee is discussing right now). Also, the current free DITA tool kits are not as functionally complete as the comparable DocBook tool kits are.

Note too that it's not necessarily an all-or-nothing choice: you can migrate over time from DocBook-based information to DITA-based information. The basic structures are similar enough that transforming from DocBook to DITA is fairly easy and low risk. You could also customize your DocBook variant to be as close to DITA as possible (that is, constrain DocBook to only allow structures that map cleanly to DITA.

Also, the vendor support for DITA is rapidly coming on line so packaged support for it is growing.

Taking full advantage of DITA requires more effort because it requires you to think more carefully and deeply about how you'll do sophisticated things, such as modular writing, re-use, specialization from core DITA types, and so on.

My general feeling is that unless you know you simply can't afford the start-up effort of using DITA or you are sure you really just need to do nothing but books, that you should go with DITA as a base--it provides, I think, a much more solid and flexible approach to managing XML content for technical documentation whose value accrues over time, especially through its specialization mechanism.