The Design of the DocBook XSL Stylesheets

Norman Walsh

XML Standards Engineer
Sun Microsystems, Technology Development Center

8 April 2001

XSLT-UK 01
08 Apr - 09 Apr 2001
Keble College, Oxford, England

Version 1.1


Table of Contents

Introduction
Modularity
Organizational Modularity
Code-reuse Modularity
Internationalization Modularity
Parameterization
Web-Based Parameterization
Self-Customizing Stylesheets
“Literate” Programming
Extensions
Conclusion
Other Resources

Introduction

Building stylesheets for a large, rich XML vocabulary is a challenging exercise. This paper explores some of the design issues confronted by the author in designing XSL stylesheets for DocBook, an XML DTD maintained by the DocBook Technical Committee of OASIS. It is particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications).

DocBook consists of nearly 400 tags. The HTML and Formatting Object stylesheets each consist of roughly 1000 templates spread over about 30 files.

The design for the DocBook XSL Stylesheets attempts to meet the following goals:

  • Full support for all of DocBook.

  • Full support for both HTML and XSL Formatting Object presentations.

  • Utility for a wide range of users, with varying levels of technical skill.

  • Support across diverse hardware and software platforms.

  • Provide a framework on top of which additional stylesheets can be written for schemas derived from DocBook.

  • Support for internationalization.

  • Support for a wide range of projects (books, articles, online- and print-centric presentations, etc.)

Although not all of these goals have been completely achieved, progress has been made on all of them. Five techniques stand out as important factors in achieving these goals: modularity, parameterization, self-customizing stylesheets, “literate” programming, and extensions. The rest of this paper will discuss these techniques in detail.

Modularity

Most stylesheets develop and mature over time. The authors and subsequent maintainers come back to them repeatedly, making small and large changes to add features, fix bugs, and adapt them to changing requirements.

The extent to which it is easy or hard to make these changes depends on many things. One of the key factors is stylesheet modularity: how easy is it to identify and isolate the parts that have to be changed and change them without breaking everything that used to work.

The DocBook stylesheets have been developed with the following general guidelines in mind.

  • Match templates are more flexible than named templates

    Match templates are very data-driven. Supporting new or rearranged content models in a stylesheet that uses match templates often requires little more than adding or changing a few templates. Named templates are much more procedural. Changing a complex chain of called templates often requires new variables, new conditional statements, and even more complexity.

  • Don't select more than necessary

    Like using named templates, using very specific select expressions makes your code more procedural.

    Consider the difference between two ways of processing an element like this:

    <author><firstname>Norman</firstname><surname>Walsh</surname></author>
    
    Choice 1 Choice 2
    <xsl:template match="author">
      <xsl:apply-templates select="firstname"/>
      <xsl:apply-templates select="surname"/>
    </xsl:template>
    
    <xsl:template match="author">
      <xsl:apply-templates/>
    </xsl:template>
    

    In both cases, everything works fine at first. But presented with a different document:

    <author><honorific>Mr.</honorific>
    <firstname>Norman</firstname><surname>Walsh</surname></author>
    

    the first stylesheet silently loses content. In fairness, the second stylesheet has drawbacks as well: it always outputs the names in the order they appear in the document. In practice, something more complex than either of these alternatives may be required, but as a general rule, the second choice is better. (At least when the second choice does something wrong, it'll be visible and not silently lost.)

  • Break templates apart where customization is likely

    It's difficult to write whole stylesheets without using any named templates at all. Bending over backwards to do so might be more complicated (and harder to maintain) than using the named templates.

    When you do use named templates, try to break them apart where customization is likely to be needed. This allows future authors to redefine only the parts that are necessary. Consider, for example, the template that processes articles:

    <xsl:template match="article">
      <xsl:variable name="id"><xsl:call-template name="object.id"/></xsl:variable>
      <div id="{$id}" class="{name(.)}">
        <xsl:call-template name="article.titlepage"/>
        <xsl:if test="$generate.article.toc != '0'">
          <xsl:call-template name="component.toc"/>
        </xsl:if>
        <xsl:apply-templates/>
        <xsl:call-template name="process.footnotes"/>
      </div>
    </xsl:template>
    

    The title page, table of contents, and footnote processing are broken into separate templates. Not only does this allow some of them to be reused in other templates, it simplifies the job of customizing the stylesheets. If I want to change the way tables of contents are produced for articles, for example, I can simply redefine the component.toc template, I don't have to change the article template at all.

  • Most “constants” aren't

    Whenever you write a constant value into your stylesheet (the background color of a table cell, the name of an output file, the enumeration style of a list), consider carefully whether or not it's going to be useful in the future to change that constant. If it is, make it a parameter.

    Similarly, if you have followed the preceding guideline and broken your templates into pieces, consider which pieces are likely to be conditional (for example, whether or not a table of contents is generated at all), and add parameters to control them.

  • Avoid xsl:for-each except in limited circumstances

    For loosely structured data, xsl:for-each often results in very complicated templates. It's also difficult to break the templates apart.

    For data that is very strictly structured, tables pulled from a relational database, for example, xsl:for-each may be a very convenient and appropriate approach.

  • Use attributes sets to carry properties

    XSL Formatting Objects and other presentational vocabularies frequently provide many attributes to control the formatting of the result tree. Consider a typical title from the DocBook stylesheets:

    <xsl:template match="title">
      <fo:block font-size="24pt"
                space-before="18pt"
                keep-with-next="always"
                hyphenate="false"
                font-weight="bold"
                font-family="{$title.font.family}">
                text-align="center">
        <xsl:apply-templates/>
      </fo:block>
    </xsl:template>
    

    Repeating this collection of attributes on every title increases the possibility of error and makes changes tedious to manage. Using an attribute set solves these problems:

    <xsl:attribute-set name="title.properties">
      <xsl:attribute name="keep-with-next">always</xsl:attribute>
      <xsl:attribute name="hyphenate">false</xsl:attribute>
      <xsl:attribute name="font-weight">bold</xsl:attribute>
      <xsl:attribute name="font-family">
        <xsl:value-of select="$title.font.family"/>
      </xsl:attribute>
      <xsl:attribute name="text-align">center</xsl:attribute>
    </xsl:attribute-set>
    
    <!-- ... -->
    
    <xsl:template match="title">
      <fo:block font-size="24pt"
                space-before="18pt"
                xsl:use-attribute-sets="title.properties">
        <xsl:apply-templates/>
      </fo:block>
    </xsl:template>
    

    The attribute set makes it easy to share these properties across multiple titles and makes it easy for a customizer to change them. The font size and spacing attributes vary on each title, so placing them in the attribute set would probably be misleading.

In addition to these general guidelines, the DocBook XSL Stylesheets exhibit three specific flavors of modularity: simple organizational modularity, code-reuse modularity, and internationalization modularity.

Organizational Modularity

This form of modularity is as simple as it sounds. The templates that comprise each stylesheet are broken into functional units: templates for inlines, templates for blocks, templates for bibliographies, etc., are each stored in a separate file.

Storing templates together in functional units makes them easy to find and helps keep related templates close together.

Code-reuse Modularity

Modularity for code-reuse is standard software engineering practice. Code that can be shared between several stylesheets can be stored in a single module. Naturally, the extent to which this is applicable to a given project is going to vary.

The DocBook XSL Stylesheets were designed from the beginning with the goal of writing both HTML and Formatting Object Stylesheets from the beginning. It was clear to me that some of the functionality that was necessary to format DocBook documents would not depend on the result tree type.

Some likely candidates for this sort of modularity are:

  • The XSL strip-space and preserve-space elements.

  • Named templates that calculate a position in the source tree hierarchy (section level, for example).

  • Named templates that calculate simple text values (figure numbers, generated cross-reference text, etc.).

Consider the formatting of divisions in a QandASet:

<qandaset>
<qandadiv><title>Some Technical FAQs...</title>
<qandaentry>...</qandaentry>
</qandadiv>
<qandadiv><title>Some Procedural FAQs...</title>
<qandaentry>...</qandaentry>
</qandadiv>
...
</qandaset>

Most authors expect the titles of these divisions to appear in a font size that's relative to the section that contains them. That is, they expect the title to be a size smaller than the nearest surrounding section title.

In order to achieve this, the stylesheet must calculate the relative position of this QandADiv within the nested hierarchy of sections and divisions:

<xsl:template name="qandadiv.section.level">
  <xsl:variable name="section.level">
    <xsl:call-template name="qanda.section.level"/>
  </xsl:variable>
  <xsl:variable name="anc.divs" select="ancestor::qandadiv"/>
  <xsl:value-of select="count($anc.divs) + number($section.level)"/>
</xsl:template>

This section-level information applies equally well to HTML or Formatting Object stylesheets (though it would naturally be used in different ways by the two stylesheets) so it is a good candidate for a “common” module.

Like all software engineering practices, code-reuse often involves some tradeoffs. One of the tradeoffs here is between maximum code-reuse and stylesheet readability.

There are some of the questions to consider when deciding whether or not to reuse a template. The DocBook XSL Stylesheets contain examples of each of these tradeoffs.

Vocabulary-specific Markup

Is the template specific to a particular result tree vocabulary (is it primarily HTML, or Formatting Objects, or WAP, or something else)? If so, it's probably not a candidate.

The formatting of DocBook VariableLists is very vocabulary dependent: it's a DL in HTML and a fo:list with appropriate labels and blocks in Formatting Objects. It makes little sense to try to make the formatting of variable lists common.

Structure-specific Markup

Is the template closely associated with the transformation of a particular structure? If so, then it's probably best to keep it near the other templates that process that structure.

In the formatting for inline SimpleLists, there are two templates that could be reused:

<xsl:template match="simplelist[@type='inline']/member">
  <xsl:apply-templates/>
  <xsl:text>, </xsl:text>
</xsl:template>

<xsl:template match="simplelist[@type='inline']/member[position()=last()]"
              priority="2">
  <xsl:apply-templates/>
</xsl:template>

There is nothing HTML- or Formatting Object-specific about the result tree fragments that these templates produce.

Right now, these elements are in the lists.xsl module for each stylesheet. Moving them into the common module would make them harder to find for a relatively small savings. I'm also aware that it might make sense to put result-specific markup in these templates someday, wrapping each list member in an HTML span, for example, and then they couldn't be common.

Large or Complex Templates

Does the template perform a fairly large or complex calculation? Moving code of this sort into a common module means you only have to debug it once. And you won't forget to update “the other stylesheets” when bugs are fixed.

The templates that select an appropriate media object are large, complex and format-independent:

<xsl:template name="select.mediaobject">
  <xsl:param name="olist"
             select="imageobject|videoobject|audioobject|textobject"/>
  <xsl:param name="count">1</xsl:param>

  <xsl:if test="$count &lt;= count($olist)">
    <xsl:variable name="object" select="$olist[position()=$count]"/>

    <xsl:variable name="useobject">
      <xsl:choose>
        <!-- The phrase is never used -->
        <xsl:when test="name($object)='textobject' and $object/phrase">
          <xsl:text>0</xsl:text>
        </xsl:when>
        <!-- The first textobject is a reasonable fallback -->
        <xsl:when test="name($object)='textobject'">
          <xsl:text>1</xsl:text>
        </xsl:when>
        <!-- If there's only one object, use it -->
        <xsl:when test="$count = 1 and count($olist) = 1">
          <xsl:text>1</xsl:text>
        </xsl:when>
        <!-- Otherwise, see if this one is a useable graphic -->
        <xsl:otherwise>
          <xsl:call-template name="is.acceptable.mediaobject">
            <xsl:with-param name="object" select="$object"/>
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:choose>
      <xsl:when test="$useobject='1'">
        <xsl:apply-templates select="$object"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="select.mediaobject">
          <xsl:with-param name="olist" select="$olist"/>
          <xsl:with-param name="count" select="$count + 1"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:if>
</xsl:template>

You might ask how media object selection can be format independent. Isn't that the whole point of mediaobject, that you can select different objects depending on the output format?

Well, yes; but this code works because the stylesheet-dependent code has been moved into a separate template (“break templates apart where customization is likely”). Each possible media object is evaluated by the is.acceptable.mediaobject template. That template is written specifically for each stylesheet.

The code was factored in this way precisely so the common part could be shared across stylesheets.

Internationalization Modularity

When a DocBook document is formatted for presentation, a certain amount of text displayed to a reader is not actually present in the DocBook document. Instead, it is inferred from the markup. Consider a DocBook chapter:

<chapter><title>Customizing DocBook</title>
<para>For the applications you have in mind, ...

A common presentation of this document would be:

Figure 1. Generated Text on a Chapter Page

Generated text

The words “Chapter” and “Table of Contents” (and even the digit “5” along with the punctuation) are examples of generated text.

Localizing Generated Text

It's easy to imagine how this text could be generated:

<xsl:template match="chapter">
  <h1>
    <xsl:text>Chapter </xsl:text>
    <xsl:number from="book" format="1."/>
    <xsl:apply-templates select="title"/>
  </h1>
</xsl:template>

The problem with this approach is that the generated text needs to be internationalized, that is, it should be in German for a German book, Japanese for a Japanese book, etc. DocBook is used around the world. In order to support user communities in a wide range of languages, some mechanism is needed to make generated text locale-sensitive.

DocBook provides a common lang attribute to identify the desired language. If the localization problem were confined to a few places, we could write an enormous[1] xsl:choose statement to deal with it. But the realities are:

  1. The problem is pervasive; there are many elements that introduce generated text. And generated text is not limited to words, there are locale-specific punctuation conventions as well.

  2. Localization not only at the language level but also at the country level (for example, the Portuguese in Portugal is not the same as the Portuguese in Brazil). User communities may wish to provide different translations for different end-users.

  3. The language experts capable of providing translations don't necessarily have a lot of XSLT expertise.

The DocBook XSL Stylesheets address this problem by “factoring out” the generated text. Wherever a bit of generated text is needed, the stylesheets rely on a named template to provide the text. To insert the Brazilian Portuguese name for an Appendix, call:

<xsl:call-template name="gentext">
  <xsl:with-param name="key">appendix</xsl:with-param>
  <xsl:with-param name="lang">pt_br</xsl:with-param>
</xsl:call-template>

In practice, the keys are often generated automatically from the context (for example, gentext.element.name uses the name of the current context node as the key) and the language is almost always inherited from a lang attribute (or the default language).

The translations themselves are loaded from an external file by way of the document() function. In the common area of the stylesheet distribution, there's a file called l10n.xml that contains all the localization data:

<?xml version='1.0'?>
<internationalization>

<localization language="ca">
<gentext key="abstract"                 text="Resum"/>
<gentext key="appendix"                 text="Ap&#x00E8;ndix"/>
<gentext key="article"                  text="Article"/>
<gentext key="bibliography"             text="Bibliografia"/>
<!--...-->
</localization>

<localization language="da">
<gentext key="abstract"                 text="Abstract"/>
<gentext key="appendix"                 text="Appendiks"/>
<gentext key="article"                  text="Article"/>
<gentext key="bibliography"             text="Bibliografi"/>
<!--...-->
</localization>

<!--...-->
</internationalization>

Each localization contains three sorts of keys:

gentext

Gentext elements map a key (most often the name of an element) to its translation.

dingbat

Dingbat elements map the symbolic name of some character or other symbol to its translation. In this case, the translation is usually just a locale-specific character; the “start quote” symbol, for example, is “ in German, “ in English, and in French.

xref

The xref elements attempt to address the problem of locale-specific forms of cross reference.

Localizing Forms of Cross Reference

Forms of cross reference tend to be locale-specific as well. Where in the United States one might refer to chapter 5 as “Chapter 5, Customizing DocBook”, in other locales it might be simply “Chapter 5” or “5. Customizing DocBook” or even “5 Fejezet”.

As you can see, translating the generated text is necessary but not sufficient. The word order and punctuation also vary.

The DocBook XSL Stylesheets solve this problem with the xref element in the localizations. Each xref associates an element name with a format string:

<xref element="chapter" text="%g %n, %t"/>

Cross-references to elements of that type will use the format string. Within the format string, “%g” is replaced by the element name, “%n” is replaced by its label (usually its number), and “%t” is replaced by its title. All other text and punctuation is passed through unchanged.

Shortcomings

There are (at least) three areas in which the stylesheets do not yet provide sufficient localization hooks:

The format of element titles themselves

Although cross-references are generated with a format string, the element titles themselves are not. For example, on a chapter title page, the format of the title is always “chapter number” followed by “chapter title”, but in some locales these should be reversed.

Translations more complicated than simple 1-for-1 word translation

Some locales, for example, Japanese, require generated text to precede and follow elements of a title. The present system doesn't provide support for this format.

Localization of numbers

Most generated numbers aren't sufficiently localized. Not only should the numbers for chapters and other components be localized, lists and other enumerations should be as well.

These are all issues that will eventually be addressed. If you work with a language that exhibits any of these features, I'd appreciate your feedback on what currently does and does not work adequately.

Parameterization

The vast majority of stylesheet users aren't XSLT programmers. This means that parameters (simple assignment-level statements) need to be provided wherever possible for the customizations that most users need to perform.

For DocBook, this is a veritable laundry list of features: should chapters, sections, figures, tables, etc. be numbered; should admonitions use graphics; should tables of contents be generated for books, chapters, articles, etc.; what spacing should be used around elements; etc. At last count there were more than 70 parameters each in the HTML and Formatting Object Stylesheets.

This requires slightly more effort on the part of the maintainer, but the rewards are obvious. It is vastly easier to explain to someone that all they need to do to get numbered sections is add section.autolabel=1 to the command that runs their XSLT processor than it would be to explain how to modify the template that generates section labels.

Even explaining how to set several parameters by writing a custom stylesheet is fairly easy since it's mostly boiler plate:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

<xsl:import href="http://nwalsh.com/xsl/docbook/html/docbook.xsl"/>

<xsl:param name="section.autolabel" select="1"/>
<xsl:param name="generate.chapter.toc" select="0"/>

</xsl:stylesheet>

But even this becomes cumbersome if the author wishes to change more than a few parameters. The clever solution[2] to this problem is to use a web form (and perhaps one day a simple Java application) to allow authors to choose the parameters that they want to use. Then the customization layer can be built automatically.

Web-Based Parameterization

The form is augmented with documentation, based on a system described later in this paper, and data typing information (parameters that are logically boolean are check boxes, etc.):

Figure 2. Web Form for Customization

Web form for customization

By using elements and attributes from a non-XSL namespace, it's possible to associate the additional information necessary to build the form directly with the parameters:

<xsl:param name="author.othername.in.middle" select="1" doc:type='boolean'/>

<doc:param name="author.othername.in.middle" xmlns="">
<refpurpose>Is <sgmltag>othername</sgmltag> in <sgmltag>author</sgmltag> a
middle name?</refpurpose>
<refdescription>
<para>If true (non-zero), the <sgmltag>othername</sgmltag> of an <sgmltag>author</sgmltag>
appears between the <sgmltag>firstname</sgmltag> and
<sgmltag>surname</sgmltag>.  Otherwise, <sgmltag>othername</sgmltag>
is suppressed.
</para>
</refdescription>
</doc:param>

Self-Customizing Stylesheets

There are some customizations that simply do not lend themselves to simple parameterization. Title pages are a perfect example.

  1. DocBook provides a large array of elements that could be presented on a title page:

    Abbrev Collab Graphic ModeSpec ReleaseInfo
    Abstract ConfGroup Honorific OrgName RevHistory
    Address ContractNum ISBN OtherCredit SeriesVolNums
    Affiliation ContractSponsor ISSN OtherName SubjectSet
    ArtPageNums Contrib ITermSet PageNums Subtitle
    Author Copyright IndexTerm PrintHistory Surname
    AuthorBlurb CorpAuthor InvPartNumber ProductName Title
    AuthorGroup CorpName IssueNum ProductNumber TitleAbbrev
    AuthorInitials Date KeywordSet PubDate VolumeNum
    BiblioMisc Edition LegalNotice Publisher  
    BiblioSet Editor Lineage PublisherName  
    CiteTitle FirstName MediaObject PubsNumber  

    The odds are fairly small that an author wants all of these elements to appear on the page, even if all of them are present in the meta-data for a component.

  2. There's the question of whether items appear on the title page in a fixed order or in the order that the elements appear in the meta-data.

  3. In addition to the data supplied by the author, additional information (graphics, etc.) may be desirable.

  4. Different organizations and individuals have very specific requirements about how these items are presented.

For all of these reasons, the templates that produce title pages tend to be very complex. Modifying these templates to suit the requirements of a particular document style is likely to be beyond the skill of many users. And even those users with the requisite skill are likely to find it tedious and error-prone.

What is needed is a mechanism that allows authors to provide the specification of their needs without actually having to write the templates. Specification in this case includes not only how each individual element should be formatted, but also which elements to include and what order they should appear in. For example, one might want the title, subtitle, and author (but not the copyright statement or the revision history) on the title page of a book with a 24pt bold book title, a 20pt subtitle, etc.

In keeping with our earlier discussion of parameterization, some of these settings can be handled with an appropriate attribute set:

<fo:block xsl:use-attribute-sets="book.titlepage.recto.style">
...

But that doesn't address the problem of identifying the elements to appear on the title page or how different styles can be applied to each. Because XSLT 1.0 does not include any form of general list data type, it's not easy to parameterize these things into simple variable assignments.

The solution employed by the DocBook stylesheets is: XML.

Using a special “template” vocabulary, authors can describe the title page in a declarative way, as shown in this Formatting Object example:

<t:titlepage1 element=2"book" wrapper=3"fo:block">
    <t:titlepage-content side="recto">4
      <title predicate="[1]"5
             fo:font-size="24pt"
             fo:space-before="18pt"
             fo:font-weight="bold"
             fo:font-family="{$title.font.family}"/>
      <author fo:font-size="17pt"
              fo:space-before="11pt"
              fo:keep-with-next="always"/>
    </t:titlepage-content>

  <t:titlepage-content side="verso">
      <title predicate="[1]"
             fo:font-size="14pt"
             fo:font-weight="bold"
             fo:font-family="{$title.font.family}"/>
      <corpauthor/>
      <authorgroup/>
      <author/>
      <pubdate fo:space-before="1em"/>
      <copyright/>
      <abstract/>
      <legalnotice fo:font-size="8pt"/>
  </t:titlepage-content>

  <t:titlepage-separator>6
      <fo:block break-after="page"/>
  </t:titlepage-separator>

  <t:titlepage-before side="recto">7
  </t:titlepage-before>

  <t:titlepage-before side="verso">
      <fo:block break-after="page"/>
  </t:titlepage-before>
</t:titlepage>
1

The titlepage element contains all the declarative information about each title page.

2

The element attribute identifies the element for which this is the title page. Most elements that have “info” containers use this mechanism to define their title pages.

3

The wrapper attribute identifies the element that will be used to wrap the title page in the result tree. For XSL Formatting Object stylesheets, this is almost always fo:block, for HTML, it's div.

4

The elements contained in the titlepage-content wrapper will appear on the title page.

5

The elements inside titlepage-content that are not namespace qualified are assumed to be DocBook elements. The mixture of attributes on each of these elements determines how they will be presented. In general, namespace qualified attributes are passed through to the result, other attributes control aspects of the declarative process. (For example, predicate is used to select a specific title element, in this case, the first.)

6

The separator appears between the title page and the rest of the content.

7

The before elements are inserted into the result tree before the title page.

The next step is to turn this declarative description into something that XSLT can process:

  1. Process the template with XSLT using a special stylesheet. This produces an XSLT stylesheet:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    version="1.0">
    
    <!-- This stylesheet was created by template/titlepage.xsl; do not edit it by hand. -->
    
    <xsl:template name="book.titlepage.recto">
      <xsl:apply-templates mode="book.titlepage.recto.auto.mode"
                           select="(bookinfo/title|title)[1]"/>
      <xsl:apply-templates mode="book.titlepage.recto.auto.mode"
                           select="bookinfo/author"/>
    </xsl:template>
    
    <xsl:template name="book.titlepage.verso">
      <xsl:apply-templates mode="book.titlepage.verso.auto.mode"
                           select="(bookinfo/title|title)[1]"/>
      <xsl:apply-templates mode="book.titlepage.verso.auto.mode" 
                           select="bookinfo/corpauthor"/>
      <xsl:apply-templates mode="book.titlepage.verso.auto.mode"
                           select="bookinfo/authorgroup"/>
    
    ...
    
    <xsl:template match="title" mode="book.titlepage.recto.auto.mode">
    <fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format"
              xsl:use-attribute-sets="book.titlepage.recto.style"
              font-size="24pt"
              space-before="18pt"
              font-weight="bold"
              font-family="{$title.font.family}">
    <xsl:apply-templates select="." mode="book.titlepage.recto.mode"/>
    </fo:block>
    </xsl:template>
    
    ...
    
  2. This stylesheet isn't complete, it doesn't reference all of the templates needed in DocBook. You have to create a custom stylesheet as described earlier.

  3. In addition to importing the base DocBook stylesheet into your custom stylesheet, include the automatically generated title page templates:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    version="1.0">
    
    <xsl:import href="http://nwalsh.com/xsl/docbook/html/docbook.xsl"/>
    <xsl:include href="/path/to/your/generated/titlepage.xsl"/>
    
    </xsl:stylesheet>
    
  4. Process your source document with this special custom stylesheet to get the new title page formatting.

“Literate” Programming

Documentation is an important part of any software development project. XSL Stylesheets are no exception. The DocBook XSL Stylesheets employ a style of documentation that is inspired by literate programming[3].

The stylesheets take advantage of the fact that a non-XSL namespace is allowed at the top-level. Using this technique, each template, parameter, etc. can be documented right next to its definition:

<doc:template name="is.acceptable.mediaobject" xmlns="">
<refpurpose>Returns '1' if the specified media object is recognized.</refpurpose>

<refdescription>
<para>This template examines a media object and returns '1' if the
object is recognized as a graphic.</para>
</refdescription>

<refparameter>
<variablelist>
<varlistentry><term>object</term>
<listitem>
<para>The media object to consider.</para>
</listitem>
</varlistentry>
</variablelist>
</refparameter>

<refreturn>
<para>0 or 1</para>
</refreturn>
</doc:template>

<xsl:template name="is.acceptable.mediaobject">
  <xsl:param name="object"></xsl:param>

  <xsl:variable name="filename">
    <xsl:call-template name="mediaobject.filename">
      <xsl:with-param name="object" select="$object"/>
    </xsl:call-template>
  </xsl:variable>

  <!-- abridged... -->
</xsl:template>

Generally the attributes on the doc: elements identify the relevant XSL construct. Setting the default namespace to null on the documentation allows me to use a DocBook customization inside the wrapper. A separate XSLT stylesheet transforms the source plus documentation into documentation:

Figure 3. Generated Documentation

Documentation screen shot

Notice that the documentation stylesheet is able to automatically abridge the template, effectively producing useful API documentation.

Extensions

XSLT includes a provision for extension functions and elements. These allow an implementor to “step outside” XSLT and processing documents with a more conventional language. Most XSLT processors today provide support for extension functions written in Java.

There are a few elements in DocBook that seem impractical to transform purely in XSLT:

Text insert

Many authors want to be able to insert the contents of other text files (often program listings) directly into their documentation. DocBook provides a semantic way to indicate this, although it's a bit of a hack: specifying a “linespecific” notation for a graphic.

XSLT 1.0 doesn't have any mechanism for loading a non-XML document, so an extension was developed for this purpose.

Line numbering

Another common style request is numbered program listings. In the absence of other markup, this could be done in XSLT 1.0 with a little recursive template trickery, but additional markup (such as emphasis or line annotation elements) inside a program listing greatly complicates things. An extension function that can adjust the result tree fragment that results from formatting the listing is a relatively easy solution.

Callouts

One of the mechanisms for providing callouts (those little black reverse-video numbers you saw before) is essentially asynchronous. The position of the callouts is described outside the text of the listing. Source markup like this:

<programlistingco>
<areaspec>
<areaset id="ex.plco.const">
  <area id="ex.plco.c1" coords="4"/>
  <area id="ex.plco.c2" coords="8"/>
</areaset>
<area id="ex.plco.ret" coords="12"/>
<area id="ex.plco.dest" coords="12"/>
</areaspec>
<programlisting>sub do_nothing_useful {
    my($a, $b, $c);

    $a = new A;

    $a->does_nothing_either();

    $b = new B;

    $c = "frog";

    return ($a, $c);
}</programlisting>
</programlistingco>

Is rendered with callout bullets inserted automatically into the flow of the listing:

sub do_nothing_useful {
    my($a, $b, $c);

    $a = new A;                       1

    $a->does_nothing_either();

    $b = new B;                       1

    $c = "frog";

    return ($a, $c);                  23
}

Properly formatting this result requires the ability to count both lines and columns and to insert additional spaces and markup to add the callout marks.

Table column widths

The specification of CALS table column widths supports both both absolute and relative widths. For example, a width of “3*+1in” means three times the nominal width of a column (a “1*” column) plus an addition one inch.

This is supported in XSL Formatting Objects with the proportional-column-width function, but has no equivalent in HTML. An extension function examines all of the column widths and makes appropriate calculations.

Each of these elements can be processed by an XSLT extension in the DocBook XSL Stylesheets. At the time of this writing, only the Saxon processor is supported by the extensions that are shipped with the DocBook XSL Stylesheet distribution, but support for Xalan is planned.

In the absence of the extension, the stylesheets attempt to do something reasonable.

Conclusion

XSL Transformations and Formatting Objects are a rich platform on which to build stylesheets for large, sophisticated XML vocabularies. Designing stylesheets that will be adaptable and maintainable is an interesting software engineering challenge.

In this paper we've examined five factors that contribute to the successful design of XSL stylesheets: modularity, parameterization, stylesheet generation, documentation, and XSLT extensions.

Other Resources

The following resources provide additional information about DocBook and the DocBook Stylesheets:



[1] The DocBook XSL Stylesheets support 24 languages at the time of this writing.

[2] Sebastian Rahtz's, not mine.

[3] In “real” literate programming, documentation is the primary focus and source code is derived from it.