Repurposing is presenting the same content in different media.
Publish a document on US Letter and A4 paper
Publish a document in print and on the web
Publish a document in print, on the web, and as an EPUB
Publish a document in print and as an “app”
Publish a document as an iPhone app and an Android app
Some reuse involves repurposing, some repurposing involves reuse. These words don't have a strict, technical meaning.
Maximizing reuse requires learning to write differently
Sometimes it requires using new tools
Sometimes it breaks established boundaries of authorship
Writing books becomes writing topics
Sometimes it breaks established boundaries of control
Presentation and formatting are often removed from the author's control
It may be challenging to convince authors that the necessary changes have benefits that justify the costs
Store the components you want to reuse in separate “files”
Write the “main” document so that it references those components
Resolve those references and process the resulting document
In the discussion that follows, we'll mostly be talking about a single composite document. If you can build one, you can build more than one with the same techniques.
Graphics, and other non-XML resources, are the easy case:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>...</title>
</head>
<body>
...
<img src="somegraphic.png" alt="Some graphic" />
...
</body>
</html>
Or, in DocBook:
<mediaobject>
<alt>Some graphic</alt>
<imageobject>
<imagedata fileref="somegraphic.png"/>
</imageobject>
</mediaobject>
XML entities are resolved by the parser. They operate at a much lower level than other techniques:
<!DOCTYPE doc [
<!ENTITY chap2 SYSTEM "chap2body.xml">
]>
<doc>
<chapter>First chapter...</chapter>
<chapter>
&chap2;
</chapter>
</doc>
Where chap2body.xml
contains
(an extParsedEnt):
<para>paragraph</para>
<para>paragraph</para>
Requires an XInclude processor (or appropriate configuration option)
Is logically a transformation like any other. There's a pre-XIncluded document and a post-XIncluded document.
Operates on two or more distinct, separate documents (well, usually)
May apply validation to either the individual documents, or the composite document, or both.
N.B. DTD validation cannot practically be applied to the composite document
Can address subsections of a file via XPointer
Is recursive: all or nothing
Standard schemes:
#foo
or id(foo)
, the element with the ID “foo”
element(/1/2)
, the second child of the root element
element(foo/2/3)
, the third child of the second child of the element
with the ID “foo”
xmlns(db=http://docbook.org/ns/docbook)
, defines a namespace for a
subsequent expression
A registry of extension schemes is maintained at http://www.w3.org/2005/04/xpointer-schemes/.
There are a bunch...but support is on a per-implementation basis
Of them, xpath
is probably the most widely supported
<assembly xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink">
<resources>
<resource xml:id="xidi.overview"
fileref="xidi-overview.xml"/>
<resource xml:id="scr.book.build"
fileref="scr-book-build.xml"/>
…
</resources>
<structure xml:id="xidi.help.system"
type="helpsystem"
defaultformat="helpsystem">
<output format="pdf" file="xidi-help-system.pdf"/>
<output format="helpsystem ohj"/>
<filterout condition="manual.only"/>
<title>XIDI Help System</title>
<info>
<abstract>
<para>This is the help system…
</para>
</abstract>
</info>
<revhistory>
<revision>
<revnumber>0.1</revnumber>
<date>1 August 2009</date>
</revision>
</revhistory>
<module>
<output file="sys-toc.html"/>
<toc/>
<toc role="procedures"/>
</module>
<module xml:id="help.xidi.overview" >
<output file="overview.html"/>
<title>XIDI Help System Overview</title>
<module resourceref="help.overview.intro"
contentonly="true" omittitles="true"/>
<module resourceref="xidi.overview">
<output file="ovr-xidi.html"/>
</module>
</module>
</structure>
<structure xml:id="user.guide" type="book">
<output renderas="book"/>
<output format="html"
file="xidi-user-guide.html"/>
<output format="pdf"
file="xidi-user-guide.pdf"/>
<title>XIDI User Guide</title>
<toc/>
<toc role="figures"/>
<toc role="tables"/>
<toc role="procedures"/>
<module resourceref="xidi.overview"
renderas="chapter"/>
<module resourceref="xidi.create.intro"
renderas="chapter"/>
</structure>
<relationships>
<relationship linkend="xidi.help.system"
type="path">
<association>New User Introduction</association>
<instance linkend="help.xidi.overview"/>
<instance linkend="help.svn.overview"/>
<instance linkend="help.ex.new.help.sys"/>
</relationship>
<relationship type="collection">
<association>Advanced User Topics</association>
<instance linkend="xidi.parameters.syntax"/>
<instance linkend="svn.properties"/>
</relationship>
</relationships>
<transforms>
<transform grammar="dita"
fileref="dita2docbook.xsl"/>
<transform name="tutorial"
fileref="docbook2tutorial.xsl"/>
</transforms>
</assembly>
In most cases, in order for partners to exchange documents, both partners must understand all of the markup in the exchanged documents.
In other words, I can't usefully exchange DocBook with someone expecting TEI.
Blind interchange describes the situation where partners exchange documents without knowledge
It requires adhering to a set of constraints that allow one element to be a “subtype” of another with the guarantee that processing the subtype like its “supertype” will do something useful
It is a feature of DITA
Most processes, especially in publishing, are transformative: XML to HTML, XML to PDF, XML to EPUB, etc.
Those transformations are written by people who believe they understand the structure of the documents to be transformed
If the structure differs from expectations, the results will be ugly at best, catastrophically misleading at worst
The more complex the process, the more important it is to understand the incoming markup
Validation is the easiest way to catch markup errors
Widely available (supported by almost all tools)
Normatively part of the XML specification
But validation is optional
Not written in XML-document syntax
Poor support for documentation
Not usable in some environments
Supports entities (a text-based macro language)
Not namespace aware
Very limited data type support
<doc xmlns="http://www.xmlsummerschool.com/example/ns"
status="draft">
<head>
<title>A Sample Document</title>
<date>2011-09-22T09:00:00+01:00</date>
<author>Norman Walsh</author>
</head>
<body>
<p>Paragraph. <em>Important</em> paragraph.</p>
<p>Paragraph.<fn><p>Redundant, ain't he?</p>
</fn></p>
</body>
</doc>
What makes one of our documents one of ours and not something else? When is a purchase order not a cocktail recipe?
A doc
consists of a head
and a body
,
in that order
A head
contains a title
, date
, and
author
, in any order
A body
only contains p
elements
A p
contains text, em
, or fn
elements
mixed together
The “rules” about a document exist in a spectrum from simple, structural rules all the way to business process/workflow rules.
Paragraphs in footnotes can't themselves have footnotes
Dates have to be real (ISO 8601) dates
Dates have to be expressed in UTC
Documents can have at most four footnotes
Documents with the status “final” can only be published on Thursdays
Author names have to be in the master author database
Documents can have at most four footnotes per page
<!ELEMENT doc (head, body)> <!-- Documentation, what documentation? -->
<!ATTLIST doc xmlns CDATA #FIXED "http://www.xmlsummerschool.com/example/ns" status (draft|final) #IMPLIED>
<!ELEMENT head (title, date, author)>
<!ELEMENT head (title, date, author)>
<!ELEMENT head (title & date & author)>
<!ELEMENT head (title & date & author)>
<!ELEMENT head (title | date | author)>
<!ELEMENT head (title | date | author)>
<!ELEMENT head (title | date | author)+>
Allows multiple titles, dates, and authors; doesn't require one of each.
XML Schemas are XML documents, so they have to have a root element.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:d="http://www.xmlsummerschool.com/example/ns"
elementFormDefault="qualified"
targetNamespace="http://www.xmlsummerschool.com/example/ns">
<annotation>
<documentation>
<p xmlns="http://www.w3.org/1999/xhtml">
This is documentation.
</p>
</documentation>
</annotation>
<!-- declarations go here -->
</schema>
<element name="p">
<complexType mixed="true">
<choice minOccurs="0" maxOccurs="unbounded">
<element ref="d:em"/>
<element ref="d:fn"/>
</choice>
</complexType>
</element>
<element name="em">
<complexType mixed="true">
<choice minOccurs="0" maxOccurs="unbounded">
<element ref="d:em"/>
<element ref="d:fn"/>
</choice>
</complexType>
</element>
RELAX NG grammars are XML documents, so they have to have a root element.
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
ns="http://www.xmlsummerschool.com/example/ns"
datatypeLibrary
="http://www.w3.org/2001/XMLSchema-datatypes">
<div>
<p xmlns="http://www.w3.org/1999/xhtml">This
is some documentation.
The div wrapper is just for grouping.
</p>
<start>
<ref name="doc"/>
</start>
</div>
<!-- declarations go here -->
</grammar>
Consider this document:
<doc xmlns="http://www.xmlsummerschool.com/example/ns"
xmlns:xi="http://www.w3.org/2001/XInclude"
status="draft">
<head>
<title>A Sample Document</title>
<date>2011-09-22T09:00:00+01:00</date>
<author>Norman Walsh</author>
</head>
<xi:include href="body.xml"/>
</doc>
Is it valid?
Before XInclude processing?
After XInclude processing?
Both before and after?
Databases provide a whole new range of capabilities: indexing, searching, etc.
Not generally like a filesystem, may require new practices
Traditional relational databases are not a good fit for XML. Just. Don't. Go. There.
XML and (some) NoSQL databases are a better fit.
MarkLogic, ahem, makes an excellent database for XML.
Java and XML based tool
Filesystem based
Allows authors to build flow graphs
Drives Java or command-line tools; extensible in Java
<project name="example" default="pubdoc" basedir=".">
<description>An example ant file</description>
<property name="build.dir" value="output"/>
<target name="init">
<mkdir dir="${build.dir}"/>
</target>
<target name="pubdoc" depends="init,xinclude">
<xslt in="webtech.inc" style="dbstyle.xsl"
out="${build.dir}/webtech.html"/>
</target>
<target name="xinclude">
<xslt in="webtech.xml" style="xinclude.xsl"
out="webtech.inc"/>
</target>
</project>
XML based, designed for XML processing
Allows authors to write simple, mostly declarative pipelines with a rich, and extensible, vocabulary of steps
<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
version="1.0">
<p:xinclude/>
<p:xslt>
<p:input port="stylesheet">
<p:document href="dbstyle.xsl"/>
</p:input>
</p:xslt>
</p:pipeline>
Entities and URIs are accessed via URIs
Proxies and resolvers can intercede
Most resolvers use XML Catalogs
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
prefer="public">
<system systemId
="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
uri="/share/doctypes/xhtml1-strict.dtd"/>
<system systemId
="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
uri="/share/doctypes/xhtml1-transitional.dtd"/>
</catalog>
How do you validate documents that use multiple namespaces?
One approach is to include the mixtures in the schema: the DocBook 5.0 schema knows that MathML can occur in equations, for example
NVDL, Namespace-based Validation Dispatching Language is another approach
An NVDL document describes how to decompose a mixed document into individual documents that can be validated independently
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
startMode="docbook">
<mode name="docbook">
<namespace ns="http://docbook.org/ns/docbook">
<validate schema="rng/docbook.rng"
useMode="attach"/>
<validate schema="sch/docbook.sch"
useMode="attach"/>
</namespace>
</mode>
<mode name="attach">
<anyNamespace>
<attach/>
</anyNamespace>
</mode>
</rules>