Running Linux, 5th Edition - Matthias Kalle Dalheimer [371]
If HTML is your target, you are already done at this point. Because HTML is itself XML-conforming, the stylesheet is able to convert your document into HTML that can be directly displayed in your web browser.
If your target is more remote from the XML input (e.g., PDF) you need to go another step. In this case, you do not generate PDF directly from your document, because PDF is not an XML format but rather a mixed binary/text format. Basic XSLT transformation would be very difficult, if not plain impossible. Instead, you use a stylesheet that generates XSL-FO instead, yet another acronym starting with X (eXtended Stylesheet Language Formatting Objects). The XSL-FO document is another XML document, but one where many of the logical instructions in the original document have been turned into physical instructions.
Next, an FO processor such as Apache FOP is run on the XSL-FO document and generates PDF (or other output, if the FO processor supports that).
Now that you have an idea of the general picture, lets look at what you need to set up. First of all, it may be a good idea to run your XML document through a document validator. This does not do any processing, but just checks whether your document conforms to the specified DTD. The advantage of a document validator is that it does this very fast. Because actual processing can be time-consuming, it is good if you can find out first whether your document is ill-formed and bail out quickly.
One such document validator is xmllint. xmllint is a part of libxml2, a library that was originally developed for the GNOME desktop, but is completely independent of it (and actually also used in the KDE desktop). You can find information about xmllint and download it from http://xmlsoft.org.
xmllint is used as follows:
owl$ xmllint myarticle.xml > myarticle-included.xml
The reason that xmllint is writing the file to standard output is that it can also be used to process X-Includes. These are a technique to modularize XML files in an author-friendly way, and xmllint puts the pieces back together. You can find more information about X-Includes at http://www.w3.org/TR/xinclude.
In the next step, the stylesheet needs to be applied. Saxon is a good tool for this. It comes in a Java version and a C++ version. Often, it does not matter much which you install: the C++ one runs faster, but the Java one has a few additional features, such as automatic escaping of special characters in included program listings. You can find information and downloads for Saxon at http://saxon.sourceforge.net.
Of course, you also need a stylesheet (often, this is a huge set of files, but it is still usually referenced in the singular). For DocBook, nothing exceeds the DocBook-XSL package, which is maintained by luminaries of the DocBook world. You can find it at http://docbook.sourceforge.net/projects/xsl.
Assuming that you are using the Java version of Saxon, you would invoke it more or less as follows for generating XSL-FO:
java com.icl.saxon.StyleSheet myarticle-included.xml docbook-xsl/fo/docbook.xsl > \
myarticle.fo
and as follows for generating HTML:
java com.icl.saxon.StyleSheet myarticle-included.xml docbook-xsl/html/docbook.xsl > \
myarticle.html
Notice how only the choice of stylesheet determines the output format.
As was already described, for HTML you are done at this point. For PDF output, you still need to run a FOP processor such as Apache FOP, which you can get from http://xmlgraphics.apache.org/fop. FOP requires a configuration file; see the documentation for how to create one. Often you can just use the userconfig.xml file that ships with FOP. A canonical invocation would look like this, PDF being the standard