The DocBook Deployment, day one

Hello! I notice you're using Netscape (or other CSS-noncompliant user agent—in which case, consider this an easter egg) to view this journal. Because Netscape is so titanically shit, I have disabled image viewing on Netscape specifically. If I didn't, you would notice random images being replaced with each other and similar such strangeness. The posts are still visible, but you'll be missing the images, which are half the context of these posts.

You should use RetroZilla if you can; it runs on Windows 95 and up and gives you a perfect cammy.somnol viewing experience, plus more comfortable Web browsing on retrocomputers in general. Failing that, Internet Explorer 3 (which amusingly also displays this message, since it doesn't support the display CSS property) and up will also work perfectly fine for seeing my journal posts.

May 06, 2025
The DocBook Deployment, day one

I've had the idea kicking around for a few months now, but it's finally come time to put it into action: stories deployed on all three of my websites from a single source file. The issue with nofi/lofi/hifi is that it takes some novel ideas to synchronize content across all three. Even though my stories don't change a whole lot once they're posted, the fact that I need to maintain three different versions of the markup for each (owing to nofi using no CSS and lofi and hifi using different CSS), even with the help of AutoSite, has put me off writing more. That's just a lot of busywork—and for everything that's on these sites, plain text stories should be on all three.

I initially conceived of a system using PHP and databases, like my album reviews, but what makes sense for two paragraphs and a bunch of little data around it doesn't line up for more longform writing where the HTML changes so oten. I started scouring around for typesetting and documentation systems that would translate easily to the Web and I settled on DocBook, which is XML-based. This means that the conversion to HTML is, ideally, 1:1, everything that can be represented in the output HTML document has a place in the original DocBook file. Because it's XML, and all XML-based languages are interoperable assuming your document is well-formed and your namespaces are in order, you can use a standard XML processing language like XSLT to turn DocBook into HTML.

If that's a lot of jargon for you, the plain English version here is that I'm using a format for the book that I can easily turn into other things. That doesn't just include website stuff, there's all sorts of tools and converters to turn a DocBook document into eBooks or PDFs or whatever else. It's a good, long-term format to bet on for storing my writing, but in the short term, I just want stories on my sites again.

How I'm planning the workflow

Ideally, I want the conversion process to be as hassle-free as possible. I'd like to get the story into the right format, run a script to generate the necessary three versions, one for each flavor of mari.somnol, and then upload the files. Actually, it's more like six versions—one of my requirements is that I'd like stories that are segmented (like Kevin and Theo's Multiverse Misadventure, which I've been using as a test case for figuring out the tools) to also be viewable as a single page, if your retro computer or device can handle it.

To make things even simpler, I'd like to convert not to finalized HTML, but to AutoSite input pages that I can easily slipstream into the existing AutoSite projects that nofi, lofi, and hifi are generated from. This means I will, again ideally, never have to toy with the XSLT stylesheets after they've been written, because any changes to the rest of the site layout will be handled in AutoSite, not in XSLT along with AutoSite.

Step zero: generating the DocBook

So, obviously, the story has to be in the right format first. I'm used to copying over stuff from an RTF or a Works document into an HTML document, and XML works exactly the same. Elements, tags, attributes, that stuff is all just as much in XML as it is in HTML, with some minor changes to suit XML's idea of being a good interchange format (any XML tool, even if it can't make much sense of an XML document's meaning, can at least read it successfully so long as it's free of typos). You know the weird trailing slashes people insist on including in void elements in HTML?

<img src="bubsy.jpg" alt="what could pawsibly go wrong?" />

That's not an HTML thing, that's an XML thing. XML requires all elements to be closed, even void elements, so void elements act as both an open and a close tag in the same tag. (If you're wondering where the mixup happened, there was an attempt to promote XHTML, which is an XML-based version of HTML, over HTML4 back in the 2000s. This was so feature phones were able to access converted versions of HTML sites in languages more suited for their lack of processing power, like WML. Then phones got more powerful and could just read normal websites and suddenly there was no need, so XHTML, which wasn't very well implemented in browsers anyway, died on the vine. Still, parts of it stick around due to misconception and also nobody but me caring about the difference.)

Anyway, if you're wondering what a DocBook file looks like, here's an abridged snippet of my layout:

<?xml version="1.0" encoding="utf-8"?>
<book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
    <title>KEVIN AND THEO'S MULTIVERSE MISADVENTURE</title>
    <titleabbrev>Kevin and Theo's Multiverse Misadventure</titleabbrev>
    <info>
        <author><personname>
                <othername>mariteaux</othername>
        </personname></author>
        <bibliomisc role="canonical">http://mariteaux.somnolescent.net/writing/stories/pennyverse/kevin-and-theo/</bibliomisc>
    </info>
    <chapter label="1">
        <info>
            <title>PAGE ONE</title>
            <titleabbrev>Page One</titleabbrev>
            <bibliomisc role="canonical">http://mariteaux.somnolescent.net/writing/stories/pennyverse/kevin-and-theo/01/</bibliomisc>
        </info>
        <para>Musty books. An entire building of them, in fact. An abnormally large aardwolf in a baggy maroon jacket teetered on two legs, carrying a stack of the things from chest to snout. Evening had fallen on the Apricot Bay Public Library, another week had passed, and the void of patrons meant he could round up what books had been discarded around the first floor by impatient children and uncaring teenagers. They called him Kevin.</para>

Not every piece of information will get used on all three versions of the site. I have both an uppercase and a lowercase version of each page name, as nofi uses uppercase on the visible page and lowercase for the page title and lofi and hifi use lowercase everywhere. This will make more sense when you see the XSLT.

DocBook has a lot of elements, a lot of which I don't need and a lot of which people have proposed cutting from the language entirely. I worked through the documentation to convert Kevin and Theo to DocBook throughout the week and got it done this morning. Now for turning it into stuff.

Step one: setting up xsltproc

One method of converting XML into other languages or forms is through a language called XSLT. XSLT is, itself, valid XML, but it also drives XSLT processors that turn XML documents into either HTML or other XML-based languages. XSLT is normally run in a browser on XML documents on a Web server, and this is usually how you'll see it referred to and documented online, but I want to run the conversion step offline for flexibility and also because I'm targeting browsers that have no support for rendering XML themselves, like Netscape.

My XSLT processor of choice is xsltproc, courtesy of the GNOME folks. I run it on Cygwin because it's the most convenient way I have to run Linux terminal stuff with my setup. I made a directory junction so Cygwin stuff could run on files in my sitebuilding folder, and I was ready to roll.

Step two: writing the XSLT stylesheets

This is the part that made me stumble the hardest, partially due to the complete lack of documentation. XSLT is based on templates—through your XSLT stylesheet, you have the processor match chunks of your XML input, and then you tell it what to convert it to. Problem was, I wasn't getting XSLT to match any element. The only thing I could get it to match was /, which meant "the entire DocBook". Funny as it was to see literally everything I wrote get stuffed into the <title> of the page, that's not really what I want.

So here's what wasn't working:

<xsl:template match="chapter/para">
    <p><xsl:apply-templates /></p>
</xsl:template>

Seems innocent enough, right? Match all <para> elements underneath <chapter> elements. Nope! Doesn't work.

After some time bumbling around in the dark, I discovered that XSLT requires you to prefix your namespaces. A namespace is a technical way to tell a program what XML elements are valid in a document. Technically, you can have any element you want in XML—<name>, <victim>, <objectshoveduparse>—but two different XML languages can have the same element name and thus collide with each other, so specifying namespaces and prefixes prevents that collision. I only learned this from StackOverflow, regrettably. (Even more regrettable: this adventure has made me reconsider my rocky long-term history with W3Schools, because it's the only place with decent examples of this stuff in action.)

With prefixes:

<xsl:template match="db:chapter/db:para">
    <p><xsl:apply-templates /></p>
</xsl:template>

That does work.

So began the process of slowly writing templates to convert everything in the DocBook to HTML. Anything that doesn't get used on the page, I have a template that just outputs nothing for that element.

An HTML document generated entirely from DocBook and XSLT

Here's an example of the end result! It might just look like the story on nofi, but it was generated entirely using tools and scripting, not by me playing with the HTML directly.

This is where I left off for the day, though, because I've hit an additional snag: I don't think I can use xsltproc to output multiple separate files. Remember, I don't just want an all-in-one page, I want to have the pages split up and then have an all-in-one option. Seems I'll have to dig into xinclude to set something like this up, which doesn't look too hard, at least, and xsltproc does support it.

Isn't it funny how I do so much technical work to ideally not have to do as much technical work when I want to be artsy instead?

May 06, 2025The DocBook Deployment, day one

How I'm planning the workflow

Step zero: generating the DocBook

Step one: setting up xsltproc

Step two: writing the XSLT stylesheets

May 06, 2025
The DocBook Deployment, day one