biblatex-archaeology in its environment Ⅴ: TeX4ht for HTML or MS Word conversion

There have been a couple of projects that aim to convert LaTeΧ sources to HTML, XML or MS Word. Only two have gained a broader interest of the audience: LaTeΧML TeΧ4ht (htlatex). The former does not support BibLaTeΧ. So we are stuck to TeΧ4ht. As the name indicates it was originally a TeX to HTML converter. But today it supports several XML-based formats, too: ODT, TEI, DOCBOOK, EPUB and XHTML. ODT is of special interest since it is supported by OpenOffice.Org and LibreOffice. It can easily get converted into the native MS Word format DOC(X). This allows to write manuscripts for journal items in LaTeΧ even if you are obliged to deliver MS Word DOC. This works with biblatex-archaeology perfectly, and I have done so personally.

There are two things that you should keep in mind when employing TeΧ4ht:

  1. If you plan to convert your source with TeΧ4ht you are strongly advised to use one of the native UTF-8 LaTeΧ engines (luatex or xelatex) from the beginning. Do not use pdflatex because its unicode support is incomplete. I ran into multiple encoding issues when I tried to prepare my biblatex-archaeology sample file, although it could easily be compiled into PDF with pdflatex.
  2. Never invokeTeΧ4ht (htlatex) directly. There are the batch scripts make4ht and tex4ebook. Forget all the outdated solutions like oolatex that Google will deliver. TeΧ4ht has pretty limited capabilities and since its inventor Eitan Gurani died surprisingly in 2009, nobody understands its code fully. E.g., TeΧ4ht is not able to handle unicode characters. Therefore we need the batch scripts to prepare the LaTeΧ sources before feeding them to TeΧ4ht.

make4ht or tex4ebook employ configuration files with the extension mk4. You should save your workflow there. The batch script looks for <jobname>.mk4 or any other file provided via the -e option.

if mode=="draft" then

Since every htlatex run induces three latex runs, it may take some time.

You can do much more with the mk4 file. This is especially important if you want to process images or mathematical formulas. This does not happen in our scenario since images are usually delivered as separate files (but you can employ LaTeΧ’ referencing system for automatic numbering anyway!).

I would recommend to switch off code that cannot be converted. This relates e.g. to the time consuming microtype package, although it technically works.

Now you can convert biblatex-archaeology’s sample file into several formats:

// EPUB (ignore warning of missing tidy):
tex4ebook -e mycfg.mk4 -l <jobname>.tex
// HTML 5:
make4ht -e mycfg.mk4 -l -f html5 <jobname>.tex "fn-in"
make4ht -e mycfg.mk4 -l -f xhtml <jobname>.tex "fn-in"
// ODT:
make4ht -e mycfg.mk4 -l -f odt <jobname>.tex
// TEI (fails!):
make4ht -e mycfg.mk4 -l -f tei <jobname>.tex
// Docbook
make4ht -e mycfg.mk4 -l -f docbook <jobname>.tex

These are the commands for lualatex (the -l parameter). If you want to use xelatex, use -x instead. Or omit it if you want to use pdflatex despite my warning. The fn-in parameter in HTML is necessary because there is no native support of footnotes or endnotes. Without this parameter, all footnotes go to a HTML page of their own. With it, they appear inside the main document (linked as text anchor).

View the results:

biblatex-archaeology in its environment Ⅴ: TeX4ht for HTML or MS Word conversion 1

Ingram Braun

Archaeologist, web developer, proofreader


Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment