FOSI vs. XSL-FO, part I

Eliot Kimber suggests solutions to alleged shortcomings of XSL-FO Last Updated: 2006-08-23

=Question by N.N=

I've been unable to find convenient ways to do the following in FO:

1. Determine page numbers for links in multi-volume books - short of putting all the other volumes at the end of the postscript file and then using a perl script to eliminate them before printing.

2. Smartly insert line breaks in table cells when a cell boundary overflow is about to occur. (Would also like to be able to provide a QA script to alert writers to this issue before they print).

=Elliot Kimber answers=

Re: Point #1
There's a number of ways this could be done, depending on the FO engine you're using or the details of your workflow, but they all require doing some sort of two-pass process and in no case would the FO standard itself provide what you need, for simple reason that it's too implementation specific.

In essence, what you need is a way to save off information about the pagination of a given document or document part so that you can then read that information on a second pass and use in generating your final output.

I've suggested to all the FO implementation vendors that I work with that they provide a way to emit this information into "side files". So far no dice, although both XSL Formatter and XEP let you save off an intermediate file that reflects the pagination. This could be used but these files tend to be huge (which is why I want side files).

Another approach is to examine the PDFs you've generated and get pagination information from there. Not too hard but not always the best thing. Ken Holman uses a trick where he generates pagination information into the first page of his PDF in a form that is then easy to extract using a simple Java PDF manipulation library. He runs two passes--the first creates the PDF with this page, the second extracts the information and generates the final PDF. He uses this for back-of-the-book index generation but the technique could be used for anything.

For one client we implemented a two-pass mechanism for generating lists of effective pages where, as part of the XSLT to generate the final FO, we used XSL Formatter's Java API to render each page set in order get the number of pages in that set, from which we could then determine the pages to go into the LOEP. This creates what is essentially a two-pass process but it is implemented as a single processing step in the tool chain. Because the intermediate FOs are never written out we minimized processing time by avoiding any file I/O for the first pass.

I don't know if Epic's FO processor provides a similar sort of API.

Re: Point #2
This is a function of the FO implementation's line breaking algorithm. For example, XEP never breaks lines in this case, instead trying to squeeze the text, while XSL Formatter will break lines wherever it hits the cell boundary. You can also do tricks with putting zero-width spaces at places where a break is allowed, which lets the renderer break an otherwise unbroken sequence of characters. This should work with all FO implementations.