Why does Arbortext Editor introduce extra line breaks when formating XML documents, compared to SGML?

Answered by: Paul Grosso (PTC) Last updated: 2005-05-12

Paul Grosso himself explains why Arbortext does the Right Thing

The question from Peter Karman
I'm seeing some strange behavior (well, strange to me) when formatting SGML vs XML docs.

The issue seems to be that XML is more respectful of line breaks inside as-is tagsets than SGML is. This has probably worked to our advantage in the past, when we were doing all SGML, since the FOSI did what we wanted it to, regardless of what we told it to. :)

Example:

foo

there's a line break in there (EOL) right after the open bar tag. In SGML, that would not appear in the postscript. In XML, we get that "extra" line as prespace.

As opposed to:

foo

where there is no line break.

This happens under both Epic 4.2 and 5.1 and using the same FOSI for both SGML and XML under both versions.

Anyone seen anything like this before? Am I missing a setting somewhere that controls this?

=The authoritative answer from Paul Grosso=

Unfortunately, rules for whitespace are one of the more subtle differences between SGML and XML.

SGML had a complex algorithm for deciding what white space (especially line ends) was "insignificant" and that therefore must be ignored by the parser.

XML decided instead to say that all white space must be passed through by the parser.

In both cases, the standards left no room for parsers to do something different, even by user request. In SGML, the record end after your tag must be ignored by the parser. In XML, that record end must be passed through by the parser.

Since Epic Editor wants to maintain the integrity of the SGML/XML data, it needs to obey these rules.

I'm assuming that your bar element has mixed content (it doesn't make sense for an element without mixed content to be "asis") which means that that record end following the tag is significant data in XML (though insignificant in SGML) which is why you're getting the extra line in your output.

When working in XML, Epic will not introduce line breaks in asis mode where they don't already exist. So your problem must be due to a conversion from SGML to XML (because in SGML, those line breaks were insignificant).

Unfortunately, this is just one of those conversion issues due to the differences between SGML and XML.