6.8.2 Whitespace handling

A common problem when processing XML is that the whitespace characters represented on the QuarkXPress page are not translated. One solution to this is to ensure that the meaning of these whitespace characters (carriage returns to separate paragraphs, tabs to separate table cells, etc) are translated into XML markup (i.e. <Paragraph> and <Table> elements), and this is the approach that Easypress have always recommended as being the most practical and portable between different systems running different operating systems.

The ‘Whitespace’ control to the ‘Output’ tab of the Ruleset Editor allows you to choose which whitespace characters will be included in the XML.

Ignore all whitespace: converts any whitespace characters to a single space character (default behaviour for most XML parsers).

Include all whitespace: includes all whitespace characters permitted in the selected encoding. Line endings and tabs will be preserved.

Include tab only: includes only tab characters, not double spaces and line endings.

Convert whitespace to entity: Converts all whitespace characters to the equivalent Unicode numeric entity. This option ensures that the characters are correctly transferred when moving XML content between Macintosh, Windows and UNIX systems, all of which use different line ending characters.