Atomik Roundtrip 2.1: Working with Roundtrip << >>

Chapter 14 Working with Entities

An entity is an XML representation of a character or a string of text, which could not normally be represented in a basic character set (the first 127 characters of the ANSI/ASCII set). The most common usage for this would be for representing accent characters. This ensures that your XML files will retain the same contents when transferred between different operating systems which use different character sets.

The entity entry in text is preceded by an ampersand (&) and terminated with a semi-colon (;). For example:

<text>I had a strange feeling of d&eacute;j&agrave; vu</text>

would actually be displayed as “I got a strange feeling of déjà vu”.

However, as with most things in the XML world, these entities aren’t necessarily fixed, although there are some standard lists, which are commonly used.

In order for the XML to be able to be valid, any entities used within the XML must be declared in the DTD.

This can either be done by including the entity definitions into the DTD, or by including them in the DOCTYPE definition of the XML file: for example:

<!DOCTYPE Magazine SYSTEM “Easy_Magazine.dtd” [ <!ENTITY eacute "&#x00E9;"> <!ENTITY agrave "&#x00E0;"> ]>

However, it’s a bit of an inconvenience to insert all of the entities which you’ve used at the top of the XML or in the DTD : a far more common way to achieve this is to use a standard entity file, and simply reference this from your DTD - for example;

<!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" "iso-lat1.ent">

This declaration specifies that the definition of this set of entities (ISOlat1) is declared in the XML catalog file “ISO-8870:1986//ENTITIES Added Latin 1//EN//XML”, and can be found in the local XML file called “iso-lat1.ent”.

The first part of this declaration is a format known as a Public identifier : which is actually a reference to an XML catalogue file. This part of the declaration is irrelevant to Roundtrip - its only interested in the file name (‘iso-lat1.ent’).

Note that if your XML, or its DTD, refers to an external entity file, you must ensure that the entity file referred to (“iso-lat1.ent” in this case) is in the ‘Default Entity Location’ specified in your preferences.

You’ll notice that the definitions of these entities appear to be entities themselves. This notation allows you to specify a character by its numeric value in the character set (whatever the character set of the XML file is declared as). It would be possible to include these numeric values directly into the XML text, but that would make the XML very difficult to read:

<text>I had a strange feeling of d&#x00E9;j&#x00E0; vu</text>

If you wish to define your own character entities, you should consult a table of character codes to identify the numeric value for the character. This may differ depending on the typeface which you are using. Such tables will have been supplied with your fonts when they were purchased.

Almost all of the characters you’re ever likely to need will be included in widely available public entity definitions. There are a few additional characters which are specific to QuarkXPress, which Roundtrip allows you to map as XML entities. In order to do this, you should either refer to the file ‘QXPSpecial.ent’ (which is in the sample entities folder in the Atomik Roundtrip folder), or by placing whichever lines are appropriate from the following into your DTD:

<!ENTITY qxpPPN "&#xF126;" ><!-- Previous Page Number --> <!ENTITY qxpCPN "&#xF127;" ><!-- Current Page Number --> <!ENTITY qxpNPN "&#xF128;" ><!-- Next Page Number --> <!ENTITY qxpNL "&#x000A;" ><!-- Soft return --> <!ENTITY qxpT "&#x0009;" ><!-- Tab --> <!ENTITY qxpNC "&#xF131;" ><!-- New Column --> <!ENTITY qxpNB "&#xF132;" ><!-- New Box --> <!ENTITY qxpP "&#x000D;" ><!-- New Paragraph --> <!ENTITY qxpFS "&#xF134;" ><!-- Flexible Space --> <!ENTITY qxpPS "&#xF135;" ><!-- Punctuation Space --> <!ENTITY qxpDNL "&#xF136;" ><!-- Discretionary New Line --> <!ENTITY qxpIH "&#xF137;" ><!-- Indent Here Marker --> <!ENTITY qxpDH "&#xF138;" ><!-- Discretionary Hyphen -->

Another potential use for entities is to define your own custom-changing text.

For example, if you produce multiple publications, and the same XML will be used in more than one of these, you may wish to insert different text according to which publication the XML will be used in. Whilst this could be done by maintaining multiple XML files, this could be very inconvenient. However, placing your own conditional entity declarations in the DTD allows appropriate text to be inserted into your XML on import, simply by making a slight change to the XML.

For example, if your general DTD contained the text: <![%easystation_world;[ <!ENTITY copyright "(C) 2005 EasyStation World Magazine. All rights reserved."> ]]> <![%games_review_monthly;[ <!ENTITY copyright "(C) 2005 Games Review Monthly magazine. All rights reserved"> ]]>

to represent two magazine copyright statements, then you should also include the lines: <!ENTITY % easystation_world "INCLUDE"> <!ENTITY % games_review_monthly "IGNORE">

where the line marked with an “INCLUDE” refers to the appropriate magazine to be used. These lines can either appear in the individual user’s copies of the DTD, or in the DOCTYPE declaration of the XML file (as previously discussed). When the XML is imported into Atomik Roundtrip (or, indeed, any other XML application), the appropriate text will be inserted every time the &copyright; entity is used in the XML.