Atomik Roundtrip 2.1: Working with Roundtrip > Chapter 20 A brief guide to XML << >>

20.6 Automatically defined XML entities

These entities don’t need to be defined, as they’re recognised by all XML interpreters:
&amp; & Ampersand
&quot; " Double quote mark
&apos; ' Apostrophe
&gt; > ‘Greater than’

One solution to this is to simply not bother to declare your entities in the DTD, and just declare the ones you need in the XML file. You’ll remember that you declare the DTD as part of the doctype declaration : you can also add additional lines of DTD syntax to the doctype declaration, enclosed in square brackets, and these lines will be treated as if they were a part of the DTD:

<!DOCTYPE Magazine SYSTEM "Easy_Magazine.dtd" [ <!ENTITY ldquo “&#x201C;”> <!ENTITY rdquo “&#x201D;”> <!ENTITY apos “&#x0027;”> ]>

If you had a whole bunch of character entities which you use all the time, it would be nice if you didn’t have to enter those into every DTD or every XML file you made, but rather could just refer back to a repository of all your entity declarations. You can do this with an entity file. This is simply a fragment of a DTD which contains the entity declarations.

You can use an entity file in your DTD as follows:

<!ENTITY % EasyEnt SYSTEM "easy_entities.ent">
%EasyEnt;

These lines first define a special kind of entity, a parameter entity. Parameter entites are only valid within DTDs, and in the same way that a standard entity replaces its reference with the XML which it is defined as representing, a parameter entity defines a piece of a DTD declaration. Everytime it appears in a DTD, it is replaced with the full declaration which it represents.

<!ENTITY % Formatting "(#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*">
<!ELEMENT B %Formatting;>
<!ELEMENT I %Formatting;>
<!ELEMENT U %Formatting;>
<!ELEMENT sup %Formatting;>
<!ELEMENT sub %Formatting;>
<!ELEMENT allcaps %Formatting;>
<!ELEMENT smallcaps %Formatting;>
<!ELEMENT superior %Formatting;>
<!ELEMENT shadow %Formatting;>
<!ELEMENT outline %Formatting;>
<!ELEMENT strikethru %Formatting;>
<!ELEMENT WU %Formatting;>
<!ELEMENT B (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT I (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT U (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT sup (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT sub (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT allcaps (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT smallcaps (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT superior (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT shadow (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT outline (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT strikethru (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>
<!ELEMENT WU (#PCDATA | B | I | U | sup | sub | allcaps | smallcaps | superior | shadow | outline | strikethru | WU)*>

You can see from the example above that using a parameter entity makes it much clearer and easier to read the DTD, and also to update it - as you only need to make a change in one place rather than making the same change many times in one place.

Parameter entities are signified by a percent (‘%’) sign, rather than an ampersand when they are referred to (%EasyEnt; for example), and also have a leading percent sign in their <!ENTITY declaration, to identify them as parameter entities rather than character entities.

<!ENTITY % EasyEnt SYSTEM "easy_entities.ent">
%EasyEnt;

In this example, the first line defines the entity, and the second line refers to it. When a parameter entity is defined as a reference to a file (or a URL) the entire contents of that file replaces the parameter entity reference when the DTD is interpreted, so in the example above, the DTD which the XML interpreter would construct would be:

<!ENTITY % EasyEnt SYSTEM "easy_entities.ent">
<!ENTITY ldquo “&#x201C;”> <!ENTITY rdquo “&#x201D;”> <!ENTITY apos “&#x0027;”>

This functionality of DTDs highlights an important method which is often used to construct DTDs, whereby the DTD can be broken down into chunks or modules, and multiple customised DTDs for different purposes built up from the component parts.

This brings up one important issue : if you’re making up one DTD from a bunch of modules, there’s always a risk that you’ll end up with multiple definitions of the same element. One of the beauties of XML is that it doesn’t restrict you to a fixed vocabulary of element names, you can choose your own; but this freedom carries with it the risk that if you try to mix together two different types of XML (which isn’t as unlikely as it might sound - think content management systems), then there’s a strong chance that elements like ‘Title’, ‘Body’, or ‘Paragraph’ could well exist in both of the definitions, even though they are unique within their own environment.

To address this problem, XML has a concept of a namespace : a way of mixing content from multiple DTDs without causing conflict.

Within the XML, references to content which relates to other DTDs (or other DTD modules) is prefixed by a namespace identifier and a colon - which makes it a qualified name.

<EasyMagazine:GameTitle>The Neverland</EasyMagazine:GameTitle>

In order for this to have any meaning, however, you need to declare that namespace within the XML before you use it, pointing out which DTD file it refers to:

<Review>
<GameTitle>The Neverland</GameTitle>
<Standfirst>The excesses of life are sometimes too much to handle for the Princess.</Standfirst>
<EasyTable:Table
>
<EasyTable:TableTitleRow>
<EasyTable:TableData>Top 10 games this month</EasyTable:TableData>
</EasyTable:TableTitleRow>
<EasyTable:TableHeadRow>
<EasyTable:TableData column="title">Title</EasyTable:TableData>
<EasyTable:TableData column="platform">Platform</EasyTable:TableData>
<EasyTable:TableData column="publisher">Publisher</EasyTable:TableData>
<EasyTable:TableData column="price">Price</EasyTable:TableData>
<EasyTable:TableData column="score">Score</EasyTable:TableData>
</EasyTable:TableHeadRow>
...
</EasyTable:Table>
<ReviewText><Paragraph>Two Jabberwockies laughed, and umpteen fountains perused two bourgeois mats, but bureaux cleverly tastes five obese Macintoshes. </Paragraph>
...
</ReviewText>
</Review>

One of the key reasons that you might want to use entities or UTF-8 encodings is to include characters into the text which would, were they not encoded, be interpreted as part of the XML markup, and consequently change the meaning of the XML.

For example, the following XML would be incorrect:

<Paragraph>Take the title, “The Neverland”, for example. It is proceeded in the XML text by an XML tag: <GameTitle>, and is followed by another XML tag </GameTitle>. <Paragraph>

Instead, this should be represented as:

<!ENTITY solidus “&#x002F;”>
<Paragraph>Take the title, &quot;The Neverland&quot;, for example. It is
proceeded in the XML text by an XML tag: &lt;GameTitle&gt;, and is followed by another XML tag &lt;&solidus;GameTitle&gt;. <Paragraph>

Another way to handle special characters within XML is to use CDATA or Character data markup. Content which is specified as CDATA can contain any of the punctuation marks which are used in XML markup. It allows for data to pass through the XML interpreter without it attempting to interpret the data, and so is impervious to any misinterpretation by the parser.

<Paragraph>
<![CDATA[
Take the title, “The Neverland”, for example. It is proceeded in the XML text by an XML tag: <GameTitle>, and is followed by another XML tag </GameTitle>.
]]>
<Paragraph>

This is particularly useful if you wish to create an XML representation of a book about XML!

CDATA is referred to as unparsed data - information which is included in the XML but which is not interpreted. Another, usage of unparsed data is the unparsed entity. This is a method whereby the contents of a binary file can be included within the XML:

<!ELEMENT image EMPTY>\
<!ATTLIST image imgData ENTITY #REQUIRED>
<!ENTITY companyLogo SYSTEM “LogoGraphic.eps” NDATA “eps”>
<!NOTATION eps SYSTEM “Illustrator.exe”>
<image imgData=”companyLogo”/>

In this example, the entity relates to a file containing an image, and when the XML is interpreted, the contents of that image file will replace the entity in the XML. Of course, it’s up to the device or software reading the XML file to do something useful with this data.

In order for data to be included within the XML file, its type should be declared in the DTD as a notation.

<!NOTATION eps SYSTEM “”>

It is possible to also specify an application for opening these types of files between the quote marks at the end of this declaration:

<!NOTATION eps SYSTEM “illustrator.exe”>

However, this is of no bearing to Atomik Roundtrip, as it is effectively part of QuarkXPress, and can therefore only use data types which are supported directly by QuarkXPress (or by other QuarkXPress XTensions currently installed).