Atomik Roundtrip 2.1: Working with Roundtrip > Chapter 20 A brief guide to XML << >>

20.4 Attributes & Metadata

The XML we’ve looked at so far contains only content within element tags. However, it is common for XML to contain another type of data - metadata. This is information stored in the XML file which isn’t actually content, but rather information about that content.

A very common use for attributes is to specify between differences between different types of element with the same name:

<Article section=”News”>

In the XML, attribute values are written within the opening tag. You can specify multiple different attributes for an element, each one separated by a single space character, followed by ‘=’ and the value contained within quote marks:

<Image imgType="JPEG" imgPath="Macintosh HD:Documents:Images:image_11.jpg" imgName="image_11.jpg" colourmodel="RGB"/>

This shows the other use to which attributes can be put within XML, to store information which isn’t actually content in itself, but describes content (in this case, information about a graphic referred to by the XML):

<TableDataRow position="odd">
<TableData column="title">The Neverland</TableData>
<TableData column="platform">EasyStation2</TableData>
<TableData column="publisher">Tonki Leisure</TableData>
<TableData column="price">39.99</TableData>
<TableData column="score">45</TableData>

The next example above shows a row from a table, in which all the cells of the table are within ‘TableData’ elements. Here the attributes are being used to identify to which column this data belongs; allowing the data to be more easily understood.

Attributes are declared in the DTD in a very similar manner to elements:

<!ELEMENT TableData (#PCDATA)>
<!ATTLIST TableData column CDATA #IMPLIED>

As an attribute is always attached to an element, you must always have an element defined in order to link attributes with it. The attribute declaration starts with <!ATTLIST and is followed by the element name to which this attribute will apply. It’s possible to have the same attribute declared for multiple elements, but to do this you need to have an <!ATTLIST declaration for each one of those elements.

Next comes the attribute name, which, just like element names, must not contain any spaces and is case sensitive.

There are several different types of attribute. The simplest of these simply contain any text. In these instances, the text CDATA will appear after the element name (short for Character Data). This means that the attribute can contain any character data.

There may well be circumstances where the metadata associated with an element is vital to the meaning of the XML, and others where it’s less important (or even unknown). If the attribute value is vital to the meaning of the XML, it can be declared as a required value, and will have the text #REQUIRED after it. This means that the XML won’t be valid if this value is not present. If the value is constant, and can’t change, it can be declared as a fixed value:

<!ATTLIST Review FromPublication CDATA #FIXED “Games Review Monthly”>

Making a fixed definition will mean that any alternative values for this attribute would be invalid. If the type of data is likely to change, and the possible values which could be assigned to it cannot be predetermined, then the value will be defined as an implied value, and will be followed by the text #IMPLIED.If the attribute is defined as an implied value, then it can be omitted from the XML without causing the XML to be invalid.

When constructing the DTD, however, there may be circumstances when you know all the possible values for this attribute value, and you can specify that the attribute must match one of these possible values. This is known as an enumerated value, or alternatively a name token group.

<!ATTLIST Image imgType (JPEG | TIFF | EPS) “JPEG”>

You’ll notice that after the list of choices (which is presented like an element choice list, separated by vertical bar characters), a value appears in quotation marks : this is the default value. If no value is specified in the XML, then this value is assumed.

As with element declarations, the attribute list declaration must be terminated with a closing angle bracket. We have already mentioned that an element can have multiple attributes defined, and in order to do this, you need simply to add multiple definitions to the <!ATTLIST declaration.

imgType (JPEG | TIFF | EPS) “JPEG”
colourmodel (RGB | CMYK | unknown) >

Note that these declarations do not need to be separated by a carriage return, although it is common practice to do so.

In addition to attributes which can contain text (be that free text, or an enumerated value) there are several other valid data types for attributes:

ENTITY / ENTITIES An entity declared in the DTD. You’ll get an introduction to entities in the next section of this chapter..

ID A unique identifier. This is similar to a row reference in a database - as it’s a value which is unique to a particular instance of an element within an XML file.

IDREF / IDREFS A reference to a unique identifier. Being able to specify a reference to a unique identifier within an attribute allows one element within the XML to refer to another element within that XML. It allows for a very limited relational database style functionality within an XML file.

NMTOKEN / NMTOKENS Value which is restricted to a single word.

NOTATION A previously declared notation value. Notations will be explained in a following section of this chapter.