XML and XSD (Schema) Primer

February 4, 2004 - written by Steven Moseley

I decided to put this little XML tutorial / primer together for those of you who are interested in learning XML, but don't want to spend weeks doing so. This 15-minute course should give you guys a good basic understanding of how XML works without much time investment. I suggest that if you find this primer interesting, go read up more on XML (and the associative standards) and learn more. There's a lot more you can do with it than what I'm showing here.

XML (Extensible Markup Language)

There's not much to XML, other than some basic rules. Here is a summary:

  1. Valid XML documents must begin with the XML markup identifier: <?xml version="1.0" encoding="utf-8"?> Obviously, with the appropriate version and encoding specified.  An encoding of utf-8 indicates that the contents of hte XML document will be in unicode format.

  2. There can only be one top-level "root" element in an XML document (for instance, the <html> element of an HTML document).

  3. Elements must be named with <> syntax (the same as HTML).

  4. Elements must all be closed. For instance, if you open a <table>, you must close it with a </table>. Elements that do not contain children, such as the <img> element in HTML, can be closed inline by inserting a trailing slash ("/") at the end of the elementa, like so: <img src="xyz.gif" />

  5. Attribute values must be contained within quotes. For instance, you can't say cellpadding=0 - you must say cellpadding="0" instead.

  6. You may not use the greater than (>), ampersand (&), or quote (") characters in your text nodes (the text contained within an element or attribute - for instance, <p>This is a text node!</p>. You must use the alternate representations >, &, and ".

XSD (XML Schema Definition)

XSD is a very simple markup language that is used to define an XML standard (schema). It looks a lot like HTML, but with different elements. Here are some of the basic elements of a schema:

schema

This is the root element of the schema (like the "html" element in an HTML document). The schema must contain an xmlns Namespace URI pointing to the schema that defines how XSD must be structured. It should also contain a target namespace, which is the uri for the standard that we're creating in this schema.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
targetNamespace="http://www.mydomain.com/mySchema">
</xsd:schema>

The prefix on the elements (xsd) is a reference to the xmlns defined in the schema.

element

An "element" is used to define an tag that will be allowed in the XML standard you are creating. For instance, if I want to create a standard for CDs, I could create an element called "cd" like so:

<xsd:element name="cd" type="xsd:string" />

This definition will allow me to create an element like this:

<cd>Eminem</cd>

attribute

The "attribute" element is used to define an attribute. Attributes must be defined within the context of a complexType (see below). For instance, if I wanted to give my cd element above attributes like "title" and "artist", I could create the following:

<xsd:element name="cd">
<xsd:complexType>
<attribute name="title" type="string" />
<attribute name="artist" type="string" />
</xsd:complexType>
</xsd:element>

Note that the "type" attribute was removed from the element tag and a new "complexType" element was added within the context of the element, containing the desired attributes. Here's what the output might look like:

<cd name="The Eminem Show" artist="Eminem">

simpleType

The "simpleType" element is used to define simple types that extend or restrict the basic XSD types. Basic XSD types are string, integer, decimal, datetime, etc. Here are some simpleType restrictions:

minLength / maxLength - sets the minimum or maximum length of a restricted simpleType. Here's an extension of the example above to make the base string restricted to only allow strings of 9 characters.

<xsd:simpleType name="ssnType">
<xsd:restriction base="xsd:string">
<xsd:minLength value="9" />
<xsd:maxLength value="9" />
</xsd:restriction>
</xsd:simpleType>

In the case of integers, you can also use the minInclusive / maxInclusive elements, which limit the numeric value contained in the integer.

pattern

Patterns allow definition of a regular expression pattern to limit the allowed content of a restricted type. For instance, the following pattern will match any string consisting only of letters and numbers.

<xsd:simpleType name="serialNumberType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[a-zA-Z0-9]*" />
</xsd:restriction>
</xsd:simpleType>

enumeration

Enumerations allow you to define a list of possible values allowed for a simpleType. Here's an example:

<xsd:simpleType name="genreType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="classical" />
<xsd:enumeration value="country" />
<xsd:enumeration value="gospel" />
<xsd:enumeration value="rap" />
<xsd:enumeration value="rock" />
</xsd:restriction>
</xsd:simpleType>

complexType

The "complexType" element is used to define complex types, which are composed of one or more attributes and elements put together in sequences or choice lists. An example of a complexType element would be an HTML <table> type. It contains attributes (cellpadding, etc.) and child elements (<tr>, etc.). Here's an example of a few different ways of composing complexTypes:

sequence

A "sequence" specifies a list of elements that may occur in a complexType. They must all occur (unless specified with a minOccurs value of 0) in the order arranged in the sequence.

<element name="person" type="personType" />

<xsd:complexType name="personType">
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string" />
<xsd:element name="middleInitial" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[a-zA-Z]">
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="lastName" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>

The resulting schema would allow something like this:

<person>
<firstName>Steven</firstName>
<middleInitial>G</middleInitial>
<lastName>Moseley</lastName>
</person>

Notice in the XSD example above that I'm inheriting the named complexType in the "type" attribute of the definition for the "person" element. This is a good habit to get into for defining data types that will be used by many elements.

all

An "all" element is similar to a "sequence", except that it doesn't require that the child elements be arranged in any specific manner. For instance, in the example above, <lastName> could be placed before <firstName> if an <xsd:all> were used in place of the <xsd:sequence>

any

An "any" element is similar to an "all" element, except that it doesn't require that all child elements be used. For instance, in the example above, the <person> could have only a <firstName> specified if <xsd:any> were used.

choice

A "choice" element is similar to an any element, except that it requires that only one child element be used. For instance, in the example above, the resulting <person> element could only have one of <firstName>, <middleInitial>, or <lastName> specified.

attributes

If you're going to define attributes in a complexType, you must place them after the sequence, all, any, or choice element.

Example

Here's a complete example of XSD and XML in action. I decided to go with the CD example that I started above.

<?xml version="1.0"?>
<xsd:schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.mydomain.com/cdSchema">
<xsd:element name="cd">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="artist" type="personType" />
<xsd:element name="track" type="trackType" maxOccurs="100" />
</xsd:sequence>
<attribute name="title" type="xsd:string" />
</xsd:complexType>
</xsd:element>
<xsd:complexType name="personType">
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string" />
<xsd:element name="middleInitial" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[a-zA-Z]">
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="lastName" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="trackType">
<xsd:attribute name="number" type="xsd:integer" />
<xsd:attribute name="name" type="xsd:string" />
</xsd:complexType>
</xsd:schema>

And here's an example of a resulting XML document:

<?xml version="1.0"?>
<cd name="The Eminem Show" xmlns="http://www.mydomain.com/cdSchema">
<artist>
<firstName>Marshal</firstName>
<lastName>Mathers</lastName>
</artist>
<track number="1" name="Hi, My Name Is" />
<track number="2" name="White America" />
<track number="3" name="Slim Shady" />
</cd>