Three Schemas

Schema Test Cases for the Schema Comparison Panel

Norman Walsh

Staff Engineer
Sun Microsystems, XML Technology Center

04 Oct 2001


This document describes three schemas to be completed by teams participating in the schema review panel for XML 2001.

All of these schemas are contrived. Don't let that bother you. Please let me know if you discover ambiguity in the descriptions. Additional sample documents in each schema will be provided as soon as they are available.

Schema authors are encouraged to balance readability with absolute adherence to the requirements. If, for example, strict satisfaction of a particular, and in the view of the team minor, requirement is possible but would require ten times as much effort, it is reasonable to note that the requirement is not satisfied and present the more readable solution. Teams are invited to provide the more complicated solution as well, of course.

Cross-referencing mechanisms are intentionally vague in this description. Some schema languages, like DTDs, will have to use ID/IDREF. Other teams may choose to use different language facilities for this purpose. All teams are free to add attributes as necessary to accomplish the required linking. In this regard, the sample documents provided may be lacking.

Technical Memorandum

The first schema is for a technical memorandum. It does not have a namespace. It begins with either a memo or techmemo element, these are synonymous. It consists of a head and body.

The head contains exactly one each of the following: date, author, and title. It may also include any number of meta elements from the XHTML namespace. These elements may appear in any order.

The date must be a valid date, the other head fields contain only text. What constitutes a valid date is intentionally vague. If you're using a datatype library that supports something that might reasonably be called a “date type”, you can use that. If you prefer to use a regular expression, that's fine too. As long as you describe how you interpreted “valid date” and how you achieved validation of that date, it's up to you.

The body contains a mixture of zero or more para and list elements. The emph, footnote, footnoteref, and link (a simple XLink) elements may appear inside para, along with text.

The footnote element contains one or more para elements. However, footnotes may not nest; a footnote may not contain a footnote as a descendant. The footnoteref element is empty; it has a required ref attribute which must point to a footnote. The emph and link elements contain only text.

A list consists of an optional title followed by two or more item elements. An item may contain text and inlines (emph, footnote, footnoteref, and link), or one or more para elements, but not both. All of the items in a list must have the same kind of content (all text and inlines or all paragraphs).

The following markup shows one possible memo:

<techmemo>
  <head xmlns:h="http://www.w3.org/1999/xhtml">
     <date>Jul 13, 2001</date>
     <author>John Doe</author>
     <title>A Random Memorandum</title>
     <h:meta name="pointless" content="content"/>
  </head>
  <body>
    <para>A <emph>paragraph</emph>.</para>
    <list>
      <item>a list item</item>
      <item>another list item</item>
    </list>
    <para>Another paragraph<footnote><para>A real memo schema would
probably need more than lists and paragraphs.</para></footnote>.</para>
  </body>
</techmemo>

A Whitepaper

The second schema is for a white paper, it also has no namespace. It intentionally shares many of the same structures as the technical memorandum schema. Teams are invited to factor the common bits, write one schema as a customization of the other, or otherwise take advantage of as much reuse as is practical.

A whitepaper consists of a required head followed by a mixture of para and list elements (which may be absent), followed by zero or more section elements and an optional glossary.

The head must contain exactly one date and one title. It must contain at least one author. It may contain at most one titleabbrev element. It may contain zero or more copyright, keywords and legalnotice elements. It may also contain any number of meta elements from the XHTML namespace. The order of elements in the head is irrelevant.

The keywords element has an optional vocabulary attribute. If multiple keywords are provided, they must come from different vocabularies.

The content of date, author and title elements is as before. The titleabbrev element contains text. The keywords element contains a whitespace-delimited list of one or more tokens (there are no restrictions on the characters in the tokens). The legalnotice contains an optional title followed by one or more para elements. Finally, copyright contains one or more year elements and one or more holder elements, in that order.

Copyright years should be valid years, holders simply text.

Sections must have a head, but only the title element is required in section heads. The body of a section consists only of paragraphs, lists, and optionally trailing sections.

The whitepaper schema adds a new inline to the content of para: glossterm. A glossterm must point to a glossdef. If the glossterm has a ref attribute, that attribute points to the definition, otherwise the body of the glossterm is to be used for the cross reference.

A glossary consists of an optional head (of the same form as section) followed by one or more glossdef elements. Each glossdef consists of a term followed by one or more paragraphs or lists. The terms contain only text.

Here is one valid whitepaper:

<whitepaper>
  <head>
     <date>Aug 23, 2001</date>
     <author>Jane Smith</author>
     <author>John Doe</author>
     <title>Technical Analysis of a Random Memorandum</title>
     <titleabbrev>Analysis of a Memorandum</titleabbrev>
     <keywords>analysis random contrived</keywords>
  </head>

  <para>There are <glossterm>paragraph</glossterm> and
<glossterm>list</sgmltag> elements in the
<glossterm ref="memorandum">memo</glossterm> schema.</para>

  <section>
    <head>
      <title>More Stuff</title>
    </head>
    <para>With more words.</para>
  </section>

  <glossary>
    <glossdef><term>list</term>
              <para>Some definition.</para>
    </glossdef>
    <glossdef><term>memorandum</term>
              <para>Some definition.</para>
    </glossdef>
    <glossdef><term>paragraph</term>
              <para>Some definition.</para>
    </glossdef>
  </glossary>
</whitepaper>

Order Form

The order form schema uses addresses for both billing and shipping information. For our purposes, there are two kinds of addresses in the world: US addresses and international addresses. A US address consists of the following fields: one or more street elements followed by city, state (which must be one of the 50 US state postal abbreviations), zip (which must be either a five digit zip code or a nine digit “zip+4” code), and an optional country. If country is specified, it must be “US”.

An international address consists of: one or more street elements followed by city, an optional stateOrProvince, an optional postalcode, and a country.

Either of these forms may be used for the address fields of the order form schema.

The namespace name for elements in the order form schema is “urn:x-xmlns:example:orderForm”. An orderForm contains exactly one of each of the following elements, in this order: billToAddress, order, shippingInfo, and paymentMethod. If the billToAddress is a US address and the state is not one of the following: AK, DE, HI, MT, NO, OR, or WY, then the orderForm must also include a salesTax element immediately after the order.

The orderForm may additionally contain any element not from the order form namespace, provided that the expanded-name of the element has a non-null namespace URI. Elements not from the order form namespace may not contain elements or attributes from the order form namespace.

An order consists of one or more item elements.

Each item begins with an itemNumber. If the item number has the form “CL-” followed by a four digit number, it is a clothing item. If it has the form “NC-” followed by a four digit number, it is a non-clothing item. If it matches neither pattern, it is invalid.

Non-clothing items have the following additional fields: description, quantity, and unitPrice in that order. The description may contain text and elements from any namespace other than the order form namespace (including elements whose expanded-name has a null namespace URI and without any restricton on their content). The quantity must be a positive integer. The unitPrice must be a positive decimal number with two digits after the decimal point. The quantity element is optional, if it is not specified, it must default to “1”; description and unitPrice are required. The description must not be empty and may not contain only whitespace.

Clothing items must have all of the fields of a non-clothing item, plus the following additional fields (in this order): size (“S”, “M”, “L”, “XL”, “LT”, or “XLT”), color, alternateColor (color and alternate color may not be the same), and an optional monogram which must consist of 1-3 upper-case letters (“A”-“Z”).

The shippingInfo contains a shipToAddress and a shipBy element in that order.

The shipBy is either “USPS”, “FedEx”, “UPS”, or “DHL” (Tokenized element content, like attribute values, may have leading and trailing whitespace). The shipBy must have a shippingCost attribute and may optionally have a rush attribute containing “none”, “3day”, “2day”, or “overnight”. If unspecified, rush defaults to “none”. Overnight shipping is not available to international addresses.

The paymentMethod consists of either a creditCard or a checkOrMoneyOrder. The amount of the payment is recorded in the amount attribute on the paymentMethod.

The creditCard element must have either a type attribute or a type child (it is an error to have neither or both). In either case, the content must be one of the following “Amex”, “Visa”, or “Mastercard”. The creditCard must also have a number and an expiration. For “Amex” payments, the number must be 15 digits long, for “Mastercard” it must be 16, for “Visa” it must be either 13 or 16 digits long. The expiration must match the pattern “99/99”.

The checkOrMoneyOrder element is empty.

Finally, salesTax must be a positive decimal number with two digits after the decimal point.

<orderForm xmlns="urn:x-xmlns:example:orderForm">
  <billToAddress>
    <street>John Doe</street>
    <street>123 Anystreet</street>
    <city>East Yahoo</city>
    <state>MA</state>
    <zip>01007</zip>
    <country>US</country>
  </billToAddress>

  <h:p xmlns:h="http://www.w3.org/1999/xhtml">This is a silly place
for a paragraph.</h:p>

  <order>
   <item>
     <itemNumber>NC-1234</itemNumber>  
     <description>Something</description>
     <quantity>4</quantity>
     <unitPrice>12.49</unitPrice>
   </item>
   <item>
     <itemNumber>CL-1234</itemNumber>
     <description>Something else</description>
     <size>S</size>
     <color>red</color>
     <alternateColor>white</alternateColor>
     <monogram>JSR</monogram>
     <quantity>1</quantity>
     <unitPrice>129.99</unitPrice>
   </item>
  </order>

  <salesTax>2.59</salesTax>

  <shippingInfo>
    <shipToAddress>
      <street>Jane Smith</street>
      <street>123 Any Other Street</street>
      <city>North Walsham</city>
      <postalcode>NR28 ODL</postalcode>
      <country>UK</country>
    </shipToAddress>
    <shipBy shippingCost="23.45">USPS</shipBy>
  </shippingInfo>

  <paymentMethod amount="205.99">
    <creditCard type="Visa">
      <number>1234123412341</number>
      <expiration>10/03</expiration>
    </creditCard>
  </paymentMethod>
</orderForm>