Rules and Standards |
CAA V5 XML Coding RulesRules, hints and tips to write XML code |
Technical Article |
AbstractThis article gives you several mandatory rules related to the use of XML code, including but not restricted to XML Schema and XML documents. The goal of these rules is to help choosing the best alternative design choice when dealing with XML, and therefore have a consistent and easy to maintain set of XML documents. This article also provides some tips to make your XML code easier to use with third party software, such as data exchange software. |
These rules are:
[Top]
XML documents should be easy to exchange across various OS systems and countries. Therefore, it is required that no XML instance should be written or generated using an encoding that may no be available on another system.
For example, using is a sentence in French language is to be found in an XML instance, it must not be encoded in iso-8859-1 (one byte for each character of western European languages) because such encoding may not be available on a system in an Asian country. Moreover, since XML document content can be multi-lingual (like mixing German and Japanese language), so the encoding must include all possible language combinations. UTF-8 is one of the possible encoding format for the UNICODE character set.
Always use UTF-8 for XML documents, and therefore always begin your XML documents (including XML Schema) with the following line:
<?xml version="1.0" encoding="UTF-8"?> |
It is possible to use UTF-16 as an alternative encoding format, but this encoding less commonly used and it tends to lead to larger files than UTF-8.
Do not begin any XML document with the following processing instruction:
<?xml version="1.0"?> <!-- DO NOT USE THIS XML SAMPLE IN YOUR CODE !! --> |
because there would be no way to tell what is the document encoding.
The valid set of elements, attributes, text, etc. entities in an XML document, and their allowed sequences or nesting can be defined either by a DTD (Document Type Definition), or by an XML Schema.
XML Schema can specify any document structure which can be specified by a DTD, and much more : for example an XML Schema can be used to defined allowed data types in an XML document. XML Schema mappings with most languages are being specified.
[Top]
These rules are:
[Top]
Public XML declarations must rely on standards. For XML Schema, use at least the "XML Schema Recommendation 1.0", 2 May 2001, from the World Wide Web consortium (http://www.w3.org). Previous versions of XML Schema (24 October 2001 and 7 July 2001 versions) are non-final versions which should no longer be used.
For example, don't use:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <!-- DO NOT USE THIS XML SAMPLE IN YOUR CODE !! --> <xsd:complexType name="shippingDateType"> <xsd:sequence> <!-- the xsd:year and xsd:month datatypes are deprecated --> <xsd:element name="year" type="xsd:year"/> <xsd:element name="month" type="xsd:month"/> </xsd:sequence> </xsd:complexType> </xsd:schema> |
Instead, the same XML type should be declared using the following standard syntax:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:complexType name="shippingDate"> <xsd:sequence> <xsd:element name="year" type="xsd:gYear"/> <xsd:element name="month" type="xsd:gYearMonth"/> </xsd:sequence> </xsd:complexType> </xsd:schema> |
Note that not only the namespace for XML Schema elements changed
from http://www.w3.org/2000/10/XMLSchema
to http://www.w3.org/2001/XMLSchema
,
but also some datatypes, like xsd:month
and xsd:year
have been renamed into xsd:gYearMonth
and xsd:gYear
using the XML Schema Recommendation 1.0 (the 'g' prefix stands for the 'g' of
'G'regorian calendar).
[Top]
Public XML Schema must have a namespace: do not use no-namespace schemas.
See [3] for more in-depth discussions related to the use of Namespaces.
[Top]
The datatypes used in XML Schema declarations must comply to the chosen XML Schema standard (see [2]) and must be selected in the following list of supported datatypes. Forbidden types are based on redundancy or lack of simple mapping to C++ or Java.
Datatype |
Description |
Supported |
Comment |
string
|
A
sequence of zero or more Unicode characters that are allowed in an XML
document; essentially the only forbidden characters are most of the C0
controls, surrogates, and the byte-order mark |
Yes |
Special characters must be escaped. |
normalizedString | A
string that does not contain any tabs, carriage returns, or linefeeds |
||
token |
A string with no leading or trailing white space, no tabs, no linefeeds, and not more than one consecutive space | ||
boolean
|
Yes |
|
|
float
|
Yes |
|
|
double
|
Yes |
|
|
decimal
|
No |
Common programming languages do not support arbitrary
precision decimal numbers. |
|
duration
|
Partially |
Lose of precision may occur; Year and Month component
cannot be supported. |
|
|
No |
Cannot be represented by any of the intrinsic types
in common programming languages. |
|
base64Binary
|
Base64
encoding uses an algorithm based on 65 ASCII characters chosen for their
ability to pass through almost all gateways, mail relays, and terminal
servers intact, as well as their existence with the same code points in
ASCII, EBCDIC, and most other common character sets |
Yes |
Both Base64 and hex encoding are supported, but are not encouraged for performances reasons. Binary data should rather be referred to through XLinks or unparsed entities. |
hexBinary | Hexadecimal binary encodes each byte of the input as two hexadecimal digits | Yes | |
anyURI
|
Yes |
|
|
ID
|
No |
|
|
IDREF
|
No |
|
|
ENTITY
|
No |
||
NOTATION
|
No |
||
QName
|
No |
||
language
|
Yes |
|
|
IDREFS
|
No |
See IDREF. |
|
ENTITIES
|
No |
||
NMTOKEN
|
No |
||
NMTOKENS
|
No |
||
Name
|
No |
||
NCName
|
No |
||
integer
|
No |
Common programming languages do not support integer
numbers of arbitrary size. |
|
nonPositiveInteger
|
No |
Common programming languages do not support integer
numbers of arbitrary size. |
|
negativeInteger
|
No |
Common programming languages do not support integer
numbers of arbitrary size. |
|
long
|
No |
Many programming languages do not support integer
values that exceed the range of a 32-bit signed integer. |
|
int
|
Yes |
|
|
short
|
Yes |
|
|
byte
|
Yes |
|
|
nonNegativeInteger
|
No |
Common programming languages do not support integer
numbers of arbitrary size. |
|
unsignedLong
|
No |
Many programming languages do not support integer
values that exceed the range of a 32-bit signed integer. |
|
unsignedInt
|
No |
Many programming languages do not support integer
values that exceed the range of a 32-bit signed integer. |
|
unsignedShort
|
Yes |
|
|
unsignedByte
|
Yes |
|
|
positiveInteger
|
No |
Common programming languages do not support integer
numbers of arbitrary size. |
|
dateTime
|
Yes |
|
|
time
|
Yes |
|
|
date
|
Yes |
|
|
gDay gMonth |
Yes | ||
gYear
|
A given year |
Yes |
|
gYear
|
Yes |
|
|
century month year timeDuration recurringDuration |
No |
Deprecated |
The XML Schema Recommendation 1.0 standard does not specifically
addresses how to define array datatypes. Arrays used outside the scope of SOAP
and WSDL can be represented by nested <sequence>
constructs,
otherwise the SOAP and WSDL definitions for arrays should be used.
[Top]
All elements having a complex type must be defined through a type. For example, do not use the following declaration:
<!-- DO NOT USE THIS XML SAMPLE IN YOUR CODE !! --> <xsd:element name="shippingDate"> <xsd:complexType> <xsd:sequence> <xsd:element name="year" type="xsd:gYear"/> <xsd:element name="month" type="xsd:gYearMonth"/> </xsd:sequence> </xsd:complexType> </xsd:element> |
But use instead the following one:
<xsd:element name="shippingDate" type="shippingDateType"> <xsd:complexType name="shippingDateType"> <xsd:sequence> <xsd:element name="year" type="xsd:gYear"/> <xsd:element name="month" type="xsd:gYearMonth"/> </xsd:sequence> </xsd:complexType> |
Indeed, types are more re-usable than element definitions.
For example, if the structure is intended to be used as an element in instance documents, and it’s required that sometimes it be nillable and other times not, then it must be defined it as a type. Indeed the following syntax is not supported by XML Schema:
<!-- DO NOT USE THIS XML SAMPLE IN YOUR CODE !! --> <xsd:element ref="shippingDate" nillable="true"> |
while the following is supported:
<xsd:element name="sendingDate" type="shippingDateType" nillable="true"> <xsd:element name="receiptData" type="shippingDateType"> |
[Top]
When defining attributes on a container datatype (in the Object-Oriented sense), two main XML Schema patterns may be used:
The "default CAA" choice for attribute definition in XML Schemas is the later choice: "define typed elements" instead actual attributes. There are at least two reasons for this rule: first, arbitrary text values cannot be stored in attributes of string type (line breaks in particular); second, using elements over attributes is a SOAP rule. See the SOAP chapter in [4] for more in-depth references about "elements over attributes". For example, the following declaration is not the preferred one for a public CAA schema:
<!-- DO NOT USE THIS XML SAMPLE IN YOUR CODE !! --> <xsd:complexType name="ecType"> <xsd:attribute name="effectivity" type="xsd:date"/> </xsd:complexType> |
while this declaration does declare attributes in the preferred CAA way:
<xsd:complexType name="ecType2"> <xsd:sequence> <xsd:element name="effectivity" type="xsd:date"/> </xsd:sequence> </xsd:complexType> |
Please note that:
[Top]
We call XML Instances any XML document have all its elements defined in a XML Schema. These rules are:
The XML Schema associated to XML instance documents must be declared, even if the documents are to be processed by non-validating XML processors.
[Top]
XML property files are a particular set of XML instances documents which can be used by end-users to customize their applications. Most end-users do not have an XML editors, so a special care must be taken for such files to allow easy editing:
\n
.[Top]
[1] | "XML Schema Part 1: Structures", 2 May 2001, on the Internet at http://www.w3.org/TR/xmlschema-1/ |
[2] | "XML Schema Part 2: Datatypes", 2 May 2001, on the Internet at http://www.w3.org/TR/xmlschema-2/ |
[3] | "XML Schemas: Best Practices", 16 August 2001 or more recent update, on the Internet at http://www.xfront.com/BestPracticesHomepage.html |
[4] | "Essential XML - Beyond Markup", Don Box, Aaron Skonnard, and Jon Lam. Addison Wesley, 2000 |
[Top] |
Version: 1 [Sep 2001] | Document created |
[Top] |
Copyright © 2001, Dassault Systèmes. All rights reserved.