3D PLM Enterprise Architecture |
Middleware Abstraction |
Introduction to XMLQuick overview of XML fundamentals |
Technical Article |
AbstractThis article explains what XML is. It gives an overview of the standards governing the various aspects of XML. It shows how XML can be used from the developer's point of view. |
XML (Extensible Markup Language) is a data description language. XML
has many strengths, which account for its great popularity and large
availability:
XML documents are composed of elements, delimited by a start tag (of
the form <element_name>
) and an end tag (of
the form </element_name>
). Elements can
contain either other elements or free text. Every XML document has one
and only one root element, which contains all the other elements. The
following XML sample is provided to illustrate the anatomy of an XML
document.
The following syntactic constructs are the most common in XML documents.
Other syntactic constructs (CDATA sections, processing instructions, etc.) can occasionally appear in XML documents. For a complete description of the XML syntax, please refer to [1].
[Top]
XML files follow syntactic rules, some of which were just described in the previous section: there must be one and only one root element; attribute names must be unique within an element start tag scope; reserved XML characters such as '&' and '<' must be properly escaped, etc. An XML document, which obeys these syntactic rules is said to be well formed.
<?xml version='1.0' encoding='UTF-8'?> <car><part name="engine"></car></part> ^ | Not well-formed XML: the tags are not properly nested. |
The elements, which are allowed to appear in an XML documents and the order in which these elements are allowed to appear is described by a grammar file, called a DTD or an XSD schema. An XML document, which obeys all the rules specified by its associated grammar file is said to be valid.
<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE car SYSTEM "automotive.dtd"> <car> <part name="engine"></part> <aeroplane name="spitfire"/> </car> ^ | | Well-formed but invalid XML: aeroplane is not defined in the automotive DTD. |
[Top]
The W3C (World Wide Web Consortium) is the standard body in charge of XML. The W3C does not only take care of the standardization of the language itself (see [1]); it also offers standardized programming APIs to manipulate XML documents. Aside from the W3C, other programming APIs have become very popular to the point of becoming de facto standards. Note that though XML is very stable and upward compatibility will be assured, the standard keeps evolving with revisions of the specifications (XML 1.1, XSD 1.1) or the apparition of newer programming paradigms.
[Top]
The following list presents the official W3C standards:
[Top]
Originally invented by David Megginson as a library for the Java programming language, the SAX API has become very popular and can be considered as a de facto standard. Many vendors provide their own implementation of SAX for various languages and the specification has already undergone one major evolution (SAX 2.0). For more information, see [6].
[Top]
Developers wanting to create, access or manipulate data stored in XML have APIs at their disposal. This section gives an overview of the DOM and the SAX APIs, while the next section discusses the strengths and weaknesses of each API.
[Top]
The DOM API uses an object-oriented approach to describe XML documents. The DOM API defines interfaces to represent each of the constructs available in the XML language: elements, attributes, documents, characters, entities, comments, etc. These interfaces have inheritance relationship (a "comment" is a specialized form of "character data"). The inheritance hierarchy is rooted at the abstract "node" class. The following diagram shows the DOM V5 interface hierarchy.
The DOM API views XML documents as a tree of XML nodes. The root element of the XML document corresponds to the root of the DOM tree. The sub-elements of the root element are the children of this root node. The following sample shows the DOM tree which corresponds to a sample XML document.
<?xml version="1.0"?> <!DOCTYPE car SYSTEM "automotive.dtd"> <car> <!--list of parts for a convertible car--> <part name="seat" quantity="2"></part> <part name="wheel" quantity="4"/> <part name="engine" quantity="1">low consumption engine</part> <part name="body" quantity="1">weight must be < 1200 kg</part> </car> |
The DOM API defines methods to parse documents (build the in-memory tree, which corresponds to an XML document), manipulate document (insert elements, edit attribute values, copy or delete sub-trees, etc.), and write documents (generate XML from an in-memory DOM tree).
[Top]
The SAX API uses an event-oriented API to process XML documents. The XML SAX parser reads XML documents sequentially and emits one typed event for each XML construct it comes across: start of the document, start of an element, end of an element, comment, characters, etc. Programmers register callback functions with the SAX parser for the events they are interested in. Usually, programmers will need to store the generated events in a stack in order to keep track of the location of the event in the XML tree. The following list shows the SAX events, which are generated for a sample document.
<?xml version="1.0"?> <!DOCTYPE car SYSTEM "automotive.dtd"> <car> <!--list of parts for a convertible car--> <part name="seat" quantity="2"></part> <part name="wheel" quantity="4"/> <part name="engine" quantity="1">low consumption engine</part> <part name="body" quantity="1">weight must be < 1200 kg</part> </car> |
1. Start document 2. Start element "car" 3. Start element "part", attributes {"name=seat", "quantity=2"} 4. End element "part" 5. Start element "part", attributes {"name=wheel", "quantity=4"} 6. End element "part" 7. Start element "part", attributes {"name=engine", "quantity=1"} 8. Characters "low consumption engine" 9. End element "part" 10. Start element "part", attributes {"name=body", "quantity=1"} 11. Characters "weight must be < 1200 kg" 12. End element "part" 13. End element "car" 14. End document |
[Top]
XML often provides more than one mechanism to address the same problem. Choosing the right mechanism is a matter of understanding its trade-offs in terms of performance, ease of development, supported features, etc.
[Top]
The main advantages of DOM are:
The main weaknesses of DOM are:
The main advantages of SAX are:
The main weaknesses of SAX are:
In summary, use DOM if you need to manipulate files no larger than a few megabytes or if your application uses XML itself as the data model. Typical candidates would be storing settings in XML, or manipulating contents in XHTML. Use SAX if you need to manipulate arbitrary large files or if you need to map you own object model to XML. Typical candidates would be persistency of an object graph in XML, or processing large log files containing events generated by a server.
[Top]
DTD and schemas address the same problem: defining tag vocabularies (grammars) for XML documents. They are to XML documents what the description of tables and relationships is to a relational database. The following example shows how an XML grammar defined as a DTD. The grammar defines two elements (car and part). The part element element can have two attributes ("name" and "quantity"). Several "part" elements can be nested inside a "car" element. Part elements can contain text.
<!ELEMENT car (part)+> <!ELEMENT part (#PCDATA)> <!ATTLIST part name ID #REQUIRED quantity CDATA #IMPLIED> |
Here is an equivalent grammar defined as an XSD schema:
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="urn:com:dassault_systemes:automotive" xmlns:tns="urn:com:dassault_systemes:automotive"> <xsd:element name="car"> <xsd:complexType> <xsd:sequence maxOccurs="unbounded"> <xsd:element name="part" type="tns:partType"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="partType"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="name" type="xsd:ID" use="required"/> <xsd:attribute name="quantity" type="xsd:positiveInteger"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:schema> |
The main advantages of DTDs are:
The main weaknesses of DTDs are:
The main advantages of XSD schemas are:
The main weaknesses of XSD schemas are:
In summary, your choice for DTDs or schemas will first depend on what is available: make the list of the products you need to integrate and choose a grammar language supported by all the systems. Use DTDs if if you are new to XML and want to get started quickly; if you need to develop a prototype and do not want to spend much time on a grammar; if you need to integrate with a system, which only supports DTDs. Use schemas if you need a precise definition of your data model; you have tools or expertise to help you define the schema.
[Top]
XML is a data description language. Its simplicity, strict standardization, broad availability, and tools support make it a good vehicle to exchange data among heterogeneous systems.
[Top]
Version: 1 [Apr 2005] | Document created |
[Top] |
Copyright © 2005, Dassault Systèmes. All rights reserved.