3D PLM Enterprise Architecture |
Middleware Abstraction |
Using XML in V5Description of the XML infrastructure available in V5 |
Technical Article |
AbstractThis article explains what is an XML parser. It describes the parsers available in V5, lists their capabilities and discusses how to create them. It also discusses how these V5 parsers integrate existing XML standards. |
The component, which processes raw XML content and allows the developer to access and manipulate this content from a DOM or a SAX API is called an XML parser. Rather than developing its own XML parser, Dassault Systèmes chooses to rely on its software partners and integrate existing parsers. All these parsers support the DOM and SAX APIs, but there are differences between them:
To make the development of XML-based solutions easier for CAA developers, the XMLParser framework provides CAA developers with the following functionality:
Each parser component in the XMLParser framework is identified by a GUID. To instantiate the parser, developers invoke a global function, passing it the identifier of the parser they want to use. Since there are two families of APIs (DOM and SAX), there are two global functions. If the developer fails to pass an identifier, a default value is used (corresponding to the XML4C3 parser).
The following sample shows how to instantiate a DOM parser backed by the XML4C5 parser component:
#include "CATIXMLDOMDocumentBuilder.h" // To create the DOM objects CATIXMLDOMDocumentBuilder_var builder; HRESULT hr = CreateCATIXMLDOMDocumentBuilder(builder, CLSID_XML4C5_DOM); |
The following sample shows how to instantiate a SAX parser backed by the XML4C5 parser component:
#include "CATIXMLSAXFactory.h" // To create the SAX objects CATIXMLSAXFactory_var factory; HRESULT hr = CreateCATIXMLSAXFactory(factory, CLSID_XML4C5_SAX); |
Compatibility between parsers
DOM V5 components
have dependencies on SAX V5 components. For instance a DOM parser needs
to be able to fetch XML from various physical sources (an HTTP server, a
file, a relational database, etc.); rather than defining yet another
interface for input sources, the V5 DOM parser accepts parsing from a
CATISAXInputSource; to create such input sources, one uses the V5 SAX
component.
V5 parsers are
not interoperable: you cannot append a CATIDOMElement created
with XML4C3 to a CATIDOMElement created with XML4C5. However,
several parsers can coexist in the same process.
[Top]
The following table gives an overview of the features supported by each parser.
X4C3 | X4C5 | MSXML3 | MSXML4 | MSXML5 | |
DOM level 1 and 2 | Yes | Yes | Yes (1) | Yes (1) | Yes (1) |
DOM traversal | Yes | Yes | Yes (2) | Yes (2) | Yes (2) |
SAX 1 | Yes | Yes | Yes (3)(2) | Yes (4)(2) | Yes (4)(2) |
SAX 2 | Yes (5) | Yes | Yes (5) | Yes (6) | Yes (6) |
DTD validation | Yes | Yes | Yes (7) | Yes (7) | Yes (7) |
XSD schema validation | No | Yes | No | Yes | Yes |
Unix availability | Yes | Yes | No | No | No |
Windows availability | Yes | Yes | Yes | Yes | Yes |
In summary, you can use the following rules to choose the parser, which best suits your needs:
[Top]
The XMLParser framework defines two classes of XML APIs: standard APIs and additional APIs created by Dassault Systèmes.
The following two sections give you more information as to how XML standard specifications have been adapted for V5.
[Top]
The DOM specification uses OMG IDL to define its APIs in an abstract, platform-neutral way. It is then up to each platform to define a binding, that is a concrete version of the APIs using the language and data types native to the platform. The following table explains how this is done for V5 in C++.
OMG IDL | V5 C++ | Comment |
DOMString | CATUnicodeString | All the strings obtained from parsing an XML document are represented as CATUnicodeStrings: element names, attribute values, characters, entity names, etc. |
DOM exception | HRESULT + CATError | The usage for V5 code is to signal errors using HRESULTs. Additional information about the error can be obtained using the CATError mechanism. See [1] for more information. |
interface XXX | V5 interface handler CATIDOMXXX_var | All DOM interfaces are represented by V5 interface handlers. The V5 naming conventions are respected by prepending the "CATIDOM" prefix to the original DOM name (Thus, the Node interface from the specification is mapped to CATIDOMNode_var interface handler in V5 C++, the Element interface is mapped to the CATIDOMElement_var interface handler and so on). |
rettype method(arg1, arg2, ..., argN) raises DOMException | HRESULT Method(arg1, arg2, ..., argN, rettype) | Methods bear the same name in V5 as in the specification, with the first letter in capital to obey the V5 naming convention. If the specification indicates a return value for the method, the corresponding V5 method will have an additional out parameter to return this argument. The exceptions declared by the method are replaced by a HRESULT. |
boolean | CATBoolean | |
unsigned long | unsigned int |
As a concrete example of how the binding works, please consider the abstract definition of the DOMImplementation extracted from DOM specification.
interface DOMImplementation { boolean hasFeature( in DOMString feature, in DOMString version); // Introduced in DOM Level 2: DocumentType createDocumentType( in DOMString qualifiedName, in DOMString publicId, in DOMString systemId) raises(DOMException); // Introduced in DOM Level 2: Document createDocument( in DOMString namespaceURI, in DOMString qualifiedName, in DocumentType doctype) raises(DOMException); }; |
In V5, you will manipulate the following V5 interface
class CATIDOMImplementation : public CATBaseUnknown { virtual HRESULT HasFeature( const CATUnicodeString& iFeature, const CATUnicodeString& iVersion, CATBoolean& oResult) = 0; // Introduced in DOM Level 2: virtual HRESULT CreateDocumentType( const CATUnicodeString& iQualifiedName, const CATUnicodeString& iPublicId, const CATUnicodeString& iSystemId, CATIDOMDocumentType_var& oDocumentType) = 0; // Introduced in DOM Level 2: virtual HRESULT CreateDocument( const CATUnicodeString& iNamespaceURI, const CATUnicodeString& iQualifiedName, const CATIDOMDocumentType_var& iDocumentType, CATIDOMDocument_var& oDocument) = 0; }; |
[Top]
The SAX specification uses Java to define its APIs. Platforms, which do not use Java as their programming language define a binding for their language, that is a version of the APIs using the language and data types native to the platform. The following table explains how this is done for V5 in C++.
Java SAX definition | V5 C++ | Comment |
java.lang.String | CATUnicodeString | All the strings obtained from parsing an XML document are represented as CATUnicodeStrings: element names, attribute values, characters, entity names, etc. |
java.io.Exception org.xml.sax.SAXException |
HRESULT + CATError | The usage for V5 code is to signal errors using HRESULTs. Additional information about the error can be obtained using the CATError mechanism. See [1] for more information. |
interface YYY | V5 interface handler CATISAXYYY_var | All SAX interfaces are represented by V5 interface handlers. The V5 naming conventions are respected by prepending the "CATISAX" prefix to the original SAX name (Thus, the ErrorHandler interface from the specification is mapped to CATISAXErrorHandler_var interface handler in V5 C++, the AttributeList interface is mapped to the CATISAXAttributeList_var interface handler and so on). |
rettype method(arg1, arg2, ..., argN) throws SAXException | HRESULT Method(arg1, arg2, ..., argN, rettype) | Methods bear the same name in V5 as in the specification, with the first letter in capital to obey the V5 naming convention. If the specification indicates a return value for the method, the corresponding V5 method will have an additional out parameter to return this argument. The exceptions declared by the method are replaced by a HRESULT. |
org.xml.sax.HandlerBase org.xml.sax.DefaultHandler org.xml.sax.DefaultXMLFilter |
CATSAXHandlerBase CATSAXDefaultHandler CATSAXDefaultXMLFilter |
Classes providing a default implementation for SAX interfaces are represented in V5 by a V5 component providing a default implementation for the same SAX interface. The V5 naming conventions are respected by prepending the "CATSAX" prefix to the original SAX name. Thus, the HandlerBase Java class, which implements the DocumentHandler, DTDHandler, EntityResolver and ErrorHandlerJava SAX interfaces is mapped to the CATSAXHandlerBase V5 component, which implements the CATISAXDocumentHandler, CATISAXDTDHandler, CATISAXEntityResolver and CATISAXErrorHandler V5 interfaces |
boolean | CATBoolean | |
int | unsigned int |
As a concrete example of how the binding works, please consider the Java definition of the EntityResolver extracted from SAX specification.
package org.xml.sax; public interface EntityResolver { InputSource resolveEntity( String publicId, String systemId) throws SAXException, IOException; } |
In V5, you will manipulate the following V5 interface
class CATISAXEntityResolver: public CATBaseUnknown { virtual HRESULT ResolveEntity( const CATUnicodeString & iPublicId, const CATUnicodeString & iSystemId, CATISAXInputSource_var & oInputSource) = 0; }; |
[Top]
The XML specification defines the XML syntax using the character model defined by the Unicode specification. XML contents however can be stored in text using any encoding (code page) provided that the underlying parsers support them. To use a given encoding for an XML file, you need to:
<?xml version='1.0' encoding='UTF-8'?> ... content encoded in UTF-8 ... |
[Top]
The XML specification mandates that XML parsers support UTF-8. Therefore, this encoding is universally available. Furthermore, this encoding supports the whole Unicode standard, which guarantees that national characters can be read and written on any machine in the world without loss or corruption, or need to install a specific code page configuration file.
When you have the
choice of the encoding, use the UTF-8 encoding.
[Top]
A few other encodings are also supported by the XMLParser framework.
[Top]
The XMLParser framework provides several parsers. All these parsers are accessible through the same V5 DOM or SAX APIs. Choice of the parser depends on requirements of the target application.
[Top]
[1] | Managing Errors Using HRESULT |
Version: 1 [Apr 2005] | Document created |
[Top] |
Copyright © 2005, Dassault Systèmes. All rights reserved.