whenever I need to validate an XML file against a schema, and the XML doesn't have a reference to that schema, I do the following:
- create a parser
- call useCachedGrammarInParse(true) or setFeature(fgXercesCacheGrammarFromParse, true)
- call loadGrammar(MemBufInputSource, Grammar::SchemaGrammarType, true) [the last argument tells the parser to cache the grammar)
- call setDoValidation(true), setValidationScheme(Val_Always), setDoNamespaces(true)
- call parse(MemBufInputSource) to load the XML
Hope this helps, Alberto
At 17.45 18/05/2005 +0200, Gierschner, Frank wrote:
Hi all.
I have some experience with XML in general but I am new to Xerces, so please provide some hints.
I wish to schema-validate an in-memory xml-string, coming in from a CORBA client, against an in-memory schema. But the parser (DOMBuilder) does not seem to associate the xml-data with the provided schema. I provide both from within MemBufInputSource and I am unaware of the real functionalty of the so-called 'fake system id'(argument 3: bufId) and if it has something to do with the failure.
I even considered to write the schema temporarily to a xsd-file, build a DOM from the incoming xml-string, adding a schemaLocation attribute to the still unvalidated DOM, serializing this DOM into a new string and the reparsing and validating against the xsd-file but that seems error-prone, cumbersome and does not work yet, too.
I provided some test results I encountered when dealing with files and the interesting part of the code below.
I would appreciate your help. Thanks in advance. Frank Gierschner
// small excerpt on dealing with files now
In order to get track of the problem I modified the SCMPrint example around line 335 to be:
<code>
parser->loadGrammar(xsdFile, Grammar::SchemaGrammarType, true);
if (handler.getSawErrors())
{
handler.resetErrors();
}
else
{
parsedOneSchemaOkay = true;
}
parser->parse(xmlFile); // NEW
</code>
where xmlFile denotes a file which is a XML-Document having a schemaLocation attribute directing to xsdFile. If in this case xmlFile contains an invalid document the previously applied ErrorHandler will see fatal errors - that is correct. But when I remove the schemaLocation attribute from xmlFile the same procedure results in errors like 'unknown element ...' which reveal that xml-data and schema seem to be unrelated. I guess the latter is the same problem occuring with the above stated in-memory procedure.
// end of excerpt
The interesting part of the code I use for the in-memory variant looks somewhat like the following:
<code>
// the in memory schema
static const char gMySchema[] =
"<?xml version=\"1.0\" encoding=\"utf-8\" ?> \
<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" targetNamespace=\"urn:rtc:tgam:MNCfgTmpl\" \
elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\"> \
<xs:element name=\"root\"> \
<xs:complexType> \
<xs:sequence> \
<xs:element name=\"FirstElement\" type=\"xs:string\" /> \
<xs:element name=\"SecondElement\" minOccurs=\"0\" type=\"xs:string\" /> \
<xs:element name=\"ShirdElement\" type=\"xs:string\" /> \
</xs:sequence> \
</xs:complexType> \
</xs:element> \
</xs:schema> \
";
... -- omitted other stuff --
static DOMBuilder * getDOMBuilder(MyDOMErrorHandler *& myErrorHandler, const char pSchema2use[], Grammar *& grammar)
{
DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(L"LS");
DOMBuilder* theBuilder = ((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS, L"http://www.w3.org/2001/XMLSchema");
if (theBuilder->canSetFeature(XMLUni::fgDOMNamespaces, true)) theBuilder->setFeature(XMLUni::fgDOMNamespaces, true);
if (theBuilder->canSetFeature(XMLUni::fgDOMValidation, true)) theBuilder->setFeature(XMLUni::fgDOMValidation, true);
if (theBuilder->canSetFeature(XMLUni::fgDOMDatatypeNormalization, true)) theBuilder->setFeature(XMLUni::fgDOMDatatypeNormalization, true);
if (theBuilder->canSetFeature(XMLUni::fgXercesSchemaFullChecking, true)) theBuilder->setFeature(XMLUni::fgXercesSchemaFullChecking, true);
myErrorHandler = new MyDOMErrorHandler(); theBuilder->setErrorHandler(myErrorHandler);
if (pSchema2use)
{
grammar = NULL;
MemBufInputSource *myXSDInputSource = new MemBufInputSource((const XMLByte *const)gMySchema, (unsigned int) strlen(gMySchema), L"http://www.w3.org/2001/XMLSchema", false); // unaware about functionality of argument 3
Wrapper4InputSource wrapper(myXSDInputSource); // will adopt myXSDInputSource
grammar = theBuilder->loadGrammar(wrapper, Grammar::SchemaGrammarType);
}
return theBuilder; }
... -- omitted other stuff --
static int buildDOMDocument(const XMLByte* const buf, const unsigned int uiSize, DOMDocument* &doc)
{
DOMBuilder* theBuilder = NULL;
MyDOMErrorHandler* myErrorHandler = NULL;
Grammar *grammar = NULL;
theBuilder = getDOMBuilder(myErrorHandler, gMySchema, grammar); // see above
MemBufInputSource *myInputSource = new MemBufInputSource(buf, uiSize, L"TEST", false); // unaware about functionality of argument 3
Wrapper4InputSource wrapper(myInputSource); // will adopt myInputSource
int retVal = -1;
try
{
DOMDocument* newDoc = theBuilder->parse(wrapper);
retVal = (myErrorHandler->sawFatals() ? -9 : (myErrorHandler->sawErrors() ? -5 : 0));
if (newDoc)
doc = (DOMDocument*) newDoc->cloneNode(true);
}
catch (const XMLException& toCatch)
{
cout << "Exception message is: \n" << X2c(toCatch.getMessage()) << "\n";
}
catch (const DOMException& toCatch)
{
cout << "Exception message is: \n" << X2c(toCatch.msg) << "\n";
}
catch (...)
{
cout << "Unexpected Exception \n" ;
}
theBuilder->release();
delete myErrorHandler;
return retVal; }
// the in-memory xml created previously via ...
...
DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(L"Range");
DOMDocumentType* doctyp= 0; // impl->createDocumentType(L"root", NULL, L"TEST");
DOMDocument* doc = impl->createDocument(L"urn:rtc:tgam:MNCfgTmpl", L"root", doctyp);
doc->setEncoding(L"UTF-8");
// doc->setDocumentURI(L"http://www.w3.org/2001/XMLSchema-instance");// xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ???
DOMElement* root = doc->getDocumentElement();
...
// ... and generated via serialization by DOMWriter::writeNode() from this DOMDocument (from an example) looks like this
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <root xmlns="urn:rtc:tgam:MNCfgTmpl"> <FirstElement>aTextNode</FirstElement> <ThirdElement>aTextNode2</ThirdElement> </root>
</code>
