Re: [ANN] VTD-XML Version 1.5 Released

Jimmy Zhang Mon, 20 Feb 2006 10:10:40 -0800

Hi, Thanks for the email.
My answers to your questions:
1. It is a tradeoff-VTD-XMl consumes more memory, but

is easy to use and more powerful, Any random access capableXML processing API *needs* to at least load the entire hierachicalstructure in memory. My take is that among SAX, STAX, DOM

and JDOM, vtd-xml is the least likely one to choke, and best one
to handle peak loads...
2. Agree with you, benchmarking a dummy SAX parser is unfair for VTD-XML,
that will make VTD-XML look prettier in real life scenario.
3. Look at all the vertical industry XML related vocubalry,  SOAP,
Rest and XML schema, and infoset data model, DTD seems deprecated
a bit, and VTD-XMl doesn't support external entities... other than that

VTD-XML is equally capable

Cheers,
jz

----- Original Message -----From: "Stefano Mazzocchi" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Sunday, February 19, 2006 8:57 PM
Subject: Re: [ANN] VTD-XML Version 1.5 Released

Hmmmm, I have to admit that I've toyed with this idea myself lately,especially since I'm diving deep into processing large quantities of XMLfiles these days (when I say 'large', I mean it, large that 32 bits ofaddress space are not enough).
The idea of non-extracting parsing is nice but there are few issues:
1) the memory requirements, still much less than DOM, but are still*way* more than an event-driven model like SAX. Cocoon, for example,would die if we were to move to a parser like this one, especially underload spikes.
2) benchmarking against a dummy SAX content handler is completelymeaningless. in order for the API to be of any use, you have to createstrings, you can't simply pass pointers to char arrays around. I betthat if the SAX parser could go on without creating strings, it would bejust as fast (xerces, in fact, does use a similar mechanism to returnyou the character() SAX event, where the entire document is kept inmemory and the start/finish pointers are passed instead of a new array.
3) 90% of the slowness comes from 10% of the details in the XML spec,which means in order to keep fast, you need to sacrifice compliance...which is not an option these days given how cheap silicon is.
But don't get me wrong, I think there is something interesting in whatyou are doing: I think it would be cool if you could serialize the 'treeindex' alongside the document on disk and provide some sort of b-treeindexing for it. It would help me in my multi-GB-of-XML day2day struggle.
You claim xpath random access, but what is the algorithmical complexityof that? O(1), O(log(n)), O(n), O(n*log(n))? If one were to store theparsed tree index on disk, how many pages would one need to page inbefore reaching the required xpath?
--
Stefano.

Re: [ANN] VTD-XML Version 1.5 Released

Reply via email to