Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Alex Harsanyi Mon, 29 Jun 2020 18:24:59 -0700


On Tuesday, June 30, 2020 at 7:48:14 AM UTC+8 Neil Van Dyke wrote:


> Is even 2x speedup helpful for your purpose? 


Yes it is, and for my purpose `read-xml` is fine even without any speed 
improvement.  In the sports field, XML (via the TCX format) is a legacy 
technology.  Typical TCX files are about 1Mb in size, the 14Mb one is a 
very large one.   Setting ` xml-count-bytes` to #t while calling `read-xml` 
gets me a speed improvement at a low effort, but it is not worth adding 
another package dependency just to support a legacy technology.

3 seconds is one old magic 
> number for user patience in HCI, so I suppose there's still a big 
> difference between 4 seconds and almost 10 seconds? 
>

I am not sure where you got the 3 seconds from, but even 3 seconds is too 
long to wait on a button callback.  For large files, both read-xml and sxml 
would need to have a progress dialog with a cancel button, or some other 
form of user feedback, if one wants to make a "well behaved" GUI.
 

> For large (and absolutely massive) XML... SSAX can shine even better 
> than in this comparison, since you can, say, populate a database *while 
> you're parsing, without first constructing the intermediate 
> representation* of xexpr or SXML.  GC-wise, with the database-populating 
> scenario, you'll probably end up with small, little-referencing, local, 
> short-lived allocations.  Besides GC costs, you'll also use less RAM 
> (possibly lower AWS bill), and be less likely to push into swap (which 
> would be bad for performance). 
>

... if you are willing to deal with the complexity of a SAX interface, that 
is.  I have written code for parsing documents (correctly!) using a SAX 
interface, and the resulting code was so complex that I had to use a code 
generator for it, but yes, the resulting code was very fast.   Would I do 
it again? No.

The complexity of SAX parsing is probably why most people use a DOM style 
interface...
 

> In addition to SSAX's current performance characteristics and 
> opportunities... There might also be opportunity to optimize SSAX 
> significantly for Racket. Oleg is a famously capable Scheme programmer, 
> but he was writing SSAX in fairly portable Scheme code, a couple decades 
> ago, when he wrote SSAX.  I did an initial packaging of SSAX for PLT 
> Scheme, Kirill Lisovsky later did many packagings of various SXML-ish 
> tools (including his own), and then John Clements did more work to 
> package Oleg's SXML-ish tools for Racket... But I don't know that anyone 
> has had motivation to try to optimize Racket's SSAX port, using current 
> Racket features, and tuning for current performance characteristics. 
>

> Side note regarding performance comparison... FWIW, SSAX might be doing 
> some things `read-xml` doesn't, such as namespace resolution, entity 
> reference resolution, and some validation. 
>

You used the phrase "might be doing...", does that mean that it might not 
do those things?

Alex.

 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/affcfe0e-a5a7-43a6-9019-8876dc40ed03n%40googlegroups.com.

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Reply via email to