Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Alex Harsanyi Sun, 28 Jun 2020 15:44:21 -0700

I suggested using `string-append` because in my own performance 
investigations with reading 100Mb+ CSV files: constructing short tokens 
using string-append is faster than using a string port -- perhaps there is 
a fixed overhead with using string ports which makes `string-append` faster 
for short strings, but I don't know at what string length the string ports 
become faster.


I think using string ports will be definitely faster than using 
`list->string`, but for the difference between `string-append` and string 
ports, some performance measurement might be needed.

Thanks for looking into this,
Alex.

On Monday, June 29, 2020 at 5:30:43 AM UTC+8 [email protected] wrote:

> Thanks Alex for pointing out the use of list->string. I've created a PR (
> https://github.com/racket/racket/pull/3275) that changes that code to use 
> string ports instead (similar to Hendrik's suggestion, but the string port 
> handles resizing automatically). Could someone (John?) with some large XML 
> files lying around try the changes and see if they help?
>
> Ryan
>
>
> On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke <[email protected]> 
> wrote:
>
>> If anyone wants to optimize `read-xml` for particular classes of use, 
>> without changing the interface, it might be very helpful to run your 
>> representative tests using the statistical profiler.
>>
>> The profiler text report takes a little while of tracing through 
>> manually to get a feel for how to read and use it, but it can be 
>> tremendously useful, and is worth learning to do if you need performance.
>>
>> After a first pass with that, you might also want to look at how costly 
>> allocations/GC are, and maybe do some controlled experiments around 
>> that.  For example, force a few GC cycles, run your workload under 
>> profiler, check GC time during, and forced time after.  If you're 
>> dealing with very large graphs coming out of the parser, I don't know 
>> whether those are enough to matter with the current GC mechanism, but 
>> maybe also check GC time while you're holding onto large graphs, when 
>> you release them, and after they've been collected.  At some point, GC 
>> gets hard for at least me to reason about, but some things make sense, 
>> and other things you decide when to stop digging. :)  If you record all 
>> your measurements, you can compare empirically the how different changes 
>> to the code affect things, hopefully in representative situations.
>>
>> I went through a lot of these exercises to optimize a large system, and 
>> sped up dynamic Web page loads dramatically in the usual case (to the 
>> point we were then mainly limited by PostgreSQL query cost, not much by 
>> the application code in Scheme, nor our request&response network I/O), 
>> and also greatly reduced the pain of intermittent request latency spikes 
>> due to GC.
>>
>> One of the hotspots, I did half a dozen very different implementations, 
>> including C extension, and found an old-school pure Scheme 
>> implementation was fastest.  I compared the performance of the 
>> implementation using something like `shootout`, but there might be 
>> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/  
>> I also found we could be much faster if we made a change to what the 
>> algorithm guarantees, since it was more of a consistency check that 
>> turned out to be very expensive and very redundant, due to all the ways 
>> that utility code ended up being used.
>>
>> In addition to contrived experiments, I also rigged up a runtime option 
>> so that the server would save data from the statistical profiler for 
>> each request a Web server handled in production.  Which was tremendously 
>> useful, since it gave us real-world examples that were also difficult to 
>> synthesize (e.g., complex dynamic queries), and we could go from Web 
>> logs and user feedback, to exactly what happened.
>>
>> (In that system I optimized, we used Oleg's SXML tools very heavily 
>> throughout the system, plus some bespoke SXML tools for HTML and XML.  
>> There was one case in which someone had accidentally used the `xml` 
>> module, not knowing it was incompatible with the rest of the system, 
>> which caused some strange failures (no static checking) before it was 
>> discovered, and we changed that code to use SXML.)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/21f9bcbd-0f8b-4ffa-8f5f-5e8473680e00n%40googlegroups.com.

Re: [racket-users] Re: note about parsing speed of xml vs sxml?

Reply via email to