I suggested using `string-append` because in my own performance investigations with reading 100Mb+ CSV files: constructing short tokens using string-append is faster than using a string port -- perhaps there is a fixed overhead with using string ports which makes `string-append` faster for short strings, but I don't know at what string length the string ports become faster.
I think using string ports will be definitely faster than using `list->string`, but for the difference between `string-append` and string ports, some performance measurement might be needed. Thanks for looking into this, Alex. On Monday, June 29, 2020 at 5:30:43 AM UTC+8 [email protected] wrote: > Thanks Alex for pointing out the use of list->string. I've created a PR ( > https://github.com/racket/racket/pull/3275) that changes that code to use > string ports instead (similar to Hendrik's suggestion, but the string port > handles resizing automatically). Could someone (John?) with some large XML > files lying around try the changes and see if they help? > > Ryan > > > On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke <[email protected]> > wrote: > >> If anyone wants to optimize `read-xml` for particular classes of use, >> without changing the interface, it might be very helpful to run your >> representative tests using the statistical profiler. >> >> The profiler text report takes a little while of tracing through >> manually to get a feel for how to read and use it, but it can be >> tremendously useful, and is worth learning to do if you need performance. >> >> After a first pass with that, you might also want to look at how costly >> allocations/GC are, and maybe do some controlled experiments around >> that. For example, force a few GC cycles, run your workload under >> profiler, check GC time during, and forced time after. If you're >> dealing with very large graphs coming out of the parser, I don't know >> whether those are enough to matter with the current GC mechanism, but >> maybe also check GC time while you're holding onto large graphs, when >> you release them, and after they've been collected. At some point, GC >> gets hard for at least me to reason about, but some things make sense, >> and other things you decide when to stop digging. :) If you record all >> your measurements, you can compare empirically the how different changes >> to the code affect things, hopefully in representative situations. >> >> I went through a lot of these exercises to optimize a large system, and >> sped up dynamic Web page loads dramatically in the usual case (to the >> point we were then mainly limited by PostgreSQL query cost, not much by >> the application code in Scheme, nor our request&response network I/O), >> and also greatly reduced the pain of intermittent request latency spikes >> due to GC. >> >> One of the hotspots, I did half a dozen very different implementations, >> including C extension, and found an old-school pure Scheme >> implementation was fastest. I compared the performance of the >> implementation using something like `shootout`, but there might be >> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/ >> I also found we could be much faster if we made a change to what the >> algorithm guarantees, since it was more of a consistency check that >> turned out to be very expensive and very redundant, due to all the ways >> that utility code ended up being used. >> >> In addition to contrived experiments, I also rigged up a runtime option >> so that the server would save data from the statistical profiler for >> each request a Web server handled in production. Which was tremendously >> useful, since it gave us real-world examples that were also difficult to >> synthesize (e.g., complex dynamic queries), and we could go from Web >> logs and user feedback, to exactly what happened. >> >> (In that system I optimized, we used Oleg's SXML tools very heavily >> throughout the system, plus some bespoke SXML tools for HTML and XML. >> There was one case in which someone had accidentally used the `xml` >> module, not knowing it was incompatible with the rest of the system, >> which caused some strange failures (no static checking) before it was >> discovered, and we changed that code to use SXML.) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Racket Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org >> . >> > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/21f9bcbd-0f8b-4ffa-8f5f-5e8473680e00n%40googlegroups.com.

