On Sun, Jun 28, 2020 at 06:01:27PM -0700, Alex Harsanyi wrote:
> I tested the your string port version and I also wrote a "string-append" 
> version of the xml reader and they are both slower by about 10-15% on my 
> machine, when compared to the current read-xml implementation which uses 
> `list->string`.  It looks like `list->string` is not the bottleneck here.

Odd -- to remove all that storage-allocaation overhead and to find it 
gets slower...

Perhaps the overhead lies in the Scheme interpreter?  Does it allocate 
lots of storage?
Would using chez racket help any?

-- hendrik
> 
> There are some small improvements that can be made from micro 
> optimizations.  For example, I changed `name-char?` to not use 
> `name-start?` but instead check for all chars, and I also changed 
> `lex-name` to construct the list in reverse and use `(list->string (reverse 
> chars))`, plus I reordered the cond condition to check the common case 
> first (that the next character is a name-char? and not a 'special one).  
> However, this resulted in only about 5-10% speed improvement, nowhere near 
> where the 4 time speedup when using `sxml`, as reported by John.
> 
> In the end, it may well be that speeding up `read-xml` can only be done by 
> these types of micro-optimizations.  Another thing I looked into was the 
> "pattern" used for reading: all the `read-xml` code will use the pattern of 
> "peeking" the next character, deciding if it is good, than reading it.  
> This is much slower than just reading the characters directly.  These are 
> the results from just reading in a 14Mb XML file:
> 
>     read-char only:  cpu time: 312 real time: 307 gc time: 0
>     read-char-or-special only:  cpu time: 750 real time: 741 gc time: 0
>     peek-char than read-char:  cpu time: 1234 real time: 1210 gc time: 0
>     peek-char-or-special than read-char-or-special:  cpu time: 1688 real 
> time: 1690 gc time: 0
> 
> Using this code:
> 
> (define file-name "your-test-file-here.xml")
> 
> (printf "read-char only~%")
> (collect-garbage 'major)
> (time
>  (call-with-input-file file-name
>    (lambda (in)
>      (let loop ([c (read-char in)])
>        (if (eof-object? c)
>            (void)
>            (loop (read-char in)))))))
> 
> (printf "read-char-or-special only~%")
> (collect-garbage 'major)
> (time
>  (call-with-input-file file-name
>    (lambda (in)
>      (let loop ([c (read-char-or-special in)])
>        (if (eof-object? c)
>            (void)
>            (loop (read-char-or-special in)))))))
> 
> (printf "peek-char than read-char~%")
> (collect-garbage 'major)
> (time
>  (call-with-input-file file-name
>   (lambda (in)
>     (let loop ([c (peek-char in)])
>       (if (eof-object? c)
>           (void)
>           (begin
>             (void (read-char in))
>             (loop (peek-char in))))))))
> 
> (printf "peek-char-or-special than read-char-or-special~%")
> (collect-garbage 'major)
> (time
>  (call-with-input-file file-name
>   (lambda (in)
>     (let loop ([c (peek-char-or-special in)])
>       (if (eof-object? c)
>           (void)
>           (begin
>             (void (read-char-or-special in))
>             (loop (peek-char-or-special in))))))))
> 
> Alex.
> 
> On Monday, June 29, 2020 at 5:30:43 AM UTC+8 [email protected] wrote:
> 
> > Thanks Alex for pointing out the use of list->string. I've created a PR (
> > https://github.com/racket/racket/pull/3275) that changes that code to use 
> > string ports instead (similar to Hendrik's suggestion, but the string port 
> > handles resizing automatically). Could someone (John?) with some large XML 
> > files lying around try the changes and see if they help?
> >
> > Ryan
> >
> >
> > On Sun, Jun 28, 2020 at 9:56 PM Neil Van Dyke <[email protected]> 
> > wrote:
> >
> >> If anyone wants to optimize `read-xml` for particular classes of use, 
> >> without changing the interface, it might be very helpful to run your 
> >> representative tests using the statistical profiler.
> >>
> >> The profiler text report takes a little while of tracing through 
> >> manually to get a feel for how to read and use it, but it can be 
> >> tremendously useful, and is worth learning to do if you need performance.
> >>
> >> After a first pass with that, you might also want to look at how costly 
> >> allocations/GC are, and maybe do some controlled experiments around 
> >> that.  For example, force a few GC cycles, run your workload under 
> >> profiler, check GC time during, and forced time after.  If you're 
> >> dealing with very large graphs coming out of the parser, I don't know 
> >> whether those are enough to matter with the current GC mechanism, but 
> >> maybe also check GC time while you're holding onto large graphs, when 
> >> you release them, and after they've been collected.  At some point, GC 
> >> gets hard for at least me to reason about, but some things make sense, 
> >> and other things you decide when to stop digging. :)  If you record all 
> >> your measurements, you can compare empirically the how different changes 
> >> to the code affect things, hopefully in representative situations.
> >>
> >> I went through a lot of these exercises to optimize a large system, and 
> >> sped up dynamic Web page loads dramatically in the usual case (to the 
> >> point we were then mainly limited by PostgreSQL query cost, not much by 
> >> the application code in Scheme, nor our request&response network I/O), 
> >> and also greatly reduced the pain of intermittent request latency spikes 
> >> due to GC.
> >>
> >> One of the hotspots, I did half a dozen very different implementations, 
> >> including C extension, and found an old-school pure Scheme 
> >> implementation was fastest.  I compared the performance of the 
> >> implementation using something like `shootout`, but there might be 
> >> better ways now in Racket. https://www.neilvandyke.org/racket/shootout/  
> >> I also found we could be much faster if we made a change to what the 
> >> algorithm guarantees, since it was more of a consistency check that 
> >> turned out to be very expensive and very redundant, due to all the ways 
> >> that utility code ended up being used.
> >>
> >> In addition to contrived experiments, I also rigged up a runtime option 
> >> so that the server would save data from the statistical profiler for 
> >> each request a Web server handled in production.  Which was tremendously 
> >> useful, since it gave us real-world examples that were also difficult to 
> >> synthesize (e.g., complex dynamic queries), and we could go from Web 
> >> logs and user feedback, to exactly what happened.
> >>
> >> (In that system I optimized, we used Oleg's SXML tools very heavily 
> >> throughout the system, plus some bespoke SXML tools for HTML and XML.  
> >> There was one case in which someone had accidentally used the `xml` 
> >> module, not knowing it was incompatible with the rest of the system, 
> >> which caused some strange failures (no static checking) before it was 
> >> discovered, and we changed that code to use SXML.)
> >>
> >> -- 
> >> You received this message because you are subscribed to the Google Groups 
> >> "Racket Users" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an 
> >> email to [email protected].
> >> To view this discussion on the web visit 
> >> https://groups.google.com/d/msgid/racket-users/68624c9a-df35-14a3-a912-df806799a7e0%40neilvandyke.org
> >> .
> >>
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/b663b6e8-ac63-4ecd-8212-c0175db5afden%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/20200629120946.jh5tk4ospbo3cu3v%40topoi.pooq.com.

Reply via email to