I managed to get it about as fast as Python by making it really imperative and rolling my own hash:
https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 Sam Tobin-Hochstadt writes: > Here are several variants of the code: > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > > The enabled version is about the fastest I can get without using > `unsafe` (which the rules said not to do). It's possible to optimize a > tiny bit more by avoiding sorting, but only a few milliseconds -- it > would be more significant if there were more different words. > > Switching to bytes works correctly for the given task, but wouldn't > always work in the case of general UTF8 input. But those versions > appeared not be faster for me. Also, writing my own string-downcase > didn't help. And using a big buffer and doing my own newline splitting > didn't help either. > > The version using just a regexp matching on a port (suggested by > Robby) turned out not to be faster either, so my suspicion is that the > original slowness is just using regexps for splitting words. > > Sam > > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > <[email protected]> wrote: >> >> Here's a somewhat-optimized version of the code: >> >> #lang racket/base >> (require racket/string racket/vector racket/port) >> >> (define h (make-hash)) >> >> (time >> (for* ([l (in-lines)] >> [w (in-list (string-split l))] >> [w* (in-value (string-downcase w))]) >> (hash-update! h w* add1 0))) >> >> (define v >> (time >> (for/vector #:length (hash-count h) >> ([(k v) (in-hash h)]) >> (cons k v)))) >> (time (vector-sort! v > #:key cdr)) >> (define p (current-output-port) #;(open-output-nowhere)) >> (time >> (for ([pair (in-vector v)]) >> (write-string (car pair) p) >> (write-string (number->string (cdr pair)) p) >> (newline p))) >> >> It's much more imperative, but also pretty nice and compact. The >> `printf` optimization is significant for that portion of the program, >> but that isn't much of the running time. The overall running time for >> 10 copies of the KJV is about 9 seconds on my laptop. >> >> I think the remaining difference between Racket and other languages is >> likely the `string-split` and `string-downcase` functions, plus the >> relatively-inefficient string representation that Racket uses. >> >> Sam >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <[email protected]> >> wrote: >> > >> > Hi David, >> > >> > Yes, the 21 seconds includes the interpreter startup time. I have done a >> > simple test to see how long it takes: >> > >> > $ time racket -e '(displayln "Hello, world")' >> > Hello, world >> > >> > real 0m0.479s >> > user 0m0.449s >> > sys 0m0.030s >> > >> > I have also put my code inside a main function and profiled it: >> > >> > Profiling results >> > ----------------- >> > Total cpu time observed: 20910ms (out of 20970ms) >> > Number of samples taken: 382 (once every 55ms) >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) >> > >> > ============================================================== >> > Caller >> > Idx Total Self Name+src Local% >> > ms(pct) ms(pct) Callee >> > ============================================================== >> > [1] 20910(100.0%) 0(0.0%) [running body] >> > ...word-occurences-profile.rkt":##f >> > profile-thunk [2] 100.0% >> > -------------------------------------------------------------- >> > [running body] [1] 100.0% >> > [2] 20910(100.0%) 0(0.0%) profile-thunk >> > ...ket/pkgs/profile-lib/main.rkt:9:0 >> > run [3] 100.0% >> > -------------------------------------------------------------- >> > profile-thunk [2] 100.0% >> > [3] 20910(100.0%) 0(0.0%) run >> > ...share/racket/pkgs/profile-lib/main.rkt:39:2 >> > main [4] 100.0% >> > -------------------------------------------------------------- >> > run [3] 100.0% >> > [4] 20910(100.0%) 50(0.2%) main >> > ...cket/count-word-occurences-profile.rkt:5:0 >> > read-from-stdin-it [5] 98.5% >> > ??? [6] 0.2% >> > -------------------------------------------------------------- >> > main [4] 100.0% >> > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it >> > ...-occurences-profile.rkt:19:6 >> > internal-split [7] 42.8% >> > -------------------------------------------------------------- >> > main [4] 100.0% >> > [6] 51(0.2%) 0(0.0%) ??? >> > ...cket/collects/racket/private/sort.rkt:369:3 >> > generic-sort/key [8] 100.0% >> > -------------------------------------------------------------- >> > read-from-stdin-it [5]100.0% >> > [7] 8810(42.1%) 3528(16.9%) internal-split >> > ...collects/racket/string.rkt:117:0 >> > regexp-split [9] 59.9% >> > -------------------------------------------------------------- >> > ??? [6] 100.0% >> > [8] 51(0.2%) 0(0.0%) generic-sort/key >> > .../racket/private/sort.rkt:156:2 >> > copying-mergesort [10]100.0% >> > -------------------------------------------------------------- >> > internal-split [7] 100.0% >> > [9] 5282(25.3%) 2810(13.4%) regexp-split >> > ...ts/racket/private/string.rkt:338:2 >> > loop [11] 46.8% >> > -------------------------------------------------------------- >> > generic-sort/key [8] 10.0% >> > copying-mergesort [10] 90.0% >> > [10] 51(0.2%) 51(0.2%) copying-mergesort >> > ...racket/private/sort.rkt:129:8 >> > copying-mergesort [10] 90.0% >> > -------------------------------------------------------------- >> > regexp-split [9] 100.0% >> > [11] 2471(11.8%) 2471(11.8%) loop >> > ...t/collects/racket/private/string.rkt:169:7 >> > -------------------------------------------------------------- >> > >> > Kind regards, >> > Pawel >> > >> > >> > On Thursday, March 18, 2021 at 2:09:35 PM UTC [email protected] wrote: >> >> >> >> Hi Pawel, >> >> >> >> I'll take a look at the code later, but did that 21 seconds include >> >> startup time for the interpreter? >> >> >> >> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski <[email protected]> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I am a Racket beginner and I have come across this article: >> >>> >> >>> https://benhoyt.com/writings/count-words/ >> >>> >> >>> This is my attempt at solving the challenge: >> >>> >> >>> https://pastebin.com/kL16w5Hc >> >>> >> >>> However when I have benchmarked it, it takes ~21 seconds to run compared >> >>> to the Python and Ruby versions which take around 3-4 seconds. >> >>> >> >>> I understand that both Ruby and Python probably have the string >> >>> operations and hash tables implemented in optimized C but is there >> >>> anything I can do to improve performance of my program? >> >>> >> >>> Many thanks for all help and suggestions. >> >>> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups "Racket Users" group. >> >>> To unsubscribe from this group and stop receiving emails from it, send >> >>> an email to [email protected]. >> >>> To view this discussion on the web visit >> >>> https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com. >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Racket Users" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to [email protected]. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msgid/racket-users/09c58a34-bd2d-49e7-bfbd-d3253c1e6dd1n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/m2wnu42rol.fsf%40192.168.0.142.

