Re: [racket-users] Word Count program/benchmark performance

Bogdan Popa Thu, 18 Mar 2021 12:22:19 -0700

I managed to get it about as fast as Python by making it really
imperative and rolling my own hash:


https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571

Sam Tobin-Hochstadt writes:

> Here are several variants of the code:
> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>
> The enabled version is about the fastest I can get without using
> `unsafe` (which the rules said not to do). It's possible to optimize a
> tiny bit more by avoiding sorting, but only a few milliseconds -- it
> would be more significant if there were more different words.
>
> Switching to bytes works correctly for the given task, but wouldn't
> always work in the case of general UTF8 input. But those versions
> appeared not be faster for me. Also, writing my own string-downcase
> didn't help. And using a big buffer and doing my own newline splitting
> didn't help either.
>
> The version using just a regexp matching on a port (suggested by
> Robby) turned out not to be faster either, so my suspicion is that the
> original slowness is just using regexps for splitting words.
>
> Sam
>
> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> <[email protected]> wrote:
>>
>> Here's a somewhat-optimized version of the code:
>>
>> #lang racket/base
>> (require racket/string racket/vector racket/port)
>>
>> (define h (make-hash))
>>
>> (time
>>  (for* ([l (in-lines)]
>>         [w (in-list (string-split l))]
>>         [w* (in-value (string-downcase w))])
>>    (hash-update! h w* add1 0)))
>>
>> (define v
>>   (time
>>    (for/vector #:length (hash-count h)
>>        ([(k v) (in-hash h)])
>>      (cons k v))))
>> (time (vector-sort! v > #:key cdr))
>> (define p (current-output-port) #;(open-output-nowhere))
>> (time
>>  (for ([pair (in-vector v)])
>>    (write-string (car pair) p)
>>    (write-string (number->string (cdr pair)) p)
>>    (newline p)))
>>
>> It's much more imperative, but also pretty nice and compact. The
>> `printf` optimization is significant for that portion of the program,
>> but that isn't much of the running time. The overall running time for
>> 10 copies of the KJV is about 9 seconds on my laptop.
>>
>> I think the remaining difference between Racket and other languages is
>> likely the `string-split` and `string-downcase` functions, plus the
>> relatively-inefficient string representation that Racket uses.
>>
>> Sam
>>
>>
>> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <[email protected]> 
>> wrote:
>> >
>> > Hi David,
>> >
>> > Yes, the 21 seconds includes the interpreter startup time. I have done a 
>> > simple test to see how long it takes:
>> >
>> > $ time racket -e '(displayln "Hello, world")'
>> > Hello, world
>> >
>> > real    0m0.479s
>> > user    0m0.449s
>> > sys    0m0.030s
>> >
>> > I have also put my code inside a main function and profiled it:
>> >
>> > Profiling results
>> > -----------------
>> >   Total cpu time observed: 20910ms (out of 20970ms)
>> >   Number of samples taken: 382 (once every 55ms)
>> >   (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >
>> > ==============================================================
>> >                                   Caller
>> >  Idx    Total         Self      Name+src                Local%
>> >         ms(pct)       ms(pct)     Callee
>> > ==============================================================
>> >  [1] 20910(100.0%)     0(0.0%)  [running body] 
>> > ...word-occurences-profile.rkt":##f
>> >                                   profile-thunk [2]     100.0%
>> > --------------------------------------------------------------
>> >                                   [running body] [1]    100.0%
>> >  [2] 20910(100.0%)     0(0.0%)  profile-thunk 
>> > ...ket/pkgs/profile-lib/main.rkt:9:0
>> >                                   run [3]               100.0%
>> > --------------------------------------------------------------
>> >                                   profile-thunk [2]     100.0%
>> >  [3] 20910(100.0%)     0(0.0%)  run 
>> > ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> >                                   main [4]              100.0%
>> > --------------------------------------------------------------
>> >                                   run [3]               100.0%
>> >  [4] 20910(100.0%)    50(0.2%)  main 
>> > ...cket/count-word-occurences-profile.rkt:5:0
>> >                                   read-from-stdin-it [5] 98.5%
>> >                                   ??? [6]                 0.2%
>> > --------------------------------------------------------------
>> >                                   main [4]              100.0%
>> >  [5] 20606(98.5%)  11796(56.4%) read-from-stdin-it 
>> > ...-occurences-profile.rkt:19:6
>> >                                   internal-split [7]     42.8%
>> > --------------------------------------------------------------
>> >                                   main [4]              100.0%
>> >  [6]    51(0.2%)       0(0.0%)  ??? 
>> > ...cket/collects/racket/private/sort.rkt:369:3
>> >                                   generic-sort/key [8]  100.0%
>> > --------------------------------------------------------------
>> >                                   read-from-stdin-it [5]100.0%
>> >  [7]  8810(42.1%)   3528(16.9%) internal-split 
>> > ...collects/racket/string.rkt:117:0
>> >                                   regexp-split [9]       59.9%
>> > --------------------------------------------------------------
>> >                                   ??? [6]               100.0%
>> >  [8]    51(0.2%)       0(0.0%)  generic-sort/key 
>> > .../racket/private/sort.rkt:156:2
>> >                                   copying-mergesort [10]100.0%
>> > --------------------------------------------------------------
>> >                                   internal-split [7]    100.0%
>> >  [9]  5282(25.3%)   2810(13.4%) regexp-split 
>> > ...ts/racket/private/string.rkt:338:2
>> >                                   loop [11]              46.8%
>> > --------------------------------------------------------------
>> >                                   generic-sort/key [8]   10.0%
>> >                                   copying-mergesort [10] 90.0%
>> > [10]    51(0.2%)      51(0.2%)  copying-mergesort 
>> > ...racket/private/sort.rkt:129:8
>> >                                   copying-mergesort [10] 90.0%
>> > --------------------------------------------------------------
>> >                                   regexp-split [9]      100.0%
>> > [11]  2471(11.8%)   2471(11.8%) loop 
>> > ...t/collects/racket/private/string.rkt:169:7
>> > --------------------------------------------------------------
>> >
>> > Kind regards,
>> > Pawel
>> >
>> >
>> > On Thursday, March 18, 2021 at 2:09:35 PM UTC [email protected] wrote:
>> >>
>> >> Hi Pawel,
>> >>
>> >> I'll take a look at the code later, but did that 21 seconds include 
>> >> startup time for the interpreter?
>> >>
>> >> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski <[email protected]> 
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I am a Racket beginner and I have come across this article:
>> >>>
>> >>> https://benhoyt.com/writings/count-words/
>> >>>
>> >>> This is my attempt at solving the challenge:
>> >>>
>> >>> https://pastebin.com/kL16w5Hc
>> >>>
>> >>> However when I have benchmarked it, it takes ~21 seconds to run compared 
>> >>> to the Python and Ruby versions which take around 3-4 seconds.
>> >>>
>> >>> I understand that both Ruby and Python probably have the string 
>> >>> operations and hash tables implemented in optimized C but is there 
>> >>> anything I can do to improve performance of my program?
>> >>>
>> >>> Many thanks for all help and suggestions.
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google 
>> >>> Groups "Racket Users" group.
>> >>> To unsubscribe from this group and stop receiving emails from it, send 
>> >>> an email to [email protected].
>> >>> To view this discussion on the web visit 
>> >>> https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Racket Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to [email protected].
>> > To view this discussion on the web visit 
>> > https://groups.google.com/d/msgid/racket-users/09c58a34-bd2d-49e7-bfbd-d3253c1e6dd1n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/m2wnu42rol.fsf%40192.168.0.142.

Re: [racket-users] Word Count program/benchmark performance

Reply via email to