Anyone got an implementation of a mutable StringBuilder-like object? I could use it in Rebellion's implementation of `into-string` which currently isn't quadratic, but it definitely has the allocation problem.
On Sunday, June 27, 2021 at 11:36:14 AM UTC-7 bogdan wrote: > While I think the complexity piece is important, I feel like it's worth > pointing out just how much more expensive the allocation -- and its > ramifications, like the resulting GC pressure and CPU cache misses -- is > than one might think. Here's a quadratic version of the code that > avoids allocations: > > #lang racket > > (require racket/flonum) > > (define (xy->string x y) > (string-append > (~r x #:precision 1) "," > (~r y #:precision 1))) > > (define len 0) > (define dst (make-string 5000000)) > (define (string-append! b) > (string-copy! dst 0 dst 0 len) ;; intentionally performing pointless work > to be n^2 > (string-copy! dst len b) > (set! len (+ len (string-length b)))) > > (define (xy-vectors->string x-vec y-vec) > (for ((x (in-flvector x-vec)) > (y (in-flvector y-vec))) > (string-append! (xy->string x y)) > (string-append! " "))) > > (time > (let ([x (make-flvector 100000)] > [y (make-flvector 100000)]) > (xy-vectors->string x y))) > > On my machine (running Racket CS 8.1), this is faster than the > `call-with-output-string` version I posted earlier by about 50ms. > > > Jens Axel Søgaard writes: > > > Den søn. 27. jun. 2021 kl. 18.58 skrev Alessandro Motta < > [email protected] > >>: > > > >> I also like Jens' code for its pure functional elegance. I'm surprised > >> that building up a long list of short strings before joining them is so > >> much faster. After all, isn't the final `string-join` essentially doing > >> what the original code snipped did? (Clearly not! I will have a look at > >> the implementation.) > >> > > > > The following is the same answer as Robby's, but fills in some details. > > I wrote this mainly because I couldn't find a web-page with the > explanation. > > > > Let's say we have two strings a and b of lengths m and n respectively. > > > > How much work must (string-append a b) do? > > > > Conceptually the following steps need to be done: > > 1. Allocate a new string c of length m+n. > > 2. Copy the string a into c. > > 3. Copy the string b into c. > > > > This means that the time (work) needed to append the strings a and b are > > proportional to m+n. > > > > How much work is done here? > > > > (string-append "x" (string-append "y" (string-append "z" ""))) > > > > First (string-append "z" "w") takes time 1+0=1 with the result "z". > > Then (string-append "y" (string-append "z" "")) or (string-append "y" > "z") > > takes time 1+1=2 with the result "yz". > > Then (string-append "x" (string-append "y" (string-append "z" ""))) or > > (string-append "x" "yz") takes time 1+2 = 3 with result "xyz". > > In total we have used time 1+2+3 =6. > > > > We see that if we 3 times use string-append to add strings of length 1, > > then we use time 1+2+3=6. > > In general, if we n times use string-append to add strings of length 1, > > then we use time 1+2+3+...+n. > > The last sum is more or less n^2. See [1]. > > > > The same thing happens in your loop, where you repeatedly use > string-append > > to append a small string. > > The length of the small string is no longer 1, but the same happens - and > > the time used by the > > loop is proportional to n^2. > > > > > > Now suppose we have a list of the strings to append > > (list "x" "y" "z") > > and we need to append them. How much work is there now? > > > > Well, first we can calculate the length of the result 1+1+1 = 3. > > Then we allocate a new string of length 3. Then each individual > > string is copied and that takes time 1+1+1=3. > > > > Here, each string is only copied once. For comparison in the loop version > > the string z is copied multiple times. > > > > > > But wait - why does a loop that uses cons multiple times to build up > > a list not have the same problem? > > > > Because in (cons x xs) the existing list xs isn't copied. > > The steps are > > 1. a new pair with two cells (the car and the cdr) are allocated > > 2. the contents of the car is set to x > > 3. the contents of the cdr is set to xs. > > > > This always takes the same amount of time, no matter how long the list xs > > is. > > This means that the time used by cons is constant (that is, proportional > to > > 1). > > > > Note: This phenomenon that using string-append in a loop is not a Racket > > only problem. > > It is a common pitfall in many languages. > > > > > > > > [1] Remember the famous story of Gauss as a child that calculated > > 1+2+...1000 ? > > https://www.youtube.com/watch?v=Dd81F6-Ar_0 > > > > > > /Jens Axel - https://racket-stories.com > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/61691331-226f-4708-bd2a-42e5c6e5c2bcn%40googlegroups.com.

