Re: Why does the following Clojure code take 10x the time the C# version does? How to improve the Clojure version?

Steven Yi Fri, 15 May 2015 20:03:07 -0700

Ah, I see.  Well, I think then you can ignore the stuff about warming
up, as this certainly takes a while to run here:


"Elapsed time: 314763.666693 msecs"

I tried profiling with Yourkit and saw a couple of things to change:

;; I think lte with more than two args ends up being slower than
unrolling out here
;; I also tried type-hinting values from paper to ^long to avoid lte
calls with Number
(defn- covered?
  [[^long canvas-x ^long canvas-y] paper]
  (and (<= ^long (:x1 paper) canvas-x ) (<= canvas-x ^long (:x2 paper))
       (<= ^long (:y1 paper) canvas-y ) (<= canvas-y ^long (:y2 paper))))

;; for the reduce function in visible-color-frequencies-arr
;; using aset instead of aset-long, as it looked like aset-long was
using reflection
         (aset colorCounts color (+ 1 (aget colorCounts color)))

That got it down to:

"Elapsed time: 279864.041477 msecs"

I suspect you might get improvement too if you change
visible-color-frequencies-arr to use loop-recur instead of reduce
since you're doing a bit of magic there.

Unfortunately I have to stop at the moment as I have to leave on a
trip early in the morning, but hopefully that's useful.

steven


On Fri, May 15, 2015 at 9:07 PM, Amith George <[email protected]> wrote:
> Hi Steven,
>
> My bad. You need to invoke the code using the command
>
> lein run -m rdp.214-intermediate-arr 1 true
>
> The `1` tells it to select a certain input file, (in this case the biggest)
> and the `true` tells it to use the function that internally uses a java
> array (as opposed to the function that internally uses a transient map.)
>
> The above mentioned command takes around 250secs on my laptop. My apologies
> again, I should have made it clear how to execute the project.
>
> On Saturday, 16 May 2015 05:48:43 UTC+5:30, Steven Yi wrote:
>>
>> Hi Amith,
>>
>> I checked out your project from git and just doing 'lein run' I got a
>> reported:
>>
>> "Elapsed time: 185.651689 msecs"
>>
>> However, if I modify the -main function in 214_intermediate.clj to wrap
>> the time testing with (doseq [_ (range 20)]), to run the test multiple
>> times, the behavior is much better after the first 9 or so runs, and by the
>> end it is down to:
>>
>> "Elapsed time: 35.574945 msecs"
>>
>> I think you might be running into a situation where the VM hasn't run the
>> functions enough times to optimize.
>>
>> Rather than use the time function, you might want to try using
>> criterium[1] to benchmark the code. The site explains all the wonderful
>> things it does to get a good benchmark.
>>
>> Unfortunately, if your data set is small, and you're running a one off
>> calculation like this, I don't know if there's much to improve due to the
>> warmup cost. You could fiddle a bit with VM flags to try to optimize with
>> less calls (I can't recall the flag, but I think the JVM defaults to
>> optimizing after a functions been called 10,000 times).  On the other hand,
>> if you're processing larger datasets, I think it's reassuring that once
>> warmed up, the Clojure code performs pretty well.
>>
>> For reference, this was run on an Macbook Pro 13" early 2011, Core i7
>> 2.7ghz.
>>
>> steven
>>
>> [1] - https://github.com/hugoduncan/criterium/
>>
>>
>> On Friday, May 15, 2015 at 3:59:22 AM UTC-4, Amith George wrote:
>>>
>>> Thanks for the detailed suggestions. Implementing them did bring the
>>> execution time down to around 250secs. Though that value is still much
>>> longer than 45secs. Could you please verify if I have implemented them
>>> correctly?
>>>
>>> Code -
>>> https://github.com/amithgeorge/reddit-dailyprogrammer-clojure/blob/4ec02600e025d257a59d76fee0ad0eb01f4785ff/src/rdp/214_intermediate_arr.clj
>>>
>>> project.clj contains the line
>>>
>>>     :jvm-opts ^:replace ["-Xms1024m" "-Xmx1g" "-server"]
>>>
>>>   > 0) Switch to Clojure 1.7.0-beta3 - it's faster and some things below
>>> are dependent on it for best performance. And use Java 1.8.
>>>
>>> Done.
>>>
>>>   > 1) Parse the lines you're reading directly into longs (Clojure
>>> focuses on 64-bit primitives - longs and doubles)
>>>
>>>     (defn- read-line-nums
>>>       ([] (read-line-nums (read-line) #(Long/parseLong %1)))
>>>       ([input-line parse-fn]
>>>        (if-let [line input-line]
>>>          (->> line
>>>               (#(clojure.string/split %1 #" "))
>>>               (map parse-fn)))))
>>>
>>>   > 2) Put the longs first into a data structure that preserves the
>>> primitive type. The two best options for that here are records (which can
>>> have primitive fields) and arrays. I would create a Canvas defrecord with
>>> ^long width and height and a Paper defrecord with all ^long fields for
>>> example.
>>>
>>>     (defrecord Paper [^long color
>>>                       ^long x1  ^long y1
>>>                       ^long x2 ^long y2])
>>>
>>>     (defrecord Canvas [^long width ^long height])
>>>
>>>     (defn- make-paper
>>>       ([^long w ^long h] (make-paper [0 0 0 w h]))
>>>       ([^longs [color x y w h]]
>>>        (Paper. color x y (+ x w -1) (+ y h -1))))
>>>
>>> I had to make the second arity accept a vector, as it seems impossible to
>>> create a function accepting more than 4 primitive arguments. Though I
>>> suppose I could grouped the args into a `record` of its own?
>>>
>>>   > 3) Store the papers in a vector (using transient to create it)
>>>
>>> The `read-input-file` function which creates and stores the papers, it
>>> finishes within 20ms. So I didn't bother using a transient.
>>>
>>>   > 4) I suspect visible-color and covered? could probably be tightened
>>> up into a reduce over papers or a single loop-recur over papers - can't say
>>> I totally get what's happening there.
>>>
>>> The papers vector represents a sheets of papers that have been stacked on
>>> top of each other. ie, the last element in the vector is the canvas, and the
>>> first element is the topmost sheet. Each sheet is of different dimensions.
>>> Depending on the coordinates where they are placed, they may cover a part of
>>> the canvas. Different sheets may overlap in the areas they cover and thus we
>>> only want to consider the topmost sheet for that area.
>>>
>>> The `visible-color` function accepts a canvas coordinate and the stack
>>> papers and goes through the stack to find the first sheet that covers said
>>> coordinate. That papers color will be visible color for that coordinate.
>>>
>>>   > 5) In visible-color-frequencies, you could use "update" instead of
>>> get + transient assoc! on the acc map, but this is never going to be
>>> terribly fast. Another option here would be to create an array with the max
>>> color (you could track that while reading if it's not a well-known answer)
>>> and bash the array. That can retain int or long counters and will be *way*
>>> faster.
>>>
>>>     (defn- visible-color-frequencies-arr
>>>       [{:keys [colors canvas papers]}]
>>>       (let [colorCounts (long-array (count colors))]
>>>         (reduce
>>>          (fn [_ ^longs coord]
>>>            (if-let [color (visible-color coord papers)]
>>>              (aset-long colorCounts color (+ 1 (aget colorCounts color)))
>>>              _))
>>>          -1
>>>          (for [^long y (range (:height canvas))
>>>                ^long x (range (:width canvas))]
>>>            [x y]))
>>>          (zipmap (range) colorCounts)))
>>>
>>> It's a bit messy with me abusing reduce and totally ignoring the
>>> accumulator, but the version using arrays is atleast 10 seconds faster than
>>> the one using transient maps. That said, even the hashmap version took
>>> around 260-270secs. So the bulk of the time savings is caused by some other
>>> change.
>>>
>>>   > 6) You can use (set! *unchecked-math* :warn-on-boxed) to get faster
>>> math (no overflow checks) and also issue warnings (added in 1.7) if you
>>> happened to use boxed math by accident.
>>>
>>> I added these to both the old code (which didn't use type hints) and the
>>> new one. I ran the `-main` function of both from within the repl and using
>>> lein run. I didn't see any warnings in any of the executions. How am I
>>> supposed to use these?
>>>
>>>     (defn -main
>>>       ([] (-main "0" "false"))
>>>       ([index use-arrays]
>>>        (time
>>>         (binding [*unchecked-math* :warn-on-boxed
>>>                   *warn-on-reflection* true]
>>>           (if-not (Boolean/parseBoolean use-arrays)
>>>             (solve (input-files (Integer/parseInt index)))
>>>             (solve-arr (input-files (Integer/parseInt index))))))))
>>>
>>>   > Use: ^:jvm-opts ^:replace ["-server"]
>>>
>>> In my case there isn't any noticable difference between explicitly using
>>> '-server' and not using it. Which brings me to my final question,
>>>
>>>   > The major problem here is that you are using boxed math for
>>> everything instead of primitives.
>>>
>>> From what I understand of the changes I made, a reduction of almost 250
>>> seconds came about from simply using type hints. Is that normal? The
>>> incrementing of color counts in the original code, was that the boxed math
>>> you were referring to?
>>>
>>> In C#, using generic types consistently meant I never had to worry about
>>> boxing. I tried using `no.disassemble` to verify if the type hints really
>>> changed anything, but I only got more confused.
>>>
>>> Consider this example code where my-val is supposed to be a long and each
>>> of the functions is supposed to accept a long and return a long.
>>>
>>>     (-> my-val
>>>         (func-a)
>>>         (func-b)
>>>         (func-c))
>>>
>>> I will have to add typehints to original value as well to each function's
>>> input and output, right? Can typehints be added to denote a
>>> collection/sequence of longs? What about a collection/sequence of records?
>>>
>>>
>>> Finally, can the performance be improved any more? 250secs still feels
>>> too long compared to the C# version.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/JgxFQLP2E34/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Why does the following Clojure code take 10x the time the C# version does? How to improve the Clojure version?

Reply via email to