On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote: > > Hi Leon, > > I think this is an edge case related to how varargs functions are > implemented in Clojure. > > The varargs arity of `max` is implemented with `reduce1`: core.clj line > 1088 > <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088> > > `reduce1` is a simplified implementation of "reduce" defined early in > clojure.core before the optimized reduction protocols have been loaded: > core.clj > line 894 > <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894>. > > `reduce1` is implemented in terms of lazy sequences, with support for > chunking. > > So `apply max` defaults to using chunked lazy sequence operations. `map` > and `range` both return chunked sequences. > > `eduction` returns an Iterable, so when you `apply max` on it, it turns > the Iterable into a Seq, but it's not a chunked seq. Therefore, it's > slightly slower than `apply max` on a chunked seq. >
seqs on eductions *are* chunked - they will fall into this case during seq: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525 which produces a chunked sequence over an Iterable. > In this case, to ensure you're using the fast-path internal reduce over ` > eduction`, you can use `reduce` directly: > (reduce max 0 (eduction (map inc) (range 100000))) > You must provide an init value because `eduction` does not assume the > "init with first element" behavior of sequences. > > This version, in my informal benchmarking, is the fastest. > > Lots of functions in clojure.core use `reduce1` in their varargs > implementation. Perhaps they could be changed to use the optimized `reduce`, > but this might add a lot of repeated definitions as clojure.core is > bootstrapping itself. I'm not sure. > For various bootstrapping reasons, this is a hard change. > In general, I would not assume that `eduction` is automatically faster > than lazy sequences. It will be faster only in the cases where it can use > the optimized reduction protocols such as InternalReduce. If the optimized > path isn't available, many operations will fall back to lazy sequences for > backwards-compatibility. > > I would suggest using `eduction` only when you *know* you're going to > consume the result with `reduce` or `transduce`. As always, test first, and > profilers are your friend. :) > Use eduction for delayed eager *non-cached* execution. Seqs give you delayed *cached* execution. If you're doing a transformation once, or if the thing you're doing would consume too many resources if cached, then use eduction. If you need to do a transformation once and then use the result multiple times, it's better to use sequence+transducer to get the caching effect and the benefits of reduced allocation during transformation. Chunked seqs are surprisingly fast, particularly when all of the operations in a nested transformation are chunked. However, every new layer adds another set of (chunked) sequence allocation. Eduction or anything transducer-based is going to do no seq allocation and execute as a single eager pass. Generally this means that transducer stuff will win more if the collection source is reducible, if the inputs are "large" (more input = more win), or if the number of transformations is >1 (more transformations = more wins). > > –S > > > > On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote: >> >> My understanding was that if I pass an eduction to a process using >> reduce, I can save the computer time and space because the per step >> overhead of lazy sequences is gone and also the entire sequence does not >> have to reside in memory at once. >> >> When I time the difference between (apply max (map inc (range 100000))) >> and (apply max (eduction (map inc) (range 100000))), the lazy-seq variant >> wins. >> >> I'd like to understand why, and when eductions should be used instead. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
