Transducers: sequence versus eduction

Tassilo Horn Wed, 01 Apr 2015 00:13:02 -0700

Hi all,

I've switched many nested filter/map/mapcat applications in my code to
using transducers.  That brought a moderate speedup in certain cases
and the deeper the nesting has been before, the clearer the transducers
code is in comparison, so yay! :-)


However, I'm still quite unsure about the difference between `sequence`
and `eduction`.  From the docs and experimentation, I came to the
assumptions below and I'd be grateful if someone with more knowledge
could verify/falsify/add:

  - Return types differ: Sequence returns a standard lazy seq, eductions
    an instance of Eduction.

  - Eductions are reducible/sequable/iterable, i.e., basically I can use
    them wherever a (lazy) seq would also do, so sequence and eduction
    are quite interchangeable except when poking at internals, e.g.,
    (.contains (sequence ...) x) works whereas (.contains (eduction ...)
    x) doesn't.

  - Both compute their contents lazily.

  - Lazy seqs cache their already realized contents, eductions compute
    them over and over again on each iteration.

Because of that, I came to the conclusion that whenever I ask myself if
one of my functions should return a lazy seq or an eduction, I should
use these rules:

  1. If the function is likely to be used like

     (let [xs (seq-producing-fn args)]
       (or (do-stuff-with xs)
           (do-other-stuff-with xs)
           ...))

     that is, the resulting seq is likely to be bound to a variable
     which is then used multiple times (and thus lazy seq caching is
     benefitical), then use sequence.

  2. If it is a private function only used internally and never with the
     usage pattern of point 1, then definitively use eduction.

  3. If its a public function which usually isn't used with a pattern as
     in point 1, then I'm unsure.  eduction is probably more efficient
     but sequence fits better in the original almost everything returns
     a lazy seq design.  Also, the latter has the benefit that users of
     the library don't need to know anything about transducers.

Is that sensible?  Or am I completely wrong with my assumptions about
sequence and eduction?

On a related note, could someone please clarify the statement from the
transducers docs for `sequence`?

,----[ Docs of sequence at http://clojure.org/transducers ]
| The resulting sequence elements are incrementally computed. These
| sequences will consume input incrementally as needed and fully realize
| intermediate operations.  This behavior differs from the equivalent
| operations on lazy sequences.
`----

I'm curious about the "fully realize intermediate operations" part.
Does it mean that in a "traditional"

    (mapcat #(range %) (range 10000))

the inner range is also evaluated lazy but with

    (sequence (mapcat #(range %)) (range 10000))

it is not?  It seems so.  At least dorun-ning these two expressions
shows that the "traditional" version is more than twice as fast than the
transducer version.  Also, the same seems to hold for

    (eduction (mapcat #(range %)) (range 10000))

which is exactly as fast (or rather slow) as the sequence version.

But wouldn't that mean that transducers with mapcat where the mapcatted
function isn't super-cheap is a bad idea in general at least from a
performance POV?

Bye,
Tassilo

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Transducers: sequence versus eduction

Reply via email to