Hi all,
I've switched many nested filter/map/mapcat applications in my code to
using transducers. That brought a moderate speedup in certain cases
and the deeper the nesting has been before, the clearer the transducers
code is in comparison, so yay! :-)
However, I'm still quite unsure about the difference between `sequence`
and `eduction`. From the docs and experimentation, I came to the
assumptions below and I'd be grateful if someone with more knowledge
could verify/falsify/add:
- Return types differ: Sequence returns a standard lazy seq, eductions
an instance of Eduction.
- Eductions are reducible/sequable/iterable, i.e., basically I can use
them wherever a (lazy) seq would also do, so sequence and eduction
are quite interchangeable except when poking at internals, e.g.,
(.contains (sequence ...) x) works whereas (.contains (eduction ...)
x) doesn't.
- Both compute their contents lazily.
- Lazy seqs cache their already realized contents, eductions compute
them over and over again on each iteration.
Because of that, I came to the conclusion that whenever I ask myself if
one of my functions should return a lazy seq or an eduction, I should
use these rules:
1. If the function is likely to be used like
(let [xs (seq-producing-fn args)]
(or (do-stuff-with xs)
(do-other-stuff-with xs)
...))
that is, the resulting seq is likely to be bound to a variable
which is then used multiple times (and thus lazy seq caching is
benefitical), then use sequence.
2. If it is a private function only used internally and never with the
usage pattern of point 1, then definitively use eduction.
3. If its a public function which usually isn't used with a pattern as
in point 1, then I'm unsure. eduction is probably more efficient
but sequence fits better in the original almost everything returns
a lazy seq design. Also, the latter has the benefit that users of
the library don't need to know anything about transducers.
Is that sensible? Or am I completely wrong with my assumptions about
sequence and eduction?
On a related note, could someone please clarify the statement from the
transducers docs for `sequence`?
,----[ Docs of sequence at http://clojure.org/transducers ]
| The resulting sequence elements are incrementally computed. These
| sequences will consume input incrementally as needed and fully realize
| intermediate operations. This behavior differs from the equivalent
| operations on lazy sequences.
`----
I'm curious about the "fully realize intermediate operations" part.
Does it mean that in a "traditional"
(mapcat #(range %) (range 10000))
the inner range is also evaluated lazy but with
(sequence (mapcat #(range %)) (range 10000))
it is not? It seems so. At least dorun-ning these two expressions
shows that the "traditional" version is more than twice as fast than the
transducer version. Also, the same seems to hold for
(eduction (mapcat #(range %)) (range 10000))
which is exactly as fast (or rather slow) as the sequence version.
But wouldn't that mean that transducers with mapcat where the mapcatted
function isn't super-cheap is a bad idea in general at least from a
performance POV?
Bye,
Tassilo
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.