I ran into a situation in my application where I wanted to take a set-
difference between a small set and a large set. In this circumstance,
clojure.set/difference is needlessly slow. Because of this, in
general, a sequence of n clojure.set operations may run in O(n^2) time
when O(n) is easily achievable. Here's the basic idea:
(defn difference2 "Like difference, but fast for big s2 and small
s1" [s1 s2]
(reduce (fn [result item] (if (contains? s2 item) (disj result item)
result))
s1 s1))
(defn fast-difference "Like difference, but faster." [s1 s2]
(let [c1 (int (count s1))
c2 (int (count s2))]
(if (< c1 c2)
(difference2 s1 s2)
(clojure.set/difference s1 s2))))
user> (let [small-set (hash-set 1), big-set (set (range 10000))]
(doseq [fn [clojure.set/difference difference2 fast-difference]
[s1 s2] [[small-set big-set] [big-set small-set]]]
(time (dotimes [_ 1000] (fn s1 s2)))))
"Elapsed time: 3995.249 msecs" ; (set/difference small big)
"Elapsed time: 3.358 msecs" ; (set/difference big small)
"Elapsed time: 1.91 msecs" ; (difference2 small big)
"Elapsed time: 4935.183 msecs" ; (difference2 big small)
"Elapsed time: 3.088 msecs" ; (fast-difference small big)
"Elapsed time: 3.401 msecs" ; (fast-difference big small)
For the commutative operations (union and intersection), I suppose you
could argue that the user should know how the operations work and put
the arguments in the right (fast) order -- but sometimes, that
information will only be known at runtime. Overall, the possibility
of a big O improvement for a small constant penalty seems worth it to
me.
Questions or comments? Any objections/support for me adding this as
an issue?
Thanks,
Jason
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---