Let N be the total number of elements in your collection (e.g.
1000,000), and n the number of elements that you want to take out of
this collection (e.g 10).
By sorting the collection of N elements you spend N log N time. By
repeatedly adding an to the small collection and removing the minimum
you spend N log n time (if you have a good sorted collection that
takes log n time to insert).
The algorithm spelled out in imperative pseudocode is:
def top_n(n,coll)
let result = new SortedCollection(take n from coll)
for x in (drop n from coll):
result.add(x)
result.deleteMinimum() // adding 1 element and removing 1 keeps
the result collection at size n
return result
Hope this helps,
Jules
On Jun 23, 10:14 am, Daniel Lyons <[email protected]> wrote:
> On Jun 23, 2009, at 1:47 AM, Christophe Grand wrote:
>
>
>
>
>
> > I don't know if it has an official name but basically it's a
> > modified tree-sort: for each item you insert a value in a sorted
> > coll of size N and remove one item from this sorted coll, both ops
> > are O(sqrt(N)) thus O(n*sqrt(N)) for processing an input whose
> > length is n.
>
> > I'm away from a REPL but I had something like that in mind:
> > (defn top-n [coll n]
> > (let [add #(assoc %1 %2 (inc (%1 %2 0))
> > prune-min #(let [[[min n]] (rseq %)] (if (== 1 n) (dissoc %
> > min) (assoc % (dec n))))
> > seed (reduce add sorted-map (take n coll))]
> > (reduce (comp prune-min add) seed (drop n coll))))
>
> > (top 3 [5 4 3 5 8 2 7]) ; should return {5 1, 7 1, 8 1}
> > ; but
> > (top 3 [5 4 3 5 8 2]) ; should return {5 2, 8 1}
>
> I have to admit I don't really understand your code, so my apologies
> if I've missed something obvious.
>
> I think if you consider each element of N and do an operation that
> costs sqrt(N) with it, you'd arrive at O(N*sqrt(N)), which I think
> would be worse than O(N log N) which is what you'd get from just
> sorting the whole structure and taking the top n items. Is it really
> sqrt(N)? I'm under the impression insertion into a balanced binary
> tree is O(log(N)), but wouldn't you still need to have N of them? This
> algorithm reminds me of heap sort.
>
> —
> Daniel Lyons
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---