[ 
https://issues.apache.org/jira/browse/LUCENE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501358#comment-17501358
 ] 

Greg Miller commented on LUCENE-10302:
--------------------------------------

As I worked on this a little more, it occurs to me that I think we might want 
to implement two separate abstractions here. It feels a bit like overloading 
the idea of a PQ builder to provide sorted results.

I started sketching out the idea of a "top-N collector" (naming isn't great...) 
that abstracts the idea of just "collecting" the top-n provided elements, 
either sorted or unsorted. It encapsulates details about whether-or-not it ever 
uses a heap, and just exposes methods to get the top-n, either sorted or 
unsorted.

I think it's separately still very useful to have an explicit PQ builder for 
cases where the user knows they need a PQ but can build it in bulk up-front.

> PriorityQueue: optimize where we collect then iterate by using O(N) heapify
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-10302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10302
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE_PriorityQueue_Builder_with_heapify.patch
>
>
> Looking at LUCENE-8875 (LargeNumHitsTopDocsCollector.java ) I got to 
> wondering if there was faster-than O(N*log(N)) way of loading a PriorityQueue 
> when we provide a bulk array to initialize the heap/PriorityQueue.  It turns 
> out there is: the JDK's PriorityQueue supports this in its constructors, 
> referring to "This classic algorithm due to Floyd (1964) is known to be 
> O(size)" -- heapify() method.  There's 
> [another|https://www.geeksforgeeks.org/building-heap-from-array/]  that may 
> or may not be the same; I didn't look too closely yet.  I see a number of 
> uses of Lucene's PriorityQueue that first collects values and only after 
> collecting want to do something with the results (typical / unsurprising).  
> This lends itself to a builder pattern that can look similar to 
> LargeNumHitsTopDocsCollector in terms of first having an array used like a 
> list and then move over to the PriorityQueue if/when it gets full (it may 
> not).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to