Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

via GitHub Sun, 21 Dec 2025 21:32:52 -0800


prudhvigodithi commented on issue #13745:
URL: https://github.com/apache/lucene/issues/13745#issuecomment-3680554964


   
   > Also correct me if I'm wrong, but I believe we may be duplicating 
allocations in `TopFieldCollectorManager` as we allocate a queue of size 
`numHits` for every collector, but we may be calling a collector multiple times 
on the same segment right?
   > 
https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java/org/apache/lucene/search/TopFieldCollectorManager.java#L137-L173.
 Specifically through `FieldValueHitQueue#create` which mentions "NOTE: The 
instances returned by this method pre-allocate a full array of length numHits."
   
   I went through the `TotalHitCountCollectorManager` which uses putIfAbsent in 
intra-segment mode (https://github.com/apache/lucene/pull/13542) which says 
it’s already been done and skip or reuse the cached result since its pure 
count. But for `TopFieldCollectorManager`  the only duplication is memory (for 
example with 4 partitions 4 queues instead of 1) which is the necessary cost 
for lock free parallelism. Each partition independently finds its local top-K 
then reduce() should merges them, IMO the latency improvement from parallelism 
could outweigh the memory cost, let me conclude this with some tests on my 
local.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

Reply via email to