prudhvigodithi commented on issue #13745: URL: https://github.com/apache/lucene/issues/13745#issuecomment-3680554964
> Also correct me if I'm wrong, but I believe we may be duplicating allocations in `TopFieldCollectorManager` as we allocate a queue of size `numHits` for every collector, but we may be calling a collector multiple times on the same segment right? > https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java/org/apache/lucene/search/TopFieldCollectorManager.java#L137-L173. Specifically through `FieldValueHitQueue#create` which mentions "NOTE: The instances returned by this method pre-allocate a full array of length numHits." I went through the `TotalHitCountCollectorManager` which uses putIfAbsent in intra-segment mode (https://github.com/apache/lucene/pull/13542) which says it’s already been done and skip or reuse the cached result since its pure count. But for `TopFieldCollectorManager` the only duplication is memory (for example with 4 partitions 4 queues instead of 1) which is the necessary cost for lock free parallelism. Each partition independently finds its local top-K then reduce() should merges them, IMO the latency improvement from parallelism could outweigh the memory cost, let me conclude this with some tests on my local. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
