Re: [I] Improve Lucene's I/O concurrency [lucene]

via GitHub Wed, 29 May 2024 11:44:17 -0700


msfroh commented on issue #13179:
URL: https://github.com/apache/lucene/issues/13179#issuecomment-2138047304


   > This should work, though I'm wary of making it the new way that collectors 
need to interact with doc values if they want to be able to take advantage of 
prefetching. E.g. we also have collectors for top hits sorted by field, where 
collecting all hits ahead of time would kill the benefits of dynamic pruning. I 
wonder if there are approaches that don't require collecting all matches 
up-front? Access to doc values is forward-only, so prefetching the first page 
only and then relying on some form of read ahead would hopefully do what we 
need?
   
   Couldn't we do both with the suggested prefetch operation on 
`DocValuesIterator`? Just prefetch as much or as little as needed for the 
particular use-case.
   
   For the general top doc collector / bulk scorer, the doc value prefetch 
could look "just ahead", prefetching as it goes (maybe we buffer the next few 
doc IDs from the first-phase scorer and prefetch those?). Am I correct in 
understanding that prefetching an already-fetched page is (at least 
approximately) a no-op?
   
   If we want to collect all the doc IDs during the collect phase (as Lucene's 
`FacetsCollector` does) and then prefetch them all at once to compute facet 
counts, that works too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Improve Lucene's I/O concurrency [lucene]

Reply via email to