jpountz commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2149257220
> Then before evaluating if these docs matches TwoPhaseIterator or not, we can perform prefetch on these buffered docs (via some prepareMatches mechanism on TwoPhaseIterator). This can be done, but I'd note that this would be a significant change to our APIs since `TwoPhaseIterator` only supports verifying the current document that the approximation is on. It is not possible to buffer matching documents from the approximation, to then check them with the `TwoPhaseIterator`. This is similar to the point I was making in a previous comment about buffering documents in collectors, `Scorer#score` only supports scoring the current document that the scorer is positioned on, it is not possible to buffer several documents and then evaluate their scores in `TopScoreDocCollector` (without API changes). > via some prepareMatches mechanism on TwoPhaseIterator FWIW one thing that is on my mind is that both postings and doc values take in the order of 1 or 2 bytes per document. So even a query that matches 0.1% of docs, evenly distributed in the doc ID space, would still end up fetching all pages in practice. So a very smart prefetching may only perform better than naive prefetching in the following cases: - Queries that are _extremely_ sparse. - Queries whose matches are highly clustered in the doc ID spare, because of index sorting, recursive graph bisection or early termination. But then I'd still expect some naive readahead logic to perform ok in such cases. For the extremely sparse case, it would fetch up to X times too many pages where X is the number of pages that get read ahead. For reasonable values of X, this should be ok. The other thing that is on my mind is that this sort of approach allows us doing it completely at the OS level, which gives additional efficiency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org