sohami commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2148405028
> @sohami I gave a try at a possible approach at #13450 in case you're curious. @jpountz Thanks for sharing this. Originally I was thinking the prefetch optimization only in collect phase but I am trying to understand if it can be used in iterators side of things as well. To understand better I am looking into `SortedNumericDocValuesRangeQuery` test to understand the flow when different iterators are involved. So far my general understanding is all the scoring and collection of docs via Collectors happens in the method `DefaultBulkScorer::Score`. The lead `scorerIterator` in that could either be a standalone iterator or wrapper on multiple iterators or an `approximation` iterator when `TwoPhaseIterator` is non-null. These are then passed down to `scoreAll` or `scoreRange` (ignoring the `competitiveIterator` for now). In either of `scoreAll` or `scoreRange` we iterate over the lead `scorerIterator` to get the matching docs and then check if the doc matches the `TwoPhaseIterator` or not to make it eligible for collection via collectors. So I see following flows/cases: a) When only lead `scorerIterator` is present, b) When both lead `scorerIterator` and `TwoPhaseIterator` is present, c) the collect phase which happens over doc that scorers have found. Based on my above understanding, I am thinking below and would love your feedback 1. For case (a), when only single iterator is involved the `readAhead` mechanism can be useful. This is considering a single iterator will not know what next match is until it goes to the next doc. 2. For case (b), we can potentially do combination of `readAhead` and `prefetch`. We can use `readAhead` on lead iterator and then buffer some of the matching docs from this lead iterator. Then before evaluating if these docs matches `TwoPhaseIterator` or not, we can perform prefetch on these buffered docs (via some `prepareMatches` mechanism on `TwoPhaseIterator`). Here we know which all docs will be used for evaluating matches on `TwoPhaseIterator`, so we should be able to prefetch data for those docs. Would like to understand more on your earlier feedback on this, as my understanding is collection will come afterwards. > maybe we buffer the next few doc IDs from the first-phase scorer and prefetch those >> FWIW this would break a few things, e.g. we have collectors that only compute the score when needed (e.g. when sorting by field then score). But if we need to buffer docs up-front, then we don't know at this point in time if scores are going to be needed or not, so we need to score more docs. Maybe it's still the right trade-off, I'm mostly pointing out that this would be a bigger trade-off than what we've done for prefetching until now. 3. Before calling collect phase on collectors, we can first buffer up the matching docs. Ask collectors to trigger optional `prefetch` of the docs which will be passed to it for collection. These docs are the ones which was produced by scorers with or without TwoPhaseIterator in the mix. I think for scenarios like 2 and 3 above where we know exact doc matches, performing prefetch could be useful vs readAhead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org