dsmiley commented on PR #14357: URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729965328
An aside: `org.apache.lucene.search.DisjunctionScorer.TwoPhase#matches` looks kind of sad, in that each matches() call is going to build a priority queue of "unverified matches" (DisiWrapper holding TwoPhaseIterator). It seems strange to populate one on visiting each doc instead of maintaining a fixed pre-sorted array of them, since we know which clauses have TPIs. The DisiWrapper could have a TPI index (by match cost) into an array of TwoPhaseIterators. The selected unverifiedMatches per matches() call might be noted via a bitmask/bitset that is cheap to set & clear & iterate set bits. Or could just use an array of DisiWrapper that is cleared & filled. No matchCost comparisons & heap manipulation. Not sure if I'm over-optimizing here. The use-case bringing me here is only one TPI, and it's approximation is all docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org