gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1097623311
##########
lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java:
##########
@@ -258,13 +271,41 @@ public Matches matches(LeafReaderContext context, int
doc) throws IOException {
* On the given leaf context, try to either rewrite to a disjunction if
there are few matching
* terms, or build a bitset containing matching docs.
*/
- private WeightOrDocIdSet rewrite(LeafReaderContext context) throws
IOException {
+ private WeightOrDocIdSet rewrite(
+ LeafReaderContext context, long leadCost, boolean isPrimaryKeyField,
DocValuesType dvType)
+ throws IOException {
final LeafReader reader = context.reader();
Terms terms = reader.terms(field);
if (terms == null) {
return null;
}
+
+ long costThreshold = Long.MAX_VALUE;
+ if (dvType == DocValuesType.SORTED || dvType ==
DocValuesType.SORTED_SET) {
+ // Establish a threshold for switching to doc values. Give postings
a significant
+ // advantage for the primary-key case, since many of the primary-key
terms may not
+ // actually be in this segment. The 8x factor is arbitrary, based on
IndexOrDVQuery,
+ // but has performed well in benchmarks:
+ costThreshold = isPrimaryKeyField ? leadCost << 3 : leadCost;
+
+ if (termData.size() > costThreshold) {
+ // If the number of terms is > the number of candidates, DV should
perform better.
Review Comment:
I'm not sure actually. The up-front term-seeking you refer to is certainly a
cost, but it doesn't scale with the number of lead hits. So this can still be
cheaper. But also, +1 to the idea of trying out on-demand term seeking for
these situations!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]