maosuhan opened a new pull request, #11939:
URL: https://github.com/apache/lucene/pull/11939
### Description
When we execute TermRangeQuery or TermInSet query, lucene use
DocIdSetBuilder to store doc id list. When the doc id list becomes large, it
will convert from array to bitset in upgradeToBitSet. When new doc id is added,
the `counter` variable of DocIdSetBuilder is unchanged, and the cost is
incorrect in DocIdSetBuilder.build.
How to reproduce:
Directory dir = FSDirectory.open(Files.createTempDirectory(null, new
FileAttribute[0]));
IndexWriter w = new IndexWriter(dir, new IndexWriterConfig());
for (int i = 100000; i < 300000; ++i) {
Document doc = new Document();
doc.add(new StringField("f1", i + "", Field.Store.NO));
w.addDocument(doc);
}
w.forceMerge(1);
IndexReader reader = DirectoryReader.open(w);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.setQueryCache(null);
Query query = new TermRangeQuery("f1", new BytesRef("200000"), new
BytesRef("300000"), true, true);
Weight weight = searcher.createWeight(searcher.rewrite(query),
ScoreMode.COMPLETE, 1);
ScorerSupplier scorerSupplier =
weight.scorerSupplier(searcher.getIndexReader().leaves().get(0));
System.out.println(scorerSupplier.cost());
it is wrong cost=1026, the actual cost should be 100000. This will cause
some performance unexpected issue like lead selection in bool query.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]