gsmiller commented on PR #12089: URL: https://github.com/apache/lucene/pull/12089#issuecomment-1407307849
> attach it to a github comment That works! Here's how I benchmarked. One note if you're interested in running this is to make sure to shuffle the genomes data prior to running or you get very skewed results. This is because the file is sorted by country code, so a postings-based approach is heavily favored by the natural index sorting if you index the lines in this order. [TiSBench.txt](https://github.com/apache/lucene/files/10526083/TiSBench.txt) I'm actually a bit surprised/impressed that our existing `IndexOrDocValues` functionality works as well as it does across these queries, given how rough of an estimate `#cost()` is on the current `TermInSetQuery`. There are some clear cases where term-seeking and using term-level stats in a heuristic helps make better decisions, but I expected the difference to be more pronounced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org