easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792322904
@mikemccand Thanks for the benchmarking, i also write 10 million docs of random long values, then use `TermInSetQuery` for benchmarking. here is the result: The file size of tip reduced ~2% | | size | | --- | --- | | main | 1807149 | | PR | 1770259 | The query latency reduced ~7%. `termsCount` is the number of terms in `TermInSetQuery`, `hitRatio` refers to what percentage of the term will be hit. there is a bit of variance across runs, but they seem good overall. | hitRatio | termsCount | tookMs(main) | tookMs(PR) | diff | | --- | --- | --- | --- | --- | | 1% | 64 | 177 | 164 | 92.66% | | 1% | 512 | 1380 | 1312 | 95.07% | | 1% | 2048 | 5225 | 5022 | 96.11% | | 25% | 64 | 222 | 212 | 95.50% | | 25% | 512 | 1462 | 1391 | 95.14% | | 25% | 2048 | 5602 | 5533 | 98.77% | | 50% | 64 | 216 | 204 | 94.44% | | 50% | 512 | 1600 | 1513 | 94.56% | | 50% | 2048 | 6193 | 5883 | 94.99% | | 75% | 64 | 224 | 213 | 95.09% | | 75% | 512 | 1702 | 1598 | 93.89% | | 75% | 2048 | 6565 | 6289 | 95.80% | | 100% | 64 | 233 | 218 | 93.56% | | 100% | 512 | 1752 | 1736 | 99.09% | | 100% | 2048 | 7057 | 6621 | 93.82% | crude benchmark code: ``` static public long doSearch(int termCount, int hitRatio) throws IOException { Directory directory = FSDirectory.open(Paths.get("/Volumes/RamDisk/longdata")); IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(indexReader); searcher.setQueryCachingPolicy( new QueryCachingPolicy() { @Override public void onUse(Query query) { } @Override public boolean shouldCache(Query query) throws IOException { return false; } }); long total = 0; Query query = getQuery(termCount, hitRatio); for (int i = 0; i < 1000; i++) { long start = System.currentTimeMillis(); doQuery(searcher, query); long end = System.currentTimeMillis(); total += end - start; } //System.out.println("term count: " + termCount + ", took(ms): " + total); indexReader.close(); directory.close(); return total; } private static Query getQuery(int termCount, int hitRatio) { int hitCount = termCount * hitRatio / 100; int notHitCount = termCount - hitCount; List<BytesRef> terms = new ArrayList<>(); for (int i = 0; i < hitCount; i++) { terms.add(new BytesRef(Long.toString(longs.get(RANDOM.nextInt(longs.size() - 1))))); } Random r = new Random(); for (int i = 0; i < notHitCount; i++) { long v = r.nextLong(); while (uniqueLongs.contains(v)) { v = r.nextLong(); } terms.add(new BytesRef(Long.toString(v))); } return new TermInSetQuery(FIELD, terms); } private static void doQuery(IndexSearcher searcher, Query query) throws IOException { searcher.search( query, new Collector() { @Override public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException { return new LeafCollector() { @Override public void setScorer(Scorable scorer) throws IOException { } @Override public void collect(int doc) throws IOException { throw new CollectionTerminatedException(); } }; } @Override public ScoreMode scoreMode() { return ScoreMode.COMPLETE_NO_SCORES; } }); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org