Re: [PR] Specialize arc store for continuous label in FST [lucene]

via GitHub Fri, 03 Nov 2023 05:06:15 -0700


easyice commented on PR #12748:
URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792322904


   @mikemccand Thanks for the benchmarking, i also write 10 million docs of 
random long values, then use `TermInSetQuery` for benchmarking. here is the 
result:
   
   The file size of tip reduced ~2% 
   
   | | size |
   | --- | --- |
   | main | 1807149 |
   | PR | 1770259 |
   
   The query latency reduced ~7%. `termsCount` is the number of terms in 
`TermInSetQuery`, `hitRatio` refers to what percentage of the term will be hit. 
there is a bit of variance across runs, but they seem good overall.
   
   | hitRatio  | termsCount | tookMs(main) | tookMs(PR) | diff |
   | --- |  --- | --- | --- | --- |
   | 1%  | 64 | 177 | 164 | 92.66% |
   | 1%  | 512 | 1380 | 1312 | 95.07% |
   | 1%  | 2048 | 5225 | 5022 | 96.11% |
   | 25%  | 64 | 222 | 212 | 95.50% |
   | 25%  | 512 | 1462 | 1391 | 95.14% |
   | 25%  | 2048 | 5602 | 5533 | 98.77% |
   | 50%  | 64 | 216 | 204 | 94.44% |
   | 50%  | 512 | 1600 | 1513 | 94.56% |
   | 50%  | 2048 | 6193 | 5883 | 94.99% |
   | 75%  | 64 | 224 | 213 | 95.09% |
   | 75%  | 512 | 1702 | 1598 | 93.89% |
   | 75%  | 2048 | 6565 | 6289 | 95.80% |
   | 100%  | 64 | 233 | 218 | 93.56% |
   | 100%  | 512 | 1752 | 1736 | 99.09% |
   | 100%  | 2048 | 7057 | 6621 | 93.82% |
   
   crude benchmark code:
   
   ```
   static public long doSearch(int termCount, int hitRatio) throws IOException {
           Directory directory = 
FSDirectory.open(Paths.get("/Volumes/RamDisk/longdata"));
           IndexReader indexReader = DirectoryReader.open(directory);
           IndexSearcher searcher = new IndexSearcher(indexReader);
           searcher.setQueryCachingPolicy(
                   new QueryCachingPolicy() {
                       @Override
                       public void onUse(Query query) {
                       }
   
                       @Override
                       public boolean shouldCache(Query query) throws 
IOException {
                           return false;
                       }
                   });
   
           long total = 0;
           Query query = getQuery(termCount, hitRatio);
           for (int i = 0; i < 1000; i++) {
               long start = System.currentTimeMillis();
               doQuery(searcher, query);
               long end = System.currentTimeMillis();
               total += end - start;
           }
           //System.out.println("term count: " + termCount + ", took(ms): " + 
total);
           indexReader.close();
           directory.close();
           return total;
       }
   
       private static Query getQuery(int termCount, int hitRatio) {
           int hitCount = termCount * hitRatio / 100;
           int notHitCount = termCount - hitCount;
           List<BytesRef> terms = new ArrayList<>();
           for (int i = 0; i < hitCount; i++) {
               terms.add(new 
BytesRef(Long.toString(longs.get(RANDOM.nextInt(longs.size() - 1)))));
           }
   
           Random r = new Random();
           for (int i = 0; i < notHitCount; i++) {
               long v = r.nextLong();
               while (uniqueLongs.contains(v)) {
                   v = r.nextLong();
               }
               terms.add(new BytesRef(Long.toString(v)));
           }
           return new TermInSetQuery(FIELD, terms);
       }
   
       private static void doQuery(IndexSearcher searcher, Query query) throws 
IOException {
           searcher.search(
                   query,
                   new Collector() {
                       @Override
                       public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOException {
                           return new LeafCollector() {
                               @Override
                               public void setScorer(Scorable scorer) throws 
IOException {
                               }
   
                               @Override
                               public void collect(int doc) throws IOException {
                                   throw new CollectionTerminatedException();
                               }
                           };
                       }
   
                       @Override
                       public ScoreMode scoreMode() {
                           return ScoreMode.COMPLETE_NO_SCORES;
                       }
                   });
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Specialize arc store for continuous label in FST [lucene]

Reply via email to