easyice commented on PR #12842:
URL: https://github.com/apache/lucene/pull/12842#issuecomment-1856293777

   Sorry for the late update! i spent some more time on other PR, i encoded the 
positions with group-varint when `storeOffsets` is false and there are no 
payloads. with the last commit, it uses a long[] buffer with 128 size to 
encode/decode. i wrote a simple benchmark to show `flush()` performance, it 
seems no significant performance improvement, because `readVInt` and 
`readGroupVInt` have similar performance in `ByteBuffersDataOutput` on current 
branch, i'll test it with https://github.com/apache/lucene/pull/12841 optimized 
code tomorrow.
   
   The simple benchmark summary:
   * using 200 terms per field.
   * freq per term set to 100. that means, the cardinality of a field is 2.(the 
group-varint encoding of the positions does not cross doc boundaries)
   * 10000 docs total.
   
   
   
   <details>
   <summary >Benchmark code</summary>
   
   ```java
   public class SortedStringWriteBenchmark {
   
     static class Benchark {
       Random rand = new Random(0);
   
       String randomString(int termsPerField, int freqPerTerm) {
         List<String> values = new ArrayList<>();
         for (int i = 0; i < termsPerField; ) {
           String s = TestUtil.randomSimpleString(rand, 5, 10);
           for (int j = 0; j < freqPerTerm; j++) {
             values.add(s);
           }
           i += freqPerTerm;
         }
         Collections.shuffle(values);
         String text = String.join(" ", values);
         return text;
       }
   
       List<String> randomStrings(int max, int termsPerField, int freqPerTerm) {
         List<String> values = new ArrayList<>();
         for (int i = 0; i < max; i++) {
           values.add(randomString(termsPerField, freqPerTerm));
         }
         return values;
       }
   
       long write() throws IOException {
         List<String> terms = randomStrings(10000, 200, 100);
   
         Path temp = Files.createTempDirectory(Paths.get("/Volumes/RamDisk"), 
"tmpDirPrefix");
         Directory dir = MMapDirectory.open(temp);
         IndexWriterConfig config = new IndexWriterConfig(new 
StandardAnalyzer());
         config.setIndexSort(new Sort(new SortField("sort", 
SortField.Type.LONG)));
         config.setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH);
         IndexWriter w = new IndexWriter(dir, config);
         FieldType ft = new FieldType();
         ft.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
         ft.setTokenized(true);
         ft.freeze();
         for (int i = 0; i < terms.size(); ++i) {
           Document doc = new Document();
           doc.add(new NumericDocValuesField("sort", rand.nextInt()));
           doc.add(new TextField("field", terms.get(i), Field.Store.NO));
           w.addDocument(doc);
         }
         long t0 = System.currentTimeMillis();
         w.flush();
         long took = System.currentTimeMillis() - t0;
         w.close();
         dir.close();
         return took;
       }
     }
   
     public static void main(final String[] args) throws Exception {
       int iter = 50;
       Benchark benchark = new Benchark();
       List<Long> times = new ArrayList<>();
       for (int i = 0; i < iter; i++) {
         long took = benchark.write();
         times.add(took);
         System.out.println("iteration " + i + ",took(ms):" + took);
       }
       double avg = times.stream().skip(iter / 
2).mapToLong(Number::longValue).average().getAsDouble();
       long min = times.stream().mapToLong(Number::longValue).min().getAsLong();
       System.out.println("best took(ms) avg:" + avg + ", min:" + min);
     }
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to