gf2121 commented on issue #12598:
URL: https://github.com/apache/lucene/issues/12598#issuecomment-1738687168

   I did an experiment: Index random BytesRefs and count the byte usage when 
`BytesStore#finish` called. The following are the statistical results:
   
   ```
   total sample: 23130
   
   avg: 78.12
   min: 7
   mid: 36
   pct75: 40
   pct90: 40
   pct99: 44
   max: 10790
   ```
   
   While the bytesrefs are random, it may share little prefix and suffix, I 
tried to mock some common prefix/suffix for them like:
   
   ```
   if (R.nextBoolean()) {
     int prefixLen = R.nextInt(b.length / 2);
     System.arraycopy(commonPrefix, 0, b, 0, prefixLen);
   }
   
   if (R.nextBoolean()) {
     int suffixLen = R.nextInt(b.length / 2);
     System.arraycopy(commonSuffix, commonSuffix.length - suffixLen, b, 
b.length - suffixLen, suffixLen);
   }
   ```
   
   And here is the result:
   ```
   total sample: 27235
   
   avg: 820.540738020929
   min: 8
   mid: 24
   pct75: 629
   pct90: 3347
   pct99: 5374
   max: 29049
   ```
   
   We will allocate a 32kb while 99% cases only need 5kb. These results 
somewhat matches the allocation profile that we rarely need a second block in 
`BytesStore`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to