gf2121 commented on issue #12598: URL: https://github.com/apache/lucene/issues/12598#issuecomment-1738687168
I did an experiment: Index random BytesRefs and count the byte usage when `BytesStore#finish` called. The following are the statistical results: ``` total sample: 23130 avg: 78.12 min: 7 mid: 36 pct75: 40 pct90: 40 pct99: 44 max: 10790 ``` While the bytesrefs are random, it may share little prefix and suffix, I tried to mock some common prefix/suffix for them like: ``` if (R.nextBoolean()) { int prefixLen = R.nextInt(b.length / 2); System.arraycopy(commonPrefix, 0, b, 0, prefixLen); } if (R.nextBoolean()) { int suffixLen = R.nextInt(b.length / 2); System.arraycopy(commonSuffix, commonSuffix.length - suffixLen, b, b.length - suffixLen, suffixLen); } ``` And here is the result: ``` total sample: 27235 avg: 820.540738020929 min: 8 mid: 24 pct75: 629 pct90: 3347 pct99: 5374 max: 29049 ``` We will allocate a 32kb while 99% cases only need 5kb. These results somewhat matches the allocation profile that we rarely need a second block in `BytesStore`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org