gf2121 commented on issue #12598:
URL: https://github.com/apache/lucene/issues/12598#issuecomment-1738687168
I did an experiment: Index random BytesRefs and count the byte usage when
`BytesStore#finish` called. The following are the statistical results:
```
total sample: 23130
avg: 78.12
min: 7
mid: 36
pct75: 40
pct90: 40
pct99: 44
max: 10790
```
While the bytesrefs are random, it may share little prefix and suffix, I
tried to mock some common prefix/suffix for them like:
```
if (R.nextBoolean()) {
int prefixLen = R.nextInt(b.length / 2);
System.arraycopy(commonPrefix, 0, b, 0, prefixLen);
}
if (R.nextBoolean()) {
int suffixLen = R.nextInt(b.length / 2);
System.arraycopy(commonSuffix, commonSuffix.length - suffixLen, b,
b.length - suffixLen, suffixLen);
}
```
And here is the result:
```
total sample: 27235
avg: 820.540738020929
min: 8
mid: 24
pct75: 629
pct90: 3347
pct99: 5374
max: 29049
```
We will allocate a 32kb while 99% cases only need 5kb. These results
somewhat matches the allocation profile that we rarely need a second block in
`BytesStore`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]