gf2121 opened a new pull request, #12604:
URL: https://github.com/apache/lucene/pull/12604

   ### Description
   
   
https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down
   
   Nightly benchmark shows that `FSTCompiler#init` allocated most of the memory 
during indexing. This is because `FSTCompiler#init` will always allocate 32k 
bytes as we param `bytesPageBits` default to 15. I counted the usage of 
BytesStore (`getPosition()` when `BytesStore#finish` called) during the 
wikimediumall indexing, and the result shows that 99% FST won't even use more 
than 1k bytes.
   
   ```
   BytesStore#finish called: 1000000 times
   
   min: 1
   mid: 16
   avg: 64.555987
   pct75: 28
   pct90: 57
   pct99: 525
   pct999: 4957
   pct9999: 29124
   max: 631700
   ```
   
   This PR proposes to reduce the block size of `FST` in 
`Lucene90BlockTreeTermsWriter`.
   
   closes https://github.com/apache/lucene/issues/12598
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to