sarthakaggarwal97 opened a new issue, #13099: URL: https://github.com/apache/lucene/issues/13099
### Description It was observed, for duplicated / similar type of data, the change in block size from [60K to 8K](https://github.com/apache/lucene/commit/f1fdd24) results in over 50% increase in the stored fields size. This observation is coming from OpenSearch: https://github.com/opensearch-project/OpenSearch/issues/3769. I was able to replicate the results for duplicated documents: https://github.com/opensearch-project/OpenSearch/issues/3769#issuecomment-1938506593. This also includes the comparison of non-similar data, where the affect of block size is mostly insignificant. Now currently, if my understanding is correct, there is not a clean way to toggle the block sizes of the codecs without creating a separate Codec and a StoredFieldsFormat (I took a stab at this approach over [here](https://github.com/opensearch-project/OpenSearch/pull/12029)). I would like to get community's feedback if we could provide a way to make the block size configurable which allows users to choices based on their type of workload. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org