sarthakaggarwal97 opened a new issue, #13099:
URL: https://github.com/apache/lucene/issues/13099

   ### Description
   
   It was observed, for duplicated / similar type of data, the change in block 
size from [60K to 8K](https://github.com/apache/lucene/commit/f1fdd24) results 
in over 50% increase in the stored fields size.
   
   This observation is coming from OpenSearch: 
https://github.com/opensearch-project/OpenSearch/issues/3769. 
   
   I was able to replicate the results for duplicated documents: 
https://github.com/opensearch-project/OpenSearch/issues/3769#issuecomment-1938506593.
 This also includes the comparison of non-similar data, where the affect of 
block size is mostly insignificant.
   
   Now currently, if my understanding is correct, there is not a clean way to 
toggle the block sizes of the codecs without creating a separate Codec and a 
StoredFieldsFormat (I took a stab at this approach over 
[here](https://github.com/opensearch-project/OpenSearch/pull/12029)).
   
   I would like to get community's feedback if we could provide a way to make 
the block size configurable which allows users to choices based on their type 
of workload.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to