[ 
https://issues.apache.org/jira/browse/LUCENE-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536233#comment-17536233
 ] 

Robert Muir commented on LUCENE-10556:
--------------------------------------

Yes, the original code i wrote for this thing was really designed to be a 
reproducer of some crazy behavior where user is only writing stored fields and 
flushing all the time. I'm not sure if we should change the MP in the benchmark 
though, since so many users do use TieredMP (the default).

> Relax the maximum dirtiness for stored fields and term vectors?
> ---------------------------------------------------------------
>
>                 Key: LUCENE-10556
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10556
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Stored fields and term vectors compress data and have merge-time 
> optimizations to copy compressed data directly instead of decompressing and 
> recompressing over and over again. However, sometimes incomplete blocks get 
> carried over (typically the last block of a flushed segment) and so these 
> file formats keep track of how "dirty" their current blocks are to know 
> whether stored fields / term vectors for a segment should be re-compressed.
> Currently the logic is to recompress if more than 1% of the blocks are 
> incomplete, or if the total number of missing documents across incomplete 
> blocks is more than the configured maximum number of documents per block.
> I'd be interested in evaluating what the compression ratio would be if we 
> relaxed these conditions a bit, e.g. by allowing up to 5% dirtiness. My gut 
> feeling is that the compression ratio could be barely worse while index-time 
> CPU usage could be significantly improved. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to