jpountz commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262327111

   >> I'm considering exposing write amplification separately for flushes (as 
flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedBytes) / 
totalIndexSize) and temporary files (as (totalIndexSize + tempBytes) / 
totalIndexSize) and pushing the responsibility to users of whether and how they 
would like to combine these various metrics?
   >
   > We could add these in addition to the write amplification factor methods 
that are already present?
   
   I think that one question I was raising was also whether the current way 
we're measuring write amplification is correct. Say you write 10 1MB segments, 
which then get merged into a 9MB segment (smaller the the sum of the sizes of 
flushed segments, possibly because many terms are no longer duplicated across 
segments). What is the write amplification of merging? Is it `(10 + 9) / 10 = 
1.9` like the formula that this PR is currently using, computing write 
ampfilication as a function of the total flush size. Or is it `(9 + 9) / 9 = 
2`, computing write amplification as a function of the total index size? The 
latter feels more intuitive to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to