jpountz commented on PR #11796: URL: https://github.com/apache/lucene/pull/11796#issuecomment-1262327111
>> I'm considering exposing write amplification separately for flushes (as flushedBytes / totalIndexSize), merges (as (totalIndexSize + mergedBytes) / totalIndexSize) and temporary files (as (totalIndexSize + tempBytes) / totalIndexSize) and pushing the responsibility to users of whether and how they would like to combine these various metrics? > > We could add these in addition to the write amplification factor methods that are already present? I think that one question I was raising was also whether the current way we're measuring write amplification is correct. Say you write 10 1MB segments, which then get merged into a 9MB segment (smaller the the sum of the sizes of flushed segments, possibly because many terms are no longer duplicated across segments). What is the write amplification of merging? Is it `(10 + 9) / 10 = 1.9` like the formula that this PR is currently using, computing write ampfilication as a function of the total flush size. Or is it `(9 + 9) / 9 = 2`, computing write amplification as a function of the total index size? The latter feels more intuitive to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org