msfroh edited a comment on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-991340704


   > As I look at this, it is kind of strange that we call 
setMaxMergedSegmentMB in findMerges because if a user set that explicitly, we 
would then override it based on the max size as percentage of index, which 
could be surprising. I think this max size as proportion of index is a nice 
feature, but I'm not sure it really has anything to do with finding small 
segments to merge on refresh? So +1 to dropping that feature. Maybe @msfroh 
will remember why it's in here -- is it just for convenience
   
   That's correct. Our Amazon product search merge policy is solving two 
different problems -- finding small segments that could cheaply be merged on 
commit (raising the floor of committed segment sizes), and setting the max 
segment size as a proportion of overall index size (lowering the ceiling).
   
   The variable max segment size was motivated by cases where we would have 
(say) a 5.5GB shard, with a 5GB max segment size. If a merge kicks in and 
produces a 5GB segment, then the search is basically single-threaded, which 
would lead to a large (and sudden) jump in latency.
   
   If the goal of this change is to just provide a reference merge policy for 
merge on commit/refresh, then the variable max segment size should be left out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to