msfroh commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-991340704
> As I look at this, it is kind of strange that we call setMaxMergedSegmentMB in findMerges because if a user set that explicitly, we would then override it based on the max size as percentage of index, which could be surprising. I think this max size as proportion of index is a nice feature, but I'm not sure it really has anything to do with finding small segments to merge on refresh? So +1 to dropping that feature. Maybe @msfroh will remember why it's in here -- is it just for convenience That's correct. Our Amazon product search merge policy is solving two different problems -- finding small segments that could cheaply be merged on commit (raising the floor of committed segment sizes), and setting the max segment size as a proportion of overall segment size (lowering the ceiling). The variable max segment size was motivated by cases where we would have (say) a 5.5GB shard, with a 5GB max segment size. If a merge kicks in and produces a 5GB segment, then the search is basically single-threaded, which would lead to a large (and sudden) jump in latency. If the goal of this change is to just provide a reference merge policy for merge on commit/refresh, then the variable max segment size should be left out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org