We should be very careful about how much resource we dedicate to rebalancing.
One of our competitors rebalances *much* faster than we do, but in doing so they consume all available resources. At one bank that caused significant loss of incoming market data that was coming in on a multicast feed, which had a severe adverse effect on the pricing and risk management functions for a period of time. That bank removed the competitor's product and for several years no distributed caching was allowed by the chief architect at that bank. Until he left and a new chief architect was named they didn't use any distributed caching products. When they DID go back to using them, it pre-dated Geode, so they used GemFire largely because GemFire does not consume all available resources while rebalancing. I do think we need to improve our rebalancing such that it iterates until it achieves balance, but not in a way that will consume all available resources. -- Mike Stolz On Thu, Mar 8, 2018 at 2:25 PM, Nick Reich <nre...@pivotal.io> wrote: > Team, > > The time required to undertake a rebalance of a geode cluster has often > been an area for improvement noted by users. Currently, buckets are moved > one at a time and we propose that creating a system that moved buckets in > parallel could greatly improve performance for this feature. > > Previously, parallelization was implemented for adding redundant copies of > buckets to restore redundancy. However, moving buckets is a more > complicated matter and requires a different approach than restoration of > redundancy. The reason for this is that members could be potentially both > be gaining buckets and giving away buckets at the same time. While giving > away a bucket, that member still has all of the data for the bucket, until > the receiving member has fully received the bucket and it can safely be > removed from the original owner. This means that unless the member has the > memory overhead to store all of the buckets it will receive and all the > buckets it started with, there is potential that parallel moving of buckets > could cause the member to run out of memory. > > For this reason, we propose a system that does (potentially) several rounds > of concurrent bucket moves: > 1) A set of moves is calculated to improve balance that meet a requirement > that no member both receives and gives away a bucket (no member will have > memory overhead of an existing bucket it is ultimately removing and a new > bucket). > 2) Conduct all calculated bucket moves in parallel. Parameters to throttle > this process (to prevent taking too many cluster resources, impacting > performance) should be added, such as only allowing each member to either > receive or send a maximum number of buckets concurrently. > 3) If cluster is not yet balanced, perform additional iterations of > calculating and conducting bucket moves, until balance is achieved or a > possible maximum iterations is reached. > Note: in both the existing and proposed system, regions are rebalanced one > at a time. > > Please let us know if you have feedback on this approach or additional > ideas that should be considered. >