On Tue, Aug 21, 2012 at 2:54 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Mon, Aug 20, 2012 at 4:55 PM, Eric Evans <eev...@acunu.com> wrote: >> Shuffling the ranges to create a random distribution from contiguous >> ranges has the potential to move a *lot* of data around (all of it, >> basically). Doing this in an optimal way would mean never moving a >> range more than once. Since it is a lot of data, and since presumably >> we're expecting normal operations to continue in the meantime, it >> would seem an optimal shuffle would need to maintain state. For >> example, one machine could serve as the "shuffle coordinator", >> precalculating and persisting all of the moves, starting new transfers >> as existing ones finish, and tracking the progress, etc. > > Fortunately, we have a distributed storage system.... :) > > Seriously though, creating a CF mapping vnode from->to tuples, > throwing in the list of changes to make once, and deleting them out as > they complete, would be a pretty simple way to get what we want.
Yeah, that's exactly what I had in mind to do. Actually, now that I think about it, I'd probably drop the entire notion of a "coordinator", and write the respective entiries into a column family in the system keyspaces. Each system could then work through their respective queue of relocations at their own pace. What would you think of this approach? -- Eric Evans Acunu | http://www.acunu.com | @acunu