Sorry sent early.

To explain further, the scheduler is entirely decentralized in the proposed
design, and no node holds all the information you're talking about in heap
at once (in fact no one node would ever hold that information). Each node
is responsible only for tokens that they are "primary" replicas of. Then
each token is split by tables and then each table range is individually
split into subranges, into at most a few hundred range splits (typically
one or two, you don't want too many otherwise you'll have too many small
sstables) at a time. This is all at most megabytes of data, and I really do
believe would not cause significant, if any, heap pressure. The repairs
*themselves* certainly would create heap pressure, but that happens
regardless of the scheduler.

-Joey

On Thu, Apr 5, 2018 at 7:25 PM, Joseph Lynch <joe.e.ly...@gmail.com> wrote:

> I wouldn't trivialize it, scheduling can end up dealing with more than a
>> single repair. If theres 1000 keyspace/tables, with 400 nodes and 256
>> vnodes on each thats a lot of repairs to plan out and keep track of and can
>> easily cause heap allocation spikes if opted in.
>>
>> Chris
>
> The current proposal never keeps track of more than a few hundred range
> splits for a single table at a time, and nothing ever keeps state for the
> entire 400 node  Compared to the load generated by actually repairing the
> data, I actually do think it is trivial heap pressure.
>
>
> Somewhat beside the point, I wasn't aware there were any 100 node +
> clusters running with vnodes, if my math is correct they would be
> excessively vulnerable to outages with that many vnodes and that many
> nodes. Most of the large clusters I've heard of (100 nodes plus) are
> running with single or at most 4 tokens per node.
>

Reply via email to