Re: Issues with Materialized-Views updates during a cluster change?

Nadav Har'El Mon, 26 Feb 2018 04:54:41 -0800

On Fri, Feb 23, 2018 at 12:49 AM, Paulo Motta <pauloricard...@gmail.com>
wrote:


> > Is this a realistic case when Cassandra (unless I'm missing something) is
> limited to adding or removing a single node at a time? I'm sure this
> can happen under some sort of generic range movement of some
> sort (how does one initiate such movement, and why), but will it happen
> under "normal" conditions of node bootstrap or decomission of a single
> node?
>
> It's possible to make simultaneous range movements when either
> {{-Dcassandra.consistent.range.movement=false}}(CASSANDRA-7069) or
> {{-Dcassandra.consistent.simultaneousmoves.allow=true}}
> (CASSANDRA-11005) are specified.
>
> In any case, I'm not saying it's not possible, just that we cannot
> apply this optimization when there are simultaneous range movements in
> the same rack.
>
> > How/when would we have two pending nodes for a single view partition?
>
> Actually I meant if there are multiple range movements going on in the
> same rack, not exactly in the same partition.
>

But the code we're discussing now, in mutateMV, isn't it about sending just
one mutation, of a single partition in the view table? So don't we only care
which node (just one? can it be more than one?) this partition will move to?


>
> > Yes, it seems it will not be trivial. But if this is the common case in
> common operations such as node addition or removal, it may significantly
> reduce
> (from RF*2 to RF+1) the number of view updates being sent around, and avoid
> MV update performance degredation during the streaming process.
>
> Agreed, we should definitely look into making this optimization, but
> just was never done before due to other priorities, please open a
> ticket for it.


Ok, I will, though I'm not sure I understood all the caveats you mentioned
so you may need to edit the ticket later to add them.


> There's a similar optimization that can be done for
> view batchlog replays - right now the view update is sent to all
> replicas during batchlog replay, but we could simplify it and also
> send only to the paired view replicas.
>
> > Is it actually possible to repair *only* a view, not its base table? If
> you repair a view table which has an inconsistency, namely one view row in
> one replica and a different view row in another replica, won't the repair
> just cause both versions to be kept, which is wrong?
>
> ...

>
> When there are permanent inconsistencies though (when the base is
> consistent and the view has extraneous rows), it doesn't really matter
> if the inconsistency is present on a subset or all view replicas,
> since the inconsistency is already visible to clients. The only way to
> fix permanent inconsistencies currently is to drop and re-create the
> view. CASSANDRA-10346 was created to address this.
>

So this is why I asked why the view repair, which you suggested in the
release notes, will help in this case.

I'm also worried that the fact *each* replica sends to the pending node
(and not the paired replica) will mean that we are more likely to create
these inconsistencies - if two base replicas have different values, *both*
values will be sent to the pending view replica, and create an inconsistency
there that cannot be fixed.


>
> If you have more comments about CASSANDRA-14251 would you mind adding
> them to the ticket itself so the discussion is registered on the
> relevant JIRA?
>

I think most of the issues I raised later are not really part of 14251 but
different issues, I'll see what I can express clearly enough to become new
JIRA issues and submit them.

Re: Issues with Materialized-Views updates during a cluster change?

Reply via email to