Re: Tombstone passed GC period causes un-repairable inconsistent data

Jay Zhuang Mon, 25 Jun 2018 16:26:38 -0700

Thanks Jeff. CASSANDRA-6434 is exactly the issue. Do we have a plan/ticket
to get rid of GCGS (and make only_purge_repaired_tombstones default)? Will
it be covered in CASSANDRA-14145?


I created a ticket CASSANDRA-14543 for purgeable tombstone hints replaying,
which doesn't fix the root cause but reduces the chance to cause this
issue, please comment if you have any suggestion.

On Thu, Jun 21, 2018 at 12:55 PM Jeff Jirsa <[email protected]> wrote:

> Think he's talking about
> https://issues.apache.org/jira/browse/CASSANDRA-6434
>
> Doesn't solve every problem if you don't run repair at all, but if you're
> not running repairs, you're nearly guaranteed problems with resurrection
> after gcgs anyway.
>
>
>
> On Thu, Jun 21, 2018 at 11:33 AM, Jay Zhuang <[email protected]>
> wrote:
>
> > Yes, I also agree that the user should run (incremental) repair within
> GCGS
> > to prevent it from happening.
> >
> > @Sankalp, would you please point us the patch you mentioned from Marcus?
> > The problem is basically the same as
> > https://issues.apache.org/jira/browse/CASSANDRA-14145
> >
> > CASSANDRA-11427 <https://issues.apache.org/jira/browse/CASSANDRA-11427>
> is
> > actually the opposite of this problem. As purgeable tombstone is
> repaired,
> > this un-repairable problem cannot be reproduced. I tried 2.2.5 (before
> the
> > fix), it's able to repair the purgeable tombstone from node1 to node2, so
> > the data is deleted as expected. But it doesn't mean that's the right
> > behave, as it will also cause purgeable tombstones keeps bouncing around
> > the nodes.
> > I think https://issues.apache.org/jira/browse/CASSANDRA-14145 will fix
> the
> > problem by detecting the repaired/un-repaired data.
> >
> > How about having hints dispatch to deliver/replay purgeable (not live)
> > tombstones? It will reduce the chance to have this issue, especially when
> > GCGS < hinted handoff window.
> >
> > On Wed, Jun 20, 2018 at 9:36 AM sankalp kohli <[email protected]>
> > wrote:
> >
> > > I agree with Stefan that we should use incremental repair and use
> patches
> > > from Marcus to drop tombstones only from repaired data.
> > > Regarding deep repair, you can bump the read repair and run the repair.
> > The
> > > issue will be that you will stream lot of data and also your blocking
> > read
> > > repair will go up when you bump the gc grace to higher value.
> > >
> > > On Wed, Jun 20, 2018 at 1:10 AM Stefan Podkowinski <[email protected]>
> > > wrote:
> > >
> > > > Sounds like an older issue that I tried to address two years ago:
> > > > https://issues.apache.org/jira/browse/CASSANDRA-11427
> > > >
> > > > As you can see, the result hasn't been as expected and we got some
> > > > unintended side effects based on the patch. I'm not sure I'd be
> willing
> > > > to give this another try, considering the behaviour we like to fix in
> > > > the first place is rather harmless and the read repairs shouldn't
> > happen
> > > > at all to any users who regularly run repairs within gc_grace.
> > > >
> > > > What I'd suggest is to think more into the direction of a
> > > > post-full-repair-world and to fully embrace incremental repairs, as
> > > > fixed by Blake in 4.0. In that case, we should stop doing read
> repairs
> > > > at all for repaired data, as described in
> > > > https://issues.apache.org/jira/browse/CASSANDRA-13912. RRs are
> > certainly
> > > > useful, but can be very risky if not very very carefully implemented.
> > So
> > > > I'm wondering if we shouldn't disable RRs for everything but
> unrepaired
> > > > data. I'd btw also be interested to hear any opinions on this in
> > context
> > > > of transient replicas.
> > > >
> > > >
> > > > On 20.06.2018 03:07, Jay Zhuang wrote:
> > > > > Hi,
> > > > >
> > > > > We know that the deleted data may re-appear if repair is not run
> > within
> > > > > gc_grace_seconds. When the tombstone is not propagated to all
> nodes,
> > > the
> > > > > data will re-appear. But it's also causing following 2 issues
> before
> > > the
> > > > > tombstone is compacted away:
> > > > > a. inconsistent query result
> > > > >
> > > > > With consistency level ONE or QUORUM, it may or may not return the
> > > value.
> > > > > b. lots of read repairs, but doesn't repair anything
> > > > >
> > > > > With consistency level ALL, it always triggers a read repair.
> > > > > With consistency level QUORUM, it also very likely (2/3) causes a
> > read
> > > > > repair. But it doesn't repair the data, so it's causing repair
> every
> > > > time.
> > > > >
> > > > >
> > > > > Here are the reproducing steps:
> > > > >
> > > > > 1. Create a 3 nodes cluster
> > > > > 2. Create a table (with small gc_grace_seconds):
> > > > >
> > > > > CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy',
> > > > > 'replication_factor': 3};
> > > > > CREATE TABLE foo.bar (
> > > > >     id int PRIMARY KEY,
> > > > >     name text
> > > > > ) WITH gc_grace_seconds=30;
> > > > >
> > > > > 3. Insert data with consistency all:
> > > > >
> > > > > INSERT INTO foo.bar (id, name) VALUES(1, 'cstar');
> > > > >
> > > > > 4. stop 1 node
> > > > >
> > > > > $ ccm node2 stop
> > > > >
> > > > > 5. Delete the data with consistency quorum:
> > > > >
> > > > > DELETE FROM foo.bar WHERE id=1;
> > > > >
> > > > > 6. Wait 30 seconds and then start node2:
> > > > >
> > > > > $ ccm node2 start
> > > > >
> > > > > Now the tombstone is on node1 and node3 but not on node2.
> > > > >
> > > > > With quorum read, it may or may not return value, and read repair
> > will
> > > > send
> > > > > the data from node2 to node1 and node3, but it doesn't repair
> > anything.
> > > > >
> > > > > I'd like to discuss a few potential solutions and workarounds:
> > > > >
> > > > > 1. Can hints replay sends GCed tombstone?
> > > > >
> > > > > 2. Can we have a "deep repair" which detects such issue and repair
> > the
> > > > GCed
> > > > > tombstone? Or temperately increase the gc_grace_seconds for repair?
> > > > >
> > > > > What other suggestions you have if the user is having such issue?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jay
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> >
>

Re: Tombstone passed GC period causes un-repairable inconsistent data

Reply via email to