Hello Sam, This is not answering your direct question but if you worry about clock skew take a look at this great two-part blogpost:
https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ <https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/> https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/ <https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/> Josef Lindman Hörnlund Chief Data Scientist AppData jo...@appdata.biz > On 16 Jun 2015, at 20:45, Sam Klock <skl...@akamai.com> wrote: > > Hi folks, > > I have a question about a design choice on how expiring cells are > reconciled with tombstones. For two cells with the same timestamp, if > one is expiring and one is a tombstone, Cassandra *always* prefers the > tombstone. This matches its behavior for normal/non-expiring cells, but > the folks in my organization worry about what it may imply for nodes > experiencing clock skew. Specifically, we're concerned about scenarios > like the following: > > 1) An expiring cell is committed via some node with a non-skewed clock. > 2) Another replica for that cell experiences forward clock skew and > decides that the cell is expired. It eventually runs a compaction that > converts the cell to a tombstone. > 3) The tombstone propagates to other nodes via, e.g., node repair. > 4) The other nodes all eventually run their own compactions. Because of > the reconciliation logic, the expiring cell is purged on all of the > replicas, leaving behind only the tombstone. > > If the cell should have still been live at (4), the reconciliation logic > will result in it being prematurely purged. We have confirmed this > behavior experimentally. > > My organization may be more concerned about clock skew than the larger > community, so I don't think we're inclined to propose a patch at this > time. But to account for this kind of scenario we would like to patch > our internal version of Cassandra to conditionally prefer expiring cells > to tombstones if the node believes they should still be live; i.e., in > reconcile() in *ExpiringCell.java, instead of: > > if (cell instanceof DeletedCell) > return cell; > > use: > > if (cell instanceof DeletedCell) > return isLive() ? this : cell; > > Before we do so, however, we'd like to understand the rationale for the > existing behavior and the risks of making changes to it. Why does > Cassandra consistently prefer tombstones to other kinds of cells? By > modifying this behavior in this particular case, do we risk hitting > bizarre corner cases? > > Thanks, > SK