Re: one way to make counter delete work better

2011-06-14 Thread Yang
patch in https://issues.apache.org/jira/browse/CASSANDRA-2774 some coding is messy and only intended for demonstration only, we could refine it after we agree this is a feasible way to go. Thanks Yang On Tue, Jun 14, 2011 at 11:21 AM, Sylvai

Re: one way to make counter delete work better

2011-06-14 Thread Yang
yes epoch is generated by each node, in the replica set, upon a delete operation. epoch is **global** to the replica set, for one counter, in contrast to clock, with is local to partition. different counters have different epoch numbers , because different counters can be seen as completely diffe

Re: one way to make counter delete work better

2011-06-14 Thread Yang
in "stronger reason", I mean the +3 is already merged up in memtable of node B, you can't find +1 and +2 any more On Tue, Jun 14, 2011 at 7:02 PM, Yang wrote: > I almost got the code done, should release in a bit. > > > > your scenario is not a problem concerned with implementation, but really

Re: one way to make counter delete work better

2011-06-14 Thread Yang
I almost got the code done, should release in a bit. your scenario is not a problem concerned with implementation, but really with definition of "same time". remember that in a distributed system, there is no absolute physical time concept, time is just another way of saying "before or after". i

Re: one way to make counter delete work better

2011-06-14 Thread Milind Parikh
If I understand this correctly, then the epoch integer would be generated by each node. Since time always flows forward, the assumption would be, I suppose, that the epochs would be tagged with the node that generated them and additionally the counter would carry as much history as necessary (and p

Re: one way to make counter delete work better

2011-06-14 Thread Sylvain Lebresne
Who assigns those epoch numbers ? You need all nodes to agree on the epoch number somehow to have this work, but then how do you maintain those in a partition tolerant distributed system ? I may have missed some parts of your proposal but let me consider a scenario that we have to be able to handl

Re: one way to make counter delete work better

2011-06-13 Thread Yang
ok, I think it's better to understand it this way, then it is really simple and intuitive: my proposed way of counter update can be simply seen as a combination of regular columns + current counter columns: regular column : [ value: "wipes out every bucket to nil" , clock: epoch number] then w

Re: one way to make counter delete work better

2011-06-13 Thread Yang
I think this approach also works for your scenario: I thought that the issue is only concerned with merging within the same leader; but you pointed out that a similar merging happens between leaders too, now I see that the same rules on epoch number also applies to inter-leader data merging, speci

Re: one way to make counter delete work better

2011-06-13 Thread Jonathan Ellis
I don't think that's bulletproof either. For instance, what if the two adds go to replica 1 but the delete to replica 2? Bottom line (and this was discussed on the original delete-for-counters ticket, https://issues.apache.org/jira/browse/CASSANDRA-2101), counter deletes are not fully commutative

one way to make counter delete work better

2011-06-13 Thread Yang
as https://issues.apache.org/jira/browse/CASSANDRA-2101 indicates, the problem with counter delete is in scenarios like the following: add 1, clock 100 delete , clock 200 add 2 , clock 300 if the 1st and 3rd operations are merged in SStable compaction, then we have delete clock 200 add 3, clo