Hi Jools,
 
when using a consistency level other than ALL, there is no way Cassandra
can tell whether a given key currently exists in a cluster. There may be
several concurrent insert or delete operations for the key in progress which 
just
do not yet have propagated to the node which tries to determine the key's
presence.  This is a side effect of "eventual consistency", which is
Cassandras consistency model.  Eventually, everything will be consistent,
if you give the cluster enough time to get everything settled.
 
Martin


________________________________

        From: Jools [mailto:jool...@gmail.com] 
        Sent: Wednesday, June 09, 2010 11:09 AM
        To: user@cassandra.apache.org
        Subject: Re: Inserting new data, where the key points to a tombstone 
record.
        
        

        Hi Martin,

        Many thanks for the succinct, and clear response. 

        I've got some pointers to move me in the right direction, many thanks.

        However, as a final point of clarification, is there a particular 
reason that insert does not raise an exception when trying to insert over an 
existing key, or when the key points to a tombstone record ?

        --Jools



        On 9 June 2010 09:53, Dr. Martin Grabmüller 
<martin.grabmuel...@eleven.de> wrote:
        

                Hi Jools,
                 
                what happens in Cassandra with your scenario is the following:
                 
                1) insert new record
                  -> the record is added to Cassandra's dataset (with the given 
timestamp)
                 
                2) delete record
                  -> a tombstone is added to the data set (with the timestamp 
of the deletion,
                      which should be larger than the timestamp in 1), 
otherwise, the delete
                      will be lost.
                 
                3) insert new record with same key as deleted record
                  -> the record is added as in 2), but the timestamp should be 
larger than
                     the timestamps from both 1) and 2)
                 
                When you compact between 2) and 3), the record inserted at 1) 
will be thrown
                away, but the tombstone from 2) will not be thrown away 
*unless* the tombstone
                was created more than GCGraceSeconds (a configuration option) 
before the
                compaction.
                 
                If you do not compact, all records and tombstone will be 
present in Cassandra's
                dataset, and each read operation checks which of the records 
has the highest
                timestamp before returning the most current record (or report 
an error, if the tombstone
                has the highest timestamp).
                 
                So whether you compact or not does not make a difference for 
your scenario,
                as long as all replicas see the tombstone before GCGraceSeconds 
have elapsed.
                If that is the case, it is possible that deleted records come 
alive again, because
                tombstones are deleted before all replicas had a chance to 
remove the deleted
                record.
                 
                Your question about concurrently inserting the same key from 
different clients
                is another beast.  The simple answer is: don't do it.
                 
                The longer answer: either you use some external synchronisation 
mechanism
                (e.g., Zookeeper), or you make sure that all clients use 
disjoint keys (UUIDs, or
                keys derived from the clients IP address+timestamp, that sort 
of thing).
                 
                For keys representing user accounts or something similar, I 
would recommend
                using an external synchronisation mechanism, because for 
actions like account
                registration latency caused by such a mechanism is usually not 
a problem.
                 
                For data coming in quickly, where the overhead of 
synchronisation is not acceptable,
                use the UUID variant and reconcile the data on read.
                 
                HTH,
                  Martin


________________________________

                        From: Jools [mailto:jool...@gmail.com] 
                        Sent: Wednesday, June 09, 2010 10:39 AM
                        To: user@cassandra.apache.org
                        Subject: Inserting new data, where the key points to a 
tombstone record.
                        
                        
                        
                        
                        Hi,
                        
                        
                        I've been developing a system against cassandra over 
the last few weeks, and I'd like to ask the community some advice on the best 
way to deal with inserting new data where the key is currently a tombstone 
record.
                        
                        
                        As with all distributed systems, this is always a 
tricky thing to deal with, so I though I'd throw it to a wider audience.
                        
                        
                        1) insert new record.
                        2) deleted record.
                        3) insert record with same key as deleted record.
                        
                        
                        Now I know I can make this work if I flush and compact 
between 2 and 3. However, I don't want to rely on a flush and compact and I'd 
like to code defensively against this senario, and I've ended up looking up to 
see if the key exists, then if it does then I know I can't insert the data. 
However, if the key does not exist then I attempt an insert.
                        
                        
                        Now, here lies the issue. If I have more than one 
client doing this at the same time, both trying to insert using the same key. 
One will succeed and ones will fail. However neither insert will give me an 
indication of which one actually succeeded.
                        
                        
                        So should an insert against an existing key, or deleted 
key produce some kind of exception ? 
                        
                        
                        Cheers,
                        
                        
                        --Jools
                        
                        
                        
                        


Reply via email to