Re: Atomic Compare and Swap

2010-06-23 Thread aaron morton
I've been playing with something like CAS, it's not the same but it  
may be of interest.


I write some data into Cassandra with quorum or better consistency,  
that allows me to assert what it should look like when read back. If  
the assertion holds I can then go ahead.


For example, in a CF with Time uuid ordering the client writes a  
column against the key of the thing we want to update. This write does  
not store the value. Then read back the first ordered column, if it's  
name is my uuid then I can proceed. Otherwise delete the column. If  
you know the uuid of the last update you can read back two columns.  
Then assert your the first and the previous is the second.


Perhaps if you were doing a CAS you could then write then actual value  
you want to update and somehow store the uuid from above with it. Say  
as col in another col family with  name as the uuid and value as the  
value. To read get the first colum from both CFs as a multi get, the  
col names must match from both cols for the value to be correct.


(could just use two diff keys in same CF)

Hope that makes sense.
Aaron






On 23/06/2010, at 4:27 PM, Mike Malone  wrote:

I'd be interested in what the folks who want CAS implementations  
think about
vector clocks. Can you use them to fulfill your use cases? If not,  
why not?


I ask because I have found myself wanting CAS in Cassandra too, but  
I think
that's only because I'm pretty familiar with HTTP. I think vector  
clocks
with client merge give you essentially the same functionality, but  
in a way
that fits much more nicely with the rest of the Cassandra  
architecture. CAS

really exacerbates Cassandra's weaknesses.

Mike

On Tue, Jun 22, 2010 at 4:52 PM, Rishi Bhardwaj  
wrote:





S>: An *atomic* CAS is another beast and I see at least two  
difficulties:


S>: 1) making it atomic locally: Cassandra's implementation is very  
much

multi-threaded. On a given node, while you're

reading-comparing-and-swapping

on some column c, no other thread should be allowed to write c (even

'normal'
write). You would probably need to have specific column families  
where CAS

is
allowed and for which all writes would be slower (since some  
locking would

be
involved). Even then, making such locking efficient and right is  
not easy.

But

in the end, local atomicity is quite probably the easy part.


R: I am curious as to how does Cassandra handle two concurrent  
writes to

the same column right now? Is there any locking on the write path to
serialize two writes to the same column? If there is any locking  
then CAS
can build on that. If there is no such locking then we could  
exclude normal
writes from the synchronization/locking required for CAS. So the  
normal
write path remains the same, and we let the client know that atomic  
CAS
wouldn't work if normal writes are also happening on the same  
column values.
In short a client should not mix normal writes with Atomic CAS for  
writing

some column value. This will hopefully make things simpler.

S:>2) making it atomic cluster-wide: data is replicated and an  
atomic CAS

would
need to apply on the exact same column version in every node.  
Which, with

eventual consistency especially, is pretty hard to accomplish unless

you're

locking the cluster (but that's what Cages/ZK do).


R: For starters it would be great if atomic CAS could work for  
consistency
level Quorum and ALL and not be supported for other consistency  
levels. Even
for other consistency levels what would stop CAS to work? Why would  
one
require cluster wide locking? I might be mistaken here but the  
atomic CAS

operation would happen individually at all the replica nodes (either
directly or through hinted writes) and would succeed or fail  
depending on
the timestamp/version of the column at the replica. If we do Quorum  
reads

and CAS writes then we can also be sure about consistency.

S:>That being said, if you have a neat solution for efficient and
distributed
atomic CAS that doesn't require rewriting 80% of Cassandra, I'm  
sure there

will be interest in that.



R: That sounds great. I am definitely going to look into this and  
report

back if I have a good solution.


Thanks,
Rishi





From: Sylvain Lebresne 
To: dev@cassandra.apache.org
Sent: Tue, June 22, 2010 1:21:51 AM
Subject: Re: Atomic Compare and Swap

On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj >

wrote:
I have read the post on cages and it is definitely very  
interesting. But
cages seems to be too coarse grained compared to an Atomic Compare  
and

Swap
on Cassandra column value. Cages would makes sense when one wants  
to do
multiple atomic row, column updates. Also, I am not so sure about  
the

scalability when it comes to using zookeeper for keeping locks on

Cassandra

columns... there would also be performance hit with an added RPC for

every
write. I feel Cages maybe fine for systems when one has few locks  
but I

feel

an atomic CAS in

Re: Atomic Compare and Swap

2010-06-23 Thread Mike Malone
On Wed, Jun 23, 2010 at 4:51 AM, aaron morton wrote:

> I've been playing with something like CAS, it's not the same but it may be
> of interest.
>
> I write some data into Cassandra with quorum or better consistency, that
> allows me to assert what it should look like when read back. If the
> assertion holds I can then go ahead.
>

This may work for your use cases, but fwiw, I would not consider this a
"safe" alternative to CAS. CAS says "I have version 1 of the data, and want
to update it, but only if version 1 is still the latest version."

During failure conditions I don't think there's _any_ way to implement CAS
without making Cassandra potentially fail during writes since you'll have to
read before a write operation.

Mike


Re: Atomic Compare and Swap

2010-06-23 Thread Rishi Bhardwaj
Hmnn... vector clocks seem very interesting, I am not very familiar with the 
work being done in 0.7 for the vector clock support but it sounds like a great 
addition. I was thinking if we can just use vector clocks to implement CAS in 
the following way:

Let's assume that a client will only do one CAS on a column at a time using 
Quorum/ALL consistency level. Also lets assume that each client will have a 
unique vector component in the vector clock associated with the column value to 
be updated. Now in order to do a CAS on a column, the client first reads the 
column value and the associated vector clock. Then it issues a Quorum/ALL write 
on the column with the new vector clock obtained by incrementing its own vector 
component in the old vector clock value. If the write succeeds, the client 
issues a Quorum/ALL read on the column again and checks if the vector clock for 
the column is logically greater than the vector clock it wrote. If the column 
value read has a vector clock greater than what the client wrote then the CAS 
succeeded, otherwise it failed.

Now in order for the above CAS scheme to work we would need couple of 
guarantees from the server side (i.e. the cassandra backend). Basically when 
doing a read repair (at compaction time, or read time, or write time) if we 
find two writes to the same column with logical clock values that can't be 
compared then the "older" column write wins. By "older" I mean whichever write 
got to the server first wins. For example if a cassandra replica node is 
applying a write for a column and if it already finds write for the same column 
in the memtable then it can check the vector clocks for the two writes. If the 
vector clocks are not comparable then the write which is in memtable wins as 
that write got to the cassandra node earlier. Similarly on a read if two 
versions of the same column are found with vector clocks that are not 
comparable then the write that was in the "older" sstable (sstable written 
before) wins.

With the above scheme if two CAS writes are racing against each other, we know 
only one of them will win and the client who issued that write can be sure 
about it.

How does the above scheme sound? 

-Rishi




From: Mike Malone 
To: dev@cassandra.apache.org
Sent: Tue, June 22, 2010 9:27:44 PM
Subject: Re: Atomic Compare and Swap

I'd be interested in what the folks who want CAS implementations think about
vector clocks. Can you use them to fulfill your use cases? If not, why not?

I ask because I have found myself wanting CAS in Cassandra too, but I think
that's only because I'm pretty familiar with HTTP. I think vector clocks
with client merge give you essentially the same functionality, but in a way
that fits much more nicely with the rest of the Cassandra architecture. CAS
really exacerbates Cassandra's weaknesses.

Mike

On Tue, Jun 22, 2010 at 4:52 PM, Rishi Bhardwaj wrote:

>
>
> S>: An *atomic* CAS is another beast and I see at least two difficulties:
>
> S>: 1) making it atomic locally: Cassandra's implementation is very much
> >multi-threaded. On a given node, while you're
> reading-comparing-and-swapping
> >on some column c, no other thread should be allowed to write c (even
> 'normal'
> >write). You would probably need to have specific column families where CAS
> is
> >allowed and for which all writes would be slower (since some locking would
> be
> >involved). Even then, making such locking efficient and right is not easy.
> But
> >in the end, local atomicity is quite probably the easy part.
>
> R: I am curious as to how does Cassandra handle two concurrent writes to
> the same column right now? Is there any locking on the write path to
> serialize two writes to the same column? If there is any locking then CAS
> can build on that. If there is no such locking then we could exclude normal
> writes from the synchronization/locking required for CAS. So the normal
> write path remains the same, and we let the client know that atomic CAS
> wouldn't work if normal writes are also happening on the same column values.
> In short a client should not mix normal writes with Atomic CAS for writing
> some column value. This will hopefully make things simpler.
>
> S:>2) making it atomic cluster-wide: data is replicated and an atomic CAS
> would
> >need to apply on the exact same column version in every node. Which, with
> >eventual consistency especially, is pretty hard to accomplish unless
> you're
> >locking the cluster (but that's what Cages/ZK do).
>
> R: For starters it would be great if atomic CAS could work for consistency
> level Quorum and ALL and not be supported for other consistency levels. Even
> for other consistency levels what would stop CAS to work? Why would one
> require cluster wide locking? I might be mistaken here but the atomic CAS
> operation would happen individually at all the replica nodes (either
> directly or through hinted writes) and would succeed or fail depending on
> t