date:20100622

Re: Atomic Compare and Swap

2010-06-22 Thread Sylvain Lebresne

On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj  wrote:
> I have read the post on cages and it is definitely very interesting. But
> cages seems to be too coarse grained compared to an Atomic Compare and Swap
> on Cassandra column value. Cages would makes sense when one wants to do
> multiple atomic row, column updates. Also, I am not so sure about the
> scalability when it comes to using zookeeper for keeping locks on Cassandra
> columns... there would also be performance hit with an added RPC for every
> write. I feel Cages maybe fine for systems when one has few locks but I feel
> an atomic CAS in Cassandra would help us avoid distributed locking systems
> and zookeeper in many other simpler scenarios. For more complicated
> (transaction like) things, using Cages may be fine. Then again doing a read
> before write for CAS in cassandra will make CAS at least as slow as a read,
> which I believe will still be better than taking a single column lock from
> zookeeper.
>
> What do other folks think in this regard? From whatever I have read, I
> believe CAS is feasible in Cassandra without hurting the normal write path
> performance. Only for CAS writes would we have to pay for the read before
> write penalty. I am going to do feasibility study for this and would love
> any pointers from others about this.

Making a (non atomic) CAS is easy (doing it client side is fine, and there
has been some discussion about 'callbacks' that may or may not someday allow
to do that server-side).

An *atomic* CAS is another beast and I see at least two difficulties:

1) making it atomic locally: Cassandra's implementation is very much
multi-threaded. On a given node, while you're reading-comparing-and-swapping
on some column c, no other thread should be allowed to write c (even 'normal'
write). You would probably need to have specific column families where CAS is
allowed and for which all writes would be slower (since some locking would be
involved). Even then, making such locking efficient and right is not easy. But
in the end, local atomicity is quite probably the easy part.

2) making it atomic cluster-wide: data is replicated and an atomic CAS would
need to apply on the exact same column version in every node. Which, with
eventual consistency especially, is pretty hard to accomplish unless you're
locking the cluster (but that's what Cages/ZK do).

That being said, if you have a neat solution for efficient and distributed
atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there
will be interest in that.

--
Sylvain

>
> Thanks,
> Rishi
>
>
>
> 
> From: Rauan Maemirov 
> To: dev@cassandra.apache.org
> Sent: Mon, June 21, 2010 11:27:02 AM
> Subject: Re: Atomic Compare and Swap
>
> Have you read this post?
> http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/
> I guess, you will like it.
>
> 2010/6/22 Rishi Bhardwaj 
>
>> I am definitely interested in taking this work up. I believe the CAS
>> functionality would help in a lot of different scenarios and could help
>> avoid use of other external services (like zookeeper) to provide similar
>> functionality. I am new at Cassandra development and would really appreciate
>> pointers from the dev. community about how to approach/start on this
>> project. Also how feasible is the approach mentioned below to implement the
>> CAS functionality? It would be great if we could have a discussion on the
>> pros and cons.
>>
>> Thanks,
>> Rishi
>>
>>
>>
>> 
>> From: Sriram Srinivasan 
>> To: dev@cassandra.apache.org
>> Sent: Sun, June 20, 2010 9:47:37 PM
>> Subject: Re: Atomic Compare and Swap
>>
>>
>> I too am interested in a CAS facility.
>>
>> I like Rishi's proposal. One could simply use a version number as the
>> logical timestamp. If we promote CAS to a consistency level, it would rate
>> higher than a quorum. One pays the price for a more complex write path to
>> obtain the requisite guarantee.
>>
>>
>> On Jun 21, 2010, at 4:03 AM, Rishi Bhardwaj wrote:
>>
>> >
>> > Heres another thought I had, if say the user always wrote with quorum (or
>> to all) nodes then can't we implement CAS (compare and swap) assuming that
>> user employs logical timestamp and Cassandra doesn't allow writes to a
>> column with same or older timestamp. Here's the scenario I am thinking
>> about:
>> > Say we use logical timestamp for a column value and lets assume the
>> current timestamp is t. Now say two clients read this column and generate
>> concurrent CAS (compare and swap) operations on timestamp t and for both the
>> writes the resulting new timestamp would become (t+1). Now if we don't allow
>> writes to a column with same timestamp then only one of these writes would
>> succeed. Of course another assumption is that if a third CAS write with
>> compare on logical timestamp (t - 1) came in, that would be denied as I
>> believe Cassandra doesn't allow "older" writes to win over "ne

Re: PhantomReference in Cassandra

2010-06-22 Thread Brandon Williams

On Tue, Jun 8, 2010 at 9:44 PM, Anty  wrote:

> In theory ,it could do some help.


I tried this and unfortunately there was no noticeable difference.

-Brandon

performance with a "large" number of supercolumns/columns

2010-06-22 Thread Terje Marthinussen

Hi,

I was looking a bit on a case we have with columnfamily which has 439k
supercolumns, each supercolumn with ~3 columns each so a total of some 1.3
million objects in total.

This takes about 9 second to read with thrift on first access, 4-5 second on
second access.

I took a little closer look at this, I noticed that roughly half of this
time was spend in cassandra.

I will look more at this, but I thought I would just ask people here as it
could be that someone already had good explanations...

Most of the time is spent here

public void collectReducedColumns(IColumnContainer container,
Iterator reducedColumns, int gcBefore)
{
int liveColumns = 0;
AbstractType comparator = container.getComparator();

while (reducedColumns.hasNext())
{
if (liveColumns >= count)
break;

IColumn column = reducedColumns.next();
if (logger.isDebugEnabled())
logger.debug(String.format("collecting %s of %s: %s",
   liveColumns, count,
column.getString(comparator)));

if (finish.length > 0
&& ((!reversed && comparator.compare(column.name(),
finish) > 0))
|| (reversed && comparator.compare(column.name(),
finish) < 0))
break;

// only count live columns towards the `count` criteria
if (!column.isMarkedForDelete()
&& (!container.isMarkedForDelete()
|| (ClockRelationship.GREATER_THAN ==
column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt()
{
liveColumns++;
}

// but we need to add all non-gc-able columns to the
result for read repair:
if (QueryFilter.isRelevant(column, container, gcBefore))
container.addColumn(column);
}
}

Adding some time measuring print revealed a few interesting this.
1. First time I request the row (I request the entire row in this case),
collectReducedColumns() is called twice. I have not had time to understand
why yet, but there is one difference between the calls. All columns are
returned both times, but the first call is done with MAXINT as "count" while
the second call has the maxcount actually requested by the client as
"count". The first call takes about 3.7 seconds to process, the second about
1.7 secs. Whatever reason, I think we should be able to remove one of these
calls?

2. Almost half the time is spent in "container.addColumn". This is probably
no big surprise as there is a lot of code hidden "down there". I am however
very curious if it could not be significantly optimized, especially in the
case where you have predicates which cases all columns to be included. That
said, I have not manage to read enough cassandra code to understand
tombstones or all the other things which is going on in that part of the
code.

3. A bit more of a surprise, most of the remaining 50% of the time seems to
occur at while (reducedColumns.hasNext())
That is, save system.nanoTime() at the end of the loop and after hasNext and
sum up and it accounts for almost 50% of the total time. That seems quite
weird to me.

I will dig more, but I was curious if someone had answers already.

Best regards,
Terje

Re: Atomic Compare and Swap

2010-06-22 Thread Rishi Bhardwaj

S>: An *atomic* CAS is another beast and I see at least two difficulties:

S>: 1) making it atomic locally: Cassandra's implementation is very much
>multi-threaded. On a given node, while you're reading-comparing-and-swapping
>on some column c, no other thread should be allowed to write c (even 'normal'
>write). You would probably need to have specific column families where CAS is
>allowed and for which all writes would be slower (since some locking would be
>involved). Even then, making such locking efficient and right is not easy. But
>in the end, local atomicity is quite probably the easy part.

R: I am curious as to how does Cassandra handle two concurrent writes to the 
same column right now? Is there any locking on the write path to serialize two 
writes to the same column? If there is any locking then CAS can build on that. 
If there is no such locking then we could exclude normal writes from the 
synchronization/locking required for CAS. So the normal write path remains the 
same, and we let the client know that atomic CAS wouldn't work if normal writes 
are also happening on the same column values. In short a client should not mix 
normal writes with Atomic CAS for writing some column value. This will 
hopefully make things simpler.

S:>2) making it atomic cluster-wide: data is replicated and an atomic CAS would
>need to apply on the exact same column version in every node. Which, with
>eventual consistency especially, is pretty hard to accomplish unless you're
>locking the cluster (but that's what Cages/ZK do).

R: For starters it would be great if atomic CAS could work for consistency 
level Quorum and ALL and not be supported for other consistency levels. Even 
for other consistency levels what would stop CAS to work? Why would one require 
cluster wide locking? I might be mistaken here but the atomic CAS operation 
would happen individually at all the replica nodes (either directly or through 
hinted writes) and would succeed or fail depending on the timestamp/version of 
the column at the replica. If we do Quorum reads and CAS writes then we can 
also be sure about consistency.

S:>That being said, if you have a neat solution for efficient and distributed
>atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there
>will be interest in that.

R: That sounds great. I am definitely going to look into this and report back 
if I have a good solution.

Thanks,
Rishi

From: Sylvain Lebresne 
To: dev@cassandra.apache.org
Sent: Tue, June 22, 2010 1:21:51 AM
Subject: Re: Atomic Compare and Swap

On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj  wrote:
> I have read the post on cages and it is definitely very interesting. But
> cages seems to be too coarse grained compared to an Atomic Compare and Swap
> on Cassandra column value. Cages would makes sense when one wants to do
> multiple atomic row, column updates. Also, I am not so sure about the
> scalability when it comes to using zookeeper for keeping locks on Cassandra
> columns... there would also be performance hit with an added RPC for every
> write. I feel Cages maybe fine for systems when one has few locks but I feel
> an atomic CAS in Cassandra would help us avoid distributed locking systems
> and zookeeper in many other simpler scenarios. For more complicated
> (transaction like) things, using Cages may be fine. Then again doing a read
> before write for CAS in cassandra will make CAS at least as slow as a read,
> which I believe will still be better than taking a single column lock from
> zookeeper.
>
> What do other folks think in this regard? From whatever I have read, I
> believe CAS is feasible in Cassandra without hurting the normal write path
> performance. Only for CAS writes would we have to pay for the read before
> write penalty. I am going to do feasibility study for this and would love
> any pointers from others about this.

Making a (non atomic) CAS is easy (doing it client side is fine, and there
has been some discussion about 'callbacks' that may or may not someday allow
to do that server-side).

An *atomic* CAS is another beast and I see at least two difficulties:

1) making it atomic locally: Cassandra's implementation is very much
multi-threaded. On a given node, while you're reading-comparing-and-swapping
on some column c, no other thread should be allowed to write c (even 'normal'
write). You would probably need to have specific column families where CAS is
allowed and for which all writes would be slower (since some locking would be
involved). Even then, making such locking efficient and right is not easy. But
in the end, local atomicity is quite probably the easy part.

2) making it atomic cluster-wide: data is replicated and an atomic CAS would
need to apply on the exact same column version in every node. Which, with
eventual consistency especially, is pretty hard to accomplish unless you're
locking the cluster (but that's what Cages/ZK do).

That being said, if yo

Re: Atomic Compare and Swap

2010-06-22 Thread Mike Malone

I'd be interested in what the folks who want CAS implementations think about
vector clocks. Can you use them to fulfill your use cases? If not, why not?

I ask because I have found myself wanting CAS in Cassandra too, but I think
that's only because I'm pretty familiar with HTTP. I think vector clocks
with client merge give you essentially the same functionality, but in a way
that fits much more nicely with the rest of the Cassandra architecture. CAS
really exacerbates Cassandra's weaknesses.

Mike

On Tue, Jun 22, 2010 at 4:52 PM, Rishi Bhardwaj wrote:

>
>
> S>: An *atomic* CAS is another beast and I see at least two difficulties:
>
> S>: 1) making it atomic locally: Cassandra's implementation is very much
> >multi-threaded. On a given node, while you're
> reading-comparing-and-swapping
> >on some column c, no other thread should be allowed to write c (even
> 'normal'
> >write). You would probably need to have specific column families where CAS
> is
> >allowed and for which all writes would be slower (since some locking would
> be
> >involved). Even then, making such locking efficient and right is not easy.
> But
> >in the end, local atomicity is quite probably the easy part.
>
> R: I am curious as to how does Cassandra handle two concurrent writes to
> the same column right now? Is there any locking on the write path to
> serialize two writes to the same column? If there is any locking then CAS
> can build on that. If there is no such locking then we could exclude normal
> writes from the synchronization/locking required for CAS. So the normal
> write path remains the same, and we let the client know that atomic CAS
> wouldn't work if normal writes are also happening on the same column values.
> In short a client should not mix normal writes with Atomic CAS for writing
> some column value. This will hopefully make things simpler.
>
> S:>2) making it atomic cluster-wide: data is replicated and an atomic CAS
> would
> >need to apply on the exact same column version in every node. Which, with
> >eventual consistency especially, is pretty hard to accomplish unless
> you're
> >locking the cluster (but that's what Cages/ZK do).
>
> R: For starters it would be great if atomic CAS could work for consistency
> level Quorum and ALL and not be supported for other consistency levels. Even
> for other consistency levels what would stop CAS to work? Why would one
> require cluster wide locking? I might be mistaken here but the atomic CAS
> operation would happen individually at all the replica nodes (either
> directly or through hinted writes) and would succeed or fail depending on
> the timestamp/version of the column at the replica. If we do Quorum reads
> and CAS writes then we can also be sure about consistency.
>
> S:>That being said, if you have a neat solution for efficient and
> distributed
> >atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there
> >will be interest in that.
>
>
> R: That sounds great. I am definitely going to look into this and report
> back if I have a good solution.
>
>
> Thanks,
> Rishi
>
>
>
>
> 
> From: Sylvain Lebresne 
> To: dev@cassandra.apache.org
> Sent: Tue, June 22, 2010 1:21:51 AM
> Subject: Re: Atomic Compare and Swap
>
> On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj 
> wrote:
> > I have read the post on cages and it is definitely very interesting. But
> > cages seems to be too coarse grained compared to an Atomic Compare and
> Swap
> > on Cassandra column value. Cages would makes sense when one wants to do
> > multiple atomic row, column updates. Also, I am not so sure about the
> > scalability when it comes to using zookeeper for keeping locks on
> Cassandra
> > columns... there would also be performance hit with an added RPC for
> every
> > write. I feel Cages maybe fine for systems when one has few locks but I
> feel
> > an atomic CAS in Cassandra would help us avoid distributed locking
> systems
> > and zookeeper in many other simpler scenarios. For more complicated
> > (transaction like) things, using Cages may be fine. Then again doing a
> read
> > before write for CAS in cassandra will make CAS at least as slow as a
> read,
> > which I believe will still be better than taking a single column lock
> from
> > zookeeper.
> >
> > What do other folks think in this regard? From whatever I have read, I
> > believe CAS is feasible in Cassandra without hurting the normal write
> path
> > performance. Only for CAS writes would we have to pay for the read before
> > write penalty. I am going to do feasibility study for this and would love
> > any pointers from others about this.
>
> Making a (non atomic) CAS is easy (doing it client side is fine, and there
> has been some discussion about 'callbacks' that may or may not someday
> allow
> to do that server-side).
>
> An *atomic* CAS is another beast and I see at least two difficulties:
>
> 1) making it atomic locally: Cassandra's implementation is very much
> multi-threaded. O

Re: Atomic Compare and Swap

Re: PhantomReference in Cassandra

performance with a "large" number of supercolumns/columns

Re: Atomic Compare and Swap

Re: Atomic Compare and Swap

5 matches

Site Navigation

Mail list logo

Footer information