Stateful Thrift/Avro API in 0.7 (set_keyspace)

2010-07-06 Thread Thomas Heller
Hey,

I wonder if there is any particular reasoning why the API
(thrift/avro) will become stateful in 0.7? Granted it already is doing
that for "login", but why is the keyspace argument moved to a stateful
level?

I wrote a ruby client to help me in my app development and while it
currently just connects to one keyspace, I was planning to divide my
data into several keyspaces since there are some parts of data where I
want a higher RF and some where a low RF is just fine.

In preparation for 0.7 I'd now refactor parts of my client to support
stateful keyspace selection but I just wondered why this "stateful"
path was chosen? Will set_keyspace() be an expensive operation?

Cheers,
/thomas

PS:
For the curious, my Client is available at:

http://github.com/thheller/greek_architect

While fully functional I doubt it would be very useful too anyone else
at this time.


Re: Stateful Thrift/Avro API in 0.7 (set_keyspace)

2010-07-06 Thread Jonathan Ellis
Because stateful keyspace is semantically closer to how people use it:
one keyspace per application.

If Thrift allowed us to make the keyspace-per-method-call optional we
could go that route, but it does not.

On Tue, Jul 6, 2010 at 10:56 AM, Thomas Heller  wrote:
> Hey,
>
> I wonder if there is any particular reasoning why the API
> (thrift/avro) will become stateful in 0.7? Granted it already is doing
> that for "login", but why is the keyspace argument moved to a stateful
> level?
>
> I wrote a ruby client to help me in my app development and while it
> currently just connects to one keyspace, I was planning to divide my
> data into several keyspaces since there are some parts of data where I
> want a higher RF and some where a low RF is just fine.
>
> In preparation for 0.7 I'd now refactor parts of my client to support
> stateful keyspace selection but I just wondered why this "stateful"
> path was chosen? Will set_keyspace() be an expensive operation?
>
> Cheers,
> /thomas
>
> PS:
> For the curious, my Client is available at:
>
> http://github.com/thheller/greek_architect
>
> While fully functional I doubt it would be very useful too anyone else
> at this time.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Cassandra performance and read/write latency

2010-07-06 Thread Peter Fales
Greetings Cassandra Developers!

We've been trying to benchmark Cassandra performance and have 
developed a test client written in C++ that uses multiple threads to 
send out a large number of write and read requests (as fast as the
server can handle them).   

One of the results we're seeing is a bit surprising, and I'm hoping
someone here can help shed some light on the topic - as far as I can
tell, it hasn't been discuseed on the mailing list.

Most of the requests return in a reasonable amount of time (10s or
100s of milliseconds), but every once in a while, the server seems to
just "stop" for up to several seconds.   During this time, all the 
reads and writes will take several seconds to complete and network traffic
in an out of the system drops off to nearly zero.   When plotted on a 
graph, these appear as very larges spikes every few minutes.  (Though without
any particular pattern to how often those spikes occur).   Even though
the average response time is very good (and therefore we get a reasonable
number of requests/sec) these occasional outliers are a showstopper for
our potential applications.

We've experimented with a number of different machines of different 
capabilities including a range of physical machines, and clusters of
machines on Amazon's EC2.  We've also used different numbers of nodes
in the cluster and different values for ReplicationFactor.   All are 
qualitatively similar, though the numbers vary as expected (i.e. 
fast machines improve both the average and maximum numbers, but the 
max values are still on the order of seconds)

I know Cassandra has lots of configuration parameters that can be
tweaked, but most of the other parameters are left at the default
values of Cassandara-0.6.2 or 0.6.3.

Has anyone else seen nodes "hang" for several seconds like this?  I'm
not sure if this is a Java VM issue (e.g. garbage collection) or something
specific to the Cassandra application.   I'll be happy to share more 
details of our experiments either on the mailing list, or with interested
parties offline.  But I thought I'd start with a brief description and 
see how consistent it is with other experiences.   I'm sort of expecting
to see "Well, of course you'll see that kind of behavior because you
didn't change..."

I'm also interested in comparing notes with anyone  else that has been doing
read/write throughput benchmarks with Cassandara.

Thanks in advance for any information or suggestions you may have!

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Stateful Thrift/Avro API in 0.7 (set_keyspace)

2010-07-06 Thread Thomas Heller
On Tue, Jul 6, 2010 at 6:00 PM, Jonathan Ellis  wrote:
> Because stateful keyspace is semantically closer to how people use it:

Hmm no other reason?

Writing a client-side wrapper which turns

get(key, column_path, clevel) into
get(@keyspace, key, column_path, clevel)

is trivial in pretty much any language.

Well, looked at the code and switching isnt gonna be expensive so my
concern is answered.

Since set_keyspace resets login it might be "useful" to combine those
two? set_keyspace(string, optional:auth)?

Cheers,
/thomas


Re: Cassandra performance and read/write latency

2010-07-06 Thread Peter Schuller
> Has anyone else seen nodes "hang" for several seconds like this?  I'm
> not sure if this is a Java VM issue (e.g. garbage collection) or something

Since garbage collection is logged (if you're running with default
settings etc), any multi-second GC:s should be discoverable in said
log. So for testing that hypothesis i'd check there first. Cassandra
itself logs GC:s, but you can also turn of the JVM:s GC logging by
e.g. "-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimestamps".

> I'm also interested in comparing notes with anyone  else that has been doing
> read/write throughput benchmarks with Cassandara.

I did some batch write testing to see how it scaled up to about 200
million rows and 200 gb; I had ocational spikes in latency that were
due to disk writes being flushed by the OS. However it was probably
exacerbated in this case by the fact that this was ZFS/FreeBSD and ZFS
is always (in my humble of opinion, and at least on FreeBSD)
exhibiting the behavior for me that it flushes writes too late and end
up blocking applications even if you have left-over bandwidth.

In my case I "eliminated" the issue for the purpose of my test by
having a stupid while loop simply doing "sync" every handful of
seconds, to avoid accumulating too much data in the cache.

While I expect this to be less of a problem for other setups, it's
possible this is what you're seeing. If the operating system is
blocking writes to the commit log for example (are you running with
periodic fsync or batch wise fsync?).

-- 
/ Peter Schuller