Stateful Thrift/Avro API in 0.7 (set_keyspace)
Hey, I wonder if there is any particular reasoning why the API (thrift/avro) will become stateful in 0.7? Granted it already is doing that for "login", but why is the keyspace argument moved to a stateful level? I wrote a ruby client to help me in my app development and while it currently just connects to one keyspace, I was planning to divide my data into several keyspaces since there are some parts of data where I want a higher RF and some where a low RF is just fine. In preparation for 0.7 I'd now refactor parts of my client to support stateful keyspace selection but I just wondered why this "stateful" path was chosen? Will set_keyspace() be an expensive operation? Cheers, /thomas PS: For the curious, my Client is available at: http://github.com/thheller/greek_architect While fully functional I doubt it would be very useful too anyone else at this time.
Re: Stateful Thrift/Avro API in 0.7 (set_keyspace)
Because stateful keyspace is semantically closer to how people use it: one keyspace per application. If Thrift allowed us to make the keyspace-per-method-call optional we could go that route, but it does not. On Tue, Jul 6, 2010 at 10:56 AM, Thomas Heller wrote: > Hey, > > I wonder if there is any particular reasoning why the API > (thrift/avro) will become stateful in 0.7? Granted it already is doing > that for "login", but why is the keyspace argument moved to a stateful > level? > > I wrote a ruby client to help me in my app development and while it > currently just connects to one keyspace, I was planning to divide my > data into several keyspaces since there are some parts of data where I > want a higher RF and some where a low RF is just fine. > > In preparation for 0.7 I'd now refactor parts of my client to support > stateful keyspace selection but I just wondered why this "stateful" > path was chosen? Will set_keyspace() be an expensive operation? > > Cheers, > /thomas > > PS: > For the curious, my Client is available at: > > http://github.com/thheller/greek_architect > > While fully functional I doubt it would be very useful too anyone else > at this time. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Cassandra performance and read/write latency
Greetings Cassandra Developers! We've been trying to benchmark Cassandra performance and have developed a test client written in C++ that uses multiple threads to send out a large number of write and read requests (as fast as the server can handle them). One of the results we're seeing is a bit surprising, and I'm hoping someone here can help shed some light on the topic - as far as I can tell, it hasn't been discuseed on the mailing list. Most of the requests return in a reasonable amount of time (10s or 100s of milliseconds), but every once in a while, the server seems to just "stop" for up to several seconds. During this time, all the reads and writes will take several seconds to complete and network traffic in an out of the system drops off to nearly zero. When plotted on a graph, these appear as very larges spikes every few minutes. (Though without any particular pattern to how often those spikes occur). Even though the average response time is very good (and therefore we get a reasonable number of requests/sec) these occasional outliers are a showstopper for our potential applications. We've experimented with a number of different machines of different capabilities including a range of physical machines, and clusters of machines on Amazon's EC2. We've also used different numbers of nodes in the cluster and different values for ReplicationFactor. All are qualitatively similar, though the numbers vary as expected (i.e. fast machines improve both the average and maximum numbers, but the max values are still on the order of seconds) I know Cassandra has lots of configuration parameters that can be tweaked, but most of the other parameters are left at the default values of Cassandara-0.6.2 or 0.6.3. Has anyone else seen nodes "hang" for several seconds like this? I'm not sure if this is a Java VM issue (e.g. garbage collection) or something specific to the Cassandra application. I'll be happy to share more details of our experiments either on the mailing list, or with interested parties offline. But I thought I'd start with a brief description and see how consistent it is with other experiences. I'm sort of expecting to see "Well, of course you'll see that kind of behavior because you didn't change..." I'm also interested in comparing notes with anyone else that has been doing read/write throughput benchmarks with Cassandara. Thanks in advance for any information or suggestions you may have! -- Peter Fales Alcatel-Lucent Member of Technical Staff 1960 Lucent Lane Room: 9H-505 Naperville, IL 60566-7033 Email: peter.fa...@alcatel-lucent.com Phone: 630 979 8031
Re: Stateful Thrift/Avro API in 0.7 (set_keyspace)
On Tue, Jul 6, 2010 at 6:00 PM, Jonathan Ellis wrote: > Because stateful keyspace is semantically closer to how people use it: Hmm no other reason? Writing a client-side wrapper which turns get(key, column_path, clevel) into get(@keyspace, key, column_path, clevel) is trivial in pretty much any language. Well, looked at the code and switching isnt gonna be expensive so my concern is answered. Since set_keyspace resets login it might be "useful" to combine those two? set_keyspace(string, optional:auth)? Cheers, /thomas
Re: Cassandra performance and read/write latency
> Has anyone else seen nodes "hang" for several seconds like this? I'm > not sure if this is a Java VM issue (e.g. garbage collection) or something Since garbage collection is logged (if you're running with default settings etc), any multi-second GC:s should be discoverable in said log. So for testing that hypothesis i'd check there first. Cassandra itself logs GC:s, but you can also turn of the JVM:s GC logging by e.g. "-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimestamps". > I'm also interested in comparing notes with anyone else that has been doing > read/write throughput benchmarks with Cassandara. I did some batch write testing to see how it scaled up to about 200 million rows and 200 gb; I had ocational spikes in latency that were due to disk writes being flushed by the OS. However it was probably exacerbated in this case by the fact that this was ZFS/FreeBSD and ZFS is always (in my humble of opinion, and at least on FreeBSD) exhibiting the behavior for me that it flushes writes too late and end up blocking applications even if you have left-over bandwidth. In my case I "eliminated" the issue for the purpose of my test by having a stupid while loop simply doing "sync" every handful of seconds, to avoid accumulating too much data in the cache. While I expect this to be less of a problem for other setups, it's possible this is what you're seeing. If the operating system is blocking writes to the commit log for example (are you running with periodic fsync or batch wise fsync?). -- / Peter Schuller