Re: Cassandra read optimization

2012-04-19 Thread aaron morton
> but i'll have to make 5 times as many requests to the database 5 times a small number can be less than 1 big number :) see http://wiki.apache.org/cassandra/HadoopSupport It's also covered in the O'Reilly cassandra book, however that book is somewhat out of date. also search for posts from Jere

Re: Cassandra read optimization

2012-04-19 Thread Dan Feldman
We'll try doing multithreaded requests today-tomorrow As for tuning down the number of supercolumns per slice, I tried doing that, but I've noticed that the time was decreasing linearly with the length of the slice. So, grabbing 1000 per slice would take 1/5 as long as 5000, but i'll have to make

Re: Cassandra read optimization

2012-04-19 Thread aaron morton
Here's a test I did a while ago about creating column objects in python http://www.mail-archive.com/user@cassandra.apache.org/msg06729.html As Tyler said, the best approach is to limit the size of the slices. If are are trying to load 125K super columns with 25 columns each your are asking fo

Re: Cassandra read optimization

2012-04-19 Thread Dan Feldman
Hi Paolo, Thanks for the hint - JNA indeed wasn't installed. However, now that cassandra is actually using it, there doesn't seem to be any change in terms of speed - still 7 seconds with pycassa. On Thu, Apr 19, 2012 at 12:14 AM, Paolo Bernardi wrote: > Look into your Cassandra's logs to see i

Re: Cassandra read optimization

2012-04-19 Thread Paolo Bernardi
Look into your Cassandra's logs to see if JNA is really enabled (it really should be, by default), and more importantly if JNA is loaded correctly. You might find some surprising message over there: if this is the case, just install JNA with your distro's package manager and, if still doesn't work,

Re: Cassandra read optimization

2012-04-18 Thread Dan Feldman
Hi Tyler and Aaron, Thanks for your replies. Tyler, fetching scs using your pycassa script on our server takes ~7 s - consistent with the times we've been seeing. Now, we aren't really experts in Cassandra, but it seems that JNA is enabled by default for Cassandra > 1.0 according to Jeremy ( http

Re: Cassandra read optimization

2012-04-18 Thread Tyler Hobbs
I tested this out with a small pycassa script: https://gist.github.com/2418598 On my not-very-impressive laptop, I can read 5000 of the super columns in 3 seconds (cold) or 1.5 (warm). Reading in batches of 1000 super columns at a time gives much better performance; I definitely recommend going w

Re: Cassandra read optimization

2012-04-18 Thread Aaron Turner
On Wed, Apr 18, 2012 at 5:00 PM, Dan Feldman wrote: > Hi all, > > I'm trying to optimize moving data from Cassandra to HDFS using either Ruby > or Python client. Right now, I'm playing around on my staging server, an 8 > GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for >

Cassandra read optimization

2012-04-18 Thread Dan Feldman
Hi all, I'm trying to optimize moving data from Cassandra to HDFS using either Ruby or Python client. Right now, I'm playing around on my staging server, an 8 GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for now) with ~150k super columns each (I know, I know - super colu