Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
awesome! Thank you guys for the really quick answers and the links to the presentations. On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne wrote: >> This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it >> not true that each node contains all the data in the cluster? > >

Re: understanding the cassandra storage scaling

2010-12-09 Thread Sylvain Lebresne
> This helps a little but unfortunately I'm still a bit fuzzy for me. So is it > not true that each node contains all the data in the cluster? Not at all. Basically each node is responsible of only a part of the data (a range really). But for each data you can choose on how many nodes it is; this

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
> > So is it not true that each node contains all the data in the cluster? No, not in the general case, in fact rarely is it the case. Usually Rhttp://wiki.apache.org/cassandra/StorageConfiguration On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby wrote: > Thanks Ran. This helps a little but unf

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
Thanks Ran. This helps a little but unfortunately I'm still a bit fuzzy for me. So is it not true that each node contains all the data in the cluster? I haven't come across any information on how clustered data is coordinated in cassandra. how does my query get directed to the right node? On Th

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
there are two numbers to look at, N the numbers of hosts in the ring (cluster) and R the number of replicas for each data item. R is configurable per column family. Typically for large clusters N >> R. For very small clusters if makes sense for R to be close to N in which case cassandra is useful s

understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with