Re: Cassandra and disk space

2010-12-09 Thread Bill de hÓra
This is true, but for larger installations I end up needing more servers to hold the disks, more racks to hold the servers the point where the overall cost per GB climbs (granted the cost per IOP is probably still good). AIUI, a chunk of that 50% is replicated data such that the truly available s

Re: Cassandra and disk space

2010-12-09 Thread Brandon Williams
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev wrote: > Also, I noticed that you can specify multiple data file directories located > on different disks. Let's say if I have machine with 4 x 500GB drives, what > would be the difference between following 2 setups: > >1. each drive mounted separ

Re: Cassandra and disk space

2010-12-09 Thread Robert Coli
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev wrote: > Thanks Tyler, this is really useful. > [ RAID0 vs JBOD question ] > In other words, does splitting data folder into smaller ones bring any > performance or stability advantages? This is getting to be a FAQ, so here's my stock answer : There

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Thanks Tyler, this is really useful. Also, I noticed that you can specify multiple data file directories located on different disks. Let's say if I have machine with 4 x 500GB drives, what would be the difference between following 2 setups: 1. each drive mounted separately and has data file

Re: Cassandra and disk space

2010-12-09 Thread Nick Bailey
Additionally, cleanup will fail to run when the disk is more than 50% full. Another reason to stay below 50%. On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs wrote: > Yes, that's correct, but I wouldn't push it too far. You'll become much > more sensitive to disk usage changes; in particular, rebal

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
Yes, that's correct, but I wouldn't push it too far. You'll become much more sensitive to disk usage changes; in particular, rebalancing your cluster will particularly difficult, and repair will also become dangerous. Disk performance also tends to drop when a disk nears capacity. There's no reco

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you can

Re: Cassandra and disk space

2010-12-09 Thread Scott Dworkis
i recently finished a practice expansion of 4 nodes to 5 nodes, a series of "nodetool move", "nodetool cleanup" and jmx gc steps. i found that in some of the steps, disk usage actually grew to 2.5x the base data size on one of the nodes. i'm using 0.6.4. -scott On Thu, 9 Dec 2010, Rustam Al

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you can push

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Is there any plans to improve this in future? For big data clusters this could be very expensive. Based on your comment, I will need 200TB of storage for 100TB of data to keep Cassandra running. -- Rustam. On 09/12/2010 17:56, Tyler Hobbs wrote: If you are on 0.6, repair is particularly dang

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
If you are on 0.6, repair is particularly dangerous with respect to disk space usage. If your replica is sufficiently out of sync, you can triple your disk usage pretty easily. This has been improved in 0.7, so repairs should use about half as much disk space, on average. In general, yes, keep y

Re: Cassandra and disk space

2010-12-09 Thread Peter Schuller
> I recently ran into a problem during a repair operation where my nodes > completely ran out of space and my whole cluster was... well, clusterfucked. > > I want to make sure how to prevent this problem in the future. Depending on which version you're on, you may be seeing this: https://issue

Cassandra and disk space

2010-12-09 Thread Mark
I recently ran into a problem during a repair operation where my nodes completely ran out of space and my whole cluster was... well, clusterfucked. I want to make sure how to prevent this problem in the future. Should I make sure that at all times every node is under 50% of its disk space? Ar