This is true, but for larger installations I end up needing more
servers to hold the disks, more racks to hold the servers the point
where the overall cost per GB climbs (granted the cost per IOP is
probably still good). AIUI, a chunk of that 50% is replicated data such
that the truly available s
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev wrote:
> Also, I noticed that you can specify multiple data file directories located
> on different disks. Let's say if I have machine with 4 x 500GB drives, what
> would be the difference between following 2 setups:
>
>1. each drive mounted separ
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev wrote:
> Thanks Tyler, this is really useful.
> [ RAID0 vs JBOD question ]
> In other words, does splitting data folder into smaller ones bring any
> performance or stability advantages?
This is getting to be a FAQ, so here's my stock answer :
There
Thanks Tyler, this is really useful.
Also, I noticed that you can specify multiple data file directories
located on different disks. Let's say if I have machine with 4 x 500GB
drives, what would be the difference between following 2 setups:
1. each drive mounted separately and has data file
Additionally, cleanup will fail to run when the disk is more than 50% full.
Another reason to stay below 50%.
On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs wrote:
> Yes, that's correct, but I wouldn't push it too far. You'll become much
> more sensitive to disk usage changes; in particular, rebal
Yes, that's correct, but I wouldn't push it too far. You'll become much
more sensitive to disk usage changes; in particular, rebalancing your
cluster will particularly difficult, and repair will also become dangerous.
Disk performance also tends to drop when a disk nears capacity.
There's no reco
That depends on your scenario. In the worst case of one big CF,
there's not much that can be easily done for the disk usage of
compaction and cleanup (which is essentially compaction).
If, instead, you have several column families and no single CF makes
up the majority of your data, you can
i recently finished a practice expansion of 4 nodes to 5 nodes, a series
of "nodetool move", "nodetool cleanup" and jmx gc steps. i found that in
some of the steps, disk usage actually grew to 2.5x the base data size on
one of the nodes. i'm using 0.6.4.
-scott
On Thu, 9 Dec 2010, Rustam Al
That depends on your scenario. In the worst case of one big CF, there's not
much that can be easily done for the disk usage of compaction and cleanup
(which is essentially compaction).
If, instead, you have several column families and no single CF makes up the
majority of your data, you can push
Is there any plans to improve this in future?
For big data clusters this could be very expensive. Based on your
comment, I will need 200TB of storage for 100TB of data to keep
Cassandra running.
--
Rustam.
On 09/12/2010 17:56, Tyler Hobbs wrote:
If you are on 0.6, repair is particularly dang
If you are on 0.6, repair is particularly dangerous with respect to disk
space usage. If your replica is sufficiently out of sync, you can triple
your disk usage pretty easily. This has been improved in 0.7, so repairs
should use about half as much disk space, on average.
In general, yes, keep y
> I recently ran into a problem during a repair operation where my nodes
> completely ran out of space and my whole cluster was... well, clusterfucked.
>
> I want to make sure how to prevent this problem in the future.
Depending on which version you're on, you may be seeing this:
https://issue
I recently ran into a problem during a repair operation where my nodes
completely ran out of space and my whole cluster was... well,
clusterfucked.
I want to make sure how to prevent this problem in the future.
Should I make sure that at all times every node is under 50% of its disk
space? Ar
13 matches
Mail list logo