Re: nodetool repair caused high disk space usage

2011-08-23 Thread Héctor Izquierdo Seliva
El sáb, 20-08-2011 a las 01:22 +0200, Peter Schuller escribió: > > Is there any chance that the entire file from source node got streamed to > > destination node even though only small amount of data in hte file from > > source node is supposed to be streamed destination node? > > Yes, but the thi

Re: nodetool repair caused high disk space usage

2011-08-22 Thread Huy Le
After having done so many tries, I am not sure which log entries correspond to what. However, there were many of this type: WARN [CompactionExecutor:14] 2011-08-18 18:47:00,596 CompactionManager.java (line 730) Index file contained a different key or row size; using key from data file And ther

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
> > Do you have an indication that at least the disk space is in fact > consistent with the amount of data being streamed between the nodes? I > think you had 90 -> ~ 450 gig with RF=3, right? Still sounds like a > lot assuming repairs are not running concurrently (and compactions are > able to run

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Peter Schuller
> In our case they get created exclusively during  repairs. Compactionstats > showed a huge number of sstable build compactions Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 -> ~ 450 gig wit

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
Péter, In our case they get created exclusively during repairs. Compactionstats showed a huge number of sstable build compactions On Aug 20, 2011 1:23 AM, "Peter Schuller" wrote: >> Is there any chance that the entire file from source node got streamed to >> destination node even though only smal

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> Is there any chance that the entire file from source node got streamed to > destination node even though only small amount of data in hte file from > source node is supposed to be streamed destination node? Yes, but the thing that's annoying me is that even if so - you should not be seeing a 40

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
> > To confirm - are you saying the data directory size is huge, but the > live size as reported by nodetool ring and nodetool info does NOT > reflect this inflated size? > That's correct. > What files *do* you have in the data directory? Any left-over *tmp* > files for example? > > The files th

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> There were few Compacted files.  I thought that might have been the cause, > but it wasn't it.  We have a CF that is 23GB, and while repair is running, > there are multiple instances of that CF created along with other CFs. To confirm - are you saying the data directory size is huge, but the liv

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
There were few Compacted files. I thought that might have been the cause, but it wasn't it. We have a CF that is 23GB, and while repair is running, there are multiple instances of that CF created along with other CFs. I checked the stream directory across cluster of four nodes, but it was empty.

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Huy Le
I wasn't clear on that. What I mean was would scrub putting data in at state that might have caused the repair consume a lot of disk space? On Thu, Aug 18, 2011 at 6:44 PM, aaron morton wrote: > No scrub is a local operation only. > > Cheers > > - > Aaron Morton > Freelance Cassa

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
> After upgrading to cass 0.8.4 from cass 0.6.11.  I ran scrub.  That worked > fine.  Then I ran nodetool repair on one of the nodes.  The disk usage on > data directory increased from 40GB to 480GB, and it's still growing. If you check your data directory, does it contain a lot of "*Compacted" fi

Re: nodetool repair caused high disk space usage

2011-08-18 Thread aaron morton
No scrub is a local operation only. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19/08/2011, at 6:36 AM, Huy Le wrote: > Thanks. I won't try that then. > > So in our environment, after upgrading from 0.6.11 to 0.8.4, we have

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Huy Le
Thanks. I won't try that then. So in our environment, after upgrading from 0.6.11 to 0.8.4, we have to run scrub on all nodes before we can run repair on them. Is there any chance that running scrub on the nodes causing data from all SSTables being streamed to/from other nodes on running repair?

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Philippe
Unfortunately repairing one cf at a time didn't help in my case because it still streams all CF and that triggers lots of compactions On Aug 18, 2011 3:48 PM, "Huy Le" wrote:

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Huy Le
Philippe, Besides the system keyspace, we have only one user keyspace. However, tell me that we can also try repairing one CF at a time. We have two concurrent compactors configured. Will change that to one. Huy On Wed, Aug 17, 2011 at 6:10 PM, Philippe wrote: > Huy, > Have you tried repai

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
Huy, Have you tried repairing one keyspace at a time and then giving it some breathing time to compact. My current observations is that the streams of repairs are triggering massive compactions which are filling up my disks too. Another idea I'd like to try is to limit the number of concurrent comp

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Huy Le
I restarted the cluster and kicked off repair on the same node again. It only made the matter worse. It filled up the 830GB partition, and cassandra on the node repair ran on crashed. I restarted it, and now I am running compaction to reduce disk usage. Repair after upgrading to 0.8.4 is still

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Huy Le
Sorry for the duplicate thread. I saw the issue being referenced to https://issues.apache.org/jira/browse/CASSANDRA-2280. However, I am running version 0.8.4. I saw your comment in on of the threads that the issue is not reprocible, but multiple users have the same issue. This there anything

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
Look at my last two or three threads. I've encountered the same thing and got some pointers/answers. On Aug 17, 2011 4:03 PM, "Huy Le" wrote: > Hi, > > After upgrading to cass 0.8.4 from cass 0.6.11. I ran scrub. That worked > fine. Then I ran nodetool repair on one of the nodes. The disk usage on

nodetool repair caused high disk space usage

2011-08-17 Thread Huy Le
Hi, After upgrading to cass 0.8.4 from cass 0.6.11. I ran scrub. That worked fine. Then I ran nodetool repair on one of the nodes. The disk usage on data directory increased from 40GB to 480GB, and it's still growing. The cluster has 4 nodes with replica factor 3. The ring shows: Address