Re: Compaction doubles disk space

2011-03-30 Thread Karl Hiramoto
On 3/30/2011 12:39 PM, aaron morton wrote: Checked the code again, got it a bit wrong. When getting a path to flush a memtable (and to write an incoming stream) via cfs.getFlushPath() the code does not invoke GC if there is not enough space. One reason for not doing this could be that when we

Re: Compaction doubles disk space

2011-03-30 Thread Sheng Chen
It really helps. Thank you very much. Sheng 2011/3/30 aaron morton > When a compaction need to write a file cassandra will try to find a place > to put the new file, based on an estimate of it's size. If it cannot find > enough space it will trigger a GC which will delete any previously compact

Re: Compaction doubles disk space

2011-03-30 Thread aaron morton
Checked the code again, got it a bit wrong. When getting a path to flush a memtable (and to write an incoming stream) via cfs.getFlushPath() the code does not invoke GC if there is not enough space. One reason for not doing this could be that when we do it during compaction we wait for 20 seco

Re: Compaction doubles disk space

2011-03-30 Thread Karl Hiramoto
On 30/03/2011 09:08, aaron morton wrote: Also as far as I understand we cannot immediately delete files because other operations (including repair) may be using them. The data in the pre compacted files is just as correct as the data in the compacted file, it's just more compact. So the easiest

Re: Compaction doubles disk space

2011-03-30 Thread aaron morton
When a compaction need to write a file cassandra will try to find a place to put the new file, based on an estimate of it's size. If it cannot find enough space it will trigger a GC which will delete any previously compacted and so unneeded SSTables. The same thing will happen when a new SSTable

Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
Yes. I think at least we can remove the tombstones for each sstable first, and then do the merge. 2011/3/29 Karl Hiramoto > Would it be possible to improve the current compaction disk space issue by > compacting one only a few SSTables at a time then imediately deleting the > old one? Looking

Re: Compaction doubles disk space

2011-03-29 Thread Karl Hiramoto
Would it be possible to improve the current compaction disk space issue by compacting one only a few SSTables at a time then imediately deleting the old one? Looking at the logs it seems like deletions of old SSTables are taking longer than necessary. -- Karl

Re: Compaction doubles disk space

2011-03-29 Thread Sylvain Lebresne
> BTW, given that compaction requires double disk spaces, does it mean that I > should never reach half of my total disk space? > e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at all. It is not so black and white. What is true is that in practice reaching half the disk shoul

Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
>From a previous thread of the same topic, I used a force GC and the extra spaces are released. What about my second question? 2011/3/29 Sheng Chen > I use 'nodetool compact' command to start a compaction. > I can understand that extra disk spaces are required during the compaction, > but af

Compaction doubles disk space

2011-03-29 Thread Sheng Chen
I use 'nodetool compact' command to start a compaction. I can understand that extra disk spaces are required during the compaction, but after the compaction, the extra spaces are not released. Before compaction: SSTable count: 10 space used (live): 19G space used (total): 21G After compaction: ss