Re: 8 million Cassandra data files on disk

2011-08-02 Thread Jonathan Ellis
I stand corrected. There are several dozen reasons to upgrade, AND that one. :) On Tue, Aug 2, 2011 at 4:42 PM, Yiming Sun wrote: > Hi Jonathan, > > Good to know.  We will certainly upgrade to 0.7.8. > > Also, here is the link to that post I came across earlier: > > http://cassandra-user-incubat

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
Hi Jonathan, Good to know. We will certainly upgrade to 0.7.8. Also, here is the link to that post I came across earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Files-not-deleted-after-compaction-and-GCed-td5960453.html best, -- Y. On Tue, Aug 2, 2011 at 5:36 PM, Jo

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Jonathan Ellis
I don't remember a removing-compacted-files bug in 0.7.0, but you should absolutely upgrade to 0.7.8 for several dozen other fixes, including some severe ones -- see NEWS.txt. On Tue, Aug 2, 2011 at 4:29 PM, Yiming Sun wrote: > Hi Jeremiah, > > Thank you for the information - it certainly is a re

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
Hi Jeremiah, Thank you for the information - it certainly is a relief. Two questions though: 1. I came across an old thread which seemed to be saying 0.7.0 cassandra has a bug and doesn't remove these compact files properly. Should we upgrade to a newer version that has this bug fixed? 2. Do w

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Jeremiah Jordan
Connect with jconsole and run garbage collection. All of the files that have a -Compacted with the same name will get deleted the next time a full garbage collection runs, or when the node is restarted. They have already been combined into new files, the old ones just haven't been deleted yet. On

8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
Hi, I am new to Cassandra, and am hoping someone could help me understand the (large amount of small) data files on disk that Cassandra generates. The reason we are using Cassandra is because we are dealing with thousands to millions of small text files on disk, so we are experimenting with Cassa