It looks like Cassandra will calculate free space by summing all the disks, and this will result in a big compaction file that can not fit in one disk. So any one could clarify this please?
On Wed, Apr 30, 2014 at 2:09 PM, Yatong Zhang <bluefl...@gmail.com> wrote: > I am using CQL 3 to create a table to store images and very image was > about 200K ~ 500K. I have 6 harddisks per node and cassandra was configured > with 6 data directories: > > data_file_directories: >> - /data1/cass >> - /data2/cass >> - /data3/cass >> - /data4/cass >> - /data5/cass >> - /data6/cass >> > > And every directory is on a standalone disk. But I just found when the > error occurred: > > [root@node5 images]# ll -hl >> total 3.6T >> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots >> -rw-r--r-- 1 root root 456M Apr 30 13:42 >> mydb-images-tmp-jb-91068-CompressionInfo.db >> -rw-r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db >> -rw-r--r-- 1 root root 0 Apr 30 13:42 >> mydb-images-tmp-jb-91068-Filter.db >> -rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp-jb-91068-Index.db >> > > [root@node5 images]# df -hl > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 49G 7.5G 39G 17% / > tmpfs 7.8G 0 7.8G 0% /dev/shm > /dev/sda3 3.6T 1.3T 2.1T 38% /data1 > /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 > /dev/sdc1 3.6T 466G 3.0T 14% /data3 > /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 > /dev/sde1 3.6T 1.3T 2.2T 38% /data5 > /dev/sdf1 3.6T 3.6T 0 100% /data6 > > *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk space (4T > harddisk with 3.6T actual usable size) > > after I restated cassandra, very thing seems to be fine: > > -rw-r--r-- 1 root root 19K Apr 30 13:58 >> mydb_oe-images-tmp-jb-96242-CompressionInfo.db >> -rw-r--r-- 1 root root 145M Apr 30 13:58 >> mydb_oe-images-tmp-jb-96242-Data.db >> -rw-r--r-- 1 root root 64K Apr 30 13:58 >> mydb_oe-images-tmp-jb-96242-Index.db >> > > [root@node5 images]# df -hl > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 49G 7.5G 39G 17% / > tmpfs 7.8G 0 7.8G 0% /dev/shm > /dev/sda3 3.6T 1.3T 2.1T 38% /data1 > /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 > /dev/sdc1 3.6T 466G 3.0T 14% /data3 > /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 > /dev/sde1 3.6T 1.3T 2.2T 38% /data5 > /dev/sdf1 3.6T 662M 3.4T 1% /data6 > > So my questions are: > > 1. I am using CQL3 and is there a limit for 'tables' created by CQL3? > 2. I specified 6 data directories with each on a stand alone disk, is it > OK? > 3. Why the tmp db file is so large? Is it normal or a bug? > > > So could any one please help to solve this issue? Any help is of great > appreciation and thanks a lot! > > > On Wed, Apr 30, 2014 at 12:04 PM, Yatong Zhang <bluefl...@gmail.com>wrote: > >> Thanks for the response. I've checked the system logs and harddisk smartd >> info, and no errors found. Any hints to locate the problem? >> >> >> On Wed, Apr 30, 2014 at 9:26 AM, Michael Shuler >> <mich...@pbandjelly.org>wrote: >> >>> Then you likely need to fix your I/O problem. The most recent error you >>> posted is an EOFException - the file being read ended unexpectedly. >>> Probably when you ran out of disk space. >>> >>> -- >>> Michael >>> >>> >>> On 04/29/2014 07:48 PM, Yatong Zhang wrote: >>> >>>> Here is another type of exception, seems all are I/O related: >>>> >>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,548 SSTableReader.java >>>> (line >>>> >>>>> 223) Opening >>>>> /data2/cass/system/compaction_history/system-compaction_ >>>>> history-jb-6956 >>>>> (447252 bytes) >>>>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,553 SSTableReader.java >>>>> (line 223) Opening >>>>> /data2/cass/system/compaction_history/system-compaction_ >>>>> history-jb-6958 >>>>> (257 bytes) >>>>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,554 SSTableReader.java >>>>> (line 223) Opening >>>>> /data2/cass/system/compaction_history/system-compaction_ >>>>> history-jb-6957 >>>>> (257 bytes) >>>>> INFO [main] 2014-04-29 14:44:35,592 ColumnFamilyStore.java (line 248) >>>>> Initializing system.batchlog >>>>> INFO [main] 2014-04-29 14:44:35,596 ColumnFamilyStore.java (line 248) >>>>> Initializing system.sstable_activity >>>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,601 SSTableReader.java >>>>> (line 223) Opening >>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8084 >>>>> (1562 >>>>> bytes) >>>>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,604 SSTableReader.java >>>>> (line 223) Opening >>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8083 >>>>> (2075 >>>>> bytes) >>>>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,605 SSTableReader.java >>>>> (line 223) Opening >>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8085 >>>>> (1555 >>>>> bytes) >>>>> INFO [main] 2014-04-29 14:44:35,687 AutoSavingCache.java (line 114) >>>>> reading saved cache >>>>> /data1/saved_caches/system-sstable_activity-KeyCache-b.db >>>>> INFO [main] 2014-04-29 14:44:35,696 ColumnFamilyStore.java (line 248) >>>>> Initializing system.peer_events >>>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,697 SSTableReader.java >>>>> (line 223) Opening /data4/cass/system/peer_ >>>>> events/system-peer_events-jb-181 >>>>> (12342 bytes) >>>>> INFO [main] 2014-04-29 14:44:35,717 ColumnFamilyStore.java (line 248) >>>>> Initializing system.compactions_in_progress >>>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,718 SSTableReader.java >>>>> (line 223) Opening >>>>> /data5/cass/system/compactions_in_progress/system-compactions_in_ >>>>> progress-jb-36448 >>>>> (167 bytes) >>>>> ERROR [SSTableBatchOpen:1] 2014-04-29 14:44:35,730 CassandraDaemon.java >>>>> (line 198) Exception in thread Thread[SSTableBatchOpen:1,5,main] >>>>> org.apache.cassandra.io.sstable.CorruptSSTableException: >>>>> java.io.EOFException >>>>> at >>>>> org.apache.cassandra.io.compress.CompressionMetadata.< >>>>> init>(CompressionMetadata.java:110) >>>>> at >>>>> org.apache.cassandra.io.compress.CompressionMetadata. >>>>> create(CompressionMetadata.java:64) >>>>> at >>>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile >>>>> $Builder.complete(CompressedPoolingSegmentedFile.java:42) >>>>> at >>>>> org.apache.cassandra.io.sstable.SSTableReader.load( >>>>> SSTableReader.java:458) >>>>> at >>>>> org.apache.cassandra.io.sstable.SSTableReader.load( >>>>> SSTableReader.java:422) >>>>> at >>>>> org.apache.cassandra.io.sstable.SSTableReader.open( >>>>> SSTableReader.java:203) >>>>> at >>>>> org.apache.cassandra.io.sstable.SSTableReader.open( >>>>> SSTableReader.java:184) >>>>> at >>>>> org.apache.cassandra.io.sstable.SSTableReader$1.run( >>>>> SSTableReader.java:264) >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter. >>>>> call(Executors.java:471) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>> ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>> ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> Caused by: java.io.EOFException >>>>> at >>>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >>>>> at >>>>> org.apache.cassandra.io.compress.CompressionMetadata.< >>>>> init>(CompressionMetadata.java:85) >>>>> ... 12 more >>>>> INFO [main] 2014-04-29 14:44:35,733 ColumnFamilyStore.java (line 248) >>>>> Initializing system.hints >>>>> INFO [main] 2014-04-29 14:44:35,734 AutoSavingCache.java (line 114) >>>>> reading saved cache /data1/saved_caches/system-hints-KeyCache-b.db >>>>> INFO [main] 2014-04-29 14:44:35,737 ColumnFamilyStore.java (line 248) >>>>> Initializing system.schema_keyspaces >>>>> >>>>> >>>> >>>> >>>> On Tue, Apr 29, 2014 at 6:07 PM, Yatong Zhang <bluefl...@gmail.com> >>>> wrote: >>>> >>>> I am pretty sure the disk has plenty of space, I am sure of that. I >>>>> restarted cassandra and everything went fine again. >>>>> >>>>> It's really wired >>>>> >>>>> >>>>> On Tue, Apr 29, 2014 at 5:58 PM, Sylvain Lebresne < >>>>> sylv...@datastax.com>wrote: >>>>> >>>>> The important part of that stack trace is "java.io.IOException: No >>>>>> space >>>>>> left on device", your disks are full (and it's not really a bug that >>>>>> Cassandra error out in that case). >>>>>> >>>>>> -- >>>>>> Sylvain >>>>>> >>>>>> >>>>>> On Tue, Apr 29, 2014 at 11:09 AM, Yatong Zhang <bluefl...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hi there, >>>>>>> >>>>>>> Sorry if this is not the right place to report bugs. I am using 2.0.7 >>>>>>> >>>>>> and I >>>>>> >>>>>>> have a 10 boxes clusters with about 200TB capacity. I just found I >>>>>>> had 3 >>>>>>> boxes with error exceptions. With datastax opscenter I can see these >>>>>>> >>>>>> three >>>>>> >>>>>>> nodes lost connections (no reponse), but after I sshed to these >>>>>>> server, >>>>>>> cassandara were still running, and the 'system.log' still had logs. >>>>>>> >>>>>>> I think this might be a bug so any one would kindly help to >>>>>>> investigate >>>>>>> into it? Thanks~ >>>>>>> >>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,249 >>>>>>> >>>>>> CassandraDaemon.java >>>>>> >>>>>>> (line 198) Exception in thread Thread[CompactionExecutor:1,1,main] >>>>>>>> FSWriteError in >>>>>>>> >>>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-98219-Filter.db >>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>>>> SSTableWriter.java:475) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> org.apache.cassandra.io.util.FileUtils.closeQuietly( >>>>>> FileUtils.java:212) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.io.sstable.SSTableWriter.abort( >>>>>> SSTableWriter.java:301) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.db.compaction.CompactionTask. >>>>>> runWith(CompactionTask.java:209) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow( >>>>>> DiskAwareRunnable.java:48) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> org.apache.cassandra.utils.WrappedRunnable.run( >>>>>> WrappedRunnable.java:28) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal( >>>>>> CompactionTask.java:60) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute( >>>>>> AbstractCompactionTask.java:59) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.db.compaction.CompactionManager$ >>>>>> BackgroundCompactionTask.run(CompactionManager.java:197) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> java.util.concurrent.Executors$RunnableAdapter. >>>>>> call(Executors.java:471) >>>>>> >>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>> ThreadPoolExecutor.java:1145) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>>> ThreadPoolExecutor.java:615) >>>>>> >>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>> Caused by: java.io.IOException: No space left on device >>>>>>>> at java.io.FileOutputStream.write(Native Method) >>>>>>>> at java.io.FileOutputStream.write(FileOutputStream.java: >>>>>>>> 295) >>>>>>>> at >>>>>>>> >>>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.utils.BloomFilterSerializer.serialize( >>>>>> BloomFilterSerializer.java:34) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.utils.Murmur3BloomFilter$ >>>>>> Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> org.apache.cassandra.utils.FilterFactory.serialize( >>>>>> FilterFactory.java:41) >>>>>> >>>>>>> at >>>>>>>> >>>>>>>> >>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>>>> SSTableWriter.java:468) >>>>>> >>>>>>> ... 13 more >>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>>>> >>>>>>> StorageService.java >>>>>> >>>>>>> (line 367) Stopping gossiper >>>>>>>> WARN [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>>>> >>>>>>> StorageService.java >>>>>> >>>>>>> (line 281) Stopping gossip by operator request >>>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:15,406 Gossiper.java >>>>>>>> >>>>>>> (line >>>>>> >>>>>>> 1271) Announcing shutdown >>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>>>> >>>>>>> StorageService.java >>>>>> >>>>>>> (line 372) Stopping RPC server >>>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>>>> ThriftServer.java >>>>>>>> (line 141) Stop listening to thrift clients >>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,417 >>>>>>>> >>>>>>> StorageService.java >>>>>> >>>>>>> (line 377) Stopping native transport >>>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,504 Server.java >>>>>>>> (line >>>>>>>> 181) Stop listening for CQL clients >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >