Thanks for the response. I've checked the system logs and harddisk smartd info, and no errors found. Any hints to locate the problem?
On Wed, Apr 30, 2014 at 9:26 AM, Michael Shuler <mich...@pbandjelly.org>wrote: > Then you likely need to fix your I/O problem. The most recent error you > posted is an EOFException - the file being read ended unexpectedly. > Probably when you ran out of disk space. > > -- > Michael > > > On 04/29/2014 07:48 PM, Yatong Zhang wrote: > >> Here is another type of exception, seems all are I/O related: >> >> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,548 SSTableReader.java >> (line >> >>> 223) Opening >>> /data2/cass/system/compaction_history/system-compaction_history-jb-6956 >>> (447252 bytes) >>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,553 SSTableReader.java >>> (line 223) Opening >>> /data2/cass/system/compaction_history/system-compaction_history-jb-6958 >>> (257 bytes) >>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,554 SSTableReader.java >>> (line 223) Opening >>> /data2/cass/system/compaction_history/system-compaction_history-jb-6957 >>> (257 bytes) >>> INFO [main] 2014-04-29 14:44:35,592 ColumnFamilyStore.java (line 248) >>> Initializing system.batchlog >>> INFO [main] 2014-04-29 14:44:35,596 ColumnFamilyStore.java (line 248) >>> Initializing system.sstable_activity >>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,601 SSTableReader.java >>> (line 223) Opening >>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8084 >>> (1562 >>> bytes) >>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,604 SSTableReader.java >>> (line 223) Opening >>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8083 >>> (2075 >>> bytes) >>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,605 SSTableReader.java >>> (line 223) Opening >>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8085 >>> (1555 >>> bytes) >>> INFO [main] 2014-04-29 14:44:35,687 AutoSavingCache.java (line 114) >>> reading saved cache >>> /data1/saved_caches/system-sstable_activity-KeyCache-b.db >>> INFO [main] 2014-04-29 14:44:35,696 ColumnFamilyStore.java (line 248) >>> Initializing system.peer_events >>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,697 SSTableReader.java >>> (line 223) Opening /data4/cass/system/peer_events/system-peer_events-jb- >>> 181 >>> (12342 bytes) >>> INFO [main] 2014-04-29 14:44:35,717 ColumnFamilyStore.java (line 248) >>> Initializing system.compactions_in_progress >>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,718 SSTableReader.java >>> (line 223) Opening >>> /data5/cass/system/compactions_in_progress/system-compactions_in_ >>> progress-jb-36448 >>> (167 bytes) >>> ERROR [SSTableBatchOpen:1] 2014-04-29 14:44:35,730 CassandraDaemon.java >>> (line 198) Exception in thread Thread[SSTableBatchOpen:1,5,main] >>> org.apache.cassandra.io.sstable.CorruptSSTableException: >>> java.io.EOFException >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata.< >>> init>(CompressionMetadata.java:110) >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata. >>> create(CompressionMetadata.java:64) >>> at >>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile >>> $Builder.complete(CompressedPoolingSegmentedFile.java:42) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.load( >>> SSTableReader.java:458) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.load( >>> SSTableReader.java:422) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.open( >>> SSTableReader.java:203) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.open( >>> SSTableReader.java:184) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader$1.run( >>> SSTableReader.java:264) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>> ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:744) >>> Caused by: java.io.EOFException >>> at >>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >>> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >>> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata.< >>> init>(CompressionMetadata.java:85) >>> ... 12 more >>> INFO [main] 2014-04-29 14:44:35,733 ColumnFamilyStore.java (line 248) >>> Initializing system.hints >>> INFO [main] 2014-04-29 14:44:35,734 AutoSavingCache.java (line 114) >>> reading saved cache /data1/saved_caches/system-hints-KeyCache-b.db >>> INFO [main] 2014-04-29 14:44:35,737 ColumnFamilyStore.java (line 248) >>> Initializing system.schema_keyspaces >>> >>> >> >> >> On Tue, Apr 29, 2014 at 6:07 PM, Yatong Zhang <bluefl...@gmail.com> >> wrote: >> >> I am pretty sure the disk has plenty of space, I am sure of that. I >>> restarted cassandra and everything went fine again. >>> >>> It's really wired >>> >>> >>> On Tue, Apr 29, 2014 at 5:58 PM, Sylvain Lebresne <sylv...@datastax.com >>> >wrote: >>> >>> The important part of that stack trace is "java.io.IOException: No space >>>> left on device", your disks are full (and it's not really a bug that >>>> Cassandra error out in that case). >>>> >>>> -- >>>> Sylvain >>>> >>>> >>>> On Tue, Apr 29, 2014 at 11:09 AM, Yatong Zhang <bluefl...@gmail.com> >>>> wrote: >>>> >>>> Hi there, >>>>> >>>>> Sorry if this is not the right place to report bugs. I am using 2.0.7 >>>>> >>>> and I >>>> >>>>> have a 10 boxes clusters with about 200TB capacity. I just found I had >>>>> 3 >>>>> boxes with error exceptions. With datastax opscenter I can see these >>>>> >>>> three >>>> >>>>> nodes lost connections (no reponse), but after I sshed to these server, >>>>> cassandara were still running, and the 'system.log' still had logs. >>>>> >>>>> I think this might be a bug so any one would kindly help to investigate >>>>> into it? Thanks~ >>>>> >>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,249 >>>>> >>>> CassandraDaemon.java >>>> >>>>> (line 198) Exception in thread Thread[CompactionExecutor:1,1,main] >>>>>> FSWriteError in >>>>>> >>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-98219-Filter.db >>>>> >>>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>> SSTableWriter.java:475) >>>> >>>>> at >>>>>> >>>>>> org.apache.cassandra.io.util.FileUtils.closeQuietly( >>>> FileUtils.java:212) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.io.sstable.SSTableWriter.abort( >>>> SSTableWriter.java:301) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.db.compaction.CompactionTask. >>>> runWith(CompactionTask.java:209) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow( >>>> DiskAwareRunnable.java:48) >>>> >>>>> at >>>>>> >>>>>> org.apache.cassandra.utils.WrappedRunnable.run( >>>> WrappedRunnable.java:28) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal( >>>> CompactionTask.java:60) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute( >>>> AbstractCompactionTask.java:59) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.db.compaction.CompactionManager$ >>>> BackgroundCompactionTask.run(CompactionManager.java:197) >>>> >>>>> at >>>>>> >>>>>> java.util.concurrent.Executors$RunnableAdapter. >>>> call(Executors.java:471) >>>> >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at >>>>>> >>>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>>> ThreadPoolExecutor.java:1145) >>>> >>>>> at >>>>>> >>>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>> ThreadPoolExecutor.java:615) >>>> >>>>> at java.lang.Thread.run(Thread.java:744) >>>>>> Caused by: java.io.IOException: No space left on device >>>>>> at java.io.FileOutputStream.write(Native Method) >>>>>> at java.io.FileOutputStream.write(FileOutputStream.java:295) >>>>>> at >>>>>> >>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.utils.BloomFilterSerializer.serialize( >>>> BloomFilterSerializer.java:34) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.utils.Murmur3BloomFilter$ >>>> Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) >>>> >>>>> at >>>>>> >>>>>> org.apache.cassandra.utils.FilterFactory.serialize( >>>> FilterFactory.java:41) >>>> >>>>> at >>>>>> >>>>>> >>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>> SSTableWriter.java:468) >>>> >>>>> ... 13 more >>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>> >>>>> StorageService.java >>>> >>>>> (line 367) Stopping gossiper >>>>>> WARN [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>> >>>>> StorageService.java >>>> >>>>> (line 281) Stopping gossip by operator request >>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:15,406 Gossiper.java >>>>>> >>>>> (line >>>> >>>>> 1271) Announcing shutdown >>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>> >>>>> StorageService.java >>>> >>>>> (line 372) Stopping RPC server >>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>> ThriftServer.java >>>>>> (line 141) Stop listening to thrift clients >>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,417 >>>>>> >>>>> StorageService.java >>>> >>>>> (line 377) Stopping native transport >>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,504 Server.java >>>>>> (line >>>>>> 181) Stop listening for CQL clients >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >