Re: bugs report for 2.0.7

Yatong Zhang Wed, 30 Apr 2014 01:19:17 -0700

It looks like Cassandra will calculate free space by summing all the disks,
and this will result in a big compaction file that can not fit in one disk.
So any one could clarify this please?



On Wed, Apr 30, 2014 at 2:09 PM, Yatong Zhang <bluefl...@gmail.com> wrote:

> I am using CQL 3 to create a table to store images and very image was
> about 200K ~ 500K. I have 6 harddisks per node and cassandra was configured
> with 6 data directories:
>
> data_file_directories:
>>     - /data1/cass
>>     - /data2/cass
>>     - /data3/cass
>>     - /data4/cass
>>     - /data5/cass
>>     - /data6/cass
>>
>
> And every directory is on a standalone disk. But I just found when the
> error occurred:
>
> [root@node5 images]# ll -hl
>> total 3.6T
>> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots
>> -rw-r--r-- 1 root root 456M Apr 30 13:42
>> mydb-images-tmp-jb-91068-CompressionInfo.db
>> -rw-r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db
>> -rw-r--r-- 1 root root    0 Apr 30 13:42
>> mydb-images-tmp-jb-91068-Filter.db
>> -rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp-jb-91068-Index.db
>>
>
> [root@node5 images]# df -hl
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda1        49G  7.5G   39G  17% /
> tmpfs           7.8G     0  7.8G   0% /dev/shm
> /dev/sda3       3.6T  1.3T  2.1T  38% /data1
> /dev/sdb1       3.6T  1.4T  2.1T  39% /data2
> /dev/sdc1       3.6T  466G  3.0T  14% /data3
> /dev/sdd1       3.6T  1.3T  2.2T  38% /data4
> /dev/sde1       3.6T  1.3T  2.2T  38% /data5
> /dev/sdf1       3.6T  3.6T     0 100% /data6
>
> *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk space (4T
> harddisk with 3.6T actual usable size)
>
> after I restated cassandra, very thing seems to be fine:
>
> -rw-r--r-- 1 root root  19K Apr 30 13:58
>> mydb_oe-images-tmp-jb-96242-CompressionInfo.db
>> -rw-r--r-- 1 root root 145M Apr 30 13:58
>> mydb_oe-images-tmp-jb-96242-Data.db
>> -rw-r--r-- 1 root root  64K Apr 30 13:58
>> mydb_oe-images-tmp-jb-96242-Index.db
>>
>
> [root@node5 images]# df -hl
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda1        49G  7.5G   39G  17% /
> tmpfs           7.8G     0  7.8G   0% /dev/shm
> /dev/sda3       3.6T  1.3T  2.1T  38% /data1
> /dev/sdb1       3.6T  1.4T  2.1T  39% /data2
> /dev/sdc1       3.6T  466G  3.0T  14% /data3
> /dev/sdd1       3.6T  1.3T  2.2T  38% /data4
> /dev/sde1       3.6T  1.3T  2.2T  38% /data5
> /dev/sdf1       3.6T  662M  3.4T   1% /data6
>
> So my questions are:
>
> 1. I am using CQL3 and is there a limit for 'tables' created by CQL3?
> 2. I specified 6 data directories with each on a stand alone disk, is it
> OK?
> 3. Why the tmp db file is so large? Is it normal or a bug?
>
>
> So could any one please help to solve this issue? Any help is of great
> appreciation and thanks a lot!
>
>
> On Wed, Apr 30, 2014 at 12:04 PM, Yatong Zhang <bluefl...@gmail.com>wrote:
>
>> Thanks for the response. I've checked the system logs and harddisk smartd
>> info, and no errors found. Any hints to locate the problem?
>>
>>
>> On Wed, Apr 30, 2014 at 9:26 AM, Michael Shuler 
>> <mich...@pbandjelly.org>wrote:
>>
>>> Then you likely need to fix your I/O problem. The most recent error you
>>> posted is an EOFException - the file being read ended unexpectedly.
>>> Probably when you ran out of disk space.
>>>
>>> --
>>> Michael
>>>
>>>
>>> On 04/29/2014 07:48 PM, Yatong Zhang wrote:
>>>
>>>> Here is another type of exception, seems all are I/O related:
>>>>
>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,548 SSTableReader.java
>>>> (line
>>>>
>>>>> 223) Opening
>>>>> /data2/cass/system/compaction_history/system-compaction_
>>>>> history-jb-6956
>>>>> (447252 bytes)
>>>>>   INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,553 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data2/cass/system/compaction_history/system-compaction_
>>>>> history-jb-6958
>>>>> (257 bytes)
>>>>>   INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,554 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data2/cass/system/compaction_history/system-compaction_
>>>>> history-jb-6957
>>>>> (257 bytes)
>>>>>   INFO [main] 2014-04-29 14:44:35,592 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.batchlog
>>>>>   INFO [main] 2014-04-29 14:44:35,596 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.sstable_activity
>>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,601 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8084
>>>>> (1562
>>>>> bytes)
>>>>>   INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,604 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8083
>>>>> (2075
>>>>> bytes)
>>>>>   INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,605 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8085
>>>>> (1555
>>>>> bytes)
>>>>>   INFO [main] 2014-04-29 14:44:35,687 AutoSavingCache.java (line 114)
>>>>> reading saved cache
>>>>> /data1/saved_caches/system-sstable_activity-KeyCache-b.db
>>>>>   INFO [main] 2014-04-29 14:44:35,696 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.peer_events
>>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,697 SSTableReader.java
>>>>> (line 223) Opening /data4/cass/system/peer_
>>>>> events/system-peer_events-jb-181
>>>>> (12342 bytes)
>>>>>   INFO [main] 2014-04-29 14:44:35,717 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.compactions_in_progress
>>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,718 SSTableReader.java
>>>>> (line 223) Opening
>>>>> /data5/cass/system/compactions_in_progress/system-compactions_in_
>>>>> progress-jb-36448
>>>>> (167 bytes)
>>>>> ERROR [SSTableBatchOpen:1] 2014-04-29 14:44:35,730 CassandraDaemon.java
>>>>> (line 198) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>>>> java.io.EOFException
>>>>>          at
>>>>> org.apache.cassandra.io.compress.CompressionMetadata.<
>>>>> init>(CompressionMetadata.java:110)
>>>>>          at
>>>>> org.apache.cassandra.io.compress.CompressionMetadata.
>>>>> create(CompressionMetadata.java:64)
>>>>>          at
>>>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile
>>>>> $Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>>>>          at
>>>>> org.apache.cassandra.io.sstable.SSTableReader.load(
>>>>> SSTableReader.java:458)
>>>>>          at
>>>>> org.apache.cassandra.io.sstable.SSTableReader.load(
>>>>> SSTableReader.java:422)
>>>>>          at
>>>>> org.apache.cassandra.io.sstable.SSTableReader.open(
>>>>> SSTableReader.java:203)
>>>>>          at
>>>>> org.apache.cassandra.io.sstable.SSTableReader.open(
>>>>> SSTableReader.java:184)
>>>>>          at
>>>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(
>>>>> SSTableReader.java:264)
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.
>>>>> call(Executors.java:471)
>>>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>          at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1145)
>>>>>          at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:615)
>>>>>          at java.lang.Thread.run(Thread.java:744)
>>>>> Caused by: java.io.EOFException
>>>>>          at
>>>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>>>>          at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>>>>          at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>>>>          at
>>>>> org.apache.cassandra.io.compress.CompressionMetadata.<
>>>>> init>(CompressionMetadata.java:85)
>>>>>          ... 12 more
>>>>>   INFO [main] 2014-04-29 14:44:35,733 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.hints
>>>>>   INFO [main] 2014-04-29 14:44:35,734 AutoSavingCache.java (line 114)
>>>>> reading saved cache /data1/saved_caches/system-hints-KeyCache-b.db
>>>>>   INFO [main] 2014-04-29 14:44:35,737 ColumnFamilyStore.java (line 248)
>>>>> Initializing system.schema_keyspaces
>>>>>
>>>>>
>>>>
>>>>
>>>> On Tue, Apr 29, 2014 at 6:07 PM, Yatong Zhang <bluefl...@gmail.com>
>>>> wrote:
>>>>
>>>>  I am pretty sure the disk has plenty of space, I am sure of that. I
>>>>> restarted cassandra and everything went fine again.
>>>>>
>>>>> It's really wired
>>>>>
>>>>>
>>>>> On Tue, Apr 29, 2014 at 5:58 PM, Sylvain Lebresne <
>>>>> sylv...@datastax.com>wrote:
>>>>>
>>>>>  The important part of that stack trace is "java.io.IOException: No
>>>>>> space
>>>>>> left on device", your disks are full (and it's not really a bug that
>>>>>> Cassandra error out in that case).
>>>>>>
>>>>>> --
>>>>>> Sylvain
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 29, 2014 at 11:09 AM, Yatong Zhang <bluefl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Hi there,
>>>>>>>
>>>>>>> Sorry if this is not the right place to report bugs. I am using 2.0.7
>>>>>>>
>>>>>> and I
>>>>>>
>>>>>>> have a 10 boxes clusters with about 200TB capacity. I just found I
>>>>>>> had 3
>>>>>>> boxes with error exceptions. With datastax opscenter I can see these
>>>>>>>
>>>>>> three
>>>>>>
>>>>>>> nodes lost connections (no reponse), but after I sshed to these
>>>>>>> server,
>>>>>>> cassandara were still running, and the 'system.log' still had logs.
>>>>>>>
>>>>>>> I think this might be a bug so any one would kindly help to
>>>>>>> investigate
>>>>>>> into it? Thanks~
>>>>>>>
>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,249
>>>>>>>
>>>>>> CassandraDaemon.java
>>>>>>
>>>>>>> (line 198) Exception in thread Thread[CompactionExecutor:1,1,main]
>>>>>>>> FSWriteError in
>>>>>>>>
>>>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-98219-Filter.db
>>>>>>>
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(
>>>>>> SSTableWriter.java:475)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>  org.apache.cassandra.io.util.FileUtils.closeQuietly(
>>>>>> FileUtils.java:212)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter.abort(
>>>>>> SSTableWriter.java:301)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.db.compaction.CompactionTask.
>>>>>> runWith(CompactionTask.java:209)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(
>>>>>> DiskAwareRunnable.java:48)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>  org.apache.cassandra.utils.WrappedRunnable.run(
>>>>>> WrappedRunnable.java:28)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.db.compaction.CompactionTask.executeInternal(
>>>>>> CompactionTask.java:60)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(
>>>>>> AbstractCompactionTask.java:59)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.db.compaction.CompactionManager$
>>>>>> BackgroundCompactionTask.run(CompactionManager.java:197)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>  java.util.concurrent.Executors$RunnableAdapter.
>>>>>> call(Executors.java:471)
>>>>>>
>>>>>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>> ThreadPoolExecutor.java:1145)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>> ThreadPoolExecutor.java:615)
>>>>>>
>>>>>>>          at java.lang.Thread.run(Thread.java:744)
>>>>>>>> Caused by: java.io.IOException: No space left on device
>>>>>>>>          at java.io.FileOutputStream.write(Native Method)
>>>>>>>>          at java.io.FileOutputStream.write(FileOutputStream.java:
>>>>>>>> 295)
>>>>>>>>          at
>>>>>>>>
>>>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.utils.BloomFilterSerializer.serialize(
>>>>>> BloomFilterSerializer.java:34)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.utils.Murmur3BloomFilter$
>>>>>> Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>  org.apache.cassandra.utils.FilterFactory.serialize(
>>>>>> FilterFactory.java:41)
>>>>>>
>>>>>>>          at
>>>>>>>>
>>>>>>>>
>>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(
>>>>>> SSTableWriter.java:468)
>>>>>>
>>>>>>>          ... 13 more
>>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,406
>>>>>>>>
>>>>>>> StorageService.java
>>>>>>
>>>>>>> (line 367) Stopping gossiper
>>>>>>>>   WARN [CompactionExecutor:1] 2014-04-29 05:55:15,406
>>>>>>>>
>>>>>>> StorageService.java
>>>>>>
>>>>>>> (line 281) Stopping gossip by operator request
>>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:15,406 Gossiper.java
>>>>>>>>
>>>>>>> (line
>>>>>>
>>>>>>> 1271) Announcing shutdown
>>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,406
>>>>>>>>
>>>>>>> StorageService.java
>>>>>>
>>>>>>> (line 372) Stopping RPC server
>>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:17,406
>>>>>>>> ThriftServer.java
>>>>>>>> (line 141) Stop listening to thrift clients
>>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,417
>>>>>>>>
>>>>>>> StorageService.java
>>>>>>
>>>>>>> (line 377) Stopping native transport
>>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:17,504 Server.java
>>>>>>>> (line
>>>>>>>> 181) Stop listening for CQL clients
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: bugs report for 2.0.7

Reply via email to