Hi All, Recently, we encountered an error on 1.0.12 that prevented cassandra from starting up. From the log messages, it looked like the table/keyspace was opened before the scrubDataDirectories was executed. This created a race condition between two threads. One was trying to rename files while the other was trying to remove tmp files. I was wondering if anyone could provide us some information or workaround for this.
INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186) CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is 3.7553409423470883 (just-counted was 3.1413828689370487). calculation took 2ms for 265 columns INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes) INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes) INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874 ColumnFamilyStore.java (line 705) Enqueuing flush of Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live bytes, 275 ops) INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java (line 184) Creating new index : ColumnDefinition{name=6d65736853534944, validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, index_name='fmzd_ap_meshSSID'} INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes) INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes) INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java (line 184) Creating new index : ColumnDefinition{name=6d6f62696c6974795a6f6e6555554944, validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, index_name='fmzd_ap_mobilityZoneUUID'} ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main] java.io.IOError: java.io.IOException: rename failed of /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: rename failed of /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db at org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.java:355) at org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:371) ... 9 more INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,917 SSTableReader.java (line 153) Opening /test/db/data/fmzd/ap.fmzd_ap_mobilityZoneUUID-hd-1 (312 bytes) INFO [FlushWriter:2] 2013-04-09 02:49:39,916 Memtable.java (line 246) Writing Memtable-alarm.fmzd_alarm_alarmCode@402202831(2958/22542 serialized/live bytes, 58 ops) ERROR [main] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java (line 373) Exception encountered during startup java.io.IOError: java.io.IOException: Failed to delete /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:372) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:415) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) Caused by: java.io.IOException: Failed to delete /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44) at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141) at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:368) ... 4 more INFO [OptionalTasks:1] 2013-04-09 02:49:39,923 SecondaryIndexManager.java (line 184) Creating new index : ColumnDefinition{name=6d6f64656c, validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, index_name='fmzd_ap_model'} Thanks and Regards, Boris