Re: gracefully recover from data file corruptions

2011-12-16 Thread Ramesh Natarajan
Thanks Ben and Jeremiah. We are actively working with our 3rd party vendors to determine the root cause for this issue. Hopefully we will figure something out. This repair procedure is more like a last resort which i really don't want to use but something to keep in mind if such necessity arises.

Re: gracefully recover from data file corruptions

2011-12-16 Thread Ben Coverston
Hi Ramesh, Every time I have seen this in the last year it has been caused by bad hardware or bad memory. Usually we find errors in the syslog. Jeremiah is right about running repair when you get your nodes back up. Fortunately with the addition of checksums in 1.0 I don't think that the corrupt

Re: gracefully recover from data file corruptions

2011-12-16 Thread Jeremiah Jordan
You need to run repair on the node once it is back up (to get back the data you just deleted). If this is happening on more than one node you could have data loss... -Jeremiah On 12/16/2011 07:46 AM, Ramesh Natarajan wrote: We are running a 30 node 1.0.5 cassandra cluster running RHEL 5.6 x

gracefully recover from data file corruptions

2011-12-16 Thread Ramesh Natarajan
We are running a 30 node 1.0.5 cassandra cluster running RHEL 5.6 x86_64 virtualized on ESXi 5.0. We are seeing Decorated Key assertion error during compactions and at this point we are suspecting anything from OS/ESXi/HBA/iSCSI RAID. Please correct me i am wrong, once a node gets into this state