Re: Experiencing Timeouts on one node

2015-07-06 Thread Jason Wee
3. How do we rebuild System keyspace? wipe this node and start it all over. hth jason On Tue, Jul 7, 2015 at 12:16 AM, Shashi Yachavaram wrote: > When we reboot the problematic node, we see the following errors in > system.log. > > 1. Does this mean hints column family is corrupted? > 2. Can

Re: Experiencing Timeouts on one node

2015-07-06 Thread Shashi Yachavaram
When we reboot the problematic node, we see the following errors in system.log. 1. Does this mean hints column family is corrupted? 2. Can we scrub system column family on problematic node and its replication partners? 3. How do we rebuild System keyspace?

Re: Experiencing Timeouts on one node

2015-07-02 Thread Alain RODRIGUEZ
Hi, I am not sure about what is happening (I have never seen this error before). Yet from https://github.com/apache/cassandra/blob/cassandra-1.2/CHANGES.txt it looks like some bugs were fixed in late revision of 1.2.x. I would advice you upgrading to last 1.2.19 (It is an old and stable version,

Re: Experiencing Timeouts on one node

2015-07-02 Thread Shashi Yachavaram
Jason, The load was evenly distributed. And regarding network connectivity, our applications were successfully able to connect to the node, but the read and write operations were timing out. Also we were able to ssh to this node. I just pasted "/bin/nodetool -h node version" and "java -version".

Re: Experiencing Timeouts on one node

2015-07-02 Thread Jason Wee
you should check the network connectivity for this node and also its system average load. is that typo or literary what it is, cassandra 1.2.15.*1* and java 6 update *85* ? On Thu, Jul 2, 2015 at 12:59 AM, Shashi Yachavaram wrote: > We have a 28 node cluster, out of which only one node is expe

Experiencing Timeouts on one node

2015-07-01 Thread Shashi Yachavaram
We have a 28 node cluster, out of which only one node is experiencing timeouts. We thought it was the raid, but there are two other nodes on the same raid without any problem. Also The problem goes away if we reboot the node, and then reappears after seven days. The following hinted hand-off timeo