Re: timeout while doing repair

2011-11-24 Thread Jahangir Mohammed
That will give you a snapshot of thread pools. You should look at ROW-READ-STAGE and see pending and active. If there are many pending, it means that the cluster is not able to keep up with the read requests coming along. Thanks, Jahangir Mohammed. On Thu, Nov 24, 2011 at 2:14 PM, Patrik Modesto

Re: timeout while doing repair

2011-11-24 Thread Patrik Modesto
We have our own servers, it is 16 core CPU, 32GB ram,8 1TB disks. I didn't check tpstats, just iotop where cassandra used all the io capacity when compacting/repairing. I had to completely clean the test cluster, but I'll check tpstats in the production. What should I look for? Regards, Patrik D

Re: timeout while doing repair

2011-11-24 Thread Jahangir Mohammed
What I know is timeout is because of increased load on node due to repair. Hardware? EC2? Did you check tpstats? On Thu, Nov 24, 2011 at 11:42 AM, Patrik Modesto wrote: > Thanks for the reply. I know I can configure longer timeout but in our use > case, reply longer than 1second is unacceptable

Re: timeout while doing repair

2011-11-24 Thread Patrik Modesto
Thanks for the reply. I know I can configure longer timeout but in our use case, reply longer than 1second is unacceptable. What I don't understand is why I get timeout while reading differrent keyspace than the repair is working on. I get timeouts even doing compaction. Besides usual access we d

Re: timeout while doing repair

2011-11-24 Thread Jahangir Mohammed
Do you use any client which gives you this timeout ? If you don't specify any timeout from client, look at rpc_timeout_in_ms. Increase it and see if you still suffer this. Repair is a costly process. Thanks, Jahangir Mohammed. On Thu, Nov 24, 2011 at 2:45 AM, Patrik Modesto wrote: > Hi, > >

timeout while doing repair

2011-11-23 Thread Patrik Modesto
Hi, I have a test cluster of 4 nodes running Debian and Cassandra 0.8.7, there are 3 keyspaces, all with RF=3, a node has load around 40GB. When I run "nodetool repair" after a while all thrift clients that read with CL.QUORUM get TimeoutException and even some that use just CL.ONE. I've tried to