Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
etup: Cassandra 2.0.14. 2 DCs with 3 nodes each connected via 10Gbps VPN. We run repair with -par and -pr option. Problem: Repair Hangs. Merkle Tree Responses are not received from one or more nodes in remote DC. Observations till now: 1. Repair hangs intermittently on one node of  DC2.. Only on

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
Hi All, I am summarizing the setup, problem & key observations till now: Setup: Cassandra 2.0.14. 2 DCs with 3 nodes each connected via 10Gbps VPN. We run repair with -par and -pr option. Problem: Repair Hangs. Merkle Tree Responses are not received from one or more nodes in remot

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The is

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Paulo Motta
k team to capture netstats and tcpdump > too.. > > Thanks > Anuj > > > > On Wed, 18/11/15, Anuj Wadehra wrote: > > Subject: Re: Repair Hangs while requesting Merkle Trees > To: "user@cassandra.apache.org"

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Anuj Wadehra
: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" Date: Wednesday, 18 November, 2015, 7:57 AM Thanks Bryan !! Connection is in ESTBLISHED state on on end and completely missing at other end (in another dc). Yes, we can revisit TCP tuning.But the probl

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Anuj Wadehra
Cheng" Date:Wed, 18 Nov, 2015 at 2:04 am Subject:Re: Repair Hangs while requesting Merkle Trees Ah OK, might have misunderstood you. Streaming socket should not be in play during merkle tree generation (validation compaction). They may come in play during merkle tree exchange- that I'm not

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Bryan Cheng
s > Anuj > > > > > > > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Bryan Cheng" > *Date*:Tue, 17 Nov, 2015 at 5:54 am > > *Subject*:Re: Repair Hangs while requesting Merkle Tr

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Anuj Wadehra
different?  Thanks Anuj Sent from Yahoo Mail on Android From:"Bryan Cheng" Date:Tue, 17 Nov, 2015 at 5:54 am Subject:Re: Repair Hangs while requesting Merkle Trees Hi Anuj, Did you mean streaming_socket_timeout_in_ms? If not, then you definitely want that set. Even the be

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Bryan Cheng
gt; Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Anuj Wadehra" > *Date*:Sat, 14 Nov, 2015 at 11:59 pm > > *Subject*:Re: Repair Hangs while requesting Merkle Trees

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
roid From:"Anuj Wadehra" Date:Sat, 14 Nov, 2015 at 11:59 pm Subject:Re: Repair Hangs while requesting Merkle Trees Thanks Daemeon !! I wil capture the output of netstats and share in next few days. We were thinking of taking tcp dumps also. If its a network issue and increasing request

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
. Is it related some how? Thanks Anuj Sent from Yahoo Mail on Android From:"daemeon reiydelle" Date:Thu, 12 Nov, 2015 at 10:34 am Subject:Re: Repair Hangs while requesting Merkle Trees Have you checked the network statistics on that machine? (netstats -tas) while attempting to rep

Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread daemeon reiydelle
5 nodes. On only > one node in DC2, we are unable to complete repair as it always hangs. Node > sends Merkle Tree requests, but one or more nodes in DC1 (remote) never > show that they sent the merkle tree reply to requesting node. > Repair hangs infinitely. > > After increasing req

Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread Anuj Wadehra
show that they sent the merkle tree reply to requesting node. Repair hangs infinitely. After increasing request_timeout_in_ms on affected node, we were able to successfully run repair on one of the two occassions. Any comments, why this is happening on just one node? In

Repair Hangs while requesting Merkle Trees

2015-11-11 Thread Anuj Wadehra
merkle tree reply to requesting node. Repair hangs infinitely. After increasing request_timeout_in_ms on affected node, we were able to successfully run repair on one of the two occassions. Any comments, why this is happening on just one node? In OutboundTcpConnection.java,  when isTimeOut

Re: Repair hangs, seems to be stuck somehow

2014-10-21 Thread Alain RODRIGUEZ
> Thanks Robert, I believe this is a good idea but I was doing it already. > > "If you are really overprovisioned and on real hardware and network and > SSD, it might work sometimes." > > I am on AWS and was on m1.small to m1.xlarge from Cassandra 0.8 to 1.2.18, > that&#

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Alain RODRIGUEZ
about the most?" Thanks Robert, I believe this is a good idea but I was doing it already. "If you are really overprovisioned and on real hardware and network and SSD, it might work sometimes." I am on AWS and was on m1.small to m1.xlarge from Cassandra 0.8 to 1.2.18, that's the

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Robert Coli
On Mon, Oct 20, 2014 at 5:45 AM, Alain RODRIGUEZ wrote: > I now that 2.1 fixes this all. We are going to migrate to C* 2.0 soon > (asap) and then to 2.1, but we first need to run some tests, which will > take us some time. Is repair officially broken on 1.2.18 ? Is there any > known workaround or

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Robert Coli
On Mon, Oct 20, 2014 at 5:45 AM, Alain RODRIGUEZ wrote: > Using Cassandra 1.2.18, we are experimenting an issue in our 2 DC > (EC2MultiRegionSnitch) C*1.2.18 cluster. > > We have 2 DC and I saw some weird* inconsistencies between our 2 DC. I > tried to run repair on all the nodes of all 2 DC (We

Repair hangs, seems to be stuck somehow

2014-10-20 Thread Alain RODRIGUEZ
Hi, Using Cassandra 1.2.18, we are experimenting an issue in our 2 DC (EC2MultiRegionSnitch) C*1.2.18 cluster. We have 2 DC and I saw some weird* inconsistencies between our 2 DC. I tried to run repair on all the nodes of all 2 DC (We tried running various repair at the same time and also in a ro

Re: Repair hangs - Cassandra 1.2.10

2013-12-09 Thread Aaron Morton
> I changed logging to debug level, but still nothing is logged. > Again - any help will be appreciated. There is nothing at the ERROR level on any machine ? check nodetool compactionstats to see if a validation compaction is running, the repair may be waiting on this. check nodetool netstats

Re: Repair hangs - Cassandra 1.2.10

2013-12-04 Thread Tamar Rosen
Update - I am still experiencing the above issues, but not all the time. I was able to run repair (on this keyspace) from node 2 and from node 4, but now a different keyspace hangs on these nodes, and I am still not able to run repair on node 1. It seems random. I changed logging to debug level, bu

Repair hangs - Cassandra 1.2.10

2013-12-02 Thread Tamar Rosen
Hi, On AWS, we had a 2 node cluster with RF 2. We added 2 more nodes, then changed RF to 3 on all our keyspaces. Next step was to run nodetool repair, node by node. (In the meantime, we found that we must use CL quorum, which is affecting our application's performance). Started with node 1, which

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-06 Thread aaron morton
> If I wait 24 hours, the repair command will return an error saying that the > node died… but the node really didn't die, I watch it the whole time. Can you include the error, it makes it easier to know what's going on. You should see INFO messages on the node you are running repair on that say

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread Paul Sudol
> How does it fail? If I wait 24 hours, the repair command will return an error saying that the node died… but the node really didn't die, I watch it the whole time. I have the DEBUG messages on in the log files, when the node I'm repairing sends out a merkle tree request, I will normally see, {C

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread aaron morton
> A repair on a certain CF will fail, and I run it again and again, eventually > it will succeed. How does it fail? Can you see the repair start on the other node ? If you are getting errors in the log about streaming failing because a node died, and the FailureDetector is in the call stack, ch

Repair hangs when merkle tree request is not acknowledged

2013-04-04 Thread Paul Sudol
Hello, I have a cluster with 4 nodes, 2 nodes in 2 data centers. I had a hardware failure in one DC and had to replace the nodes. I'm running 1.2.3 on all of the nodes now. I was able to run nodetool rebuild on the two replacement nodes, but now I cannot finish a repair on any of them. I have 1

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Upgrading to 1.2.3 fixed the -pr Repair.. I'll just use that from now on (which is what I prefer!) Thanks, Ryan On Wed, Mar 27, 2013 at 9:11 AM, Ryan Lowe wrote: > Marco, > > No there are no errors... the last line I see in my logs related to repair > is : > > [repair #...] Sending completed m

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Marco, No there are no errors... the last line I see in my logs related to repair is : [repair #...] Sending completed merkle tree to /[node] for (keyspace1,columnfamily1) Ryan On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo < marco.matara...@hexkeep.com> wrote: > > If I run `nodetool -h lo

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Marco Matarazzo
> If I run `nodetool -h localhost repair`, then it will repair only the first > Keyspace and then hang... I let it go for a week and nothing. Does node logs show any error ? > If I run `nodetool -h localhost repair -pr`, then it appears to only repair > the first VNode range, but does do all ke

Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Has anyone else experienced this? After upgrading to VNodes, I am having Repair issues. If I run `nodetool -h localhost repair`, then it will repair only the first Keyspace and then hang... I let it go for a week and nothing. If I run `nodetool -h localhost repair -pr`, then it appears to only r

Re: repair hangs

2013-03-18 Thread aaron morton
> /raid0/cassandra/data/OpsCenter/events_timeline/OpsCenter-events_timeline-hf-1-Data.db > is not compatible with current version ib > -- This can be fixed with a nodetool upgradesstables Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.the

Re: repair hangs

2013-03-14 Thread Dane Miller
On Thu, Mar 14, 2013 at 6:34 AM, aaron morton wrote: >> 1. is this a nodetool bug? is there any way to propagate the >> java.io.IOException back to nodetool? > The repair continues to work even if nodetool fails, it's a server side thing. > >> 2. network problems on EC2, I'm shocked! are there r

Re: repair hangs

2013-03-14 Thread aaron morton
> 1. is this a nodetool bug? is there any way to propagate the > java.io.IOException back to nodetool? The repair continues to work even if nodetool fails, it's a server side thing. > 2. network problems on EC2, I'm shocked! are there recommended > network settings for EC2? Streaming does not p

Re: repair hangs

2013-03-13 Thread Dane Miller
On Wed, Mar 13, 2013 at 12:39 PM, Wei Zhu wrote: > My guess would be there is some exception during the repair and your session > is aborted. > Here is the code of doing repair: > >https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java > > loo

Re: repair hangs

2013-03-13 Thread Wei Zhu
should give you a rough idea in which stage repaired died. -Wei - Original Message - From: "Dane Miller" To: user@cassandra.apache.org, "Wei Zhu" Sent: Wednesday, March 13, 2013 12:32:20 PM Subject: Re: repair hangs On Wed, Mar 13, 2013 at 11:44 AM, Wei Zhu wrote:

Re: repair hangs

2013-03-13 Thread Dane Miller
On Wed, Mar 13, 2013 at 11:44 AM, Wei Zhu wrote: >Do you see anything related to "merkle" tree in your log? > >Also do a nodetool compactionstats, during merkle tree calculation, you will >see >validation there. The last mention of "merkle" is 2 days old. compactionstats are: $ nodetool compac

Re: repair hangs

2013-03-13 Thread Wei Zhu
10:54:50 AM Subject: repair hangs Hi, On one of my nodes, nodetool repair -pr has been running for 48 hours and appears to be hung, with no output and no AntiEntropy messages in system.log for 40+ hours. Load, cpu, etc are all near zero. There are no other repair jobs running in my cluster. Wh

repair hangs

2013-03-13 Thread Dane Miller
Hi, On one of my nodes, nodetool repair -pr has been running for 48 hours and appears to be hung, with no output and no AntiEntropy messages in system.log for 40+ hours. Load, cpu, etc are all near zero. There are no other repair jobs running in my cluster. What's the recommended way to deal wi