Re: Rebuild to a new DC fails every time

2018-01-11 Thread Martin Mačura
Thanks for the tips, Alan. The cluster is entirely healthy. But the connection between DCs is a VPN, managed by a third party - it is possible it might be flaky. However, I would expect the rebuild job to be able to recover from connection timeout/reset type of errors without a need for manual int

Re: Rebuild to a new DC fails every time

2018-01-10 Thread Alain RODRIGUEZ
Hello Martin. Did you solve your issue? I would say that this exception could be due to 'streaming_socket_timeout_in_ms' indeed. Make sure you have a large value enough or indeed upgrade to a newer version implementing the keep alive is indeed an interesting thing to try. The thing is if you are

Re: Rebuild to a new DC fails every time

2018-01-08 Thread Martin Mačura
None of the files is listed more than once in the logs: java.lang.RuntimeException: Transfer of file /fs3/cassandra/data//event_group-3b5782d08e4411e6842917253f111990/mc-116042-big-Data.db already completed or aborted (perhaps session failed?). java.lang.RuntimeException: Transfer of file /fs0/cas

Re: Rebuild to a new DC fails every time

2018-01-07 Thread kurt greaves
If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms is 48 hours. Appears rebuild is trying to stream the same file twice. Are there other exceptions in the logs related to the file, or can you find out if it's previously been sent by the same session? Search the logs for the fi

Re: Rebuild to a new DC fails every time

2017-12-29 Thread Martin Mačura
Is this something that can be resolved by CASSANDRA-11841 ? Thanks, Martin On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura wrote: > Hi all, > we are trying to add a new datacenter to the existing cluster, but the > 'nodetool rebuild' command always fails after a couple of hours. > > We're on Cas

Rebuild to a new DC fails every time

2017-12-21 Thread Martin Mačura
Hi all, we are trying to add a new datacenter to the existing cluster, but the 'nodetool rebuild' command always fails after a couple of hours. We're on Cassandra 3.9. Example 1: 172.24.16.169 INFO [STREAM-IN-/172.25.16.125:55735] 2017-12-13 23:55:38,840 StreamResultFuture.java:174 - [Stream #b