Re: some solr replicas down

2018-06-20 Thread Shawn Heisey
On 6/20/2018 6:39 AM, Satya Marivada wrote: Yes, there are some other errors that there is a javabin character 2 expected and is returning 60 which is "<" . This happens when the response is an error.  Error responses are sent in HTML format (so they render properly when viewed in a browser),

Re: some solr replicas down

2018-06-20 Thread Chris Ulicny
Having time drift longer than the TTL would definitely cause these types of problems. In our case, the clusters are time-synchronized and the error is still encountered periodically. On Wed, Jun 20, 2018 at 10:07 AM Erick Erickson wrote: > We've seen this exact issue when the times reported by

Re: some solr replicas down

2018-06-20 Thread Erick Erickson
We've seen this exact issue when the times reported by various machines have different wall-clock times, so getting these times coordinated is definitely the very first thing I'd do. It's particularly annoying because if the clocks are drifting apart gradually, your setup can be running find for d

Re: some solr replicas down

2018-06-20 Thread Satya Marivada
Chris, You are spot on with the timestamps. The date command returns different times on these vms and are not in sync with ntp. The ntpstat returns a difference of about 8-10 seconds on the 4 vms and that would caused this synchronization issues and marked the replicas as down. This just happened

Re: some solr replicas down

2018-06-19 Thread Chris Ulicny
Satya, There should be some other log messages that are probably relevant to the issue you are having. Something along the lines of "leader cannot communicate with follower...publishing replica as down." It's likely there also is a message of "expecting json/xml but got html" in another instance's