Perhaps this is: SOLR-11660?

On Wed, May 2, 2018 at 4:46 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 5/2/2018 3:52 PM, Michael B. Klein wrote:
>> It works ALMOST perfectly. The restore operation reports success, and if I
>> look at the UI, everything looks great in the Cloud graph view. All green,
>> one leader and two other active instances per collection.
>>
>> But once we start updating, we run into problems. The two NON-leaders in
>> each collection get the updates, but the leader never does. Since the
>> instances are behind a round robin load balancer, every third query hits an
>> out-of-date core, with unfortunate (for our near-real-time indexing
>> dependent app) results.
>
> That is completely backwards from what I would expect in a problem
> report.  The leader coordinates all indexing, so if the two other
> replicas are getting the updates, that means that at least part of the
> functionality of the leader replica *IS* working.
>
> Side FYI: Unless you're using preferLocalShards=true, Solr will actually
> load balance your load balanced requests.  If your external load
> balancer sends queries to replica1, replica1 may forward the request to
> replica3 because of SolrCloud's own internal load balancing.  The
> preferLocalShards parameter will keep that from happening *if* the
> machine receiving the query has the replicas required to satisfy the query.
>
>> Reloading the collection doesn't seem to help, but if I use the Collections
>> API to DELETEREPLICA the leader of each collection and follow it with an
>> ADDREPLICA, everything syncs up (with a new leader) and stays in sync from
>> there on out.
>>
>> I don't know what to look for in my settings or my logs to diagnose or try
>> to fix this issue. It only affects collections that have been restored from
>> backup. Any suggestions or guidance would be a big help.
>
> I don't know what to look for in the logs either, but the first thing to
> check for is any messages at WARN or ERROR logging levels.  These kind
> of messages should also show up in the admin UI logging tab, but
> recovering the full text of those messages is much easier in the logfile
> than the admin UI.
>
> Have you tried restarting the Solr instances after restoring the
> collection?  This shouldn't be required, but at this point I'm hoping to
> at least get you limping along, even if it requires steps that are
> obvious indications of a bug.
>
> Since you're running 6.6 and 6.x is in maintenance mode, it's not likely
> that any bugs revealed will be fixed on 6.x, but maybe we can track it
> down and see if it's still a problem in 7.x.  How much pain will it
> cause you to get upgraded?
>
> Also FYI:  Two zookeeper servers is actually LESS fault tolerant than
> only having one, because if either server goes down, quorum is lost.
> You need at least three for fault tolerance.
>
> Thanks,
> Shawn
>

Reply via email to