Perhaps this is: SOLR-11660?
On Wed, May 2, 2018 at 4:46 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 5/2/2018 3:52 PM, Michael B. Klein wrote: >> It works ALMOST perfectly. The restore operation reports success, and if I >> look at the UI, everything looks great in the Cloud graph view. All green, >> one leader and two other active instances per collection. >> >> But once we start updating, we run into problems. The two NON-leaders in >> each collection get the updates, but the leader never does. Since the >> instances are behind a round robin load balancer, every third query hits an >> out-of-date core, with unfortunate (for our near-real-time indexing >> dependent app) results. > > That is completely backwards from what I would expect in a problem > report. The leader coordinates all indexing, so if the two other > replicas are getting the updates, that means that at least part of the > functionality of the leader replica *IS* working. > > Side FYI: Unless you're using preferLocalShards=true, Solr will actually > load balance your load balanced requests. If your external load > balancer sends queries to replica1, replica1 may forward the request to > replica3 because of SolrCloud's own internal load balancing. The > preferLocalShards parameter will keep that from happening *if* the > machine receiving the query has the replicas required to satisfy the query. > >> Reloading the collection doesn't seem to help, but if I use the Collections >> API to DELETEREPLICA the leader of each collection and follow it with an >> ADDREPLICA, everything syncs up (with a new leader) and stays in sync from >> there on out. >> >> I don't know what to look for in my settings or my logs to diagnose or try >> to fix this issue. It only affects collections that have been restored from >> backup. Any suggestions or guidance would be a big help. > > I don't know what to look for in the logs either, but the first thing to > check for is any messages at WARN or ERROR logging levels. These kind > of messages should also show up in the admin UI logging tab, but > recovering the full text of those messages is much easier in the logfile > than the admin UI. > > Have you tried restarting the Solr instances after restoring the > collection? This shouldn't be required, but at this point I'm hoping to > at least get you limping along, even if it requires steps that are > obvious indications of a bug. > > Since you're running 6.6 and 6.x is in maintenance mode, it's not likely > that any bugs revealed will be fixed on 6.x, but maybe we can track it > down and see if it's still a problem in 7.x. How much pain will it > cause you to get upgraded? > > Also FYI: Two zookeeper servers is actually LESS fault tolerant than > only having one, because if either server goes down, quorum is lost. > You need at least three for fault tolerance. > > Thanks, > Shawn >