I haven't dug into why this is happening but it definitely reproduces. I
removed the local requirements (port mapping and such) from the gist you
posted (very helpful). I confirmed this fails locally and on Travis CI.

https://github.com/risdenk/test-solr-start-stop-replica-consistency

I don't even see the first update getting applied from num 10 -> 20. After
the first update there is no more change.

Kevin Risden


On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith <jas2...@cornell.edu> wrote:

> Thanks Erick, this is 7.5.0.
> ________________________________
> From: Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday, October 31, 2018 8:20:18 PM
> To: solr-user
> Subject: Re: SolrCloud Replication Failure
>
> What version of solr? This code was pretty much rewriten in 7.3 IIRC
>
> On Wed, Oct 31, 2018, 10:47 Jeremy Smith <jas2...@cornell.edu wrote:
>
> > Hi all,
> >
> >      We are currently running a moderately large instance of standalone
> > solr and are preparing to switch to solr cloud to help us scale up.  I
> have
> > been running a number of tests using docker locally and ran into an issue
> > where replication is consistently failing.  I have pared down the test
> case
> > as minimally as I could.  Here's a link for the docker-compose.yml (I put
> > it in a directory called solrcloud_simple) and a script to run the test:
> >
> >
> > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> >
> >
> > Here's the basic idea behind the test:
> >
> >
> > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> > replicas (each node gets a replica).  Just use the default schema,
> although
> > I've also tried our schema and got the same result.
> >
> >
> > 2) Shut down solr-2
> >
> >
> > 3) Add 100 simple docs, just id and a field called num.
> >
> >
> > 4) Start solr-2 and check that it received the documents.  It did!
> >
> >
> > 5) Update a document, commit, and check that solr-2 received the update.
> > It did!
> >
> >
> > 6) Stop solr-2, update the same document, start solr-2, and make sure
> that
> > it received the update.  It did!
> >
> >
> > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to what
> > it had in step 5.
> >
> >
> > I believe the main issue comes from this in the logs:
> >
> >
> > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
> are
> > newer. ourHighThreshold=1615861330901729280
> > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> > otherHighest=1615861335081353216
> >
> > PeerSync thinks the versions on solr-2 are newer for some reason, so it
> > doesn't try to sync from solr-1.  In the final state, solr-2 will always
> > have a lower version for the updated doc than solr-1.  I've tried this
> with
> > different commit strategies, both auto and manual, and it doesn't seem to
> > make any difference.
> >
> > Is this a bug with solr, an issue with using docker, or am I just
> > expecting too much from solr?
> >
> > Thanks for any insights you may have,
> >
> > Jeremy
> >
> >
> >
>

Reply via email to