So I just added PRs 5.5, 6.6, 7.1, 7.2, 7.3, 7.4, and 7.5. They all seem to
have the exact same behavior... I don't have much more insight here but it
doesn't seem to be correct.

Kevin Risden


On Thu, Nov 1, 2018 at 9:45 AM Kevin Risden <kris...@apache.org> wrote:

> Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr
> version from latest back to  earlier 7.x versions. See which version the
> problem was introduced in.
>
> Kevin Risden
>
>
> On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith <jas2...@cornell.edu> wrote:
>
>> Thanks so much for looking into this and cleaning up my code.
>>
>>
>> I added a pull request to show some additional strange behavior.  If we
>> restart solr-1, making solr-2 the leader, the out of date value of [10]
>> gets propagated back to solr-1.  Perhaps this will give a hint as to what
>> is going on.
>>
>> ________________________________
>> From:
>> Kevin Risden
>> <kris...@apache.org>
>> Sent: Wednesday, October 31, 2018 10:24:24 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud Replication Failure
>>
>> I haven't dug into why this is happening but it definitely reproduces. I
>> removed the local requirements (port mapping and such) from the gist you
>> posted (very helpful). I confirmed this fails locally and on Travis CI.
>>
>> https://github.com/risdenk/test-solr-start-stop-replica-consistency
>>
>> I don't even see the first update getting applied from num 10 -> 20. After
>> the first update there is no more change.
>>
>> Kevin Risden
>>
>>
>> On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith <jas2...@cornell.edu> wrote:
>>
>> > Thanks Erick, this is 7.5.0.
>> > ________________________________
>> > From: Erick Erickson <erickerick...@gmail.com>
>> > Sent: Wednesday, October 31, 2018 8:20:18 PM
>> > To: solr-user
>> > Subject: Re: SolrCloud Replication Failure
>> >
>> > What version of solr? This code was pretty much rewriten in 7.3 IIRC
>> >
>> > On Wed, Oct 31, 2018, 10:47 Jeremy Smith <jas2...@cornell.edu wrote:
>> >
>> > > Hi all,
>> > >
>> > >      We are currently running a moderately large instance of
>> standalone
>> > > solr and are preparing to switch to solr cloud to help us scale up.  I
>> > have
>> > > been running a number of tests using docker locally and ran into an
>> issue
>> > > where replication is consistently failing.  I have pared down the test
>> > case
>> > > as minimally as I could.  Here's a link for the docker-compose.yml (I
>> put
>> > > it in a directory called solrcloud_simple) and a script to run the
>> test:
>> > >
>> > >
>> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
>> > >
>> > >
>> > > Here's the basic idea behind the test:
>> > >
>> > >
>> > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
>> > > replicas (each node gets a replica).  Just use the default schema,
>> > although
>> > > I've also tried our schema and got the same result.
>> > >
>> > >
>> > > 2) Shut down solr-2
>> > >
>> > >
>> > > 3) Add 100 simple docs, just id and a field called num.
>> > >
>> > >
>> > > 4) Start solr-2 and check that it received the documents.  It did!
>> > >
>> > >
>> > > 5) Update a document, commit, and check that solr-2 received the
>> update.
>> > > It did!
>> > >
>> > >
>> > > 6) Stop solr-2, update the same document, start solr-2, and make sure
>> > that
>> > > it received the update.  It did!
>> > >
>> > >
>> > > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to
>> what
>> > > it had in step 5.
>> > >
>> > >
>> > > I believe the main issue comes from this in the logs:
>> > >
>> > >
>> > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
>> > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
>> > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
>> s:shard1
>> > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
>> > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
>> > are
>> > > newer. ourHighThreshold=1615861330901729280
>> > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
>> > > otherHighest=1615861335081353216
>> > >
>> > > PeerSync thinks the versions on solr-2 are newer for some reason, so
>> it
>> > > doesn't try to sync from solr-1.  In the final state, solr-2 will
>> always
>> > > have a lower version for the updated doc than solr-1.  I've tried this
>> > with
>> > > different commit strategies, both auto and manual, and it doesn't
>> seem to
>> > > make any difference.
>> > >
>> > > Is this a bug with solr, an issue with using docker, or am I just
>> > > expecting too much from solr?
>> > >
>> > > Thanks for any insights you may have,
>> > >
>> > > Jeremy
>> > >
>> > >
>> > >
>> >
>>
>

Reply via email to