So this seems like it absolutely needs a JIRA....
On Thu, Nov 1, 2018 at 9:39 AM Kevin Risden <kris...@apache.org> wrote:
>
> I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5 locally
> without docker. I still see the same behavior where the latest updates
> aren't on the replicas. I still don't know what is happening but it happens
> without Docker :(
>
> https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
>
> Kevin Risden
>
>
> On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden <kris...@apache.org> wrote:
>
> > Erick - Yea thats a fair point. Would be interesting to see if this fails
> > without Docker.
> >
> > Kevin Risden
> >
> >
> > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> Kevin:
> >>
> >> You're also using Docker, right? Docker is not "officially" supported
> >> although there's some movement in that direction and if this is only
> >> reproducible in Docker than it's a clue where to look....
> >>
> >> Erick
> >> On Wed, Oct 31, 2018 at 7:24 PM
> >> Kevin Risden
> >> <kris...@apache.org> wrote:
> >> >
> >> > I haven't dug into why this is happening but it definitely reproduces. I
> >> > removed the local requirements (port mapping and such) from the gist you
> >> > posted (very helpful). I confirmed this fails locally and on Travis CI.
> >> >
> >> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> >> >
> >> > I don't even see the first update getting applied from num 10 -> 20.
> >> After
> >> > the first update there is no more change.
> >> >
> >> > Kevin Risden
> >> >
> >> >
> >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith <jas2...@cornell.edu>
> >> wrote:
> >> >
> >> > > Thanks Erick, this is 7.5.0.
> >> > > ________________________________
> >> > > From: Erick Erickson <erickerick...@gmail.com>
> >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> >> > > To: solr-user
> >> > > Subject: Re: SolrCloud Replication Failure
> >> > >
> >> > > What version of solr? This code was pretty much rewriten in 7.3 IIRC
> >> > >
> >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith <jas2...@cornell.edu wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > We are currently running a moderately large instance of
> >> standalone
> >> > > > solr and are preparing to switch to solr cloud to help us scale
> >> up. I
> >> > > have
> >> > > > been running a number of tests using docker locally and ran into an
> >> issue
> >> > > > where replication is consistently failing. I have pared down the
> >> test
> >> > > case
> >> > > > as minimally as I could. Here's a link for the docker-compose.yml
> >> (I put
> >> > > > it in a directory called solrcloud_simple) and a script to run the
> >> test:
> >> > > >
> >> > > >
> >> > > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> >> > > >
> >> > > >
> >> > > > Here's the basic idea behind the test:
> >> > > >
> >> > > >
> >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> >> > > > replicas (each node gets a replica). Just use the default schema,
> >> > > although
> >> > > > I've also tried our schema and got the same result.
> >> > > >
> >> > > >
> >> > > > 2) Shut down solr-2
> >> > > >
> >> > > >
> >> > > > 3) Add 100 simple docs, just id and a field called num.
> >> > > >
> >> > > >
> >> > > > 4) Start solr-2 and check that it received the documents. It did!
> >> > > >
> >> > > >
> >> > > > 5) Update a document, commit, and check that solr-2 received the
> >> update.
> >> > > > It did!
> >> > > >
> >> > > >
> >> > > > 6) Stop solr-2, update the same document, start solr-2, and make
> >> sure
> >> > > that
> >> > > > it received the update. It did!
> >> > > >
> >> > > >
> >> > > > 7) Repeat step 6 with a new value. This time solr-2 reverts back
> >> to what
> >> > > > it had in step 5.
> >> > > >
> >> > > >
> >> > > > I believe the main issue comes from this in the logs:
> >> > > >
> >> > > >
> >> > > > solr-2_1 | 2018-10-31 17:04:26.135 INFO
> >> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> >> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
> >> s:shard1
> >> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> >> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our
> >> versions
> >> > > are
> >> > > > newer. ourHighThreshold=1615861330901729280
> >> > > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> >> > > > otherHighest=1615861335081353216
> >> > > >
> >> > > > PeerSync thinks the versions on solr-2 are newer for some reason,
> >> so it
> >> > > > doesn't try to sync from solr-1. In the final state, solr-2 will
> >> always
> >> > > > have a lower version for the updated doc than solr-1. I've tried
> >> this
> >> > > with
> >> > > > different commit strategies, both auto and manual, and it doesn't
> >> seem to
> >> > > > make any difference.
> >> > > >
> >> > > > Is this a bug with solr, an issue with using docker, or am I just
> >> > > > expecting too much from solr?
> >> > > >
> >> > > > Thanks for any insights you may have,
> >> > > >
> >> > > > Jeremy
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >>
> >