So I just added PRs 5.5, 6.6, 7.1, 7.2, 7.3, 7.4, and 7.5. They all seem to have the exact same behavior... I don't have much more insight here but it doesn't seem to be correct.
Kevin Risden On Thu, Nov 1, 2018 at 9:45 AM Kevin Risden <kris...@apache.org> wrote: > Ahhh your PR triggered an idea. I'll open a few PRs adjusting the Solr > version from latest back to earlier 7.x versions. See which version the > problem was introduced in. > > Kevin Risden > > > On Thu, Nov 1, 2018 at 9:17 AM Jeremy Smith <jas2...@cornell.edu> wrote: > >> Thanks so much for looking into this and cleaning up my code. >> >> >> I added a pull request to show some additional strange behavior. If we >> restart solr-1, making solr-2 the leader, the out of date value of [10] >> gets propagated back to solr-1. Perhaps this will give a hint as to what >> is going on. >> >> ________________________________ >> From: >> Kevin Risden >> <kris...@apache.org> >> Sent: Wednesday, October 31, 2018 10:24:24 PM >> To: solr-user@lucene.apache.org >> Subject: Re: SolrCloud Replication Failure >> >> I haven't dug into why this is happening but it definitely reproduces. I >> removed the local requirements (port mapping and such) from the gist you >> posted (very helpful). I confirmed this fails locally and on Travis CI. >> >> https://github.com/risdenk/test-solr-start-stop-replica-consistency >> >> I don't even see the first update getting applied from num 10 -> 20. After >> the first update there is no more change. >> >> Kevin Risden >> >> >> On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith <jas2...@cornell.edu> wrote: >> >> > Thanks Erick, this is 7.5.0. >> > ________________________________ >> > From: Erick Erickson <erickerick...@gmail.com> >> > Sent: Wednesday, October 31, 2018 8:20:18 PM >> > To: solr-user >> > Subject: Re: SolrCloud Replication Failure >> > >> > What version of solr? This code was pretty much rewriten in 7.3 IIRC >> > >> > On Wed, Oct 31, 2018, 10:47 Jeremy Smith <jas2...@cornell.edu wrote: >> > >> > > Hi all, >> > > >> > > We are currently running a moderately large instance of >> standalone >> > > solr and are preparing to switch to solr cloud to help us scale up. I >> > have >> > > been running a number of tests using docker locally and ran into an >> issue >> > > where replication is consistently failing. I have pared down the test >> > case >> > > as minimally as I could. Here's a link for the docker-compose.yml (I >> put >> > > it in a directory called solrcloud_simple) and a script to run the >> test: >> > > >> > > >> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489 >> > > >> > > >> > > Here's the basic idea behind the test: >> > > >> > > >> > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 >> > > replicas (each node gets a replica). Just use the default schema, >> > although >> > > I've also tried our schema and got the same result. >> > > >> > > >> > > 2) Shut down solr-2 >> > > >> > > >> > > 3) Add 100 simple docs, just id and a field called num. >> > > >> > > >> > > 4) Start solr-2 and check that it received the documents. It did! >> > > >> > > >> > > 5) Update a document, commit, and check that solr-2 received the >> update. >> > > It did! >> > > >> > > >> > > 6) Stop solr-2, update the same document, start solr-2, and make sure >> > that >> > > it received the update. It did! >> > > >> > > >> > > 7) Repeat step 6 with a new value. This time solr-2 reverts back to >> what >> > > it had in step 5. >> > > >> > > >> > > I believe the main issue comes from this in the logs: >> > > >> > > >> > > solr-2_1 | 2018-10-31 17:04:26.135 INFO >> > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr >> > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test >> s:shard1 >> > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: >> > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr Our versions >> > are >> > > newer. ourHighThreshold=1615861330901729280 >> > > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 >> > > otherHighest=1615861335081353216 >> > > >> > > PeerSync thinks the versions on solr-2 are newer for some reason, so >> it >> > > doesn't try to sync from solr-1. In the final state, solr-2 will >> always >> > > have a lower version for the updated doc than solr-1. I've tried this >> > with >> > > different commit strategies, both auto and manual, and it doesn't >> seem to >> > > make any difference. >> > > >> > > Is this a bug with solr, an issue with using docker, or am I just >> > > expecting too much from solr? >> > > >> > > Thanks for any insights you may have, >> > > >> > > Jeremy >> > > >> > > >> > > >> > >> >