[ https://issues.apache.org/jira/browse/SOLR-14262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040026#comment-17040026 ]
Michael edited comment on SOLR-14262 at 2/19/20 1:26 PM: --------------------------------------------------------- We tested on Solr Cloud 7.7.1 and 8.4.1 - but i am sure that all versions are affected likewise. Some possibly related issues: Open: https://issues.apache.org/jira/browse/SOLR-3888 - "need beter handling of external add/commit requests during tlog recovery" Closed: https://issues.apache.org/jira/browse/SOLR-12011 https://issues.apache.org/jira/browse/SOLR-9366 was (Author: michaelf): Some possibly related issues: Open: https://issues.apache.org/jira/browse/SOLR-3888 - "need beter handling of external add/commit requests during tlog recovery" Closed: https://issues.apache.org/jira/browse/SOLR-12011 https://issues.apache.org/jira/browse/SOLR-9366 > local commit is (silently - no rf support) ignored during replay > ---------------------------------------------------------------- > > Key: SOLR-14262 > URL: https://issues.apache.org/jira/browse/SOLR-14262 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > > Summarizing an issue discovered by Michael Frank and reported to the > solr-user mailing list in this thread... > [https://lists.apache.org/thread.html/%3ccaggv7soucsbhm4+cnhvvtrjxtzbbvpnaxsy-7vsksfpar_a...@mail.gmail.com%3E] > Situation: > * chaos testing of add+commit while randomly bringing nodes up/down > * test client checks rf of every add > ** commit does not support rf > * after adding a doc (and confirming expected rf) + commiting, it's possible > to issue a search that gets back a "stale" version of the doc > Analysis by Michael... > {quote} > We traced the problem down to DistributedUpdateProcessor.doLocalCommit() > which is *silently* dropping all commits while the replica is currently > inactive and replaying, imeadiatly returns and still reports status=0. > ... > The issue we have is the "silent" part. If upon recieving a commit request > the replica > * would either wait to become healthy and and then commit and return, > honoring waitSearcher=true (which is what we expected from reading the > documentation) > * or at least behave consistently the same way as all other > UpdateRequests and report back the achieved replication factor with the > "rf" response parameter > we could easily detect the degraded cluster state in the client and keep > re-trying the commit till "rf" matches the number of replicas. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org