Re: solrcloud replicas not in sync

Erick Erickson Wed, 24 May 2017 08:23:21 -0700

I wouldn't rely on the "current" flag in the admin UI as an indicator.
As long as your numDocs and the like match I'd say it's a UI issue.


Best,
Erick

On Wed, May 24, 2017 at 8:15 AM, Webster Homer <webster.ho...@sial.com> wrote:
> We see data in the target clusters. CDCR replication is working. We first
> noticed the current=false flag on the target replicas, but since I started
> looking I see it on the source too.
>
>
> I have removed the IgnoreCommitOptimizeUpdateProcessorFactory from our
> update processor chain, I did two data loads to different collections.
> These collections are part of our development system, they are not
> configured to use cdcr they are directly loaded by our data load. The ETL
> to our solrs use the /update/json request handler and does not send
> commits. These collections mirror our production collections and have 2
> shards with 2 replicas. I see the situation where the replicas are marked
> current=false which should not happen if autoCommit was working correctly.
> The last load was yesterday at 5pm and I didn't check until this morning
> where I found bb-catalog-material_shard1_replica1 (the leader) was not
> current, but the other was. The last modified date on the leader was
> 2017-05-23T22:44:54.618Z.
>
> My modified autoCommit:
>       <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
>      <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
>      </autoSoftCommit>
>
> The last indexed record from a search matches up with the above time. For
> this test,the numDocs are the same between the two replicas. I think the
> soft commit is working. Why wouldn't both replicas be current after so many
> hours?
> We are using solr 6.2 fyi. I expect to upgrade to solr 6.6 when it becomes
> available
>
> Thanks,
> Webster
>
> On Tue, May 23, 2017 at 12:52 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> This is all quite strange. Optimize (BTW, it's rarely
>> necessary/desirable on an index that changes, despite its name)
>> shouldn't matter here. CDCR forwards the raw documents to the target
>> cluster.
>>
>> Ample time indeed. With a soft commit of 15 seconds, that's your
>> window (with some slop for how long CDCR takes).
>>
>> If you do a search and sort by your timestamp descending, what do you
>> see on the target cluster? And when you are indexing and CDCR is
>> running, your target cluster solr logs should show updates coming in.
>> Mostly checking if the data is even getting to the target cluster
>> here.
>>
>> Also check the tlogs on the source cluster. By "check" here I just
>> mean "are they reasonable size", and "reasonable" should be very
>> small. The tlogs are the "queue" that CDCR uses to store docs before
>> forwarding to the target cluster, so this is just a sanity check. If
>> they're huge, then CDCR is not forwarding anything to the target
>> cluster.
>>
>> It's also vaguely possible that
>> IgnoreCommitOptimizeUpdateProcessorFactory is interfering, if so it's
>> a bug and should be reported as a JIRA. If you remove that on the
>> target cluster, does the behavior change?
>>
>> I'm mystified here as you can tell.
>>
>> Best,
>> Erick
>>
>> On Tue, May 23, 2017 at 10:12 AM, Webster Homer <webster.ho...@sial.com>
>> wrote:
>> > We see a pretty consistent issue where the replicas show in the admin
>> > console as not current, indicating that our auto commit isn't commiting.
>> In
>> > one case we loaded the data to the source, cdcr replicated it to the
>> > targets and we see the source and the target as having current = false.
>> It
>> > is searchable so the soft commits are happening. We turned off data
>> loading
>> > to investigate this issue, and the replicas are still not current after 3
>> > days. So there should have been ample time to catch up.
>> > This is our autoCommit
>> >      <autoCommit>
>> >        <maxDocs>25000</maxDocs>
>> >        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>> >        <openSearcher>false</openSearcher>
>> >      </autoCommit>
>> >
>> > This is our autoSoftCommit
>> >      <autoSoftCommit>
>> >        <maxTime>${solr.autoSoftCommit.maxTime:15000}</maxTime>
>> >      </autoSoftCommit>
>> > neither property, solr.autoCommit.maxTime or solr.autoSoftCommit.maxTime
>> > are set.
>> >
>> > We also have an updateChain that calls the
>> > solr.IgnoreCommitOptimizeUpdateProcessorFactory to ignore client
>> commits.
>> > Could that be the cause of our
>> >       <updateRequestProcessorChain name="cleanup">
>> >      <!-- Ignore commits from clients, telling them all's OK -->
>> >        <processor class="solr.IgnoreCommitOptimizeUpdateProc
>> essorFactory">
>> >          <int name="statusCode">200</int>
>> >        </processor>
>> >
>> >        <processor class="TrimFieldUpdateProcessorFactory" />
>> >        <processor class="RemoveBlankFieldUpdateProcessorFactory" />
>> >
>> >        <processor class="solr.LogUpdateProcessorFactory" />
>> >        <processor class="solr.RunUpdateProcessorFactory" />
>> >      </updateRequestProcessorChain>
>> >
>> > We did create a date field to all our collections that defaults to NOW
>> so I
>> > can see that no new data was added, but the replicas don't seem to get
>> the
>> > commit. I assume this is something in our configuration (see above).
>> >
>> > Is there a way to determine when the last commit occurred?
>> >
>> > I believe that the one replica got out of sync due to an admin running an
>> > optimize while cdcr was still running.
>> > That was one collection, but it looks like we are missing commits on most
>> > of our collections.
>> >
>> > Any help would be greatly appreciated!
>> >
>> > Thanks,
>> > Webster Homer
>> >
>> > On Mon, May 22, 2017 at 4:12 PM, Erick Erickson <erickerick...@gmail.com
>> >
>> > wrote:
>> >
>> >> You can ping individual replicas by addressing to a specific replica
>> >> and setting distrib=false, something like
>> >>
>> >>      http://SOLR_NODE:port/solr/collection1_shard1_replica1/
>> >> query?distrib=false&q=......
>> >>
>> >> But one thing to check first is that you've committed. I'd:
>> >> 1> turn off indexing on the source cluster.
>> >> 2> wait until the CDCR had caught up (if necessary).
>> >> 3> issue a hard commit on the target
>> >> 4> _then_ see if the counts were what is expected.
>> >>
>> >> Due to the fact that autocommit settings can fire at different clock
>> >> times even for replicas on the same shard, it's easier to track
>> >> whether it's a transient issue. The other thing I've seen people do is
>> >> have a timestamp on the docs set to NOW (there's an update processor
>> >> that can do this). Then when you check for consistency you can use
>> >> fq=timestamp:[* TO NOW - (some interval significantly longer than your
>> >> autocommit interval)].
>> >>
>> >> bq: Is there a way to recover when a shard has inconsistent replicas.
>> >> If I use the delete replica API call to delete one of them and then use
>> add
>> >> replica to create it from scratch will it auto-populate from the other
>> >> replica in the shard?
>> >>
>> >> Yes. Whenever you ADDREPLICA it'll catch itself up from the leader
>> >> before becoming active. It'll have to copy the _entire_ index from the
>> >> leader, so you'll see network traffic spike.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, May 22, 2017 at 1:41 PM, Webster Homer <webster.ho...@sial.com>
>> >> wrote:
>> >> > I have a solrcloud collection with 2 shards and 4 replicas. The
>> replicas
>> >> > for shard 1 have different numbers of records, so different queries
>> will
>> >> > return different numbers of records.
>> >> >
>> >> > I am not certain how this occurred, it happened in a collection that
>> was
>> >> a
>> >> > cdcr target.
>> >> >
>> >> > Is there a way to limit a search to a specific replica of a shard? We
>> >> want
>> >> > to understand the differences
>> >> >
>> >> > Is there a way to recover when a shard has inconsistent replicas.
>> >> > If I use the delete replica API call to delete one of them and then
>> use
>> >> add
>> >> > replica to create it from scratch will it auto-populate from the other
>> >> > replica in the shard?
>> >> >
>> >> > Thanks,
>> >> > Webster
>> >> >
>> >> > --
>> >> >
>> >> >
>> >> > This message and any attachment are confidential and may be
>> privileged or
>> >> > otherwise protected from disclosure. If you are not the intended
>> >> recipient,
>> >> > you must not copy this message or attachment or disclose the contents
>> to
>> >> > any other person. If you have received this transmission in error,
>> please
>> >> > notify the sender immediately and delete the message and any
>> attachment
>> >> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> >> > subsidiaries do not accept liability for any omissions or errors in
>> this
>> >> > message which may arise as a result of E-Mail-transmission or for
>> damages
>> >> > resulting from any unauthorized changes of the content of this message
>> >> and
>> >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> >> > subsidiaries do not guarantee that this message is free of viruses and
>> >> does
>> >> > not accept liability for any damages caused by any virus transmitted
>> >> > therewith.
>> >> >
>> >> > Click http://www.emdgroup.com/disclaimer to access the German,
>> French,
>> >> > Spanish and Portuguese versions of this disclaimer.
>> >>
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Re: solrcloud replicas not in sync

Reply via email to