Re: solrcloud Auto-commit doesn't seem reliable

Amrit Sarkar Fri, 23 Mar 2018 06:45:59 -0700

Elaino,

When you say commits not working, the solr logs not printing "commit"
messages? or documents are not appearing when we search.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario <etca...@gmail.com> wrote:

> I'm just catching up on reading solr emails, so forgive me for being late
> to this dance....
>
> I've just gone through a project to enable CDCR on our Solr, and I also
> experienced a small period of time where the commits on the source server
> just seemed to stop.  This was during a period of intense experimentation
> where I was mucking around with configurations, turning CDCR on/off, etc.
> At some point the commits stopped occurring, and it drove me nuts for a
> couple of days - tried everything - restarting Solr, reloading, turned
> buffering on, turned buffering off, etc.  I finally threw up my hands and
> rebooted the server out of desperation (it was a physical Linux box).
> Commits worked fine after that.  I don't know what caused the commits to
> stop, and why re-booting (and not just restarting Solr) caused them to work
> fine.
>
> Wondering if you ever found a solution to your situation?
>
>
>
> On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
>
> > I meant to get back to this sooner.
> >
> > When I say I issued a commit I do issue it as
> collection/update?commit=true
> >
> > The soft commit interval is set to 3000, but I don't have a problem with
> > soft commits ( I think). I was responding
> >
> > I am concerned that some hard commits don't seem to happen, but I think
> > many commits do occur. I'd like suggestions on how to diagnose this, and
> > perhaps an idea of where to look. Typically I believe that issues like
> this
> > are from our configuration.
> >
> > Our indexing job is pretty simple, we send blocks of JSON to
> > <collection>/update/json. We have either re-index the whole collection,
> or
> > just apply updates. Typically we reindex the data once a week and delete
> > any records that are older than the last full index. This does lead to a
> > fair number of deleted records in the index especially if commits fail.
> > Most of our collections are not large between 2 and 3 million records.
> >
> > The collections are hosted in google cloud
> >
> > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> > > bq: But if 3 seconds is aggressive what would be a  good value for soft
> > > commit?
> > >
> > > The usual answer is "as long as you can stand". All top-level caches
> are
> > > invalidated, autowarming is done etc. on each soft commit. That can be
> a
> > > lot of
> > > work and if your users are comfortable with docs not showing up for,
> > > say, 10 minutes
> > > then use 10 minutes. As always "it depends" here, the point is not to
> > > do unnecessary
> > > work if possible.
> > >
> > > bq: If a commit doesn't happen how would there ever be an index merge
> > > that would remove the deleted documents.
> > >
> > > Right, it wouldn't. It's a little more subtle than that though.
> > > Segments on various
> > > replicas will contain different docs, thus the term/doc statistics can
> be
> > > a bit
> > > different between multiple replicas. None of the stats will change
> > > until the commit
> > > though. You might try turning no distributed doc/term stats though.
> > >
> > > Your comments about PULL or TLOG replicas are well taken. However, even
> > > those
> > > won't be absolutely in sync since they'll replicate from the master at
> > > slightly
> > > different times and _could_ get slightly different segments _if_
> > > there's indexing
> > > going on. But let's say you stop indexing. After the next poll
> > > interval all the replicas
> > > will have identical characteristics and will score the docs the same.
> > >
> > > I don't have any signifiant wisdom to offer here, except this is really
> > the
> > > first time I've heard of this behavior. About all I can imagine is
> > > that _somehow_
> > > the soft commit interval is -1. When you say you "issue a commit" I'm
> > > assuming
> > > it's via ....collection/update?commit=true or some such which issues a
> > > hard
> > > commit with openSearcher=true. And it's on a _collection_ basis, right?
> > >
> > > Sorry I can't be more help
> > > Erick
> > >
> > >
> > >
> > >
> > > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <
> webster.ho...@sial.com>
> > > wrote:
> > > > Erick, I am aware of the CDCR buffering problem causing tlog
> retention,
> > > we
> > > > always turn buffering off in our cdcr configurations.
> > > >
> > > > My post was precipitated by seeing that we had uncommitted data in
> > > > collections > 24 hours after it was loaded. The collections I was
> > looking
> > > > at are in our development environment, where we do not use CDCR.
> > However
> > > > I'm pretty sure that I've seen situations in production where commits
> > > were
> > > > also long overdue.
> > > >
> > > > the "autoSoftcommit" was a typo. The soft commit logic seems to be
> > fine,
> > > I
> > > > don't see an issue with data visibility. But if 3 seconds is
> aggressive
> > > > what would be a  good value for soft commit? We have a couple of
> > > > collections that are updated every minute although most of them are
> > > updated
> > > > much less frequently.
> > > >
> > > > My reason for raising this commit issue is that we see problems with
> > the
> > > > relevancy of solrcloud searches, and the NRT replica type. Sometimes
> > the
> > > > results flip where the best hit varies by what replica serviced the
> > > search.
> > > > This is hard to explain to management. Doing an optimized does
> address
> > > the
> > > > problem for a while. I try to avoid optimizing for the reasons you
> and
> > > Sean
> > > > list. If a commit doesn't happen how would there ever be an index
> merge
> > > > that would remove the deleted documents.
> > > >
> > > > The problem with deletes and relevancy don't seem to occur when we
> use
> > > TLOG
> > > > replicas, probably because they don't do their own indexing but get
> > > copies
> > > > from their leader. We are testing them now eventually we may abandon
> > the
> > > > use of NRT replicas for most of our collections.
> > > >
> > > > I am quite concerned about this commit issue. What kinds of things
> > would
> > > > influence whether a commit occurs? One commonality for our systems is
> > > that
> > > > they are hosted in a Google cloud. We have a number of collections
> that
> > > > share configurations, but others that do not. I think commits do
> > happen,
> > > > but I don't trust that autoCommit is reliable. What can we do to make
> > it
> > > > reliable?
> > > >
> > > > Most of our collections are reindexed weekly with partial updates
> > applied
> > > > daily, that at least is what happens in production, our development
> > > clouds
> > > > are not as regular.
> > > >
> > > > Our solr startup script sets the following values:
> > > > -Dsolr.autoCommit.maxDocs=35000
> > > > -Dsolr.autoCommit.maxTime=60000
> > > > -Dsolr.autoSoftCommit.maxTime=3000
> > > >
> > > > I don't think we reference  solr.autoCommit.maxDocs in our
> > solrconfig.xml
> > > > files.
> > > >
> > > > here are our settings for autoCommit and autoSoftCommit
> > > >
> > > > We had a lot of issues with missing commits when we didn't set
> > > > solr.autoCommit.maxTime
> > > >      <autoCommit>
> > > >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> > > >        <openSearcher>false</openSearcher>
> > > >     </autoCommit>
> > > >
> > > >      <autoSoftCommit>
> > > >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> > > >      </autoSoftCommit>
> > > >
> > > >
> > > >
> > > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey <apa...@elyograg.org>
> > > wrote:
> > > >
> > > >> On 2/9/2018 9:29 AM, Webster Homer wrote:
> > > >>
> > > >>> A little more background. Our production Solrclouds are populated
> via
> > > >>> CDCR,
> > > >>> CDCR does not replicate commits, Commits to the target clouds
> happen
> > > via
> > > >>> autoCommit settings
> > > >>>
> > > >>> We see relvancy scores get inconsistent when there are too many
> > deletes
> > > >>> which seems to happen when hard commits don't happen.
> > > >>>
> > > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer <
> > webster.ho...@sial.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the
> > visibility
> > > of
> > > >>>> the records that is my primary concern. I am concerned about is
> the
> > > >>>> accumulation of uncommitted tlog files and the larger number of
> > > deleted
> > > >>>> documents.
> > > >>>>
> > > >>>
> > > >> For the deleted documents:  Have you ever done an optimize on the
> > > >> collection?  If so, you're going to need to re-do the optimize
> > > regularly to
> > > >> keep deleted documents from growing out of control.  See this issue
> > for
> > > a
> > > >> very technical discussion about it:
> > > >>
> > > >> https://issues.apache.org/jira/browse/LUCENE-7976
> > > >>
> > > >> Deleted documents probably aren't really related to what we've been
> > > >> discussing.  That shouldn't really be strongly affected by commit
> > > settings.
> > > >>
> > > >> -----
> > > >>
> > > >> A 3 second autoSoftCommit is VERY aggressive.   If your soft commits
> > are
> > > >> taking longer than 3 seconds to complete, which is often what
> happens,
> > > then
> > > >> that will lead to problems.  I wouldn't expect it to cause the kinds
> > of
> > > >> problems you describe, though.  It would manifest as Solr working
> too
> > > hard,
> > > >> logging warnings or errors, and changes taking too long to show up.
> > > >>
> > > >> Assuming that the config for autoSoftCommit doesn't have the typo
> that
> > > >> Erick mentioned.
> > > >>
> > > >> ----
> > > >>
> > > >> I have never used CDCR, so I know very little about it.  But I have
> > seen
> > > >> reports on this mailing list saying that transaction logs never get
> > > deleted
> > > >> when CDCR is configured.
> > > >>
> > > >> Below is a link to a mailing list discussion related to CDCR not
> > > deleting
> > > >> transaction logs.  Looks like for it to work right a buffer needs to
> > be
> > > >> disabled, and there may also be problems caused by not having a
> > complete
> > > >> zkHost string in the CDCR config:
> > > >>
> > > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-
> > > >> the-transaction-log-files-td4345062.html
> > > >>
> > > >> Erick also mentioned this.
> > > >>
> > > >> Thanks,
> > > >> Shawn
> > > >>
> > > >
> > > > --
> > > >
> > > >
> > > > This message and any attachment are confidential and may be
> privileged
> > or
> > > > otherwise protected from disclosure. If you are not the intended
> > > recipient,
> > > > you must not copy this message or attachment or disclose the contents
> > to
> > > > any other person. If you have received this transmission in error,
> > please
> > > > notify the sender immediately and delete the message and any
> attachment
> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not accept liability for any omissions or errors in
> > this
> > > > message which may arise as a result of E-Mail-transmission or for
> > damages
> > > > resulting from any unauthorized changes of the content of this
> message
> > > and
> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not guarantee that this message is free of viruses
> and
> > > does
> > > > not accept liability for any damages caused by any virus transmitted
> > > > therewith.
> > > >
> > > > Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> > > > Spanish and Portuguese versions of this disclaimer.
> > >
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>

Re: solrcloud Auto-commit doesn't seem reliable

Reply via email to