Re: Indexing hangs when more than 1 server in a cluster

Kevin Osborn Wed, 14 Aug 2013 09:52:50 -0700

Thanks so much for your help and for the explanations. Eventually, we will
be doing several batches in parallel. But at least now I know where to look
and can do some testing on various scenarios.


Since we may be doing a lot of heavy uploading (while still doing a lot of
queries), having a autoCommit interval shorter than the softAutoCommit
internal does sound interesting and I will test it out. And then just
disable softCommit on my batch uploads.

Either way, I at least know where to focus my efforts.

-Kevin



On Wed, Aug 14, 2013 at 6:27 AM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Kevin,
>
> I wouldn't have considered using softCommits at all based on what I
> understand from your use case.  You appear to be loading in large batches,
> and softCommits are better aligned to NRT search where there is a steady
> stream of smaller updates that need to be available immediately.
>
> As Erick pointed out, soft commits are all about avoiding constant
> reopening of the index searcher…where by constant we mean every few
> seconds.  Provided you can wait until your batch is completed, and that
> frequency is roughly a minute or more, you likely will find an
> old-fashioned hard commit (with openSearcher="true") will work just fine
> (YMMV).
>
> Jason
>
>
>
> On Aug 14, 2013, at 4:51 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > right, SOLR-5081 is possible but somewhat unlikely
> > given the fact that you actually don't have very many
> > nodes in your cluster.
> >
> > soft commits aren't relevant to the tlog, but here's
> > the thing. Your tlogs may get replayed
> > when you restart solr. If they're large, this may take
> > a long time. When you said you restarted Solr after
> > killing it, you might have triggered this.
> >
> > The way to keep tlogs small is to hard commit more
> > frequently (you should look at their size before
> > worrying about it though!). If you set openSearcher=false,
> > this is pretty inexpensive, all it really does is close
> > the current segment files, open new ones, and start a new
> > tlog file. It does _not_ invalidate caches, do autowarming,
> > all that expensive stuff.
> >
> > Your soft commit does _not_ improve performance! It is
> > just "less expensive" than a hard commit with
> > openSearcher=true. It _does_ invalidate caches, fire
> > off autowarming, etc. So it does "improve performance"
> > over doing hard commits with openSearcher=true
> > with the same frequency, but it still isn't free. It's still
> > good to have the soft commit interval as long as you
> > can tolerate.
> >
> > It's perfectly reasonable to have a hard commit interval
> > that's much shorter than your soft commit interval. As
> > Yonik explained once, "soft commits are about visibility
> > but hard commits are about durability".
> >
> > Best
> > Erick
> >
> >
> > On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <kevin.osb...@cbsi.com>
> wrote:
> >
> >> Interesting, that did work. Do you or anyone else have any ideas or
> what I
> >> should look at? While soft commit is not a requirement in my project, my
> >> understanding is that it should help performance. On the same index, I
> will
> >> be doing both a large number of queries as well as updates.
> >>
> >> If I have to disable autoCommit, should I increase the chunk size?
> >>
> >> Of course, I will have to run a more large scale test tomorrow, but I
> saw
> >> this problem fairly consistently in my smaller test.
> >>
> >> In a previous experiment, I applied the SOLR-4816 patch that someone
> >> indicated might help. I also reduced the CSV upload chunk size to 500.
> It
> >> seemed like things got a little better, but still eventually hung.
> >>
> >> I also see SOLR-5081, but I don't know if that is my issue or not. At
> least
> >> in my test, the index writes are not parallel as in the ticket.
> >>
> >> -Kevin
> >>
> >>
> >> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
> >> jhell...@innoventsolutions.com> wrote:
> >>
> >>> While I don't have a past history of this issue to use as reference,
> if I
> >>> were in your shoes I would consider trying your updates with softCommit
> >>> disabled.  My suspicion is you're experiencing some issue with the
> >>> transaction logging and how it's managed when your hard commit occurs.
> >>>
> >>> If you can give that a try and let us know how that fares we might have
> >>> some further input to share.
> >>>
> >>>
> >>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <kevin.osb...@cbsi.com>
> >> wrote:
> >>>
> >>>> I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> >>> have
> >>>> 2 servers and 3 collections. Collection1 is 1 shard and the
> Collection2
> >>> and
> >>>> Collection3 both have 2 shards. Both servers are identical.
> >>>>
> >>>> So, here is my process, I do a lot of queries on Collection1 and
> >>>> Collection2. I then do a bunch of inserts into Collection3. I am doing
> >>> CSV
> >>>> uploads. I am also doing custom shard routing. All the products in a
> >>> single
> >>>> upload will have the same shard key. All Solr interaction is through
> >>> SolrJ
> >>>> with full Zookeeper awareness. My uploads are also using soft commits.
> >>>>
> >>>> I tried this on a record set of 936 products. Everything worked fine.
> I
> >>>> then sent over a record set of 300k products. The upload into
> >> Collection3
> >>>> is chunked. I tried both 1000 and 200,000 with similar results. The
> >> first
> >>>> upload to Solr would just hang. There would simply be no response from
> >>>> Solr. A few of the products from this request would make it into the
> >>> index,
> >>>> but not many.
> >>>>
> >>>> In this state, queries continued to work, but deletes did not.
> >>>>
> >>>> My only solution was to kill each Solr process.
> >>>>
> >>>> As an experiment, I did the large catalog first. First, I reset
> >>> everything.
> >>>> With A chunk size of 1000, about 110,000 out of 300,000 records made
> it
> >>>> into Solr before the process hung. Again, queries worked, but deletes
> >> did
> >>>> not and I had to kill Solr. It hung after about 30 seconds.
> >> Timing-wise,
> >>>> this is at about the second autocommit cycle, given the default
> >>> autocommit
> >>>> of 15 seconds. I am not sure if this is related or not.
> >>>>
> >>>> As an additional experiment, I ran the entire test with just a single
> >>> node
> >>>> in the cluster. This time, everything ran fine.
> >>>>
> >>>> Does anyone have any ideas? Everything is pretty default. These
> servers
> >>> are
> >>>> Azure VMs, although I have seen similar behavior running two Solr
> >>> instances
> >>>> on a single internal server as well.
> >>>>
> >>>> I had also noticed similar behavior before with Solr 4.3. It
> definitely
> >>> has
> >>>> something do with the clustering, but I am not sure what. And I don't
> >> see
> >>>> any error message (or really anything else) in the Solr logs.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> --
> >>>> *KEVIN OSBORN*
> >>>> LEAD SOFTWARE ENGINEER
> >>>> CNET Content Solutions
> >>>> OFFICE 949.399.8714
> >>>> CELL 949.310.4677      SKYPE osbornk
> >>>> 5 Park Plaza, Suite 600, Irvine, CA 92614
> >>>> [image: CNET Content Solutions]
> >>>
> >>>
> >>
> >>
> >> --
> >> *KEVIN OSBORN*
> >> LEAD SOFTWARE ENGINEER
> >> CNET Content Solutions
> >> OFFICE 949.399.8714
> >> CELL 949.310.4677      SKYPE osbornk
> >> 5 Park Plaza, Suite 600, Irvine, CA 92614
> >> [image: CNET Content Solutions]
> >>
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Re: Indexing hangs when more than 1 server in a cluster

Reply via email to