Re: Indexing hangs when more than 1 server in a cluster

Erick Erickson Wed, 14 Aug 2013 04:52:41 -0700

right, SOLR-5081 is possible but somewhat unlikely
given the fact that you actually don't have very many
nodes in your cluster.


soft commits aren't relevant to the tlog, but here's
the thing. Your tlogs may get replayed
when you restart solr. If they're large, this may take
a long time. When you said you restarted Solr after
killing it, you might have triggered this.

The way to keep tlogs small is to hard commit more
frequently (you should look at their size before
worrying about it though!). If you set openSearcher=false,
this is pretty inexpensive, all it really does is close
the current segment files, open new ones, and start a new
tlog file. It does _not_ invalidate caches, do autowarming,
all that expensive stuff.

Your soft commit does _not_ improve performance! It is
just "less expensive" than a hard commit with
openSearcher=true. It _does_ invalidate caches, fire
off autowarming, etc. So it does "improve performance"
over doing hard commits with openSearcher=true
with the same frequency, but it still isn't free. It's still
good to have the soft commit interval as long as you
can tolerate.

It's perfectly reasonable to have a hard commit interval
that's much shorter than your soft commit interval. As
Yonik explained once, "soft commits are about visibility
but hard commits are about durability".

Best
Erick


On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <kevin.osb...@cbsi.com> wrote:

> Interesting, that did work. Do you or anyone else have any ideas or what I
> should look at? While soft commit is not a requirement in my project, my
> understanding is that it should help performance. On the same index, I will
> be doing both a large number of queries as well as updates.
>
> If I have to disable autoCommit, should I increase the chunk size?
>
> Of course, I will have to run a more large scale test tomorrow, but I saw
> this problem fairly consistently in my smaller test.
>
> In a previous experiment, I applied the SOLR-4816 patch that someone
> indicated might help. I also reduced the CSV upload chunk size to 500. It
> seemed like things got a little better, but still eventually hung.
>
> I also see SOLR-5081, but I don't know if that is my issue or not. At least
> in my test, the index writes are not parallel as in the ticket.
>
> -Kevin
>
>
> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
> > While I don't have a past history of this issue to use as reference, if I
> > were in your shoes I would consider trying your updates with softCommit
> > disabled.  My suspicion is you're experiencing some issue with the
> > transaction logging and how it's managed when your hard commit occurs.
> >
> > If you can give that a try and let us know how that fares we might have
> > some further input to share.
> >
> >
> > On Aug 13, 2013, at 11:54 AM, Kevin Osborn <kevin.osb...@cbsi.com>
> wrote:
> >
> > > I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> > have
> > > 2 servers and 3 collections. Collection1 is 1 shard and the Collection2
> > and
> > > Collection3 both have 2 shards. Both servers are identical.
> > >
> > > So, here is my process, I do a lot of queries on Collection1 and
> > > Collection2. I then do a bunch of inserts into Collection3. I am doing
> > CSV
> > > uploads. I am also doing custom shard routing. All the products in a
> > single
> > > upload will have the same shard key. All Solr interaction is through
> > SolrJ
> > > with full Zookeeper awareness. My uploads are also using soft commits.
> > >
> > > I tried this on a record set of 936 products. Everything worked fine. I
> > > then sent over a record set of 300k products. The upload into
> Collection3
> > > is chunked. I tried both 1000 and 200,000 with similar results. The
> first
> > > upload to Solr would just hang. There would simply be no response from
> > > Solr. A few of the products from this request would make it into the
> > index,
> > > but not many.
> > >
> > > In this state, queries continued to work, but deletes did not.
> > >
> > > My only solution was to kill each Solr process.
> > >
> > > As an experiment, I did the large catalog first. First, I reset
> > everything.
> > > With A chunk size of 1000, about 110,000 out of 300,000 records made it
> > > into Solr before the process hung. Again, queries worked, but deletes
> did
> > > not and I had to kill Solr. It hung after about 30 seconds.
> Timing-wise,
> > > this is at about the second autocommit cycle, given the default
> > autocommit
> > > of 15 seconds. I am not sure if this is related or not.
> > >
> > > As an additional experiment, I ran the entire test with just a single
> > node
> > > in the cluster. This time, everything ran fine.
> > >
> > > Does anyone have any ideas? Everything is pretty default. These servers
> > are
> > > Azure VMs, although I have seen similar behavior running two Solr
> > instances
> > > on a single internal server as well.
> > >
> > > I had also noticed similar behavior before with Solr 4.3. It definitely
> > has
> > > something do with the clustering, but I am not sure what. And I don't
> see
> > > any error message (or really anything else) in the Solr logs.
> > >
> > > Thanks.
> > >
> > > --
> > > *KEVIN OSBORN*
> > > LEAD SOFTWARE ENGINEER
> > > CNET Content Solutions
> > > OFFICE 949.399.8714
> > > CELL 949.310.4677      SKYPE osbornk
> > > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > > [image: CNET Content Solutions]
> >
> >
>
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677      SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]
>

Re: Indexing hangs when more than 1 server in a cluster

Reply via email to