Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Vinay Pothnis Wed, 26 Jun 2013 10:38:47 -0700

Thank you Erick!

Will look at all these suggestions.


-Vinay


On Wed, Jun 26, 2013 at 6:37 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Right, unfortunately this is a gremlin lurking in the weeds, see:
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
>
> There are a couple of ways to deal with this:
> 1> go ahead and up the limit and re-compile, if you look at
> SolrCmdDistributor the semaphore is defined there.
>
> 2> https://issues.apache.org/jira/browse/SOLR-4816 should
> address this as well as improve indexing throughput. I'm totally sure
> Joel (the guy working on this) would be thrilled if you were able to
> verify that these two points, I'd ask him (on the JIRA) whether he thinks
> it's ready to test.
>
> 3> Reduce the number of threads you're indexing with
>
> 4> index docs in small packets, perhaps even one and just rack
> together a zillion threads to get throughput.
>
> FWIW,
> Erick
>
> On Tue, Jun 25, 2013 at 8:55 AM, Vinay Pothnis <poth...@gmail.com> wrote:
> > Jason and Scott,
> >
> > Thanks for the replies and pointers!
> > Yes, I will consider the 'maxDocs' value as well. How do i monitor the
> > transaction logs during the interval between commits?
> >
> > Thanks
> > Vinay
> >
> >
> > On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman <
> > jhell...@innoventsolutions.com> wrote:
> >
> >> Scott,
> >>
> >> My comment was meant to be a bit tongue-in-cheek, but my intent in the
> >> statement was to represent hard failure along the lines Vinay is seeing.
> >>  We're talking about OutOfMemoryException conditions, total cluster
> >> paralysis requiring restart, or other similar and disastrous conditions.
> >>
> >> Where that line is is impossible to generically define, but trivial to
> >> accomplish.  What any of us running Solr has to achieve is a realistic
> >> simulation of our desired production load (probably well above peak)
> and to
> >> see what limits are reached.  Armed with that information we tweak.  In
> >> this case, we look at finding the point where data ingestion reaches a
> >> natural limit.  For some that may be JVM GC, for others memory buffer
> size
> >> on the client load, and yet others it may be I/O limits on multithreaded
> >> reads from a database or file system.
> >>
> >> In old Solr days we had a little less to worry about.  We might play
> with
> >> a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
> >> commits and rollback recoveries.  But with 4.x we now have more durable
> >> write options and NRT to consider, and SolrCloud begs to use this.  So
> we
> >> have to consider transaction logs, the file handles they leave open
> until
> >> commit operations occur, and how we want to manage writing to all cores
> >> simultaneously instead of a more narrow master/slave relationship.
> >>
> >> It's all manageable, all predictable (with some load testing) and all
> >> filled with many possibilities to meet our specific needs.  Considering
> hat
> >> each person's data model, ingestion pipeline, request processors, and
> field
> >> analysis steps will be different, 5 threads of input at face value
> doesn't
> >> really contemplate the whole problem.  We have to measure our actual
> data
> >> against our expectations and find where the weak chain links are to
> >> strengthen them.  The symptoms aren't necessarily predictable in
> advance of
> >> this testing, but they're likely addressable and not difficult to
> decipher.
> >>
> >> For what it's worth, SolrCloud is new enough that we're still
> experiencing
> >> some "uncharted territory with unknown ramifications" but with continued
> >> dialog through channels like these there are fewer territories without
> good
> >> cartography :)
> >>
> >> Hope that's of use!
> >>
> >> Jason
> >>
> >>
> >>
> >> On Jun 24, 2013, at 7:12 PM, Scott Lundgren <
> >> scott.lundg...@carbonblack.com> wrote:
> >>
> >> > Jason,
> >> >
> >> > Regarding your statement "push you over the edge"- what does that
> mean?
> >> > Does it mean "uncharted territory with unknown ramifications" or
> >> something
> >> > more like specific, known symptoms?
> >> >
> >> > I ask because our use is similar to Vinay's in some respects, and we
> want
> >> > to be able to push the capabilities of write perf - but not over the
> >> edge!
> >> > In particular, I am interested in knowing the symptoms of failure, to
> >> help
> >> > us troubleshoot the underlying problems if and when they arise.
> >> >
> >> > Thanks,
> >> >
> >> > Scott
> >> >
> >> > On Monday, June 24, 2013, Jason Hellman wrote:
> >> >
> >> >> Vinay,
> >> >>
> >> >> You may wish to pay attention to how many transaction logs are being
> >> >> created along the way to your hard autoCommit, which should truncate
> the
> >> >> open handles for those files.  I might suggest setting a maxDocs
> value
> >> in
> >> >> parallel with your maxTime value (you can use both) to ensure the
> commit
> >> >> occurs at either breakpoint.  30 seconds is plenty of time for 5
> >> parallel
> >> >> processes of 20 document submissions to push you over the edge.
> >> >>
> >> >> Jason
> >> >>
> >> >> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis <poth...@gmail.com>
> wrote:
> >> >>
> >> >>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30
> seconds.
> >> >>>
> >> >>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman <
> >> >>> jhell...@innoventsolutions.com> wrote:
> >> >>>
> >> >>>> Vinay,
> >> >>>>
> >> >>>> What autoCommit settings do you have for your indexing process?
> >> >>>>
> >> >>>> Jason
> >> >>>>
> >> >>>> On Jun 24, 2013, at 1:28 PM, Vinay Pothnis <poth...@gmail.com>
> wrote:
> >> >>>>
> >> >>>>> Here is the ulimit -a output:
> >> >>>>>
> >> >>>>> core file size           (blocks, -c)  0  data seg size
> >> >>>> (kbytes,
> >> >>>>> -d)  unlimited  scheduling priority              (-e)  0  file
> size
> >> >>>>>              (blocks, -f)  unlimited  pending signals
> >> >>>>> (-i)  179963  max locked memory        (kbytes, -l)  64  max
> memory
> >> >> size
> >> >>>>>        (kbytes, -m)  unlimited  open files
> (-n)
> >> >>>>> 32769  pipe size             (512 bytes, -p)  8  POSIX message
> queues
> >> >>>>>  (bytes,
> >> >>>>> -q)  819200  real-time priority               (-r)  0  stack size
> >> >>>>> (kbytes, -s)  10240  cpu time                (seconds, -t)
>  unlimited
> >> >>>> max
> >> >>>>> user processes               (-u)  140000  virtual memory
> >> >>>> (kbytes,
> >> >>>>> -v)  unlimited  file locks                       (-x)  unlimited
> >> >>>>>
> >> >>>>> On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro <
> >> yago.rive...@gmail.com
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> I have the same issue too, and the deploy is quasi exact like
> than
> >> >> mine,
> >> >>>>>>
> >> >>>>
> >> >>
> >>
> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
> >> >>>>>>
> >> >>>>>> With some concurrence and batches of 10 solr apparently have some
> >> >>>> deadlock
> >> >>>>>> distributing updates
> >> >>>>>>
> >> >>>>>> Can you dump the configuration of the ulimit on your servers?,
> some
> >> >>>> people
> >> >>>>>> had the same issues because they are reach the ulimit maximum
> >> defined
> >> >>>> for
> >> >>>>>> descriptor and process.
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Yago Riveiro
> >> >>>>>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:
> >> >>>>>>
> >> >>>>>>> Hello All,
> >> >>>>>>>
> >> >>>>>>> I have the following set up of solr cloud.
> >> >>>>>>>
> >> >>>>>>> * solr version 4.3.1
> >> >>>>>>> * 3 node solr cloud + replciation factor 2
> >> >>>>>>> * 3 zoo keepers
> >> >>>>>>> * load balancer in front of the 3 solr nodes
> >> >>>>>>>
> >> >>>>>>> I am seeing this strange behavior when I am indexing a large
> number
> >> >> of
> >> >>>>>>> documents (10 mil). When I have more than 3-5 threads sending
> >> >> documents
> >> >>>>>> (in
> >> >>>>>>> batch of 20) to solr, sometimes solr goes into a hung state.
> After
> >> >> this
> >> >>>>>> all
> >> >>>>>>> the update requests get timed out. What we see via AppDynamics
> (a
> >> >>>>>>> performance monitoring tool) is that there are a number of
> threads
> >> >> that
> >> >>>>>> are
> >> >>>>>>> stalled. The stack trace for one of the threads is shown below.
> >> >>>>>>>
> >> >>>>>>> The cluster has to be restarted to recover from this. When I
> reduce
> >> >> the
> >> >>>>>>>
> >> >
> >> >
> >> >
> >> > --
> >> > Scott Lundgren
> >> > Director of Engineering
> >> > Carbon Black, Inc.
> >> > (210) 204-0483 | scott.lundg...@carbonblack.com
> >>
> >>
>

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Reply via email to