Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Vinay Pothnis Tue, 25 Jun 2013 08:57:02 -0700

Jason and Scott,

Thanks for the replies and pointers!
Yes, I will consider the 'maxDocs' value as well. How do i monitor the
transaction logs during the interval between commits?


Thanks
Vinay


On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Scott,
>
> My comment was meant to be a bit tongue-in-cheek, but my intent in the
> statement was to represent hard failure along the lines Vinay is seeing.
>  We're talking about OutOfMemoryException conditions, total cluster
> paralysis requiring restart, or other similar and disastrous conditions.
>
> Where that line is is impossible to generically define, but trivial to
> accomplish.  What any of us running Solr has to achieve is a realistic
> simulation of our desired production load (probably well above peak) and to
> see what limits are reached.  Armed with that information we tweak.  In
> this case, we look at finding the point where data ingestion reaches a
> natural limit.  For some that may be JVM GC, for others memory buffer size
> on the client load, and yet others it may be I/O limits on multithreaded
> reads from a database or file system.
>
> In old Solr days we had a little less to worry about.  We might play with
> a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial
> commits and rollback recoveries.  But with 4.x we now have more durable
> write options and NRT to consider, and SolrCloud begs to use this.  So we
> have to consider transaction logs, the file handles they leave open until
> commit operations occur, and how we want to manage writing to all cores
> simultaneously instead of a more narrow master/slave relationship.
>
> It's all manageable, all predictable (with some load testing) and all
> filled with many possibilities to meet our specific needs.  Considering hat
> each person's data model, ingestion pipeline, request processors, and field
> analysis steps will be different, 5 threads of input at face value doesn't
> really contemplate the whole problem.  We have to measure our actual data
> against our expectations and find where the weak chain links are to
> strengthen them.  The symptoms aren't necessarily predictable in advance of
> this testing, but they're likely addressable and not difficult to decipher.
>
> For what it's worth, SolrCloud is new enough that we're still experiencing
> some "uncharted territory with unknown ramifications" but with continued
> dialog through channels like these there are fewer territories without good
> cartography :)
>
> Hope that's of use!
>
> Jason
>
>
>
> On Jun 24, 2013, at 7:12 PM, Scott Lundgren <
> scott.lundg...@carbonblack.com> wrote:
>
> > Jason,
> >
> > Regarding your statement "push you over the edge"- what does that mean?
> > Does it mean "uncharted territory with unknown ramifications" or
> something
> > more like specific, known symptoms?
> >
> > I ask because our use is similar to Vinay's in some respects, and we want
> > to be able to push the capabilities of write perf - but not over the
> edge!
> > In particular, I am interested in knowing the symptoms of failure, to
> help
> > us troubleshoot the underlying problems if and when they arise.
> >
> > Thanks,
> >
> > Scott
> >
> > On Monday, June 24, 2013, Jason Hellman wrote:
> >
> >> Vinay,
> >>
> >> You may wish to pay attention to how many transaction logs are being
> >> created along the way to your hard autoCommit, which should truncate the
> >> open handles for those files.  I might suggest setting a maxDocs value
> in
> >> parallel with your maxTime value (you can use both) to ensure the commit
> >> occurs at either breakpoint.  30 seconds is plenty of time for 5
> parallel
> >> processes of 20 document submissions to push you over the edge.
> >>
> >> Jason
> >>
> >> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis <poth...@gmail.com> wrote:
> >>
> >>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds.
> >>>
> >>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman <
> >>> jhell...@innoventsolutions.com> wrote:
> >>>
> >>>> Vinay,
> >>>>
> >>>> What autoCommit settings do you have for your indexing process?
> >>>>
> >>>> Jason
> >>>>
> >>>> On Jun 24, 2013, at 1:28 PM, Vinay Pothnis <poth...@gmail.com> wrote:
> >>>>
> >>>>> Here is the ulimit -a output:
> >>>>>
> >>>>> core file size           (blocks, -c)  0  data seg size
> >>>> (kbytes,
> >>>>> -d)  unlimited  scheduling priority              (-e)  0  file size
> >>>>>              (blocks, -f)  unlimited  pending signals
> >>>>> (-i)  179963  max locked memory        (kbytes, -l)  64  max memory
> >> size
> >>>>>        (kbytes, -m)  unlimited  open files                       (-n)
> >>>>> 32769  pipe size             (512 bytes, -p)  8  POSIX message queues
> >>>>>  (bytes,
> >>>>> -q)  819200  real-time priority               (-r)  0  stack size
> >>>>> (kbytes, -s)  10240  cpu time                (seconds, -t)  unlimited
> >>>> max
> >>>>> user processes               (-u)  140000  virtual memory
> >>>> (kbytes,
> >>>>> -v)  unlimited  file locks                       (-x)  unlimited
> >>>>>
> >>>>> On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro <
> yago.rive...@gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have the same issue too, and the deploy is quasi exact like than
> >> mine,
> >>>>>>
> >>>>
> >>
> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
> >>>>>>
> >>>>>> With some concurrence and batches of 10 solr apparently have some
> >>>> deadlock
> >>>>>> distributing updates
> >>>>>>
> >>>>>> Can you dump the configuration of the ulimit on your servers?, some
> >>>> people
> >>>>>> had the same issues because they are reach the ulimit maximum
> defined
> >>>> for
> >>>>>> descriptor and process.
> >>>>>>
> >>>>>> --
> >>>>>> Yago Riveiro
> >>>>>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >>>>>>
> >>>>>>
> >>>>>> On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote:
> >>>>>>
> >>>>>>> Hello All,
> >>>>>>>
> >>>>>>> I have the following set up of solr cloud.
> >>>>>>>
> >>>>>>> * solr version 4.3.1
> >>>>>>> * 3 node solr cloud + replciation factor 2
> >>>>>>> * 3 zoo keepers
> >>>>>>> * load balancer in front of the 3 solr nodes
> >>>>>>>
> >>>>>>> I am seeing this strange behavior when I am indexing a large number
> >> of
> >>>>>>> documents (10 mil). When I have more than 3-5 threads sending
> >> documents
> >>>>>> (in
> >>>>>>> batch of 20) to solr, sometimes solr goes into a hung state. After
> >> this
> >>>>>> all
> >>>>>>> the update requests get timed out. What we see via AppDynamics (a
> >>>>>>> performance monitoring tool) is that there are a number of threads
> >> that
> >>>>>> are
> >>>>>>> stalled. The stack trace for one of the threads is shown below.
> >>>>>>>
> >>>>>>> The cluster has to be restarted to recover from this. When I reduce
> >> the
> >>>>>>>
> >
> >
> >
> > --
> > Scott Lundgren
> > Director of Engineering
> > Carbon Black, Inc.
> > (210) 204-0483 | scott.lundg...@carbonblack.com
>
>

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

Reply via email to