Jason and Scott, Thanks for the replies and pointers! Yes, I will consider the 'maxDocs' value as well. How do i monitor the transaction logs during the interval between commits?
Thanks Vinay On Mon, Jun 24, 2013 at 8:48 PM, Jason Hellman < jhell...@innoventsolutions.com> wrote: > Scott, > > My comment was meant to be a bit tongue-in-cheek, but my intent in the > statement was to represent hard failure along the lines Vinay is seeing. > We're talking about OutOfMemoryException conditions, total cluster > paralysis requiring restart, or other similar and disastrous conditions. > > Where that line is is impossible to generically define, but trivial to > accomplish. What any of us running Solr has to achieve is a realistic > simulation of our desired production load (probably well above peak) and to > see what limits are reached. Armed with that information we tweak. In > this case, we look at finding the point where data ingestion reaches a > natural limit. For some that may be JVM GC, for others memory buffer size > on the client load, and yet others it may be I/O limits on multithreaded > reads from a database or file system. > > In old Solr days we had a little less to worry about. We might play with > a commitWithin parameter, ramBufferSizeMB tweaks, or contemplate partial > commits and rollback recoveries. But with 4.x we now have more durable > write options and NRT to consider, and SolrCloud begs to use this. So we > have to consider transaction logs, the file handles they leave open until > commit operations occur, and how we want to manage writing to all cores > simultaneously instead of a more narrow master/slave relationship. > > It's all manageable, all predictable (with some load testing) and all > filled with many possibilities to meet our specific needs. Considering hat > each person's data model, ingestion pipeline, request processors, and field > analysis steps will be different, 5 threads of input at face value doesn't > really contemplate the whole problem. We have to measure our actual data > against our expectations and find where the weak chain links are to > strengthen them. The symptoms aren't necessarily predictable in advance of > this testing, but they're likely addressable and not difficult to decipher. > > For what it's worth, SolrCloud is new enough that we're still experiencing > some "uncharted territory with unknown ramifications" but with continued > dialog through channels like these there are fewer territories without good > cartography :) > > Hope that's of use! > > Jason > > > > On Jun 24, 2013, at 7:12 PM, Scott Lundgren < > scott.lundg...@carbonblack.com> wrote: > > > Jason, > > > > Regarding your statement "push you over the edge"- what does that mean? > > Does it mean "uncharted territory with unknown ramifications" or > something > > more like specific, known symptoms? > > > > I ask because our use is similar to Vinay's in some respects, and we want > > to be able to push the capabilities of write perf - but not over the > edge! > > In particular, I am interested in knowing the symptoms of failure, to > help > > us troubleshoot the underlying problems if and when they arise. > > > > Thanks, > > > > Scott > > > > On Monday, June 24, 2013, Jason Hellman wrote: > > > >> Vinay, > >> > >> You may wish to pay attention to how many transaction logs are being > >> created along the way to your hard autoCommit, which should truncate the > >> open handles for those files. I might suggest setting a maxDocs value > in > >> parallel with your maxTime value (you can use both) to ensure the commit > >> occurs at either breakpoint. 30 seconds is plenty of time for 5 > parallel > >> processes of 20 document submissions to push you over the edge. > >> > >> Jason > >> > >> On Jun 24, 2013, at 2:21 PM, Vinay Pothnis <poth...@gmail.com> wrote: > >> > >>> I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds. > >>> > >>> On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman < > >>> jhell...@innoventsolutions.com> wrote: > >>> > >>>> Vinay, > >>>> > >>>> What autoCommit settings do you have for your indexing process? > >>>> > >>>> Jason > >>>> > >>>> On Jun 24, 2013, at 1:28 PM, Vinay Pothnis <poth...@gmail.com> wrote: > >>>> > >>>>> Here is the ulimit -a output: > >>>>> > >>>>> core file size (blocks, -c) 0 data seg size > >>>> (kbytes, > >>>>> -d) unlimited scheduling priority (-e) 0 file size > >>>>> (blocks, -f) unlimited pending signals > >>>>> (-i) 179963 max locked memory (kbytes, -l) 64 max memory > >> size > >>>>> (kbytes, -m) unlimited open files (-n) > >>>>> 32769 pipe size (512 bytes, -p) 8 POSIX message queues > >>>>> (bytes, > >>>>> -q) 819200 real-time priority (-r) 0 stack size > >>>>> (kbytes, -s) 10240 cpu time (seconds, -t) unlimited > >>>> max > >>>>> user processes (-u) 140000 virtual memory > >>>> (kbytes, > >>>>> -v) unlimited file locks (-x) unlimited > >>>>> > >>>>> On Mon, Jun 24, 2013 at 12:47 PM, Yago Riveiro < > yago.rive...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I have the same issue too, and the deploy is quasi exact like than > >> mine, > >>>>>> > >>>> > >> > http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 > >>>>>> > >>>>>> With some concurrence and batches of 10 solr apparently have some > >>>> deadlock > >>>>>> distributing updates > >>>>>> > >>>>>> Can you dump the configuration of the ulimit on your servers?, some > >>>> people > >>>>>> had the same issues because they are reach the ulimit maximum > defined > >>>> for > >>>>>> descriptor and process. > >>>>>> > >>>>>> -- > >>>>>> Yago Riveiro > >>>>>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > >>>>>> > >>>>>> > >>>>>> On Monday, June 24, 2013 at 7:49 PM, Vinay Pothnis wrote: > >>>>>> > >>>>>>> Hello All, > >>>>>>> > >>>>>>> I have the following set up of solr cloud. > >>>>>>> > >>>>>>> * solr version 4.3.1 > >>>>>>> * 3 node solr cloud + replciation factor 2 > >>>>>>> * 3 zoo keepers > >>>>>>> * load balancer in front of the 3 solr nodes > >>>>>>> > >>>>>>> I am seeing this strange behavior when I am indexing a large number > >> of > >>>>>>> documents (10 mil). When I have more than 3-5 threads sending > >> documents > >>>>>> (in > >>>>>>> batch of 20) to solr, sometimes solr goes into a hung state. After > >> this > >>>>>> all > >>>>>>> the update requests get timed out. What we see via AppDynamics (a > >>>>>>> performance monitoring tool) is that there are a number of threads > >> that > >>>>>> are > >>>>>>> stalled. The stack trace for one of the threads is shown below. > >>>>>>> > >>>>>>> The cluster has to be restarted to recover from this. When I reduce > >> the > >>>>>>> > > > > > > > > -- > > Scott Lundgren > > Director of Engineering > > Carbon Black, Inc. > > (210) 204-0483 | scott.lundg...@carbonblack.com > >