I am having this issue as well. I did apply this patch. Unfortunately, it
did not resolve the issue in my case.

On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters
<gwalt...@sherpaanalytics.com>wrote:

> Tim,
>
> Take a look at
> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland
> https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue
> that you're reporting for a while then I applied the patch from SOLR-4816
> to my clients and the problems went away. If you don't feel like applying
> the patch it looks like it should be included in the release of version
> 4.5. Also note that the problem happens more frequently when the
> replication factor is greater than 1.
>
> Thanks,
> Greg
>
> -----Original Message-----
> From: Tim Vaillancourt [mailto:t...@elementspace.com]
> Sent: Tuesday, September 03, 2013 6:31 PM
> To: solr-user@lucene.apache.org
> Subject: SolrCloud 4.x hangs under high update volume
>
> Hey guys,
>
> I am looking into an issue we've been having with SolrCloud since the
> beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
> yet). I've noticed other users with this same issue, so I'd really like to
> get to the bottom of it.
>
> Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
> see stalled transactions that snowball to consume all Jetty threads in the
> JVM. This eventually causes the JVM to hang with most threads waiting on
> the condition/stack provided at the bottom of this message. At this point
> SolrCloud instances then start to see their neighbors (who also have all
> threads hung) as down w/"Connection Refused", and the shards become "down"
> in state. Sometimes a node or two survives and just returns 503s "no
> server hosting shard" errors.
>
> As a workaround/experiment, we have tuned the number of threads sending
> updates to Solr, as well as the batch size (we batch updates from client ->
> solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> help. Certain combinations of update threads and batch sizes seem to
> mask/help the problem, but not resolve it entirely.
>
> Our current environment is the following:
> - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> - 3 x Zookeeper instances, external Java 7 JVM.
> - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
> a replica of 1 shard).
> - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
> day.
> - 5000 max jetty threads (well above what we use when we are healthy),
> Linux-user threads ulimit is 6000.
> - Occurs under Jetty 8 or 9 (many versions).
> - Occurs under Java 1.6 or 1.7 (several minor versions).
> - Occurs under several JVM tunings.
> - Everything seems to point to Solr itself, and not a Jetty or Java
> version (I hope I'm wrong).
>
> The stack trace that is holding up all my Jetty QTP threads is the
> following, which seems to be waiting on a lock that I would very much like
> to understand further:
>
> "java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00000007216e68d8> (a
> java.util.concurrent.Semaphore$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>     at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>     at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
>     at
>
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
>     at
>
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
>     at
>
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
>     at
>
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
>     at
>
> org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
>     at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
>     at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
>     at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
>     at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
>     at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
>     at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
>     at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>     at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
>     at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
>     at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
>     at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
>     at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
>     at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
>     at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
>     at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
>     at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
>     at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
>     at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201)
>     at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
>     at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:445)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
>     at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
>     at
>
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
>     at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
>     at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
>     at java.lang.Thread.run(Thread.java:724)"
>
> Some questions I had were:
> 1) What exclusive locks does SolrCloud "make" when performing an update?
> 2) Keeping in mind I do not read or write java (sorry :D), could someone
> help me understand "what" solr is locking in this case at
> "org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)"
> when performing an update? That will help me understand where to look next.
> 3) It seems all threads in this state are waiting for
> "0x00000007216e68d8", is there a way to tell what "0x00000007216e68d8" is?
> 4) Is there a limit to how many updates you can do in SolrCloud?
> 5) Wild-ass-theory: would more shards provide more locks (whatever they
> are) on update, and thus more update throughput?
>
> To those interested, I've provided a stacktrace of 1 of 3 nodes at this
> URL in gzipped form:
> https://s3.amazonaws.com/timvaillancourt.com/tmp/solr-jstack-2013-08-23.gz
>
> Any help/suggestions/ideas on this issue, big or small, would be much
> appreciated.
>
> Thanks so much all!
>
> Tim Vaillancourt
>



-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Reply via email to