Hi Frederik, I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context.
In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results. During load testing, we ran into a few different issues... 1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once. E.g. there's a limit to max parallel connections in the client being used to talk to Solr. 2. We needed to tune up the SolrJ settings for the HttpConnectionManager Under heavy load, this was running out of free connections. Given you've got 20 shards, each request is going to spawn 20 HTTP connections. I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections. 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc. There are lots of knobs to twiddle here, for better or worse. -- Ken On Sep 28, 2011, at 5:21am, Frederik Kraus wrote: > I just had a look at the thread-dump, pasting 3 examples here: > > > 'pool-31-thread-8233' Id=11626, BLOCKED on > lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, > total cpu time=20.0000ms user time=20.0000ms > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) > > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) > > at > org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) > > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) > > at > org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) > > at > org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) > > at > org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) > > at > org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) > > at > org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) > > at > org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) > > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 'pool-31-thread-8232' Id=11625, BLOCKED on > lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, > total cpu time=20.0000ms user time=20.0000ms > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) > > at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) > > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) > > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) > > at > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > and > > 'http-8080-381' Id=6859, WAITING on > lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, > total cpu time=990.0000ms user time=920.0000ms > > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) > at > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) > > at > org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) > > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) > > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > > at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) > > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > at java.lang.Thread.run(Thread.java:662) > > > > > > > Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus: > >> >> >> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann: >> >>> Hi Fred, >>> >>> ok, it's a strange behavior with same queries. >>> Another questions: >>> -which solr version? >> >> 3.3 (might the NIOFSDirectory from 3.4 help?) >> >>> -do you indexing during your load test? (because of index rebuilt) >> nope >> >>> -do you replicate your index? >> >> nope >>> >>> Regards >>> Vadim >>> >>> >>> >>> 2011/9/28 Frederik Kraus <frederik.kr...@gmail.com >>> (mailto:frederik.kr...@gmail.com)> >>> >>>> Hi Vladim, >>>> >>>> the thing is, that those exact same queries, that take longer during a load >>>> test, perform just fine when executed at a slower request rate and are also >>>> random, i.e. there is no pattern in bad/slow queries. >>>> >>>> My first thought was some kind of contention and/or connection starvation >>>> for the internal shard communication? >>>> >>>> Fred. >>>> >>>> >>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: >>>> >>>>> Hi Fred, >>>>> analyze the queries which take longer. >>>>> We observe our queries and see the problems with q-time with queries >>>> which >>>>> are complex, with phrase queries or queries which contains numbers or >>>>> special characters. >>>>> if you don't know it: >>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance >>>>> Regards >>>>> Vadim >>>>> >>>>> >>>>> 2011/9/28 Frederik Kraus <frederik.kr...@gmail.com >>>>> (mailto:frederik.kr...@gmail.com) (mailto: >>>> frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com))> >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I am experiencing a strange issue doing some load tests. Our setup: >>>>>> >>>>>> - 2 server with each 24 cpu cores, 130GB of RAM >>>>>> - 10 shards per server (needed for response times) running in a single >>>>>> tomcat instance >>>>>> - each query queries all 20 shards (distributed search) >>>>>> >>>>>> - each shard holds about 1.5 mio documents (small shards are needed due >>>> to >>>>>> rather complex queries) >>>>>> - all caches are warmed / high cache hit rates (99%) etc. >>>>>> >>>>>> >>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no >>>> disk >>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a >>>> point, >>>>>> decreases throughput and increases the response times of the individual >>>>>> queries. >>>>>> >>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at >>>> 100ms >>>>>> while 1-2% take 1.5s or longer. >>>>>> >>>>>> Any ideas are greatly appreciated :) >>>>>> >>>>>> Fred. > -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr