Hi Ken,  

the HttpConnectionManager was actually the first thing I looked at - and bumped 
the Solr default of 20 up to 50, 100, 400, 10000 (which should be more or less 
unlimited ;) ). Unfortunately didn't really solve anything. I don't know if the 
"static" HttpClient is a problem here as it will be the same 
HttpConnectionManager for all shards …

Obviously a way of validating this would be to spawn 20 tomcat (or jetty) 
instances, one for each shard and 10 per server - hopefully there is an easier 
way ;)

By the way: Ubuntu / GC / etc. are all tuned and shouldn't be a bottleneck 
here. The GC only spends about 50-100ms during a 10min load test, and never a 
full-GC.  

Just going through a jstack dump again, it looks like the HttpConnectionManager 
is actually waiting for a lock …

"pool-31-thread-15776" prio=10 tid=0x00007ef544249000 nid=0x50be waiting for 
monitor entry [0x00007ef4d38fc000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 - waiting to lock <0x00007f07dd6bfa70> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
 at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
….

Fred.  


Am Mittwoch, 28. September 2011 um 17:48 schrieb Ken Krugler:

> Hi Frederik,
>  
> I haven't directly run into this issue with Solr, but I have experienced 
> similar issues in a related context.
>  
> In my case, I had a custom webapp that made SolrJ requests and then generated 
> some aggregated/analyzed results.
>  
> During load testing, we ran into a few different issues...
>  
> 1. The load test software itself had an issue with scaling - I'm assuming 
> that's not the case for you, but I've seen it happen more than once.
>  
> E.g. there's a limit to max parallel connections in the client being used to 
> talk to Solr.
>  
> 2. We needed to tune up the SolrJ settings for the HttpConnectionManager
>  
> Under heavy load, this was running out of free connections.
>  
> Given you've got 20 shards, each request is going to spawn 20 HTTP 
> connections.
>  
> I don't know off the top of my head how solr.SearchHandler manages 
> connections (and whether it's possible to tune this), but from the stack 
> trace below it sure looks like you're blocked on getting free HTTP 
> connections.
>  
> 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.
>  
> There are lots of knobs to twiddle here, for better or worse.
>  
> -- Ken
>  
> On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:
>  
> > I just had a look at the thread-dump, pasting 3 examples here:
> >  
> >  
> > 'pool-31-thread-8233' Id=11626, BLOCKED on 
> > lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> >  total cpu time=20.0000ms user time=20.0000ms
> > at 
> > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
> >   
> > at 
> > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
> >   
> > at 
> > org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
> >   
> > at 
> > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
> >   
> > at 
> > org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
> >   
> > at 
> > org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
> >   
> > at 
> > org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
> >   
> > at 
> > org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
> >   
> > at 
> > org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
> >   
> > at 
> > org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
> >   
> > at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
> >   
> > at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >   
> > at 
> > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> >   
> > at 
> > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> >   
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> > 'pool-31-thread-8232' Id=11625, BLOCKED on 
> > lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> >  total cpu time=20.0000ms user time=20.0000ms
> > at 
> > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
> >   
> > at 
> > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> >   
> > at 
> > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> >   
> > at 
> > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
> >  
> > at 
> > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
> >  
> > at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
> >   
> > at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >   
> > at 
> > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> >   
> > at 
> > org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> >   
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   
> > at java.lang.Thread.run(Thread.java:662)  
> > and  
> >  
> > 'http-8080-381' Id=6859, WAITING on 
> > lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720,
> >  total cpu time=990.0000ms user time=920.0000ms
> >  
> > at sun.misc.Unsafe.park(Native Method)  
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
> > at 
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> >   
> > at 
> > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
> >  
> > at 
> > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
> >   
> > at 
> > org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
> >   
> > at 
> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
> >   
> > at 
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >   
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)  
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> >   
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> >   
> > at 
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >   
> > at 
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >   
> > at 
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >   
> > at 
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >   
> > at 
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)  
> > at 
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >   
> > at 
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >   
> > at 
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >   
> > at 
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
> >  
> > at 
> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)  
> > at 
> > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> >   
> > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)  
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> >  
> >  
> >  
> >  
> >  
> > Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> >  
> > >  
> > >  
> > > Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> > >  
> > > > Hi Fred,
> > > >  
> > > > ok, it's a strange behavior with same queries.
> > > > Another questions:
> > > > -which solr version?
> > >  
> > > 3.3 (might the NIOFSDirectory from 3.4 help?)
> > >  
> > > > -do you indexing during your load test? (because of index rebuilt)
> > > nope
> > >  
> > > > -do you replicate your index?
> > >  
> > > nope  
> > > >  
> > > > Regards
> > > > Vadim
> > > >  
> > > >  
> > > >  
> > > > 2011/9/28 Frederik Kraus <frederik.kr...@gmail.com 
> > > > (mailto:frederik.kr...@gmail.com)>
> > > >  
> > > > > Hi Vladim,
> > > > >  
> > > > > the thing is, that those exact same queries, that take longer during 
> > > > > a load
> > > > > test, perform just fine when executed at a slower request rate and 
> > > > > are also
> > > > > random, i.e. there is no pattern in bad/slow queries.
> > > > >  
> > > > > My first thought was some kind of contention and/or connection 
> > > > > starvation
> > > > > for the internal shard communication?
> > > > >  
> > > > > Fred.
> > > > >  
> > > > >  
> > > > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > > >  
> > > > > > Hi Fred,
> > > > > > analyze the queries which take longer.
> > > > > > We observe our queries and see the problems with q-time with queries
> > > > > which
> > > > > > are complex, with phrase queries or queries which contains numbers 
> > > > > > or
> > > > > > special characters.
> > > > > > if you don't know it:
> > > > > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > > > Regards
> > > > > > Vadim
> > > > > >  
> > > > > >  
> > > > > > 2011/9/28 Frederik Kraus <frederik.kr...@gmail.com 
> > > > > > (mailto:frederik.kr...@gmail.com) (mailto:
> > > > > frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com))>
> > > > > >  
> > > > > > > Hi,
> > > > > > >  
> > > > > > >  
> > > > > > > I am experiencing a strange issue doing some load tests. Our 
> > > > > > > setup:
> > > > > > >  
> > > > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > > > - 10 shards per server (needed for response times) running in a 
> > > > > > > single
> > > > > > > tomcat instance
> > > > > > > - each query queries all 20 shards (distributed search)
> > > > > > >  
> > > > > > > - each shard holds about 1.5 mio documents (small shards are 
> > > > > > > needed due
> > > > > to
> > > > > > > rather complex queries)
> > > > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > > >  
> > > > > > >  
> > > > > > > Now for some reason we cannot seem to fully utilize all CPU power 
> > > > > > > (no
> > > > > disk
> > > > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at 
> > > > > > > a
> > > > > point,
> > > > > > > decreases throughput and increases the response times of the 
> > > > > > > individual
> > > > > > > queries.
> > > > > > >  
> > > > > > > Also 1-2% of the queries take significantly longer: avg somewhere 
> > > > > > > at
> > > > > 100ms
> > > > > > > while 1-2% take 1.5s or longer.
> > > > > > >  
> > > > > > > Any ideas are greatly appreciated :)
> > > > > > >  
> > > > > > > Fred.
>  
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr


Reply via email to