[posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming]
Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are now running Solr) have not changed, the total time it takes for a query has increased dramatically. During this performance test, we of course do not modify the indexes. Our application is sending Solr select queries concurrently to the 8 shards, using CommonsHttpSolrServer. I added some timing debug messages, and found that CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time: int statusCode = _httpClient.executeMethod(method); Just to clarify: looking at access logs of the Solr shards, TTLB for a query might be around 5 ms. (on all shards), but httpClient.executeMethod() for this query can be much higher - say, 50 ms. On average, if under light load queries take 12 ms. on average, under heavy load the take around 22 ms. Another route we tried to pursue is add the "shards=shard1,shard2,…" parameter to the query instead of doing this ourselves, but this doesn't seem to work due to an NPE caused by QueryComponent.returnFields(), line 553: if (returnScores && sdoc.score != null) { where sdoc is null. I saw there is a null check on trunk, but since we're currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way around this. Note: we're using a custom query component which extends QueryComponent, but debugging this, I saw nothing wrong with the results at this point in the code. Our previous code used HTTP in a different manner: For each request, we created a new sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() method. Under the same load as the new application, the old application does not encounter the delays mentioned above. Our current code is initializing CommonsHttpSolrServer for each shard this way: MultiThreadedHttpConnectionManager httpConnectionManager = new MultiThreadedHttpConnectionManager(); httpConnectionManager.getParams().setTcpNoDelay(true); httpConnectionManager.getParams().setMaxTotalConnections(1024); httpConnectionManager.getParams().setStaleCheckingEnabled(false); HttpClient httpClient = new HttpClient(); HttpClientParams params = new HttpClientParams(); params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES); params.setAuthenticationPreemptive(false); params.setContentCharset(StringConstants.UTF8); httpClient.setParams(params); httpClient.setHttpConnectionManager(httpConnectionManager); and passing the new HttpClient to the Solr Server: solrServer = new CommonsHttpSolrServer(coreUrl, httpClient); We tried two different ways - one with a single MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and the other with a new MultiThreadedHttpConnectionManager and HttpClient for each SolrServer. Both tries yielded similar performance results. Also tried to give setMaxTotalConnections() a much higher connections number (1,000,000) - didn't have an effect. One last thing - to answer Lance's question about this being an "apples to apples" comparison (in lucene-user thread) - yes, our main goal in this project is to do things as close to the previous version as possible. This way we can monitor that behavior (both quality and performance) remains similar, release this version, and then move forward to improve things. Of course, there are some changes, but I believe we are indeed measuring the complete flow on both apps, and that both apps are returning the same fields via HTTP. Would love to hear what you think about this. TIA, Ophir