Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

Ophir Adiv Wed, 04 Aug 2010 00:12:04 -0700

[posted this yesterday in lucene-user mailing list, and got an advice to
post this here instead. excuse me for spamming]


Hi,

I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr
1.4.0.
During stress testing, I encountered this performance problem:
While actual search times in our shards (which are now running Solr) have
not changed, the total time it takes for a query has increased dramatically.
During this performance test, we of course do not modify the indexes.
Our application is sending Solr select queries concurrently to the 8 shards,
using CommonsHttpSolrServer.
I added some timing debug messages, and found that
CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
total search time:
int statusCode = _httpClient.executeMethod(method);

Just to clarify: looking at access logs of the Solr shards, TTLB for a query
might be around 5 ms. (on all shards), but httpClient.executeMethod() for
this query can be much higher - say, 50 ms.
On average, if under light load queries take 12 ms. on average, under heavy
load the take around 22 ms.

Another route we tried to pursue is add the "shards=shard1,shard2,…"
parameter to the query instead of doing this ourselves, but this doesn't
seem to work due to an NPE caused by QueryComponent.returnFields(), line
553:
if (returnScores && sdoc.score != null) {

where sdoc is null. I saw there is a null check on trunk, but since we're
currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way
around this.
Note: we're using a custom query component which extends QueryComponent, but
debugging this, I saw nothing wrong with the results at this point in the
code.

Our previous code used HTTP in a different manner:
For each request, we created a new
sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream()
method.
Under the same load as the new application, the old application does not
encounter the delays mentioned above.

Our current code is initializing CommonsHttpSolrServer for each shard this
way:
    MultiThreadedHttpConnectionManager httpConnectionManager = new
MultiThreadedHttpConnectionManager();
    httpConnectionManager.getParams().setTcpNoDelay(true);
    httpConnectionManager.getParams().setMaxTotalConnections(1024);
    httpConnectionManager.getParams().setStaleCheckingEnabled(false);
    HttpClient httpClient = new HttpClient();
    HttpClientParams params = new HttpClientParams();
    params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
    params.setAuthenticationPreemptive(false);
    params.setContentCharset(StringConstants.UTF8);
    httpClient.setParams(params);
    httpClient.setHttpConnectionManager(httpConnectionManager);

and passing the new HttpClient to the Solr Server:
solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

We tried two different ways - one with a single
MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's,
and the other with a new MultiThreadedHttpConnectionManager and HttpClient
for each SolrServer.
Both tries yielded similar performance results.
Also tried to give setMaxTotalConnections() a much higher connections number
(1,000,000) - didn't have an effect.

One last thing - to answer Lance's question about this being an "apples to
apples" comparison (in lucene-user thread) - yes, our main goal in this
project is to do things as close to the previous version as possible.
This way we can monitor that behavior (both quality and performance) remains
similar, release this version, and then move forward to improve things.
Of course, there are some changes, but I believe we are indeed measuring the
complete flow on both apps, and that both apps are returning the same fields
via HTTP.

Would love to hear what you think about this. TIA,
Ophir

Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

Reply via email to