Solr Cloud: Higher search latency with two nodes vs one node

Brent Mon, 12 Sep 2016 19:48:06 -0700

I've been testing Solr Cloud 6.1.0 with two servers, and getting somewhat
disappointing query latency. I'm comparing the latency with the same tests,
running DSE in place of Solr Cloud. It's surprising, because running the
test just on my laptop (running a single instance of Solr), I get
significantly better latency with Solr than with DSE.


Here's an overview of the test:
- Machine 1 - ZooKeeper server.
- Machine 2 - test driver, sending requests to the two test machines at a
rate of 200/sec total (so each test machine is processing 100/sec).
- Machine 3:
  - 1 Solr Cloud instance run with <machine 1 address>:2181 arg.
  - Java app - for each request received:
    - Creates SolrQuery object from request data.
    - Uses SolrClient.query(<collection name>, <query>) to do a Solr search. 
    - Creates SolrInputDocument and UpdateRequest objects, adds doc to
update request, and calls UpdateRequest.process(<solr client>, <collection
name>).
- Machine 4:
  - Duplicate of machine 3.

After seeing that Solr wasn't doing so well with two nodes compared to DSE,
but that it had been faster in single node tests on my laptop, I tried
running the test with just machine 3. So no Solr instance running on machine
4, and machine 2 is sending all 200 reqs/sec to machine 3. The latency in
this test was far better than a test with both machines using DSE, which was
better than a test with just one machine using DSE. This gives me hope.

To summarize the results, the latency I'm getting, in order from fastest to
slowest:
1) 1 node, Solr
2) 2 nodes, DSE
3) 1 node, DSE
4) 2 nodes Solr

In theory, shouldn't 2 nodes running Solr be the fastest? What could make
adding a second node cause the performance to decrease instead of increase
as I'd expect?

Relevant info:
I'm using a single Solr collection.
When running Solr with just one node, I create the collection with 1 shard. 
When running Solr with both nodes, I create the collection with 2 shards.
In both cases, I use replication factor = 1.

I'm using SolrJ 5.4.1, because it's the latest version of the library that
I've gotten working with both DSE and Solr Cloud, and, assuming I can get
Cloud to perform at least as well as DSE, I'll be needing to eventually talk
to both at once from within a single Java app. With DSE, I use
HttpSolrClient, giving it a URL of localhost so each Java app only talks to
the DSE instance running on the same machine. But with newer versions of
SolrJ, I'd get strange "String cannot be cast to <Map or Long>" errors on
the server side when a string field value was of one or more seemingly
arbitrary lengths... I could literally add a blank space to the end of a
542-char length value and the error would go away... but that's niether here
nor there, just background on why I'm not using the latest SolrJ lib.

With Solr Cloud, I started out using CloudSolrClient for the SolrClient
objects, but have since switched to using HttpSolrClient to force each Java
app to talk directly to its local Solr instance. It produced a minor latency
improvement, but not significant. If I can get it so adding nodes actually
improves performance, I'll go back to CloudSolrClient because the
convenience of having built in support for handling failed nodes is pretty
great.

Here is the code to create the HttpSolrClient:
PoolingHttpClientConnectionManager connectionManager = new
PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(200);
connectionManager.setMaxTotal(5000);
RequestConfig reqConfig = RequestConfig.custom()
    .setConnectTimeout(1000)
    .setSocketTimeout(1000)
    .build();
HttpClient httpClient = HttpClients.custom()
    .setDefaultRequestConfig(reqConfig)
    .setConnectionManager(connectionManager)
    .build();
HttpSolrClient writeClient = new
HttpSolrClient("http://localhost:8983/solr";, httpClient);

Here is the code to create the HttpSolrClient:
PoolingHttpClientConnectionManager connectionManager = new
PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(200);
connectionManager.setMaxTotal(5000);
RequestConfig reqConfig = RequestConfig.custom()
    .setConnectTimeout(1000)
    .setSocketTimeout(1000)
    .build();
HttpClient httpClient = HttpClients.custom()
    .setDefaultRequestConfig(reqConfig)
    .setConnectionManager(connectionManager)
    .build();
CloudSolrClient writeClient = new CloudSolrClient("localhost:2181/solr",
httpClient);

I only have a single client instance in the Java app, shared amongst a
request handling threadpool, because I'm assuming it's threadsafe. Is that
correct? It's worked fine for DSE, so perhaps that's a dumb question.

The schema in both DSE and Solr tests is identical, and the solrconfig is as
close as I can get them given a small number of different settings
available.
Here's my complete solrconfig.xml for the Solr Cloud collection:
------------------------------------------------------------
<config>
  <luceneMatchVersion>6.1.0</luceneMatchVersion>
  <dataDir>${solr.data.dir:}</dataDir>
  <directoryFactory name="DirectoryFactory"
                   
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
  </directoryFactory>
  <codecFactory class="solr.SchemaCodecFactory"/>
  <indexConfig>
    <lockType>${solr.lock.type:native}</lockType>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>2048</ramBufferSizeMB>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="maxCommitsToKeep">1</str>
      <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
     <infoStream>false</infoStream>
  </indexConfig>
  <jmx/>
  <updateHandler class="solr.DirectUpdateHandler2">
    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
      <int
name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
    </updateLog>
    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:30000}</maxTime>
    </autoSoftCommit>
  </updateHandler>
  <query>
    <maxBooleanClauses>1024</maxBooleanClauses>
    <filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="256"
                 autowarmCount="256"/>
    <queryResultCache class="solr.LRUCache"
                     size="512"
                     initialSize="256"
                     autowarmCount="256"/>
    <documentCache class="solr.LRUCache"
                   size="1024"
                   initialSize="1024"
                   autowarmCount="0"/>
    <cache name="perSegFilter"
      class="solr.search.LRUCache"
      size="10"
      initialSize="0"
      autowarmCount="10"
      regenerator="solr.NoOpRegenerator" />
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
   <queryResultWindowSize>20</queryResultWindowSize>
   <queryResultMaxDocsCached>30</queryResultMaxDocsCached>
    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries"/>
    </listener>
    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries"/>
    </listener>
    <useColdSearcher>false</useColdSearcher>
    <maxWarmingSearchers>2</maxWarmingSearchers>
  </query>
  <requestDispatcher handleSelect="true">
    <requestParsers enableRemoteStreaming="true"
                    multipartUploadLimitInKB="2048000"
                    formdataUploadLimitInKB="2048"
                    addHttpRequestToContext="false"/>
    <httpCaching never304="true"/>
  </requestDispatcher>
  <requestHandler class="solr.SearchHandler" default="true" name="search">
     <lst name="defaults">
       <str name="defType">edismax</str>
       <str name="echoParams">explicit</str>
       <int name="rows">3</int>
       <int name="timeAllowed">750</int>
     </lst>
  </requestHandler>
  <requestHandler class="solr.UpdateRequestHandler" name="/update"/>
  <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell">
    <lst name="defaults">
      <str name="df">text</str>
    </lst>
  </initParams>
</config>
------------------------------------------------------------

The Solr queries have the following params:
rows=30&qt=search&dismax=true&mm=<computed for each req>&fl=<field
names>&fq=<field name>:(<integer>)&q=<search string>

There are no explicit commits happening.

Any help is greatly appreciated, and let me know if there's any additional
info that would be helpful.

-Brent



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cloud: Higher search latency with two nodes vs one node

Reply via email to