I've been testing Solr Cloud 6.1.0 with two servers, and getting somewhat disappointing query latency. I'm comparing the latency with the same tests, running DSE in place of Solr Cloud. It's surprising, because running the test just on my laptop (running a single instance of Solr), I get significantly better latency with Solr than with DSE.
Here's an overview of the test: - Machine 1 - ZooKeeper server. - Machine 2 - test driver, sending requests to the two test machines at a rate of 200/sec total (so each test machine is processing 100/sec). - Machine 3: - 1 Solr Cloud instance run with <machine 1 address>:2181 arg. - Java app - for each request received: - Creates SolrQuery object from request data. - Uses SolrClient.query(<collection name>, <query>) to do a Solr search. - Creates SolrInputDocument and UpdateRequest objects, adds doc to update request, and calls UpdateRequest.process(<solr client>, <collection name>). - Machine 4: - Duplicate of machine 3. After seeing that Solr wasn't doing so well with two nodes compared to DSE, but that it had been faster in single node tests on my laptop, I tried running the test with just machine 3. So no Solr instance running on machine 4, and machine 2 is sending all 200 reqs/sec to machine 3. The latency in this test was far better than a test with both machines using DSE, which was better than a test with just one machine using DSE. This gives me hope. To summarize the results, the latency I'm getting, in order from fastest to slowest: 1) 1 node, Solr 2) 2 nodes, DSE 3) 1 node, DSE 4) 2 nodes Solr In theory, shouldn't 2 nodes running Solr be the fastest? What could make adding a second node cause the performance to decrease instead of increase as I'd expect? Relevant info: I'm using a single Solr collection. When running Solr with just one node, I create the collection with 1 shard. When running Solr with both nodes, I create the collection with 2 shards. In both cases, I use replication factor = 1. I'm using SolrJ 5.4.1, because it's the latest version of the library that I've gotten working with both DSE and Solr Cloud, and, assuming I can get Cloud to perform at least as well as DSE, I'll be needing to eventually talk to both at once from within a single Java app. With DSE, I use HttpSolrClient, giving it a URL of localhost so each Java app only talks to the DSE instance running on the same machine. But with newer versions of SolrJ, I'd get strange "String cannot be cast to <Map or Long>" errors on the server side when a string field value was of one or more seemingly arbitrary lengths... I could literally add a blank space to the end of a 542-char length value and the error would go away... but that's niether here nor there, just background on why I'm not using the latest SolrJ lib. With Solr Cloud, I started out using CloudSolrClient for the SolrClient objects, but have since switched to using HttpSolrClient to force each Java app to talk directly to its local Solr instance. It produced a minor latency improvement, but not significant. If I can get it so adding nodes actually improves performance, I'll go back to CloudSolrClient because the convenience of having built in support for handling failed nodes is pretty great. Here is the code to create the HttpSolrClient: PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager(); connectionManager.setDefaultMaxPerRoute(200); connectionManager.setMaxTotal(5000); RequestConfig reqConfig = RequestConfig.custom() .setConnectTimeout(1000) .setSocketTimeout(1000) .build(); HttpClient httpClient = HttpClients.custom() .setDefaultRequestConfig(reqConfig) .setConnectionManager(connectionManager) .build(); HttpSolrClient writeClient = new HttpSolrClient("http://localhost:8983/solr", httpClient); Here is the code to create the HttpSolrClient: PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager(); connectionManager.setDefaultMaxPerRoute(200); connectionManager.setMaxTotal(5000); RequestConfig reqConfig = RequestConfig.custom() .setConnectTimeout(1000) .setSocketTimeout(1000) .build(); HttpClient httpClient = HttpClients.custom() .setDefaultRequestConfig(reqConfig) .setConnectionManager(connectionManager) .build(); CloudSolrClient writeClient = new CloudSolrClient("localhost:2181/solr", httpClient); I only have a single client instance in the Java app, shared amongst a request handling threadpool, because I'm assuming it's threadsafe. Is that correct? It's worked fine for DSE, so perhaps that's a dumb question. The schema in both DSE and Solr tests is identical, and the solrconfig is as close as I can get them given a small number of different settings available. Here's my complete solrconfig.xml for the Solr Cloud collection: ------------------------------------------------------------ <config> <luceneMatchVersion>6.1.0</luceneMatchVersion> <dataDir>${solr.data.dir:}</dataDir> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"> </directoryFactory> <codecFactory class="solr.SchemaCodecFactory"/> <indexConfig> <lockType>${solr.lock.type:native}</lockType> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>2048</ramBufferSizeMB> <reopenReaders>true</reopenReaders> <deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> </deletionPolicy> <infoStream>false</infoStream> </indexConfig> <jmx/> <updateHandler class="solr.DirectUpdateHandler2"> <updateLog> <str name="dir">${solr.ulog.dir:}</str> <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int> </updateLog> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:30000}</maxTime> </autoSoftCommit> </updateHandler> <query> <maxBooleanClauses>1024</maxBooleanClauses> <filterCache class="solr.FastLRUCache" size="512" initialSize="256" autowarmCount="256"/> <queryResultCache class="solr.LRUCache" size="512" initialSize="256" autowarmCount="256"/> <documentCache class="solr.LRUCache" size="1024" initialSize="1024" autowarmCount="0"/> <cache name="perSegFilter" class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> <enableLazyFieldLoading>true</enableLazyFieldLoading> <queryResultWindowSize>20</queryResultWindowSize> <queryResultMaxDocsCached>30</queryResultMaxDocsCached> <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"/> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"/> </listener> <useColdSearcher>false</useColdSearcher> <maxWarmingSearchers>2</maxWarmingSearchers> </query> <requestDispatcher handleSelect="true"> <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048" addHttpRequestToContext="false"/> <httpCaching never304="true"/> </requestDispatcher> <requestHandler class="solr.SearchHandler" default="true" name="search"> <lst name="defaults"> <str name="defType">edismax</str> <str name="echoParams">explicit</str> <int name="rows">3</int> <int name="timeAllowed">750</int> </lst> </requestHandler> <requestHandler class="solr.UpdateRequestHandler" name="/update"/> <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell"> <lst name="defaults"> <str name="df">text</str> </lst> </initParams> </config> ------------------------------------------------------------ The Solr queries have the following params: rows=30&qt=search&dismax=true&mm=<computed for each req>&fl=<field names>&fq=<field name>:(<integer>)&q=<search string> There are no explicit commits happening. Any help is greatly appreciated, and let me know if there's any additional info that would be helpful. -Brent -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Higher-search-latency-with-two-nodes-vs-one-node-tp4295894.html Sent from the Solr - User mailing list archive at Nabble.com.