Hi Solr-Team, I am familiarizing myself with solr cloud and I am trying out and compare different processing setups. Short story: term-query ran on shard gives lower numbers compared querying the complete index. I wonder why.
Long story: I grabbed the 2.7.1 version of solr, created a 4 core setup with replication factor 2 on windows using [1], I've restarted the setup with 2GB for each node [2], inserted the html docs from the german wikipedia archive [3], and obtained top 10 terms for the whole collection vs one specific shard: http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json { "responseHeader":{ "zkConnected":true, "status":0, "QTime":5287}, "terms":{ "text":[ "8",670564, "application",670564, "articles",670564, "charset",670564, "de",670564, "f",670564, "utf",670564, "wiki",670564, "xhtml",670564, "xml",670564]}} http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&shards=localhost:9999/solr/de_wiki_all_shard1_replica_n1&shards.qt=de_wiki_all_shard1_replica_n1 { "responseHeader":{ "zkConnected":true, "status":0, "QTime":20274}, "terms":{ "text":{ "8":671396, "application":671396, "articles":671396, "charset":671396, "de":671396, "f":671396, "utf":671396, "wiki":671396, "xhtml":671396, "xml":671396}}} reveals: (1) querying one shard takes 20 secs vs 5 secs for the whole index (2) the counts for one shards are higher than for the whole index (3) the f: hard drive is samsung SSD 850 evo 4TB (CrystalDeiskMark shows ~500MB/s seq and ~300MBs random read/writes), CPU:i7-6400 @3.4GHz. Querying for 20 secs shows that java process is neither being pushed on the CPU nor on the SDD side to the limits. What is the bottleneck in this computation? (4) the output format is slightly different (compare ',' vs ':' and vector vs list). I wonder why The findings are a bit counter intuitive. Could you comment on those? Cheers, Arturas References: [1] Create cluster F:\solr_server\solr-7.2.1>bin\solr.cmd start -e cloud Welcome to the SolrCloud example! This interactive session will help you launch a SolrCloud cluster on your local workstation. To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]: 4 Ok, let's start up 4 Solr nodes for your example SolrCloud cluster. Please enter the port for node1 [8983]: 9999 Please enter the port for node2 [7574]: 9998 Please enter the port for node3 [8984]: 9997 Please enter the port for node4 [7575]: 9996 Creating Solr home directory F:\solr_server\solr-7.2.1\example\cloud\node1\solr Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into F:\solr_server\solr-7.2.1\example\cloud\node2 Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into F:\solr_server\solr-7.2.1\example\cloud\node3 Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into F:\solr_server\solr-7.2.1\example\cloud\node4 Starting up Solr on port 9999 using command: "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9999 -s "F:\solr_server\solr-7.2.1\example\cloud\node1\solr" Waiting up to 30 to see Solr running on port 9999 Started Solr server on port 9999. Happy searching! Starting up Solr on port 9998 using command: "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9998 -s "F:\solr_server\solr-7.2.1\example\cloud\node2\solr" -z localhost:10999 Waiting up to 30 to see Solr running on port 9998 Starting up Solr on port 9997 using command: "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9997 -s "F:\solr_server\solr-7.2.1\example\cloud\node3\solr" -z localhost:10999 Started Solr server on port 9998. Happy searching! Waiting up to 30 to see Solr running on port 9997 Starting up Solr on port 9996 using command: "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9996 -s "F:\solr_server\solr-7.2.1\example\cloud\node4\solr" -z localhost:10999 Started Solr server on port 9997. Happy searching! Waiting up to 30 to see Solr running on port 9996 Started Solr server on port 9996. Happy searching! INFO - 2018-06-21 15:38:16.239; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:10999 ready Now let's create a new collection for indexing documents in your 4-node cluster. Please provide a name for your new collection: [gettingstarted] de_wiki_all How many shards would you like to split de_wiki_all into? [2] 4 How many replicas per shard would you like to create? [2] 2 Please choose a configuration for the de_wiki_all collection, available options are: _default or sample_techproducts_configs [_default] sample_techproducts_configs Created collection 'de_wiki_all' with 4 shard(s), 2 replica(s) with config-set 'de_wiki_all' Enabling auto soft-commits with maxTime 3 secs using the Config API POSTing request to Config API: http://localhost:9999/solr/de_wiki_all/config {"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}} Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000 SolrCloud example running, please visit: http://localhost:9999/solr F:\solr_server\solr-7.2.1> [2] Restart with 2GB "F:\solr_server\solr-7.2.1\bin\solr.cmd" stop -all "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9999 -s "F:\solr_server\solr-7.2.1\example\cloud\node1\solr" "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9998 -s "F:\solr_server\solr-7.2.1\example\cloud\node2\solr" -z localhost:10999 "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9997 -s "F:\solr_server\solr-7.2.1\example\cloud\node3\solr" -z localhost:10999 "F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9996 -s "F:\solr_server\solr-7.2.1\example\cloud\node4\solr" -z localhost:10999 [3] Insert wikipedia files java -jar -Durl=http://localhost:9999/solr/de_wiki_all/update -Dauto -Drecursive example\exampledocs\post.jar f:\wiki\de\articles\* 2681612 files indexed. COMMITting Solr index changes to http://localhost:9999/solr/de_wiki_all/update... Time spent: 12:32:53.843