Hi Solr-Team,

I am familiarizing myself with solr cloud and I am trying out and compare
different processing setups. Short story: term-query ran on shard gives
lower numbers compared querying the complete index. I wonder why.



Long story:

I grabbed the 2.7.1 version of solr, created a 4 core setup with
replication factor 2 on windows using [1], I've restarted the setup with
2GB for each node [2], inserted the html docs from the german wikipedia
archive [3], and obtained top 10 terms for the whole collection vs one
specific shard:

http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json
{
"responseHeader":{
"zkConnected":true,

    "status":0,
    "QTime":5287},
  "terms":{
    "text":[
      "8",670564,
      "application",670564,
      "articles",670564,
      "charset",670564,
      "de",670564,
      "f",670564,
      "utf",670564,
      "wiki",670564,
      "xhtml",670564,
      "xml",670564]}}

http://localhost:9999/solr/de_wiki_all/terms?terms.limit=10&terms.fl=text&wt=json&shards=localhost:9999/solr/de_wiki_all_shard1_replica_n1&shards.qt=de_wiki_all_shard1_replica_n1

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":20274},
  "terms":{
    "text":{
      "8":671396,
      "application":671396,
      "articles":671396,
      "charset":671396,
      "de":671396,
      "f":671396,
      "utf":671396,
      "wiki":671396,
      "xhtml":671396,
      "xml":671396}}}

reveals:
(1) querying one shard takes 20 secs vs 5 secs for the whole index

(2) the counts for one shards are higher than for the whole index

(3) the f: hard drive is samsung SSD 850 evo 4TB (CrystalDeiskMark shows
~500MB/s seq and ~300MBs random read/writes), CPU:i7-6400 @3.4GHz. Querying
for 20 secs shows that java process is neither being pushed on the CPU nor
on the SDD side to the limits. What is the bottleneck in this computation?

(4) the output format is slightly different (compare ',' vs ':' and vector
vs list). I wonder why

The findings are a bit counter intuitive. Could you comment on those?

Cheers,
Arturas



References:

[1] Create cluster

F:\solr_server\solr-7.2.1>bin\solr.cmd start -e cloud

Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on
your local workstation.
To begin, how many Solr nodes would you like to run in your local
cluster? (specify 1-4 nodes) [2]:
4
Ok, let's start up 4 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
9999
Please enter the port for node2 [7574]:
9998
Please enter the port for node3 [8984]:
9997
Please enter the port for node4 [7575]:
9996
Creating Solr home directory F:\solr_server\solr-7.2.1\example\cloud\node1\solr
Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into
   F:\solr_server\solr-7.2.1\example\cloud\node2
Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into
   F:\solr_server\solr-7.2.1\example\cloud\node3
Cloning F:\solr_server\solr-7.2.1\example\cloud\node1 into
   F:\solr_server\solr-7.2.1\example\cloud\node4

Starting up Solr on port 9999 using command:
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9999 -s
"F:\solr_server\solr-7.2.1\example\cloud\node1\solr"

Waiting up to 30 to see Solr running on port 9999
Started Solr server on port 9999. Happy searching!

Starting up Solr on port 9998 using command:
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9998 -s
"F:\solr_server\solr-7.2.1\example\cloud\node2\solr" -z
localhost:10999

Waiting up to 30 to see Solr running on port 9998

Starting up Solr on port 9997 using command:
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9997 -s
"F:\solr_server\solr-7.2.1\example\cloud\node3\solr" -z
localhost:10999

Started Solr server on port 9998. Happy searching!
Waiting up to 30 to see Solr running on port 9997

Starting up Solr on port 9996 using command:
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -cloud -p 9996 -s
"F:\solr_server\solr-7.2.1\example\cloud\node4\solr" -z
localhost:10999

Started Solr server on port 9997. Happy searching!
Waiting up to 30 to see Solr running on port 9996
Started Solr server on port 9996. Happy searching!
INFO  - 2018-06-21 15:38:16.239;
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider;
Cluster at localhost:10999 ready

Now let's create a new collection for indexing documents in your 4-node cluster.
Please provide a name for your new collection: [gettingstarted]
de_wiki_all
How many shards would you like to split de_wiki_all into? [2]
4
How many replicas per shard would you like to create? [2]
2
Please choose a configuration for the de_wiki_all collection,
available options are:
_default or sample_techproducts_configs [_default]
sample_techproducts_configs
Created collection 'de_wiki_all' with 4 shard(s), 2 replica(s) with
config-set 'de_wiki_all'

Enabling auto soft-commits with maxTime 3 secs using the Config API

POSTing request to Config API: http://localhost:9999/solr/de_wiki_all/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000


SolrCloud example running, please visit: http://localhost:9999/solr


F:\solr_server\solr-7.2.1>





[2] Restart with 2GB

"F:\solr_server\solr-7.2.1\bin\solr.cmd" stop -all

"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9999 -s
"F:\solr_server\solr-7.2.1\example\cloud\node1\solr"
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9998 -s
"F:\solr_server\solr-7.2.1\example\cloud\node2\solr" -z
localhost:10999
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9997 -s
"F:\solr_server\solr-7.2.1\example\cloud\node3\solr" -z
localhost:10999
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 2g -cloud -p 9996 -s
"F:\solr_server\solr-7.2.1\example\cloud\node4\solr" -z
localhost:10999




[3] Insert wikipedia files

java  -jar -Durl=http://localhost:9999/solr/de_wiki_all/update  -Dauto
-Drecursive example\exampledocs\post.jar f:\wiki\de\articles\*

2681612 files indexed.
COMMITting Solr index changes to
http://localhost:9999/solr/de_wiki_all/update...
Time spent: 12:32:53.843

Reply via email to