Hi, I'm building out a multi shard solr collection as the index size is likely to grow fast. I was testing out the setup with 2 shards on 2 nodes with test data. Indexed few documents with "id" as the unique key. collection create command - /solr/admin/collections?action=CREATE&name=multishard&numShards=2
used this command to upload - curl http://server/solr/multishard/update/json?commitWithin=2000 --data-binary @data.json -H 'Content-type:application/json' data.json - [ { "id": "10000000000161200" } { "id": "10000000000161384" } ] when I query on one of the node with with an id constraint, I see the query executed on both shards which looks inefficient - Qtime increased to double digits. I guess solr would know based on id which shard data went to. I have a few questions around this as I could not find pertinent information on user lists or documentation. - query is hitting all shards and replicas - if I have 3 shards and 5 replicas , how would the performance be impacted since for the very simple case it increased to double digits? - Could id lookup queries just go to one shard automatically? /solr/multishard/select?q=id%3A10000000000161200&wt=json&indent=true&debugQuery=true "QTime":13, "debug":{ "track":{ "rid":"-multishard_shard1_replica1-1411605234897-171", "EXECUTE_QUERY":[ "http://server1/solr/multishard_shard1_replica1/",[ "QTime","1", "ElapsedTime","4", "RequestPurpose","GET_TOP_IDS", "NumFound","1", "Response","some resp"], "http://server2/solr/multishard_shard2_replica1/",[ "QTime","1", "ElapsedTime","6", "RequestPurpose","GET_TOP_IDS", "NumFound","0", "Response","some"]], "GET_FIELDS":[ "http://server1/solr/multishard_shard1_replica1/",[ "QTime","0", "ElapsedTime","4", "RequestPurpose","GET_FIELDS,GET_DEBUG", "NumFound","1", Thanks, Susmit