Well, you've picked the absolute worst case for comparison. The "increase to double digits" is a constant overhead. IOW, let's say your query went from 5ms to 20 ms. That 15 ms is pretty much the additional overhead no matter what the query. This particular query just happens to be very fast in the first place.
As far as queries going out to all the shards.. Well, they have to. The query processing cannot know ahead of time (except in this _very_ special case) what shards will generate hits. So the request is sent out to one replica in each shard, which responds with its top N. The originating node then combines the sub-queries to get the IDs of the final top N, then sends a request out to each shard hosting one of those top N for the data associated with the document. If you really need super-efficiency here, you could probably look at SolrCloudServer to get an idea of how to translate from ID to shard and just do direct requests with distrib=false. Best, Erick On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: > Hi, > > I'm building out a multi shard solr collection as the index size is likely > to grow fast. > I was testing out the setup with 2 shards on 2 nodes with test data. > Indexed few documents with "id" as the unique key. > collection create command - > /solr/admin/collections?action=CREATE&name=multishard&numShards=2 > > used this command to upload - curl > http://server/solr/multishard/update/json?commitWithin=2000 --data-binary > @data.json -H 'Content-type:application/json' > > data.json - > [ > { > "id": "10000000000161200" > } > { > "id": "10000000000161384" > } > ] > > when I query on one of the node with with an id constraint, I see the query > executed on both shards which looks inefficient - Qtime increased to double > digits. I guess solr would know based on id which shard data went to. > > I have a few questions around this as I could not find pertinent > information on user lists or documentation. > - query is hitting all shards and replicas - if I have 3 shards and 5 > replicas , how would the performance be impacted since for the very simple > case it increased to double digits? > - Could id lookup queries just go to one shard automatically? > > > /solr/multishard/select?q=id%3A10000000000161200&wt=json&indent=true&debugQuery=true > > "QTime":13, > > "debug":{ > "track":{ > "rid":"-multishard_shard1_replica1-1411605234897-171", > "EXECUTE_QUERY":[ > "http://server1/solr/multishard_shard1_replica1/",[ > "QTime","1", > "ElapsedTime","4", > "RequestPurpose","GET_TOP_IDS", > "NumFound","1", > "Response","some resp"], > "http://server2/solr/multishard_shard2_replica1/",[ > "QTime","1", > "ElapsedTime","6", > "RequestPurpose","GET_TOP_IDS", > "NumFound","0", > "Response","some"]], > "GET_FIELDS":[ > "http://server1/solr/multishard_shard1_replica1/",[ > "QTime","0", > "ElapsedTime","4", > "RequestPurpose","GET_FIELDS,GET_DEBUG", > "NumFound","1", > > > Thanks, > Susmit >