Re: Solr Cloud Default Document Routing

Erick Erickson Thu, 25 Sep 2014 07:29:05 -0700

Well, you've picked the absolute worst case for comparison. The
"increase to double digits" is a constant overhead. IOW, let's
say your query went from 5ms to 20 ms. That 15 ms is pretty much
the additional overhead no matter what the query. This particular
query just happens to be very fast in the first place.


As far as queries going out to all the shards.. Well, they have to.
The query processing cannot know ahead of time (except in this
_very_ special case) what shards will generate hits. So the request
is sent out to one replica in each shard, which responds with its
top N. The originating node then combines the sub-queries to get
the IDs of the final top N, then sends a request out to each shard
hosting one of those top N for the data associated with the
document.

If you really need super-efficiency here, you could probably
look at SolrCloudServer to get an idea of how to translate from
ID to shard and just do direct requests with distrib=false.

Best,
Erick


On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla <shukla.sus...@gmail.com>
wrote:

> Hi,
>
> I'm building out a multi shard solr collection as the index size is likely
> to grow fast.
> I was testing out the setup with 2 shards on 2 nodes with test data.
> Indexed few documents with "id" as the unique key.
> collection create command -
> /solr/admin/collections?action=CREATE&name=multishard&numShards=2
>
> used this command to upload - curl
> http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
> @data.json -H 'Content-type:application/json'
>
> data.json -
> [
>   {
>         "id": "10000000000161200"
>   }
>       {
>         "id": "10000000000161384"
>   }
> ]
>
> when I query on one of the node with with an id constraint, I see the query
> executed on both shards which looks inefficient - Qtime increased to double
> digits. I guess solr would know based on id which shard data went to.
>
> I have a few questions around this as I could not find pertinent
> information on user lists or documentation.
> - query is hitting all shards and replicas - if I have 3 shards and 5
> replicas , how would the performance be impacted since for the very simple
> case it increased to double digits?
> - Could id lookup queries just go to one shard automatically?
>
>
> /solr/multishard/select?q=id%3A10000000000161200&wt=json&indent=true&debugQuery=true
>
> "QTime":13,
>
>   "debug":{
>     "track":{
>       "rid":"-multishard_shard1_replica1-1411605234897-171",
>       "EXECUTE_QUERY":[
>         "http://server1/solr/multishard_shard1_replica1/";,[
>           "QTime","1",
>           "ElapsedTime","4",
>           "RequestPurpose","GET_TOP_IDS",
>           "NumFound","1",
>           "Response","some resp"],
>         "http://server2/solr/multishard_shard2_replica1/";,[
>           "QTime","1",
>           "ElapsedTime","6",
>           "RequestPurpose","GET_TOP_IDS",
>           "NumFound","0",
>           "Response","some"]],
>       "GET_FIELDS":[
>         "http://server1/solr/multishard_shard1_replica1/";,[
>           "QTime","0",
>           "ElapsedTime","4",
>           "RequestPurpose","GET_FIELDS,GET_DEBUG",
>           "NumFound","1",
>
>
> Thanks,
> Susmit
>

Re: Solr Cloud Default Document Routing

Reply via email to