Hi,

I'm building out a multi shard solr collection as the index size is likely
to grow fast.
I was testing out the setup with 2 shards on 2 nodes with test data.
Indexed few documents with "id" as the unique key.
collection create command -
/solr/admin/collections?action=CREATE&name=multishard&numShards=2

used this command to upload - curl
http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
@data.json -H 'Content-type:application/json'

data.json -
[
  {
        "id": "10000000000161200"
  }
      {
        "id": "10000000000161384"
  }
]

when I query on one of the node with with an id constraint, I see the query
executed on both shards which looks inefficient - Qtime increased to double
digits. I guess solr would know based on id which shard data went to.

I have a few questions around this as I could not find pertinent
information on user lists or documentation.
- query is hitting all shards and replicas - if I have 3 shards and 5
replicas , how would the performance be impacted since for the very simple
case it increased to double digits?
- Could id lookup queries just go to one shard automatically?

/solr/multishard/select?q=id%3A10000000000161200&wt=json&indent=true&debugQuery=true

"QTime":13,

  "debug":{
    "track":{
      "rid":"-multishard_shard1_replica1-1411605234897-171",
      "EXECUTE_QUERY":[
        "http://server1/solr/multishard_shard1_replica1/";,[
          "QTime","1",
          "ElapsedTime","4",
          "RequestPurpose","GET_TOP_IDS",
          "NumFound","1",
          "Response","some resp"],
        "http://server2/solr/multishard_shard2_replica1/";,[
          "QTime","1",
          "ElapsedTime","6",
          "RequestPurpose","GET_TOP_IDS",
          "NumFound","0",
          "Response","some"]],
      "GET_FIELDS":[
        "http://server1/solr/multishard_shard1_replica1/";,[
          "QTime","0",
          "ElapsedTime","4",
          "RequestPurpose","GET_FIELDS,GET_DEBUG",
          "NumFound","1",


Thanks,
Susmit

Reply via email to