Re: Slow qTime for distributed search

Shawn Heisey Mon, 08 Apr 2013 12:19:35 -0700

On 4/8/2013 12:19 PM, Manuel Le Normand wrote:

It seems that sharding my collection to many shards slowed down
unreasonably, and I'm trying to investigate why.


First, I created "collection1" - 4 shards*replicationFactor=1 collection on
2 servers. Second I created "collection2" - 48 shards*replicationFactor=2
collection on 24 servers, keeping same config and same num of documents per
shard.

The primary reason to use shards is for index size, when your index isso big that a single index cannot give you reasonable performance.There are also sometimes performance gains when you break a smallerindex into shards, but there is a limit.

Going from 2 shards to 3 shards will have more of an impact that goingfrom 8 shards to 9 shards. At some point, adding shards makes thingsslower, not faster, because of the extra work required for combiningmultiple queries into one result response. There is no reasonable wayto predict when that will happen.

Observations showed the following:

    1. Total qTime for the same query set is 5 time higher in collection2
    (150ms->700 ms)
    2. Adding to colleciton2 the *shard.info=true* param in the query shows
    that each shard is much slower than each shard was in collection1 (about 4
    times slower)
    3.  Querying only specific shards on collection2 (by adding the
    shards=shard1,shard2...shard12 param) gave me much better qTime per shard
    (only 2 times higher than in collection1)
    4. I have a low qps rate, thus i don't suspect the replication factor
    for being the major cause of this.
    5. The avg. cpu load on servers during querying was much higher in
    collection1 than in collection2 and i didn't catch any other bottlekneck.

A distributed query actually consists of up to two queries per shard.The first query just requests the uniqueKey field, not the entiredocument. If you are sorting the results, then the sort field(s) arealso requested, otherwise the only additional information requested isthe relevance score. The results are compiled into a set of uniquekeys, then a second query is sent to the proper shards requestingspecific documents.

Q:
1. Why does the amount of shards affect the qTime of each shard?
2. How can I overcome to reduce back the qTime of each shard?

With more shards, it takes longer for the first phase to compile theresults, so the second phase (document retrieval) gets delayed, and theQTime goes up.


One way to reduce the total time is to reduce the number of shards.

You haven't said anything about how complex your queries are, your indexsize(s), or how much RAM you have on each server and how it isallocated. Can you provide this information?

Getting good performance out of Solr requires plenty of RAM in your OSdisk cache. Query times of 150 to 700 milliseconds seem very high,which could be due to query complexity or a lack of server resources(especially RAM), or possibly both.


Thanks,
Shawn

Re: Slow qTime for distributed search

Reply via email to