Re: querying on shards

Shawn Heisey Tue, 20 Mar 2012 07:37:41 -0700

On 3/19/2012 11:55 PM, Ankita Patil wrote:

Hi,


I wanted to know whether it is feasible to query on all the shards even if
the query yields data only from a few shards n not all. Or is it better to
mention those shards explicitly from which we get the data and only query
on them.

for example :
I have 4 shards. Now I have a query which yields data only from 2 shards.
So shoud I select those 2 shards only and query on them or it is ok to
query on all the shards? Will that affect the performance in any way?

I use a sharded index, but I am not a seasoned Java/Solr/Lucenedeveloper. My clients do not use the shards parameter themselves - theytalk to a a load balancer, which in turn talks to a special core thathas the shards in its request handler config and has no index of itsown. I call it a broker, because that is what our previous searchproduct (EasyAsk) called it.

As I understand things, the performance of your slowest shard, whetherthat is because of index size on that shard or the underlying hardware,will be a large factor in the performance of the entire index. Adistributed query sends an identical query to all the shards it isconfigured for. It gathers all those results in parallel and builds afinal result to send to the client.

You MIGHT get better performance by not including the other shards. Ifthe "no results" shard query returns super-fast, it probably won'treally make any difference. If it takes a long time to get the answerthat there are no results, then removing them would make things gofaster. That requires intelligence on the client to know where the datais. If the client does not know where the data is, it is safer tosimply include all the shards.


Thanks,
Shawn

Re: querying on shards

Reply via email to