Re: General questions about distributed solr shards

Shawn Heisey Thu, 12 Aug 2010 13:03:23 -0700

 On 8/11/2010 3:27 PM, JohnRodey wrote:

1) Is there any information on preferred maximum sizes for a single solr
index.  I've read some people say 10 million, some say 80 million, etc...
Is there any official recommendation or has anyone experimented with large
datasets into the tens of billions?


2) Is there any down side to running multiple solr shard instances on a
single machine rather than one shard instance with a larger index per
machine?  I would think that having 5 instances with 1/5 the index would
return results approx 5 times faster.

3) Say you have a solr configuration with multiple shards.  If you attempt
to query while one of the shards is down you will receive a HTTP 500 on the
client due to a connection refused on the server.  Is there a way to tell
the server to ignore this and return as many results as possible?  In other
words if you have 100 shards, it is possible that occasionally a process may
die, but I would still like to return results from the active shards.

1) It highly depends on what's in your index. I'll let someone morequalified address this question in more detail.

2) Distributed search adds overhead. It has to query the individualshards, send additional requests to gather the matching records, andthen assemble the results. If you create enough shards that you can fitall (or most) of each index in whatever RAM is left for the OS diskcache, you'll see a VERY significant boost in search speed by usingshards. If

3) There are a couple of patches that address this, but in the end,you'll be better served by setting up a replicated pair and using a loadbalancer. I've got a distributed index with two machines per shard, themaster and the slave. The load balancer checks the ping status URLevery 5 seconds to see whether each machine is up. If one goes down, itis removed from the load balancer and everything keeps working.

Each of my shards is about 12.5GB in size and the VMs that access thedata have 9GB total RAM. I wish I had more memory!

Re: General questions about distributed solr shards

Reply via email to