Re: Best practice to index and query SolrCloud

Shawn Heisey Tue, 24 Sep 2013 14:38:15 -0700

On 9/24/2013 2:46 PM, Shamik Bandopadhyay wrote:

Now, I'm using SolrJ client (CloudSolrServer) to send documents for
indexing. Based on SolrCloud fundamentals, I can send the document to any
of the four servers or to a specific shard id. Is it advisable to use the
server information directly into the client ? In case the specific node
goes down, then indexing will fail. Is it recommended to have a load
balancer (Haproxy , ELB in Amazon) for the indexing purpose ?

CloudSolrServer contains a zookeeper client. When you create aninstance, you don't give it the URL for Solr, you tell it about yourzookeeper ensemble, using the same zkHost info you give to Solr itself.It is always aware of the clusterstate and uses that information todecide where the actual Solr requests go.

When SolrJ 4.5 comes out (which is going to be very soon), it will knowhow to route updates to the correct shard leader, so indexing will beeven more efficient.

You will only need a load balancer if you use Solr URLs directly or usea programming API that is unaware of zookeeper.

Same applies during query time. I know we can add a query parameter and
include all four server information. But then any change in the server
configuration will have an impact. Any help will be appreciated.

What I said above for indexing applies equally to queries.CloudSolrServer will load balance queries across all operational serversautomatically.


Thanks,
Shawn

Re: Best practice to index and query SolrCloud

Reply via email to