On 9/24/2013 2:46 PM, Shamik Bandopadhyay wrote:
Now, I'm using SolrJ client (CloudSolrServer) to send documents for indexing. Based on SolrCloud fundamentals, I can send the document to any of the four servers or to a specific shard id. Is it advisable to use the server information directly into the client ? In case the specific node goes down, then indexing will fail. Is it recommended to have a load balancer (Haproxy , ELB in Amazon) for the indexing purpose ?
CloudSolrServer contains a zookeeper client. When you create an instance, you don't give it the URL for Solr, you tell it about your zookeeper ensemble, using the same zkHost info you give to Solr itself. It is always aware of the clusterstate and uses that information to decide where the actual Solr requests go.
When SolrJ 4.5 comes out (which is going to be very soon), it will know how to route updates to the correct shard leader, so indexing will be even more efficient.
You will only need a load balancer if you use Solr URLs directly or use a programming API that is unaware of zookeeper.
Same applies during query time. I know we can add a query parameter and include all four server information. But then any change in the server configuration will have an impact. Any help will be appreciated.
What I said above for indexing applies equally to queries. CloudSolrServer will load balance queries across all operational servers automatically.
Thanks, Shawn