SolrCloud AutoSharding? In enterprise environment?
Hi All, I have looked at the below post URL and it really helps me a lot. http://lucene.472066.n3.nabble.com/SolrCloud-AutoSharding-td4011834.html But I have few questions. We are exploring SolrCloud to index millions of product details. We are planning to use a complete pool for SollCloud with a set of 15 physical machines. The clients will be hitting the VIP URL of SolrCloud pool instead of individual machine names or IP addresses. So my questions are, 1. While indexing the product details, do I need to take care of custom sharding strategy or I just need to specify the number of Shards as 15(total number of boxes in pool) and the SolrCloud takes care of sharding internally? 2. If SolrCloud takes care of sharding since this shards are in 15 different boxes, does SolrCloud internally do full scan to all these boxes if client is querying by specifying the VIP url (Not the individual machine names)? Or does SolrCloud do some extra intelligent logic to avoid full scan? 3. I couldn't find proper documentation about what SolrCloud does internally in terms of Sharding and what developer has to do to optimize queries? Any response on this is greatly appreciated. Thanks Joseph -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-AutoSharding-In-enterprise-environment-tp4017036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud AutoSharding? In enterprise environment?
Thanks Otis for the response. 1. Is there any performance impact if the client is invoking the solr index using the VIP url instead of individual shard URLs? If the default sharding of SOLR is based on uniqueId.hashcode % numServers, how does the SOLR identify which Shard to get the data if client is querying by any name/value of a document(Unique Id is not passed in the URL). Is ZooKeeper doing this logic of finding out which shard to go and get the data? .Sorry to go ask more into details but would like to know. 2. I have followed the steps of http://wiki.apache.org/solr/SolrCloud#Getting_Started and set up multiple shards in my local box and did index some documents. But I am still not clear on the index and document file system structure, I mean how would I verify if the data is really distributed. Can you please point me to some good documentation of the folder structure of where the index files will be created in each shard. When I indexed few documents by pointing to one shard, I saw few files getting created under apache-solr-4.0.0\example\solr\mycollection\data\index. Is this the complete index files location ? Thanks Jaino -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-AutoSharding-In-enterprise-environment-tp4017036p4017201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud AutoSharding? In enterprise environment?
Thanks Eric for the explanation. It helps me a lot :). -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-AutoSharding-In-enterprise-environment-tp4017036p4018194.html Sent from the Solr - User mailing list archive at Nabble.com.