Re: SOLR cloud sharding

2016-06-03 Thread Susheel Kumar
Also not sure about your domain but you may want to double check if you really need 350 fields for searching & storing. Many times when you challenge this against the higher cost of hardware, you may be able to reduce # of searchable / stored fields. Thanks, Susheel On Thu, Jun 2, 2016 at 9:21 AM

Re: SOLR cloud sharding

2016-06-02 Thread Shawn Heisey
On 6/2/2016 1:28 AM, Selvam wrote: > We need to run a heavy SOLR with 300 million documents, with each > document having around 350 fields. The average length of the fields > will be around 100 characters, it may have date and integers fields as > well. Now we are not sure whether to have single se

Re: SOLR cloud sharding

2016-06-02 Thread Selvam
Hi, On a note, we also need all 350 fields to be stored and indexed. On Thu, Jun 2, 2016 at 12:58 PM, Selvam wrote: > Hello all, > > We need to run a heavy SOLR with 300 million documents, with each > document having around 350 fields. The average length of the fields will be > around 100 cha

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
What do you mean "the rest of the cluster"? The routing is based on the key provided. All of the "enu" prefixes will go to one of your shards. All the "deu" docs will appear on one shard. All the "esp" will be on one shard. All the "chs" docs will be on one shard. Which shard will each go to? Good

Re: Solr Cloud sharding strategy

2016-03-07 Thread shamik
Thanks Eric and Walter, this is extremely insightful. One last followup question on composite routing. I'm trying to have a better understanding of index distribution. If I use language as a prefix, SolrCloud guarantees that same language content will be routed to the same shard. What I'm curious t

Re: Solr Cloud sharding strategy

2016-03-07 Thread Walter Underwood
Excellent advice, and I’d like to reinforce a few things. * Solr indexing is CPU intensive and generates lots of disk IO. Faster CPUs and faster disks matter a lot. * Realistic user query logs are super important. We measure 95th percentile latency and that is dominated by rare and malformed que

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
Still, 50M is not excessive for a single shard although it's getting into the range that I'd like proof that my hardware etc. is adequate before committing to it. I've seen up to 300M docs on a single machine, admittedly they were tweets. YMMV based on hardware and index complexity of course. Here'

Re: Solr Cloud sharding strategy

2016-03-07 Thread shamik
Thanks a lot, Erick. You are right, it's a tad small with around 20 million documents, but the growth projection around 50 million in next 6-8 months. It'll continue to grow, but maybe not at the same rate. From the index size point of view, the size can grow up to half a TB from its current state.

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
20M docs is actually a very small collection by the "usual" Solr standards unless they're _really_ large documents, i.e. large books. Actually, I wouldn't even shard to begin with, it's unlikely that it's necessary and it adds inevitable overhead. If you _must_ shard, just go with <1>, but again I