Also not sure about your domain but you may want to double check if you
really need 350 fields for searching & storing. Many times when you
challenge this against the higher cost of hardware, you may be able to
reduce # of searchable / stored fields.
Thanks,
Susheel
On Thu, Jun 2, 2016 at 9:21 AM
On 6/2/2016 1:28 AM, Selvam wrote:
> We need to run a heavy SOLR with 300 million documents, with each
> document having around 350 fields. The average length of the fields
> will be around 100 characters, it may have date and integers fields as
> well. Now we are not sure whether to have single se
Hi,
On a note, we also need all 350 fields to be stored and indexed.
On Thu, Jun 2, 2016 at 12:58 PM, Selvam wrote:
> Hello all,
>
> We need to run a heavy SOLR with 300 million documents, with each
> document having around 350 fields. The average length of the fields will be
> around 100 cha
What do you mean "the rest of the cluster"? The routing is based on
the key provided. All of the "enu" prefixes will go to one of your
shards. All the "deu" docs will appear on one shard. All the "esp"
will be on one shard. All the "chs" docs will be on one shard.
Which shard will each go to? Good
Thanks Eric and Walter, this is extremely insightful. One last followup
question on composite routing. I'm trying to have a better understanding of
index distribution. If I use language as a prefix, SolrCloud guarantees that
same language content will be routed to the same shard. What I'm curious t
Excellent advice, and I’d like to reinforce a few things.
* Solr indexing is CPU intensive and generates lots of disk IO. Faster CPUs and
faster disks matter a lot.
* Realistic user query logs are super important. We measure 95th percentile
latency and that is dominated by rare and malformed que
Still, 50M is not excessive for a single shard although it's getting
into the range that I'd like proof that my hardware etc. is adequate
before committing to it. I've seen up to 300M docs on a single
machine, admittedly they were tweets. YMMV based on hardware and index
complexity of course. Here'
Thanks a lot, Erick. You are right, it's a tad small with around 20 million
documents, but the growth projection around 50 million in next 6-8 months.
It'll continue to grow, but maybe not at the same rate. From the index size
point of view, the size can grow up to half a TB from its current state.
20M docs is actually a very small collection by the "usual" Solr
standards unless they're _really_ large documents, i.e.
large books.
Actually, I wouldn't even shard to begin with, it's unlikely that it's
necessary and it adds inevitable overhead. If you _must_ shard,
just go with <1>, but again I