Re: Scoping SolrCloud setup

2018-03-14 Thread Scott Prentice
Walter... Thanks for the additional data points. Clearly we're a long way from needing anything too complex. Cheers! ...scott On 3/14/18 1:12 PM, Walter Underwood wrote: That would be my recommendation for a first setup. One Solr instance per host, one shard per collection. We run 5 millio

Re: Scoping SolrCloud setup

2018-03-14 Thread Walter Underwood
That would be my recommendation for a first setup. One Solr instance per host, one shard per collection. We run 5 million document cores with 8 GB of heap for the JVM. We size the RAM so that all the indexes fit in OS filesystem buffers. Our big cluster is 32 hosts, 21 million documents in four

Re: Scoping SolrCloud setup

2018-03-14 Thread Scott Prentice
Erick... Thanks. Yes. I think we were just going shard-happy without really understanding the purpose. I think we'll start by keeping things simple .. no shards, fewer replicas, maybe a bit more RAM. Then we can assess the performance and make adjustments as needed. Yes, that's the main reas

Re: Scoping SolrCloud setup

2018-03-14 Thread Erick Erickson
Scott: Eventually you'll hit the limit of your hardware, regardless of VMs. I've seen multiple VMs help a lot when you have really beefy hardware, as in 32 cores, 128G memory and the like. Otherwise it's iffier. re: sharding or not. As others wrote, sharding is only useful when a single collectio

Re: Scoping SolrCloud setup

2018-03-14 Thread Scott Prentice
Emir... Thanks for the input. Our larger collections are localized content, so it may make sense to shard those so we can target the specific index. I'll need to confirm how it's being used, if queries are always within a language or if they are cross-language. Thanks also for the link .. ve

Re: Scoping SolrCloud setup

2018-03-14 Thread Scott Prentice
Greg... Thanks. That's very helpful, and is inline with what I've been seeing. So, to be clear, you're saying that the size of all collections on a server should be less than the available RAM. It looks like we've got about 13GB of documents in all (and growing), so, if we're restricted to 16

Re: Scoping SolrCloud setup

2018-03-14 Thread Emir Arnautović
Hi Scott, There is no definite answer - it depends on your documents and query patterns. Sharding does come with an overhead but also allows Solr to parallelise search. Query latency is usually something that tells you if you need to split collection to multiple shards or not. In caseIf you are

Re: Scoping SolrCloud setup

2018-03-13 Thread Greg Roodt
A single shard is much simpler conceptually and also cheaper to query. I would say that even your 1.2M collection can be a single shard. I'm running a single shard setup 4X that size. You can still have replicas of this shard for redundancy / availability purposes. I'm not an expert, but I think o