Greg...
Thanks. That's very helpful, and is inline with what I've been seeing.
So, to be clear, you're saying that the size of all collections on a
server should be less than the available RAM. It looks like we've got
about 13GB of documents in all (and growing), so, if we're restricted to
16GB on each VM I'm thinking that it probably makes sense to split the
collections over multiple VMs rather than having them all on one.
Perhaps instead of all indexes replicated on 3 VMs, we should split
things up over 4 VMs and go down to just 2 replicas. We can add 2 more
VMs to go up to 3 replicas if that seems necessary at some point.
Thanks,
...scott
On 3/13/18 6:15 PM, Greg Roodt wrote:
A single shard is much simpler conceptually and also cheaper to query. I
would say that even your 1.2M collection can be a single shard. I'm running
a single shard setup 4X that size. You can still have replicas of this
shard for redundancy / availability purposes.
I'm not an expert, but I think one of the deciding factors is if your index
can fit into RAM (not JVM Heap, but OS cache). What are the sizes of your
indexes?
On 14 March 2018 at 11:01, Scott Prentice <s...@leximation.com> wrote:
We're in the process of moving from 12 single-core collections (non-cloud
Solr) on 3 VMs to a SolrCloud setup. Our collections aren't huge, ranging
in size from 50K to 150K documents with one at 1.2M docs. Our max query
frequency is rather low .. probably no more than 10-20/min. We do update
frequently, maybe 10-100 documents every 10 mins.
Our prototype setup is using 3 VMs (4 core, 16GB RAM each), and we've got
each collection split into 2 shards with 3 replicas (one per VM). Also,
Zookeeper is running on each VM. I understand that it's best to have each
ZK server on a separate machine, but hoping this will work for now.
This all seemed like a good place to start, but after reading lots of
articles and posts, I'm thinking that maybe our smaller collections (under
100K docs) should just be one shard each, and maybe the 1.2M collection
should be more like 6 shards. How do you decide how many shards is right?
Also, our current live system is separated into dev/stage/prod tiers, not,
all of these tiers are together on each of the cloud VMs. This bothers some
people, thinking that it may make our production environment less stable. I
know that in an ideal world, we'd have them all on separate systems, but
with the replication, it seems like we're going to make the overall system
more stable. Is this a correct understanding?
I'm just wondering anyone has opinions on whether we're going in a
reasonable direction or not. Are there any articles that discuss these
initial sizing/scoping issues?
Thanks!
...scott