Hi, Is there a recommendation on the size of index that one should host per core? Idea is to come up with an *initial* shard/replica setting for a load test. And then arrive at a good cluster size based on that testing.
*Example: * Num documents: 100 million Average document size: 1kb So total space required: 100 gb Indexable fields per document: 5 strings, average field-size: 100 chars So total index space required for all docs: 50gb (assuming all unique words) *Rough estimates for an initial size:* 50gb index is best served if all of it is in memory. And JVMs perform the best if their max-heap is between 15-20gb So a starting point for num-shards: 50gb/20gb ~ 3 Now if all index is in memory per core, then replicas can serve queries with a much higher throughput. So we can begin with 2 replicas per shard. *Questions:* Are there any other factors that we can consider *initially* to make our calculations more precise. Note that the goal of the exercise is not to get rid of load-testing, only to start with a close-enough cluster setting so that load testing can finish faster. Thanks SG