Hi,

Is there a recommendation on the size of index that one should host per
core?
Idea is to come up with an *initial* shard/replica setting for a load test.
And then arrive at a good cluster size based on that testing.


*Example: *

Num documents: 100 million
Average document size: 1kb
So total space required:  100 gb

Indexable fields per document: 5 strings, average field-size: 100 chars
So total index space required for all docs: 50gb (assuming all unique words)


*Rough estimates for an initial size:*

50gb index is best served if all of it is in memory.
And JVMs perform the best if their max-heap is between 15-20gb
So a starting point for num-shards: 50gb/20gb ~ 3

Now if all index is in memory per core, then replicas can serve queries
with a much higher throughput.
So we can begin with 2 replicas per shard.

*Questions:*

Are there any other factors that we can consider *initially* to make our
calculations more precise.
Note that the goal of the exercise is not to get rid of load-testing, only
to start with a close-enough cluster setting so that load testing can
finish faster.

Thanks
SG

Reply via email to