In my world your index size is common. Optimal Index size: Depends on what you are optimizing for. Query Speed? Hardware utilization? Optimizing the index is something I never do. We live with about 28% deletes. You should check your configuration for your merge policy. I run 120 shards, and I am currently redesigning for 256 shards. Increased sharding has helped reduce query response time, but surely there is a point where the colation of results starts to be the bottleneck. I run the 120 shards on 90 r4.4xlarge instances with a replication factor of 3.
The things missing are: What does your schema look like? I index around 120 fields per document. What does your queries look like? Mine are so varied that caching never helps, the same query rarely comes through. My system takes continuous updates, yours does not. It is really up to you to experiment. If you follow the development pattern of Design By Use (DBU) the first thing you do for solr and even for SQL is to come up with your queries first. Then design the schema. Then figure out how to distribute it for performance. Oh, another thing, are you concerned about availability? Do you have a replication factor > 1? Do you run those replicas in a different region for safety? How many zookeepers are you running and where are they? Lots of questions. Regards > On May 20, 2020, at 11:43 AM, Modassar Ather <modather1...@gmail.com> wrote: > > Hi, > > Currently we have index of size 3.5 TB. These index are distributed across > 12 shards under two cores. The size of index on each shards are almost > equal. > We do a delta indexing every week and optimise the index. > > The server configuration is as follows. > > - Solr Version : 6.5.1 > - AWS instance type : r5a.16xlarge > - CPU(s) : 64 > - RAM : 512GB > - EBS size : 7 TB (For indexing as well as index optimisation.) > - IOPs : 30000 (For faster index optimisation) > > > Can you please help me with following few questions? > > - What is the ideal index size per shard? > - The optimisation takes lot of time and IOPs to complete. Will > increasing the number of shards help in reducing the optimisation time and > IOPs? > - We are planning to reduce each shard index size to 30GB and the entire > 3.5 TB index will be distributed across more shards. In this case to almost > 70+ shards. Will this help? > - Will adding so many new shards increase the search response time and > possibly how much? > - If we have to increase the shards should we do it on a single larger > server or should do it on multiple small servers? > > > Kindly share your thoughts on how best we can use Solr with such a large > index size. > > Best, > Modassar