On 8/12/2019 1:42 PM, Erie Data Systems wrote:
I am starting the planning stages of moving from a single instance of solr
8 to a solrcloud implementation.

Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
size is greater than the physical memory, which to my understanding is not
a good thing.

An *IDEAL* setup would have enough memory available (not assigned to programs) to be able to fit the entire index in the disk cache.

Lots of people run systems that aren't ideal and have perfectly acceptable performance. I did that for several years. I would have loved to have more memory, but the budget wasn't there, and the machines I was using were already maxed out at 64GB.

If performance is acceptable already, I think that not being able to fit the entire index into available memory is not enough of a reason to make significant changes that might require significant development time for your systems that keep Solr operational. Switching to SolrCloud could require changes to your other software.

My issue is that im not sure where to go to learn how to set this up, how
many shards, how many replicas, etc and would rather hire somebody or
something (detailed video or document)  to guide me through the process,
and make decisions along the way...For example I think a shard is a piece
of the index... but I dont even know how to decide how many replicas or
what they are .....

There are no standardized rules for making these decisions. Typically you have to make an educated guess and try it to see whether it works.

https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If it's done in the typical way, telling a SolrCloud setup to create a collection with 3 shards and 2 replicas will create six individual indexes that make up the whole collection. The index will be split into three pieces (shards), and each of those pieces will have two copies (replicas). For each shard, an election will be done that will elect one of the replicas as leader.

Sharding adds overhead. In some cases with extremely large indexes, the overhead is less than the performance gained by splitting the index onto separate machines and letting those machines work in parallel. In other cases, the overhead may result in things actually getting slower.

Thanks,
Shawn

Reply via email to