Hi All - we've been experimenting with Solr Cloud 5.5.0 with a 27 shard
(no replication - each shard runs on a physical host) cluster on top of
HDFS. It currently just crossed 3 billion documents indexed with an
index size of 16.1TBytes. In HDFS with 3x replication this takes up
48.2TBytes.
Each shard is then hosting about 610GBytes of index. The HDFS cache
size is very low at about 8GBytes. Suffice it to say, performance isn't
very good, but again, this is for experimentation.
If we were to redo this, would it be better to create many shards -
maybe 200 with 3 replicas each (600 in all) with the goal being to
withstand a server going out, and future expansion as more hardware is
added? I know this is very general question. Thanks very much in advance!
-Joe
- Large index recommendation Joe Obernberger
-