Hi All - we've been experimenting with Solr Cloud 5.5.0 with a 27 shard (no replication - each shard runs on a physical host) cluster on top of HDFS. It currently just crossed 3 billion documents indexed with an index size of 16.1TBytes. In HDFS with 3x replication this takes up 48.2TBytes.

Each shard is then hosting about 610GBytes of index. The HDFS cache size is very low at about 8GBytes. Suffice it to say, performance isn't very good, but again, this is for experimentation.

If we were to redo this, would it be better to create many shards - maybe 200 with 3 replicas each (600 in all) with the goal being to withstand a server going out, and future expansion as more hardware is added? I know this is very general question. Thanks very much in advance!

-Joe

Reply via email to