On 7/24/2016 8:04 PM, forest_soup wrote:
> We have a 5 node solrcloud. When a solr node's disk had issue and
> Raid5 downgraded, a recovery on the node was triggered. But there's a
> hanging happens. The node disappears in the live_nodes list. 

In my opinion, RAID5 (and RAID6) are bad ways to handle storage.  Cost
per usable gigabyte is the only real advantage, but the performance
problems are not worth that advantage.  If you care more about capacity
than performance, then it might be OK.

Under normal circumstances (no failed disk), if you're writing to the
array at all, all I/O (both read and write) is slow.  RAID5 can have
awesome read performance, but *only* if the array is health and there is
no writing happening at the same time.

If you lose a disk, the parity reads required to reconstruct the missing
data cause REALLY bad performance.

When you replace the failed disk and it is rebuilding, performance is
even worse.  The additional load is often enough to cause a second disk
to fail, which for RAID5 means the entire array is lost.

These I/O performance issues cause really big problems for Solr and
zookeeper.  There's no surprise to me that a degraded RAID5 array has
issues like you describe.

Thanks,
Shawn

Reply via email to