On Thu, Oct 11, 2018 at 08:54:42AM +0100, Adam Weremczuk wrote: > Hi Dan, > > Yes, I tried tweaking config following that link but for some reason the > sync progress is not showing any more. > I guess I need to fiddle with it more. > > I have 16 x 500 GB disks in each server and my layout is as below: > > 1-4: VD0: RAID10: 2 spans of 2 disks -> 1TB for Proxmox containers and VMs > 5-14: VD1: RAID50: 2 spans of 5 disks -> 4TB for storage (which I'm trying > to sync for redundancy using DRBD) > 15-16: global hot spares > > It appears to provide the best performance, resiliency and space utilisation > balance. > I've been referring to this chart: > https://www.datarecovery.net/articles/raid-level-comparison.aspx > > Is there anything fundamentally wrong with my architecture?
It depends on what you value most. Is it random I/O performance? Sequential read? Sequential write? Resiliency against disk loss? Uptime? Data safety? For your VD0, it's a choice between RAID10 and RAID6 (assuming that's available to you). With RAID10, you get excellent random I/O performance, good streaming write performance, excellent read performance, and you can lose up to two disks but only if they're the right ones (A1, A2, B1, B2: If you lose one from A and one from B you're fine, but if you lose both As or both Bs, data loss and dead filesystem.) With RAID6, you lose out on read and write speeds, but can survive the loss of any 2 disks. For your VD1, you can get excellent sequential read speeds, but write speed will be comparatively terrible: roughly 25% of the disks' combined write performance. RAID50 can survive the loss of 6 disks, but only if they're the right ones: the loss of four disks can kill the system. RAID50 is also very slow to recover from a disk loss, and it is unfortunately common for the recovery process to kill more disks that were on the edge. DRBD slows down everything in synch mode (protocol C); you'll be limited to your available network speed. Do you really have a requirement to have all your storage be instantly available on the second server at all times? You didn't mention a clustering filesystem, and you did mention failover scenarios, so it sounds like you're planning on restarting VMs from the second server with, hopefully, up-to-the-second recency. But if writes are frequent, you might overwhelm the DRBD link in synch mode, and you'll face the choice of low performance all the time or a higher risk of data loss. The problem that I see is that, assuming dual or triple power supplies for each server, your most likely threats are things that will affect both servers: a power outage, a switch failure, a routing problem, environmental issues, other things beyond your control. If you don't actually have a realtime sync requirement, you might be happier with something like hourly ZFS send/recv jobs. I don't know your exact scenario. -dsr-