On 2/14/24 18:54, The Wanderer wrote:
TL;DR: It worked! I'm back up and running, with what appears to be all
my data safely recovered from the failing storage stack!


That is good to hear.  :-)


On 2024-01-09 at 14:22, The Wanderer wrote:

On 2024-01-09 at 14:01, Michael Kjörling wrote:

On 9 Jan 2024 13:25 -0500, from wande...@fastmail.fm (The
Wanderer):

I've ordered a 22TB external drive


Make?  Model?  How it is interfaced to your computer?


for the purpose of creating
such a backup. Fingers crossed that things last long enough for
it to get here and get the backup created.

I suggest selecting, installing and configuring (as much as
possible) whatever software you will use to actually perform the
backup while you wait for the drive to arrive. It might save you a
little time later. Opinions differ but I like rsnapshot myself;
it's really just a front-end for rsync, so the copy is simply
files, making partial or full restoration easy without any special
tools.

My intention was to shut down everything that normally runs, log out
as the user who normally runs it, log in as root (whose home
directory, like the main installed system, is on a different RAID
array with different backing drives), and use rsync from that point.
My understanding is that in that arrangement, the only thing
accessing the RAID-6 array should be the rsync process itself.

For additional clarity: the RAID-6 array is backing a pair of
logical volumes, which are backing the /home and /opt partitions. The
entire rest of the system is on a series of other logical volumes
which are backed by a RAID-1 array, which is based on entirely
different drives (different model, different form factor, different
capacity, I think even different connection technology) and which has
not seen any warnings arise.

dmesg does have what appears to be an error entry for each of
the events reported in the alert mails, correlated with the
devices in question. I can provide a sample of one of those, if
desired.

As long as the drive is being honest about failures and is
reporting failures rapidly, the RAID array can do its work. What
you absolutely don't want to see is I/O errors relating to the RAID
array device (for example, with mdraid, /dev/md*), because that
would presumably mean that the redundancy was insufficient to
correct for the failure. If that happens, you are falling off a
proverbial cliff.

Yeah, *that* would be indicative of current catastrophic failure. I
have not seen any messages related to the RAID array itself.

In the time since this, I continued mostly-normal but somewhat-curtailed
use of the system, and saw few messages about these matters that did not
arise from attempts to back up the data for later recovery purposes.


Migrating large amounts of data from one storage configuration to another storage configuration is non-trivial. Anticipating problems and preparing for them ahead of time (e.g. backups) makes it even less trivial. The last time I lost data was during a migration when I had barely enough hardware. I made a conscious decision to always have a surplus of hardware.


(For awareness: this is all a source of considerable psychological
stress to me, to an extent that is leaving me on the edge of
physically ill, and I am managing to remain on the good side of that
line only by minimizing my mental engagement with the issue as much
as possible. I am currently able to read and respond to these mails
without pressing that line, but that may change at any moment, and if
so I will stop replying without notice until things change again.)

This need to stop reading wound up happening almost immediately after I
sent the message to which I am replying.


I remember reading your comment and then noticing you went silent. I apologize if I pushed your button.


I now, however, have good news to report back: after more than a month,
at least one change of plans, nearly $2200 in replacement hard drives,


Ouch.


If you have a processor, memory, PCIe slot, and HBA to match those SSD's, the performance of those SSD's should be very nice.


much nervous stress, several days of running data copies to and from a
20+-terabyte mechanical hard drive over USB, and a complete manual
removal of my old 8-drive RAID-6 array and build of a new 6-drive RAID-6
array (and of the LVM structure on top of it), I now appear to have
complete success.

I am now running on a restored copy of the data on the affected
partitions, taken from a nearly-fully-shut-down system state, which is
sitting on a new RAID-6 array built on what I understand to be
data-center-class SSDs (which should, therefore, be more suitable to the
24/7-uptime read-mostly workload I expect of my storage). The current
filesystems involved are roughly the same size as the ones previously in
use, but the underlying drives are nearly 2x the size; I decided to
leave the extra capacity for later allocation via LVM, if and when I may
need it.


When I was thinking about building md RAID, and then ZFS, I worried about having enough capacity for my data. Now I worry about zfs-auto-snapshot(8), daily backups, monthly archives, monthly images, etc., clogging my ZFS pools.


The key concept is "data lifetime". (Or alternatively, "destruction policy".)


I did my initial data backup to the external drive, from a
still-up-and-running system, via rsnapshot. Attempting to do a second
rsnapshot, however, failed at the 'cp -al' stage with "too many
hardlinks" errors. It turns out that there is a hard limit of 65000
hardlinks per on-disk file;


65,000 hard links seems to be an ext4 limit:

https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624


I believe ZFS can do more hard links. (Much more? Limited by available storage space?)


I had so many files already hardlinked
together on the source filesystem that trying to hardlink each one to
just as many new names as there were already hardlinks for that file ran
into that limit.

(The default rsnapshot configuration doesn't preserve hardlinks,
possibly in order to avoid this exact problem - but that isn't viable
for the case I had at hand, because in some cases I *need* to preserve
the hardlink status, and because without that deduplication there
wouldn't have been enough space on the drive for more than the single
copy, in which case there'd be very little point in using rsnapshot
rather than just rsync.)


ZFS provides similarly useful results with built-in compression and de-duplication.


In the end, after several flailing-around attempts to minimize or
mitigate that problem, I wound up moving the initial external copy of
the biggest hardlink-deduplicated tree (which is essentially 100%
read-only at this point; it's backup copies of an old system state,
preserved since one of those copies has corrupted data and I haven't yet
been able to confirm that all of the files in my current copy of that
data were taken from the non-corrupt version)


That sounds like an N-way merge problem -- old file system, multiple old backups, and current file system as inputs, all merged into an updated current file system as output. LVM snapshots, jdupes(1), and your favorite scripting language come to mind. Take good notes and be prepared to rollback at any step.


out of the way, shutting
down all parts of the system that might be writing to the affected
filesystems, and manually copying out the final state of the *other*
parts of those filesystems via rsync, bypassing rsnapshot. That was on
Saturday the 10th.

Then I grabbed copies of various metadata about the filesystems, the
LVM, and the mdraid config; modified /etc/fstab to not mount them;
deactivated the mdraid, and commented it out of /etc/mdadm/mdadm.conf;
updated the initramfs; shut down; pulled all eight Samsung 870 EVO
drives; installed six brand-new Intel data-center-class (or so I gather)
SSDs;


Which model?  What size?


booted up; partitioned the new drives based on the data I had
about what config the Debian installer put in place when creating the
mdraid config on the old ones; created a new mdraid RAID-6 array on
them, based on the copied metadata; created a new LVM stack on top of
that, based on *that* copied metadata; created new filesystems on top of
that, based on *that* copied metadata; rsync'ed the data in from the
manually-created external backup; adjusted /etc/fstab and
/etc/mdadm/mdadm.conf to reflect the new UUID and names of the new
storage configuration; updated the initramfs; and rebooted. Given delay
times for the drives to arrive and for various data-validation and
plan-double-checking steps to complete, the end of that process happened
this afternoon.

And it appears to Just Work. I haven't examined all the data to validate
that it's in good condition, obviously (since there's nearly 3TB of it),
but the parts I use on a day-to-day basis are all looking exactly the
way they should be. It appears that the cross-drive redundancy of the
RAID-6 array was enough to have avoided avoid data loss from the
scattered read failures of the underlying drives before I could get the
data out.


Data integrity validation is tough without a mechanism. Adding an rsnapshot(1) postexec MD5SUMS, etc., file into the root of each backup tree could solve this need, but could waste a lot of time and energy checksumming files that have not changed.


One of the reasons I switched to ZFS was because ZFS has built-in data and metadata integrity checking (and repair; depending upon redundancy).


(This does leave me without having restored the read-only backup data
from the old system state. I care less about that; I'll want it
eventually, but it isn't important enough to warrant postponing getting
the system back in working order.)


I do still want/need to figure out what to do about an *actual* backup
system, to external storage, since the rsnapshot thing apparently isn't
going to be viable for my circumstance and use case. There is, however,
now *time* to work on doing that, without living under the shadow of a
known immediate/imminent data-loss hardware failure.


rsync(1) should be able to copy backups onto an external HDD.


If your chassis has an available 5.25" half-height external drive bay and you have an available SATA 6 Gbps port, mobile racks are a more reliable connection than USB for 3.5" HDD's because there are no cables to bump or power adapters to fail or unplug:

https://www.startech.com/en-us/hdd/drw150satbk


I also do mean to read the rest of the replies in this thread, now that
doing so is unlikely to aggravate my stress heartburn...


Okay.


David

Reply via email to