On 2021-01-02 03:24, Andrei POPESCU wrote:
http://www.unixsheikh.com/articles/battle-testing-data-integrity-verification-with-zfs-btrfs-and-mdadm-dm-integrity.html
That looks interesting. Thanks for the link. :-)
On 2021-01-02 08:08, Richard Hector wrote:
On 3/01/21 12:24 am, Andrei POPESCU wrote:
In case of data corruption (system crash, power outage, user error,
or even just a HDD "hiccup") plain md without the dm-integrity
layer won't even be able to tell which is the good data and will
overwrite your good data with bad data. Silently.
I've had crashes and power outages and never noticed any problems,
not that that means they won't happen (or even that they haven't
happened). Does a journalling filesystem on top not cover that?
AIUI a journaling filesystem provides a two-step process to achieve
atomic writes of multiple sectors to disk -- e.g. a process wants to put
some data into a block here (say, a file), a block there (say, a
directory), etc., and consistency of the on-disk data structures must be
preserved. The journal provides a two-step process whereby everything
is written to the journal, then everything is written to disk. If
either step is interrupted, the filesystem driver will detect the
failure and respond. When done, either all of the blocks have been
updated on disk or none of the blocks on disk have been changed.
Integrity checking addresses different failure modes by applying
checksums to data blocks and metadata blocks. If the contents of a
block become corrupt, either in memory, in transit, on disk, etc., the
driver will detect the failure and respond. If redundant data is
available, such as via RAID, the driver will correct the data and
operations continue. If no redundant data is available, the driver will
generate an error. File system layering features in the Linux kernel
allow you to add the dm-integrity device mapper layer into a storage
stack as desired:
https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-integrity.html
On a related note, it is wise to have ECC memory to protect against data
corruption in memory:
http://www.openoid.net/will-zfs-and-non-ecc-ram-kill-your-data/
More failure modes exist (potentially, an infinite number). It's a
question of what failure modes and effects concern you, and how much
time and money you want to spend to mitigate risks.
David