On 7/1/20 2:50 PM, Josef Bacik wrote:
> On 7/1/20 2:24 PM, Matthew Miller wrote:
>> On Wed, Jul 01, 2020 at 06:54:02AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
>>> Making btrfs opt-in for F33 and (assuming the result go well) opt-out for
>>> F34
>>> could be good option. I know technically it is already opt-in, but it's not
>>> very visible or popular. We could make the btrfs option more prominent and
>>> ask people to pick it if they are ready to handle potential fallout.
>>
>> I'm leaning towards recommending this as well. I feel like we don't have
>> good data to make a decision on -- the work that Red Hat did previously when
>> making a decision was 1) years ago and 2) server-focused, and the Facebook
>> production usage is encouraging but also not the same use case. I'm
>> particularly concerned about metadata corruption fragility as noted in the
>> Usenix paper. (It'd be nice if we could do something about that!)
>>
>
> There's only so much we can do about this. I've sent up patches to ignore
> failed global trees to allow users to more easily recover data in case of
> corruption in the case of global trees, but as they say if only 1 bit is off
> in a node, we throw the whole node away. And throwing a node away means you
> lose access to any of its children, which could be a large chunk of the file
> system.
>
> This sounds like a "wtf, why are you doing this btrfs?" sort of thing, but
> this is just the reality of using checksums. It's a checksum, not ECC. We
> don't know _which_ bits are fucked, we just know somethings fucked, so we
> throw it all away. If you have RAID or DUP then we go read the other copy,
> and fix the broken copy if we find a good copy. If we don't, well then
> there's nothing really we can do.
There is often a path forward when a bad metadata checksum is detected.
i.e. e2fsck:
scan_extent_node() {
...
/* Failed csum but passes checks? Ask to fix checksum. */
if (failed_csum &&
fix_problem(ctx, PR_1_EXTENT_ONLY_CSUM_INVALID, pctx)) {
pb->inode_modified = 1;
pctx->errcode = ext2fs_extent_replace(ehandle, 0, &extent);
if (pctx->errcode)
return;
}
it does similarly for many types of metadata:
/* inode passes checks, but checksum does not match inode */
#define PR_1_INODE_ONLY_CSUM_INVALID 0x010068
--
/* Inode extent block passes checks, but checksum does not match extent */
#define PR_1_EXTENT_ONLY_CSUM_INVALID 0x01006A
--
/* Inode extended attribute block passes checks, but checksum does not
* match block. */
#define PR_1_EA_BLOCK_ONLY_CSUM_INVALID 0x01006C
--
/* dir leaf node passes checks, but fails checksum */
#define PR_2_LEAF_NODE_ONLY_CSUM_INVALID 0x02004D
Does btrfsck really never attempt to salvage a metadata block with a bad CRC by
validating its fields?
-Eric
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]