On 04/15/2010 03:55 AM, ceg wrote:
> Upon disappearance, a real failure, mdadm --fail or running an array
> degraded: mdadm -E shows *missing* disks marked as "removed". (What
> you probably referred to all the time.) Even though nobody actually
> issued "mdadm --removed" on them. (What I referred to.)

Exactly, when mounting an array in degraded mode with missing disks,
mdadm marks the missing disks as removed.  It probably should only mark
them as faulty or something less severe than removed.

> All would be clearer if * mdadm -E would report "missing" instead of
> removed (which sounds like it really got "mdadm --removed")

There already exists a faulty state.  It might be appropriate to use
that.

> That is a good point! If confliciting changes can be detected by
> this, why does mdadm not use this conflicting information (when parts
> of an array are claiming each other to be failed) to just report
> "conflicting changes" and refuse to --add without --force? (You see I
> am back asking to report and require --force to make it clear to
> users/admin that it is not only some bug/hickup in the hot-plug
> mechanism that made it fail, but -add is a manual operation that
> implies real data-loss in this case, not as in others when it will
> only sync an older copy instead of a diverged one.)

That seems to be the heart of the bug.  If BOTH disks show the second 
disk as removed, then mdadm will not use the second disk, but when the 
metadata on the second disk says disk 2 is fine, and it's disk 1 that 
has been removed, it happily adds the disk.  It should not trust the 
wrong metadata on the second disk and refuse to use it unless it can 
safely coerse it into agreement with the active metadata in the array 
taken from the first disk.

If the second disk says both disks are fine, then the array state of 
disk 2 can be changed to active/needs sync, and the metadata on both 
disks can be updated to match and the resync started.

If the second disk says that the first disk has been 
removed/failed/missing, then you can not reconcile them since failing 
the first disk would fail the array, and activating the second disk 
could destroy data.  In this case the second disk should be marked as 
removed and its metadata updated.  This will make sure that if you 
reboot and the second disk is detected first, that it will not be 
activated.  In other words, as soon as you have a boot that does see 
both disks after they have been independently degraded and modified, ONE 
of them will be chosen as the victor, and used from then on, and the 
other will be removed until the admin has a chance to investigate and 
decide to manually add it back, thus destroying any changes on that disk 
that were made during the boot with only that disk available.

-- 
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to