On 04/15/2010 03:55 AM, ceg wrote: > Upon disappearance, a real failure, mdadm --fail or running an array > degraded: mdadm -E shows *missing* disks marked as "removed". (What > you probably referred to all the time.) Even though nobody actually > issued "mdadm --removed" on them. (What I referred to.)
Exactly, when mounting an array in degraded mode with missing disks, mdadm marks the missing disks as removed. It probably should only mark them as faulty or something less severe than removed. > All would be clearer if * mdadm -E would report "missing" instead of > removed (which sounds like it really got "mdadm --removed") There already exists a faulty state. It might be appropriate to use that. > That is a good point! If confliciting changes can be detected by > this, why does mdadm not use this conflicting information (when parts > of an array are claiming each other to be failed) to just report > "conflicting changes" and refuse to --add without --force? (You see I > am back asking to report and require --force to make it clear to > users/admin that it is not only some bug/hickup in the hot-plug > mechanism that made it fail, but -add is a manual operation that > implies real data-loss in this case, not as in others when it will > only sync an older copy instead of a diverged one.) That seems to be the heart of the bug. If BOTH disks show the second disk as removed, then mdadm will not use the second disk, but when the metadata on the second disk says disk 2 is fine, and it's disk 1 that has been removed, it happily adds the disk. It should not trust the wrong metadata on the second disk and refuse to use it unless it can safely coerse it into agreement with the active metadata in the array taken from the first disk. If the second disk says both disks are fine, then the array state of disk 2 can be changed to active/needs sync, and the metadata on both disks can be updated to match and the resync started. If the second disk says that the first disk has been removed/failed/missing, then you can not reconcile them since failing the first disk would fail the array, and activating the second disk could destroy data. In this case the second disk should be marked as removed and its metadata updated. This will make sure that if you reboot and the second disk is detected first, that it will not be activated. In other words, as soon as you have a boot that does see both disks after they have been independently degraded and modified, ONE of them will be chosen as the victor, and used from then on, and the other will be removed until the admin has a chance to investigate and decide to manually add it back, thus destroying any changes on that disk that were made during the boot with only that disk available. -- array with conflicting changes is assembled with data corruption/silent loss https://bugs.launchpad.net/bugs/557429 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs