Bug#637087: Please set FSCKFIX=yes by default

Philip Hands Thu, 07 Jun 2012 06:33:19 -0700

Hi Roger,

Thanks for the lightning-quick response :-)

Roger Leigh <rle...@codelibre.net> writes:
> On Thu, Jun 07, 2012 at 09:13:35AM +0100, Philip Hands wrote:
>> Having upgraded to wheezy recently, I note that you're changing defaults
>> in rcS (as exhaustively discussed elsewhere).
>> 
>> That being the case, there is no additional cost to fixing this bug,
>> as people are going to be prompted on upgrade about rcS anyway, so this
>> would seem to be a very good time to do it.
>
> No defaults in rcS have been changed (from a squeeze to wheezy upgrade
> POV).  We did add some new ones transiently in testing before moving
> them to /etc/default/tmpfs, and removed UTC, but the defaults present
> in squeeze are unchanged.

Ah, my mistake -- I noticed this updating a couple of weeks ago, and
have only now got round to looking into whether I needed to report a new
wishlist bug, or add info to an existing one.

> However, this change has permitted rcS to become a regular conffile,
> so now at least we have the /possibilty/ of updating it to change the
> defaults sanely.

That's good.

>> The only disadvantage I can think of with FSCKFIX=yes is the reduced
>> likelihood of noticing (if you happen to understand the messages) to
>> notice that inodes are being dropped into lost+found -- I would think
>> that that could be handled by something that checks whether lost+found
>> is empty and mails root about it, but that is hardly a blocker.
>
> That would be useful.  I think the main issue is Henrique's point that
> it's not always safe to run fsck when using lvm/md on unreconstructed
> arrays/volumes.  That could result in dataloss.  That said, such things
> should normally be detectable: normally isn't there an md superblock
> which should make the partition detectable as a part of a RAID set,
> rather than a fsck-able filesystem?  Likewise with LVM PVs.

Hmm, perhaps.  How does this result in data loss?  If you're running one
disk down on a RAID5 then surely the data is _supposed_ to be the same
as what would be on the full set.

OK, so I've had disks that had duff blocks on them, and also had drives
drop out of a RAID, such that the only valid copy of the data was on the
drive that was dropped out, and if I'd blithely added it back in that
would have overwritten the good data with the bad, but I've no idea how
you'd expect to detect that automatically, and I think that there's very
little chance of the vast majority of users surviving that with their
data (and it was the result of a pretty odd sequence of events in my
case) so if people suffer from that sort of thing then they've probably
been doing something overly ambitious with their disk shuffling, and
they can expect to restore from backups when that goes wrong.

> I guess it's a balance between whether we get more dataloss if FSCKFIX=
> yes or no.

Could this be a candidate for a debconf question, with a default of yes
and a low priority unless the right combination of md/lvm exists, in
which case its priority could be promoted?

Many of our users are going to be presented with a prompt that they are
quite likely not to be able to see (if it's happening on a server in a
co-lo) at which point their server is down until they get access (via a
remote console or in person).  That might be a serious cost, and I would
think happens orders of magnitude more often than avoidable data loss
due to running fsck when it shouldn't be run, but perhaps I just don't
use lvm/md in a way that would provoke that.

Of those that do see the prompt, and understand it, I doubt many do
anything other than fsck -y on the relevant device anyway.

The remaining people that see it have a strong tendency (in my
experience) to hit Ctrl-D and when that works tell everyone in the
office that that's the new way of getting the server to boot, until it
eventually breaks horribly.

Of course, the fsck failure can be an early warning of disk failure,
which setting FSCKFIX=yes may mask until it breaks properly.  On the
other hand, these days smartmon probably covers that, along with mdadm
sending emails when raids degrade and other such tools.

Cheers, Phil.
-- 
|)|  Philip Hands [+44 (0)20 8530 9560]    http://www.hands.com/
|-|  HANDS.COM Ltd.                    http://www.uk.debian.org/
|(|  10 Onslow Gardens, South Woodford, London  E18 1NE  ENGLAND

pgpIzn3t3Luj2.pgp
Description: PGP signature

Bug#637087: Please set FSCKFIX=yes by default

Reply via email to