Hi,

I haven't seen the issue come up on the btrfs mailing list recently,
but in the year I've followed the list it has been fairly well
established that the default kernel SCT is too short for desktop-class
drives.  I haven't personally run into issues, because my drives have
7sec SCT ERC, and the default kernel SCT is 30sec.

In addition to linking to this bug, I referenced the following thread
on our wiki.d.o/btrfs page:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg53249.html

In my opinion, smartmontools should not default to attempting to
modify the SCT ERC of a drive.  Instead it should query the drive's
capabilities and only modify the kernel timeout if the query fails
(assume desktop drive) or returns a large value (proof of desktop
drive).  From what I gather, that would be sufficient to close this
bug.

The second issue seems like it needs a new (normal priority) bug to be
filed.  That bug might be a request for a debconf interface to
configure custom drive SCT ERC and kernel SCT values.  I imagine a
three column interface with device drive_SCT kernel_SCT columns would
do the trick.  This would be useful for following two cases:
  - tuning for greater performance (eg: read from redundant copy
    instead of waiting 7 sec, for more consistent IOPS)
  - allows drives with firmware preconfigured for RAID to be used in
    single disk volumes.  Eg: if a user buys a "NAS" drive with 7sec
    SCT ERC, but wants maximum chance of recovery from read errors in
    single drive configuration, then the drive's firmware timeout
    should be reconfigured to something big like 120sec and the kernel
    SCT should be bumped to 180sec; this is the expected behaviour
    for single disk configuration.

A long time ago I read that too low kernel SCT values are also a
problem for ZFS, but of course the users who are hitting these
problems are using "desktop" drives with crippled firmware for RAID.

As I see it, this is an opportunity for Debian to distinguish itself
as exemplary, because other distributions have not yet addressed these
emerging issues.  Of course, those "in the know" have already
configured their systems correctly using rc.local...

Sincerely,
Nicholas

Attachment: signature.asc
Description: Digital signature

Reply via email to