I/O Error Test 5 ================ commit "bcache: stop bcache device when backing device is offline"
Problem: bcache is unaware that a backing device goes offline, and accepts writes. Original kernel: bcache doesn't realize backing device is offline. Modified kernel: bcache device is removed after backing device is offline. Original -------- # uname -rv 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 # apt install -y linux-modules-extra-$(uname -r)=4.15.0-55.60 # modprobe scsi_debug dev_size_mb=1024 # dmesg | tail [ 17.339336] scsi host2: scsi_debug: version 1.86 [20160430] dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0 [ 17.339667] scsi 2:0:0:0: Direct-Access Linux scsi_debug 0186 PQ: 0 ANSI: 7 [ 17.339989] sd 2:0:0:0: Power-on or device reset occurred [ 17.340184] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 17.348436] sd 2:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) [ 17.352496] sd 2:0:0:0: [sda] Write Protect is off [ 17.352501] sd 2:0:0:0: [sda] Mode Sense: 73 00 10 08 [ 17.360686] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 17.451956] sd 2:0:0:0: [sda] Attached SCSI disk # ./setup-sda.sh >/dev/null 2>&1 [ 37.515474] bcache: run_cache_set() invalidating existing data [ 37.527275] bcache: register_cache() registered cache device dm-0 [ 37.625160] bcache: register_bdev() registered backing device sda [ 38.466997] bcache: bch_cached_dev_attach() Caching sda as bcache0 on set 0e23e266-6032-4700-a4c9-464964ed1349 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk sda 8:0 0 1G 0 disk └─bcache0 251:0 0 1024M 0 disk # echo writeback > /sys/block/sda/bcache/cache_mode # echo 1 > /sys/block/sda/device/delete [ 77.878624] sd 2:0:0:0: [sda] Synchronizing SCSI cache # sleep 10 # # dd if=/dev/zero of=/dev/bcache0 bs=4k [ 120.268072] Buffer I/O error on dev bcache0, logical block 0, lost async page write [ 120.271978] Buffer I/O error on dev bcache0, logical block 1, lost async page write [ 120.274554] Buffer I/O error on dev bcache0, logical block 2, lost async page write [ 120.276671] Buffer I/O error on dev bcache0, logical block 3, lost async page write [ 120.278760] Buffer I/O error on dev bcache0, logical block 4, lost async page write [ 120.288042] Buffer I/O error on dev bcache0, logical block 5, lost async page write [ 120.290178] Buffer I/O error on dev bcache0, logical block 6, lost async page write [ 120.296043] Buffer I/O error on dev bcache0, logical block 7, lost async page write [ 120.298167] Buffer I/O error on dev bcache0, logical block 8, lost async page write [ 120.300220] Buffer I/O error on dev bcache0, logical block 3260, lost async page write dd: error writing '/dev/bcache0': No space left on device 262143+0 records in 262142+0 records out 1073733632 bytes (1.1 GB, 1.0 GiB) copied, 3.49705 s, 307 MB/s # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk Modified -------- # uname -rv 4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 UTC # modprobe scsi_debug dev_size_mb=1024 # dmesg | tail [ 27.855163] scsi host2: scsi_debug: version 1.86 [20160430] dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0 [ 27.855497] scsi 2:0:0:0: Direct-Access Linux scsi_debug 0186 PQ: 0 ANSI: 7 [ 27.855829] sd 2:0:0:0: Power-on or device reset occurred [ 27.855883] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 27.863979] sd 2:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) [ 27.868028] sd 2:0:0:0: [sda] Write Protect is off [ 27.868031] sd 2:0:0:0: [sda] Mode Sense: 73 00 10 08 [ 27.876114] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 27.966877] sd 2:0:0:0: [sda] Attached SCSI disk # # ./setup-sda.sh >/dev/null 2>&1 [ 46.294586] bcache: run_cache_set() invalidating existing data [ 46.306769] bcache: register_cache() registered cache device dm-0 [ 46.400510] bcache: register_bdev() registered backing device sda [ 47.247662] bcache: bch_cached_dev_attach() Caching sda as bcache0 on set d3d7816b-6bd9-4dea-bf0c-4bde76b45f91 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk sda 8:0 0 1G 0 disk └─bcache0 251:0 0 1024M 0 disk # echo writeback > /sys/block/sda/bcache/cache_mode # echo 1 > /sys/block/sda/device/delete [ 85.713826] sd 2:0:0:0: [sda] Synchronizing SCSI cache # sleep 10 [ 90.428125] bcache: cached_dev_status_update() sda: device offline for 5 seconds [ 90.431495] bcache: cached_dev_status_update() bdev0: disable I/O request due to backing device offline [ 90.435591] bcache: bcache_device_free() bcache0 stopped # # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm # ls /dev/bcache0 ls: cannot access '/dev/bcache0': No such file or directory -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1829563 Title: bcache: risk of data loss on I/O errors in backing or caching devices Status in linux package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Cosmic: In Progress Bug description: [Impact] * The bcache code in Bionic lacks several fixes to handle I/O errors in both backing devices and caching devices. * Partial or permanent errors in backing or caching devices, specially in writeback mode, can lead to data loss and/or the application is not notified about failed I/O requests. * The bcache device might remain available for I/O requests even if backing device is offline, so writes are undefined. [Test Case] * Detailed test cases/steps for the behavior of almost every patch with code logic changes are provided in bug comments. * The patchset has been tested for regressions on each cache mode (writethrough, writeback, writearound, none) with the xfstests test suite (on ext4), fio (random read-write) and iozone (several read/write tests). [Regression Potential] * The patchset is relatively large and touches several areas in bcache code, however, synthetic testing of the patches has been performed, and extensive regression/stress tests were run (as mentioned in Test Case section). * Many patches in the patchset are 'Fixes' patches to other patches, and no further 'Fixes' currently exist upstream. [Other Info] * Canonical Field Eng. deploys bcache+writeback extensively (e.g., BootStack, UA cloud, except rare all-flash cases). [Original Bug Description] This is a request for a backport of the following upstream patch from 4.18: "bcache: stop bcache device when backing device is offline" https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp