I/O Error Test 5
================

commit "bcache: stop bcache device when backing device is offline"

Problem: bcache is unaware that a backing device goes offline,
and accepts writes.

Original kernel: bcache doesn't realize backing device is offline.
Modified kernel: bcache device is removed after backing device is offline.


Original
--------

# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019

# apt install -y linux-modules-extra-$(uname -r)=4.15.0-55.60

# modprobe scsi_debug dev_size_mb=1024
# dmesg | tail
[   17.339336] scsi host2: scsi_debug: version 1.86 [20160430]
                 dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
[   17.339667] scsi 2:0:0:0: Direct-Access     Linux    scsi_debug       0186 
PQ: 0 ANSI: 7
[   17.339989] sd 2:0:0:0: Power-on or device reset occurred
[   17.340184] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   17.348436] sd 2:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 
GiB)
[   17.352496] sd 2:0:0:0: [sda] Write Protect is off
[   17.352501] sd 2:0:0:0: [sda] Mode Sense: 73 00 10 08
[   17.360686] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
supports DPO and FUA
[   17.451956] sd 2:0:0:0: [sda] Attached SCSI disk

# ./setup-sda.sh >/dev/null 2>&1
[   37.515474] bcache: run_cache_set() invalidating existing data
[   37.527275] bcache: register_cache() registered cache device dm-0
[   37.625160] bcache: register_bdev() registered backing device sda
[   38.466997] bcache: bch_cached_dev_attach() Caching sda as bcache0 on set 
0e23e266-6032-4700-a4c9-464964ed1349

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
sda            8:0    0    1G  0 disk 
└─bcache0    251:0    0 1024M  0 disk 


# echo writeback > /sys/block/sda/bcache/cache_mode

# echo 1 > /sys/block/sda/device/delete
[   77.878624] sd 2:0:0:0: [sda] Synchronizing SCSI cache

# sleep 10
#

# dd if=/dev/zero of=/dev/bcache0 bs=4k
[  120.268072] Buffer I/O error on dev bcache0, logical block 0, lost async 
page write
[  120.271978] Buffer I/O error on dev bcache0, logical block 1, lost async 
page write
[  120.274554] Buffer I/O error on dev bcache0, logical block 2, lost async 
page write
[  120.276671] Buffer I/O error on dev bcache0, logical block 3, lost async 
page write
[  120.278760] Buffer I/O error on dev bcache0, logical block 4, lost async 
page write
[  120.288042] Buffer I/O error on dev bcache0, logical block 5, lost async 
page write
[  120.290178] Buffer I/O error on dev bcache0, logical block 6, lost async 
page write
[  120.296043] Buffer I/O error on dev bcache0, logical block 7, lost async 
page write
[  120.298167] Buffer I/O error on dev bcache0, logical block 8, lost async 
page write
[  120.300220] Buffer I/O error on dev bcache0, logical block 3260, lost async 
page write
dd: error writing '/dev/bcache0': No space left on device
262143+0 records in
262142+0 records out
1073733632 bytes (1.1 GB, 1.0 GiB) copied, 3.49705 s, 307 MB/s

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 


Modified
--------

# uname -rv
4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 
UTC

# modprobe scsi_debug dev_size_mb=1024
# dmesg | tail
[   27.855163] scsi host2: scsi_debug: version 1.86 [20160430]
                 dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
[   27.855497] scsi 2:0:0:0: Direct-Access     Linux    scsi_debug       0186 
PQ: 0 ANSI: 7
[   27.855829] sd 2:0:0:0: Power-on or device reset occurred
[   27.855883] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   27.863979] sd 2:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 
GiB)
[   27.868028] sd 2:0:0:0: [sda] Write Protect is off
[   27.868031] sd 2:0:0:0: [sda] Mode Sense: 73 00 10 08
[   27.876114] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
supports DPO and FUA
[   27.966877] sd 2:0:0:0: [sda] Attached SCSI disk
# 

# ./setup-sda.sh >/dev/null 2>&1
[   46.294586] bcache: run_cache_set() invalidating existing data
[   46.306769] bcache: register_cache() registered cache device dm-0
[   46.400510] bcache: register_bdev() registered backing device sda
[   47.247662] bcache: bch_cached_dev_attach() Caching sda as bcache0 on set 
d3d7816b-6bd9-4dea-bf0c-4bde76b45f91

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
sda            8:0    0    1G  0 disk 
└─bcache0    251:0    0 1024M  0 disk 

# echo writeback > /sys/block/sda/bcache/cache_mode

# echo 1 > /sys/block/sda/device/delete
[   85.713826] sd 2:0:0:0: [sda] Synchronizing SCSI cache

# sleep 10
[   90.428125] bcache: cached_dev_status_update() sda: device offline for 5 
seconds
[   90.431495] bcache: cached_dev_status_update() bdev0: disable I/O request 
due to backing device offline
[   90.435591] bcache: bcache_device_free() bcache0 stopped
#

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   

# ls /dev/bcache0
ls: cannot access '/dev/bcache0': No such file or directory

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache device might remain available for I/O requests
     even if backing device is offline, so writes are undefined.

  [Test Case]

   * Detailed test cases/steps for the behavior of almost every
     patch with code logic changes are provided in bug comments.

   * The patchset has been tested for regressions on each cache
     mode (writethrough, writeback, writearound, none) with the
     xfstests test suite (on ext4), fio (random read-write) and
     iozone (several read/write tests).

  [Regression Potential]

   * The patchset is relatively large and touches several areas
     in bcache code, however, synthetic testing of the patches
     has been performed, and extensive regression/stress tests
     were run (as mentioned in Test Case section).

   * Many patches in the patchset are 'Fixes' patches to other
     patches, and no further 'Fixes' currently exist upstream.

  [Other Info]

   * Canonical Field Eng. deploys bcache+writeback extensively
     (e.g., BootStack, UA cloud, except rare all-flash cases).

  
  [Original Bug Description]

  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to