I/O Error Test 2
================

commit "bcache: add stop_when_cache_set_failed option to backing device"

Problem: upon too many io errors on cache device, bcache code retires cache set,
but it doesn't stop the bcache device.

(in writeback mode, if it continue to write, writes go to backing device 
directly;
next time bcache re-registers, it writes dirty cached data again, overwriting 
data)

Original kernel: bcache device remains available for writes.
Modified kernel: bcache device is not available anymore.


Original
--------

# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019


# ./setup.sh >/dev/null 2>&1
[   27.427543] bcache: register_bdev() registered backing device dm-0
[   27.434844] bcache: run_cache_set() invalidating existing data
[   27.445652] bcache: register_cache() registered cache device dm-1
[   29.435245] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
68e969a1-0f60-4978-a109-b48559970472

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 

# echo writeback > /sys/block/bcache0/bcache/cache_mode

# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.14109 s, 209 MB/s
                                                                          
# ./dm_fake_dev.sh /dev/loop1 bad                                               
    
[  135.316335] Buffer I/O error on dev dm-1, logical block 262128, async page 
read                      
[  135.330181] Buffer I/O error on dev dm-1, logical block 262128, async page 
read                      
[  135.337019] bcache: register_bcache() error /dev/dm-1: device already 
registered                     

# dd if=/dev/zero of=/dev/bcache0 bs=4k                                         
    
[  142.754766] bcache: error on 68e969a1-0f60-4978-a109-b48559970472:           
                        
[  142.754769] journal io error                                                 
                        
[  142.757583] , disabling caching                                              
                        
dd: error writing '/dev/bcache0': No space left on device                       
                        
[  143.299557] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[  143.305560] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[  143.308463] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
<looping>

# ./dm_fake_dev.sh /dev/loop1 good
[  171.220172] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[  171.426360] bcache: register_bcache() error /dev/dm-1: device already 
registered

262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 32.066 s, 33.5 MB/s
dd: closing input file '/dev/zero': Bad file descriptor
# 
# [  174.589309] bcache: cached_dev_detach_finish() Caching disabled for dm-0
[  174.599062] bcache: cache_set_free() Cache set 
68e969a1-0f60-4978-a109-b48559970472 unregistered


# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm



Modified
--------

# uname -rv
4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 
UTC


# ./setup.sh >/dev/null 2>&1
[   39.981825] bcache: register_bdev() registered backing device dm-0
[   39.987359] bcache: run_cache_set() invalidating existing data
[   40.000743] bcache: register_cache() registered cache device dm-1
[   41.985275] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
9b509864-84be-428e-ad8a-293d546ec2e9

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 

# echo writeback > /sys/block/bcache0/bcache/cache_mode

# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.96102 s, 180 MB/s

# ./dm_fake_dev.sh /dev/loop1 bad                                               
    
[   81.798129] Buffer I/O error on dev dm-1, logical block 262128, async page 
read                  
[   81.804533] Buffer I/O error on dev dm-1, logical block 262128, async page 
read                      
[   81.810630] bcache: register_bcache() error /dev/dm-1: device already 
registered                 
[   87.004373] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.011977] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   87.044118] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.047501] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   87.084123] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.087723] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   87.091177] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.094804] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   87.180123] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.183770] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   87.220126] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   87.223760] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[   90.070935] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   90.075850] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   90.080099] bcache: bch_cached_dev_error() stop bcache0: too many IO errors 
on backing device dm-0
[   90.080099]
[   90.086639] bcache: bch_count_io_errors() dm-1: IO error on reading dirty 
data from cache, recovering
[   90.091152] bcache: bcache_device_free() bcache0 stopped

dd: error writing '/dev/bcache0': No space left on device
1015982+0 records in
1015981+0 records out
4161458176 bytes (4.2 GB, 3.9 GiB) copied, 4.86963 s, 855 MB/s

bcache device removed:

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
loop1          7:1    0    1G  0 loop 
fake-loop1   253:1    0    1G  0 dm

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache device might remain available for I/O requests
     even if backing device is offline, so writes are undefined.

  [Test Case]

   * Detailed test cases/steps for the behavior of almost every
     patch with code logic changes are provided in bug comments.

   * The patchset has been tested for regressions on each cache
     mode (writethrough, writeback, writearound, none) with the
     xfstests test suite (on ext4), fio (random read-write) and
     iozone (several read/write tests).

  [Regression Potential]

   * The patchset is relatively large and touches several areas
     in bcache code, however, synthetic testing of the patches
     has been performed, and extensive regression/stress tests
     were run (as mentioned in Test Case section).

   * Many patches in the patchset are 'Fixes' patches to other
     patches, and no further 'Fixes' currently exist upstream.

  [Other Info]

   * Canonical Field Eng. deploys bcache+writeback extensively
     (e.g., BootStack, UA cloud, except rare all-flash cases).

  
  [Original Bug Description]

  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to