So I took the time to re-test this again.
My z/VM guest has 4 CPUs (but SMT on), and 4 DASD FBA devices that equally 
split a 64GB zFCP/SCSI LUN in 4 16GB FBA chunks.

I've tested (in comment #8) with 2GB RAM where things worked and I wasn't able 
to recreate the error situation.
I then moved to 6GB RAM and things still worked for me.
Then 8GB - where everything was still fine.
And finally 10GB - still don't see the issue.

$ grep -i 'error\|crash\|crit\|panic\|I\/O\|erp\|sense\|fba' /var/log/syslog
ul 28 10:05:23 hwe0005 systemd[1]: Stopping LSB: automatic crash report 
generation...
Jul 28 10:05:23 hwe0005 systemd[1]: Stopping Configure dump on panic for System 
z...
Jul 28 10:07:36 hwe0005 systemd-udevd[514]: dasd-fba: 
/etc/udev/rules.d/41-generic-ccw-0.0.0009.rules:7 Failed to write 
ATTR{/sys/devices/css0/0.0.0007/0.0.0009/online}, ignoring: Invalid argument
Jul 28 10:07:36 hwe0005 systemd-udevd[511]: 0.0.0102: 
/etc/udev/rules.d/41-dasd-fba-0.0.0102.rules:7 Failed to write 
ATTR{/sys/devices/css0/0.0.0001/0.0.0102/online}, ignoring: Invalid argument
Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0101: 
/etc/udev/rules.d/41-dasd-fba-0.0.0101.rules:7 Failed to write 
ATTR{/sys/devices/css0/0.0.0000/0.0.0101/online}, ignoring: Invalid argument
Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0103: 
/etc/udev/rules.d/41-dasd-fba-0.0.0103.rules:7 Failed to write 
ATTR{/sys/devices/css0/0.0.0002/0.0.0103/online}, ignoring: Invalid argument
Jul 28 10:07:36 hwe0005 systemd-udevd[505]: 0.0.0104: 
/etc/udev/rules.d/41-dasd-fba-0.0.0104.rules:7 Failed to write 
ATTR{/sys/devices/css0/0.0.0003/0.0.0104/online}, ignoring: Invalid argument
Jul 28 10:07:36 hwe0005 kernel: [    4.983272] dasd-fba.f36f2f: 0.0.0101: New 
FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
Jul 28 10:07:36 hwe0005 kernel: [    4.988020] dasd-fba.f36f2f: 0.0.0102: New 
FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
Jul 28 10:07:36 hwe0005 kernel: [    4.990317] dasd-fba.f36f2f: 0.0.0103: New 
FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
Jul 28 10:07:36 hwe0005 kernel: [    4.992370] dasd-fba.f36f2f: 0.0.0104: New 
FBA DASD 9336/10 (CU 6310/80) with 16384 MB and 512 B/blk
Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Process error 
reports when automatic reporting is enabled (file watch) being skipped.
Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Unix socket for 
apport crash forwarding being skipped.
Jul 28 10:07:36 hwe0005 systemd[1]: Starting LSB: automatic crash report 
generation...
Jul 28 10:07:36 hwe0005 systemd[1]: Starting Configure dump on panic for System 
z...
Jul 28 10:07:36 hwe0005 apport[764]:  * Starting automatic crash report 
generation: apport
Jul 28 10:07:36 hwe0005 dumpconf[770]: stop on panic configured.
Jul 28 10:07:36 hwe0005 systemd[1]: Finished Configure dump on panic for System 
z.
Jul 28 10:07:36 hwe0005 systemd[1]: Started LSB: automatic crash report 
generation.

I'm wondering a bit about the systemd msgs and the sysfs device tree.
But other than that no ERP, sense, or panics so far ...

$ dmesg | grep -i 'error\|fail\|crash\|warn\|crit\|panic\|erp\|fba'
[    4.983272] dasd-fba.f36f2f: 0.0.0101: New FBA DASD 9336/10 (CU 6310/80) 
with 16383 MB and 512 B/blk
[    4.988020] dasd-fba.f36f2f: 0.0.0102: New FBA DASD 9336/10 (CU 6310/80) 
with 16383 MB and 512 B/blk
[    4.990317] dasd-fba.f36f2f: 0.0.0103: New FBA DASD 9336/10 (CU 6310/80) 
with 16383 MB and 512 B/blk
[    4.992370] dasd-fba.f36f2f: 0.0.0104: New FBA DASD 9336/10 (CU 6310/80) 
with 16384 MB and 512 B/blk
[    5.075981] random: 7 urandom warning(s) missed due to ratelimiting

I always did a quick check of the partition data:

ubuntu@hwe0005:~$ sudo fdisk -l /dev/dasde1
Disk /dev/dasde1: 15.102 GiB, 17178902528 bytes, 33552544 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

And then created a ext3 file system using -F on all 4 FBA devices one
after the other:

ubuntu@hwe0005:~$ sudo mkfs.ext3 -F /dev/dasde1
mke2fs 1.45.5 (07-Jan-2020)
/dev/dasde1 contains a ext3 file system
        created on Tue Jul 28 09:45:37 2020
Discarding device blocks: done                            
Creating filesystem with 4194068 4k blocks and 1048576 inodes
Filesystem UUID: c34e7583-1dc9-4b8a-8494-7a100338a7e6
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done   

Does it have a dependency on a certain z/VM version:

And I'm running this z/VM version:
00: CP Q CPLEVEL
00: z/VM Version 6 Release 4.0, service level 1901 (64-bit)
00: Generated at 2019-06-14 14:15:49 UTC

I do the FBA devices always have to be re-enabled before retrying.

Right now I'm a bit lost re-creating this.

@Jan, how did you system and FBAs looked like? And which z/VM version
are you using?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879707

Title:
  [UBUNTU 20.04] mke2fs dasd(fba),Failing CCW,default ERP has run out of
  retries and failed

Status in Ubuntu on IBM z Systems:
  Incomplete
Status in linux package in Ubuntu:
  New

Bug description:
  mke2fs,dasd(fba) guest edevices FBA,default ERP has run out of retries and 
failed,Failing CCW
   
  ---uname output---
  xxxxxx -  5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:27:18 UTC 2020 s390x 
s390x s390x GNU/Linux
   
  Machine Type = IBM 3906 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   mke2fs to dasd(fba) devices
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  -Post a private note with access information to the machine that the bug is 
occuring on. 
  -Attach sysctl -a output output to the bug.

  dasd(fba),Failing CCW,default ERP has run out of retries and failed between 
the following syslog events,
  mke2fs running, before mounting and starting IO to dasd(fba) devices

  May 14 14:33:32 ilabg13 root: ILAB_IO_FROM_MSDI_START
  May 14 14:48:34 ilabg13 root: ILAB_IO_FROM_MSDI_RUNNING

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1879707/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to