------- Comment From y...@cn.ibm.com 2016-11-14 06:39 EDT-------
(In reply to comment #7)
> > (In reply to comment #1)
> > The installation was on a FCP SCSI SAN volumes each with two active paths.
> > Multipath was involved.  The system IPLed fine up to the point that we
> > expanded the /root filesystem to span volumes.  At boot time,  the system
> > was unable to locate the second segment of the /root filesystem.   The error
> > message indicated this was due to lvmetad not being not active.
> For the zfcp case, did you use the chzdev tool to activate the paths of your
> new additional LVM physical volume (PV)?

Initially, the paths to the second luns were brought online manually
with "echo 0x4000400f00000000 > unit_add".  Then I followed up by
running the "chzdev zfcp-lun -e --online" command and verified they were
online and persistent with the lszdev command.

Below are the rules files for the 0.0.e100 and 0.0.e300 paths.   Below
that is the output of the lszdev command. The date on these files is
10/26/2016.  The output from the lszdev command is also from 10/26/2016.

cat 41-zfcp-lun-0.0.e100.rules
# Generated by chzdev
ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e100", 
GOTO="start_zfcp_lun_0.0.e100"
GOTO="end_zfcp_lun_0.0.e100"

LABEL="start_zfcp_lun_0.0.e100"
SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x5005076306135700", 
GOTO="cfg_fc_0.0.e100_0x5005076306135700"
SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", 
KERNELS=="rport-*", ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", 
GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000"
SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", 
KERNELS=="rport-*", ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", 
GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000"
GOTO="end_zfcp_lun_0.0.e100"

LABEL="cfg_fc_0.0.e100_0x5005076306135700"
ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400e00000000"
ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400f00000000"
ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000401200000000"
ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001400d00000000"
ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001401100000000"
GOTO="end_zfcp_lun_0.0.e100"

LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000"
ATTR{queue_depth}="32"
GOTO="end_zfcp_lun_0.0.e100"

LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000"
ATTR{queue_depth}="32"
GOTO="end_zfcp_lun_0.0.e100"

LABEL="end_zfcp_lun_0.0.e100"

----------------------------------------------------------------------------------------------------------------------------------------------------

cat 41-zfcp-lun-0.0.e300.rules
# Generated by chzdev
ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e300", 
GOTO="start_zfcp_lun_0.0.e300"
GOTO="end_zfcp_lun_0.0.e300"

LABEL="start_zfcp_lun_0.0.e300"
SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x500507630618d700", 
GOTO="cfg_fc_0.0.e300_0x500507630618d700"
SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", 
KERNELS=="rport-*", ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", 
GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000"
SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", 
KERNELS=="rport-*", ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", 
GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000"
GOTO="end_zfcp_lun_0.0.e300"

LABEL="cfg_fc_0.0.e300_0x500507630618d700"
ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400e00000000"
ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400f00000000"
ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000401200000000"
ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001400d00000000"
ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001401100000000"
GOTO="end_zfcp_lun_0.0.e300"

LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000"
ATTR{queue_depth}="32"
GOTO="end_zfcp_lun_0.0.e300"

LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000"
ATTR{queue_depth}="32"
GOTO="end_zfcp_lun_0.0.e300"

LABEL="end_zfcp_lun_0.0.e300"

----------------------------------------------------------------------------------------------------------------------------------

Output from lfzdev
zfcp-lun 0.0.e100:0x5005076306135700:0x4000400e00000000 yes yes sda sg0
zfcp-lun 0.0.e100:0x5005076306135700:0x4000400f00000000 yes yes sdc sg2
zfcp-lun 0.0.e300:0x500507630618d700:0x4000400e00000000 yes yes sdb sg1
zfcp-lun 0.0.e300:0x500507630618d700:0x4000400f00000000 yes yes sdd sg3
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401300000000 yes yes sde sg4
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401400000000 yes yes sdf sg5
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401500000000 yes yes sdg sg6
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401600000000 yes yes sdh sg7
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401700000000 yes yes sdi sg8
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401800000000 yes yes sdj sg9
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401900000000 yes yes sdk sg10
zfcp-lun 0.0.e500:0x500507630633d700:0x4000401a00000000 yes yes sdl sg11
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401200000000 yes yes sdm sg12
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401300000000 yes yes sdn sg13
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401400000000 yes yes sdo sg14
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401500000000 yes yes sdp sg15
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401600000000 yes yes sdq sg16
zfcp-lun 0.0.e500:0x500507630633d700:0x4001401700000000 yes yes sdr sg17
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401300000000 yes yes sds sg18
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401400000000 yes yes sdt sg19
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401500000000 yes yes sdu sg20
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401600000000 yes yes sdv sg21
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401700000000 yes yes sdw sg22
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401800000000 yes yes sdx sg23
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401900000000 yes yes sdy sg24
zfcp-lun 0.0.e700:0x5005076306389700:0x4000401a00000000 yes yes sdz sg25
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401200000000 yes yes sdaa sg26
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401300000000 yes yes sdab sg27
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401400000000 yes yes sdac sg28
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401500000000 yes yes sdad sg29
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401600000000 yes yes sdae sg30
zfcp-lun 0.0.e700:0x5005076306389700:0x4001401700000000 yes yes sdaf sg31

> This is the only supported post-install method to (dynamically and)
> persistently activate zfcp-attached FCP LUNs. See also
> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/
> ludd_t_fcp_wrk_addu.html.
>
> > PV Volume information:
> > physical_volumes {
> >
> >                pv0 {
> >                        device = "/dev/sdb5"        # Hint only
>
> >                pv1 {
> >                        device = "/dev/sda"        # Hint only
>
> This does not look very good, having single path scsi disk devices mentioned
> by LVM. With zfcp-attached SCSI disks, LVM must be on top of multipathing.
> Could you please double check if your installation with LVM and multipathing
> does the correct layering? If not, this would be an independent bug. See
> also [1, slide 28 "Multipathing for Disks ? LVM on Top"].
>
> > Additional testing has been done with CKD volumes and we see the same
> > behavior.
> > Because of this behavior, I do not
> > believe the problem is related to SAN disk or multipath.   I think it is due
> > to the system not being able to read the UUID on any PV in the VG other then
> > the IPL disk.
>
> For any disk device type, the initrd must contain all information how to
> enable/activate all paths of the entire block device dependency tree
> required to mount the root file system. An example for a dependency tree is
> in [1, slide 37] and such example is independent of any particular Linux
> distribution.
> I don't know how much automatic dependency tracking Ubuntu does for the
> user, especially regarding additional z-specific device activation steps
> ("setting online" as for DASD or zFCP). Potentially the user must take care
> of the dependency tree himself and ensure the necessary information lands in
> the initrd.
>
> Once the dependency tree of the root-fs has changed (such as adding a PV to
> an LVM containing the root-fs as in your case), you must re-create the
> initrd with the following command before any reboot:
> $ update-initramfs -u

The "update-initramfs -u" command was never explicitly run after the system was 
built.
The second PV volume was added to VG on 10/26/2016.  However,  it was not until 
early November that the root FS was extended.

Between 10/16/2016 and the date the root fs was extended,  the second PV was 
always online and and active in a VG and LV display after every Reboot.
I have a note in my runlog with the following from 10/26/2016
>>>Rebooted the system and all is working. Both disks are there and everything 
>>>is online.
lsscsi
[0:0:0:1074675712]disk IBM 2107900 1.69 /dev/sdb         <-----  This would be 
0x400E4000
[0:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdd         <-----  This would be 
0x400F4000
[1:0:0:1074675712]disk IBM 2107900 1.69 /dev/sda
[1:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdc

>
> On z Systems, this also contains the necessary step to re-write the boot
> record (using the zipl bootloader management tool) so it correctly points to
> the new initrd.
> See also
> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/
> ludd_t_fcp_wrk_on.html.
>
>
> In your case on reboot, it only activated 2 paths to FCP LUN
> 0x4000400e00000000 (I cannot determine the target port WWPN(s) from below
> output because it does not convey this info) from two different FCP devices
> 0.0.e300 and 0.0.e100.
> From attachment 113696 [details]:
> [    6.666977] scsi host0: zfcp
> [    6.671670] random: nonblocking pool is initialized
> [    6.672622] qdio: 0.0.e300 ZFCP on SC 2cc5 using AI:1 QEBSM:0 PRI:1 TDD:1
> SIGA: W AP
> [    6.722312] scsi host1: zfcp
> [    6.724547] scsi 0:0:0:1074675712: Direct-Access     IBM      2107900
> 1.69 PQ: 0 ANSI: 5
> [    6.725159] sd 0:0:0:1074675712: alua: supports implicit TPGS
> [    6.725164] sd 0:0:0:1074675712: alua: device
> naa.6005076306ffd700000000000000000e port group 0 rel port 303
> [    6.725287] sd 0:0:0:1074675712: Attached scsi generic sg0 type 0
> [    6.728234] qdio: 0.0.e100 ZFCP on SC 2c85 using AI:1 QEBSM:0 PRI:1 TDD:1
> SIGA: W AP
> [    6.747662] sd 0:0:0:1074675712: alua: transition timeout set to 60
> seconds
> [    6.747667] sd 0:0:0:1074675712: alua: port group 00 state A preferred
> supports tolusnA
> [    6.747801] sd 0:0:0:1074675712: [sda] 209715200 512-byte logical blocks:
> (107 GB/100 GiB)
> [    6.748652] sd 0:0:0:1074675712: [sda] Write Protect is off
> [    6.749024] sd 0:0:0:1074675712: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [    6.752076]  sda: sda1 sda2 < sda5 >
> [    6.754107] sd 0:0:0:1074675712: [sda] Attached SCSI disk
> [    6.760935] scsi 1:0:0:1074675712: Direct-Access     IBM      2107900
> 1.69 PQ: 0 ANSI: 5
> [    6.761444] sd 1:0:0:1074675712: alua: supports implicit TPGS
> [    6.761448] sd 1:0:0:1074675712: alua: device
> naa.6005076306ffd700000000000000000e port group 0 rel port 231
> [    6.761514] sd 1:0:0:1074675712: Attached scsi generic sg1 type 0
> [    6.787710] sd 1:0:0:1074675712: [sdb] 209715200 512-byte logical blocks:
> (107 GB/100 GiB)
> [    6.787770] sd 1:0:0:1074675712: alua: port group 00 state A preferred
> supports tolusnA
> [    6.788464] sd 1:0:0:1074675712: [sdb] Write Protect is off[    6.788728]
> sd 1:0:0:1074675712: [sdb] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [    6.790829]  sdb: sdb1 sdb2 < sdb5 >
> [    6.792535] sd 1:0:0:1074675712: [sdb] Attached SCSI disk

I see what you are saying,  only 1074675712 (0x400E4000) is coming
online at boot.  107474128 (0x400f4000) does not come online at boot.
The second device must be coming online after boot has completed and
that is why lsscsi shows it online.  And since the boot partition is on
the first segment, the system can read initrd and start the boot.  But
when it goes to mount root,  it is not aware of the second segment.   Do
I have this right?

If so, that brings me to the next question.  If this is the case,  do
you have a procedure where I could bring up a rescue system,  bring
volumes 1074675712 (0x400E4000) & 107474128 (0x400f4000) online, chroot
and then update the initrd with the second volume?  or do I need to
rebuild the system from scratch?

>
>
> REFERENCE
>
> [1]
> http://www-05.ibm.com/de/events/linux-on-z/pdf/day2/4_Steffen_Maier_zfcp-
> best-practices-2015.pdf

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1641078

Title:
  System cannot be booted up when root filesystem is on an LVM on two
  disks

Status in Ubuntu on IBM z Systems:
  New
Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  LVMed root file system acrossing multiple disks cannot be booted up 
    
  ---uname output---
  Linux ntc170 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:47:15 UTC 2016 
s390x s390x s390x GNU/Linux
   
  ---Patches Installed---
  n/a
   
  Machine Type = z13 
   
  ---System Hang---
   cannot boot up the system after shutdown or reboot
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Created root file system on an LVM and the LVM crosses two disks. After shut 
down or reboot the system, the system cannot be up. 
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  Device driver error code:
   Begin: Mounting root file system ... Begin: Running /scripts/local-top ...   
lvmetad is not active yet, using direct activation during sysinit 
    Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V. 
   
  -Attach sysctl -a output output to the bug.

  More detailed installation description:

  The installation was on a FCP SCSI SAN volumes each with two active
  paths.  Multipath was involved.  The system IPLed fine up to the point
  that we expanded the /root filesystem to span volumes.  At boot time,
  the system was unable to locate the second segment of the /root
  filesystem.   The error message indicated this was due to lvmetad not
  being not active.

  Error message:   
         Begin: Running /scripts/local-block ...   lvmetad is not active yet, 
using direct activation during sysinit 
         Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V 
          Failed to find logical volume "ub01-vg/root" 
          
  PV Volume information: 
  physical_volumes { 

                 pv0 { 
                         id = "L2qixM-SKkF-rQsp-ddao-gagl-LwKV-7Bw1Dz" 
                         device = "/dev/sdb5"        # Hint only 

                         status = ["ALLOCATABLE"] 
                         flags = [] 
                         dev_size = 208713728        # 99.5225 Gigabytes 
                         pe_start = 2048 
                         pe_count = 25477        # 99.5195 Gigabytes 
                 } 

                 pv1 { 
                         id = "7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V" 
                         device = "/dev/sda"        # Hint only 

                         status = ["ALLOCATABLE"] 
                         flags = [] 
                         dev_size = 209715200        # 100 Gigabytes 
                         pe_start = 2048 
                         pe_count = 25599        # 99.9961 Gigabytes 

  
  LV Volume Information: 
  logical_volumes { 

                 root { 
                         id = "qWuZeJ-Libv-DrEs-9b1a-p0QF-2Fj0-qgGsL8" 
                         status = ["READ", "WRITE", "VISIBLE"] 
                         flags = [] 
                         creation_host = "ub01" 
                         creation_time = 1477515033        # 2016-10-26 
16:50:33 -0400 
                         segment_count = 2 

                         segment1 { 
                                 start_extent = 0 
                                 extent_count = 921        # 3.59766 Gigabytes 

                                 type = "striped" 
                                 stripe_count = 1        # linear 

                                 stripes = [ 
                                         "pv0", 0 
                                 ] 
                         } 
                         segment2 { 
                                 start_extent = 921 
                                 extent_count = 25344        # 99 Gigabytes 

                                 type = "striped" 
                                 stripe_count = 1        # linear 

                                 stripes = [ 
                                         "pv1", 0 
                                 ] 
                         } 
                 } 

  
  Additional testing has been done with CKD volumes and we see the same 
behavior.   Only the UUID of the fist volume in the VG can be located at boot, 
and the same message:  lvmetad is not active yet, using direct activation 
during sysinit 
  Couldn't find device with uuid xxxxxxxxxxxxxxxxx  is displayed for CKD disks. 
Just a different UUID is listed.   
  If the file /root file system only has one segment on the first volume,  CKD 
or SCSI  volumes, the system will IPL.  Because of this behavior, I do not 
believe the problem is related to SAN disk or multipath.   I think it is due to 
the system not being able to read the UUID on any PV in the VG other then the 
IPL disk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to