On 6/4/24 11:03, Joost Roeleveld wrote:
Yes, I have dev-mapper multipath support in the kernel:
# zcat /proc/config.gz | grep -i multipath
# CONFIG_NVME_MULTIPATH is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
# CONFIG_DM_MULTIPATH_IOA is not set

% zgrep "CONFIG_DM_M" /proc/config.gz
# CONFIG_DM_MIRROR is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=y
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_MULTIPATH_HST=y
CONFIG_DM_MULTIPATH_IOA=y

I suspect that integral to the kernel is probably okay.

I installed multipath:
# eix -I multipath
[U] sys-fs/multipath-tools
  Available versions: 0.9.7^t 0.9.7-r1^t{tbz2} 0.9.8^t{tbz2} {systemd test}
 Installed versions: 0.9.7-r1^t{tbz2}(04:27:38 PM 04/10/2024)(-systemd -test)
  Homepage: http://christophe.varoqui.free.fr/
  Description: Device mapper target autoconfig

I've got sys-fs/multipath-tools-0.9.8 installed.

I added 'multipath' and 'multipathd' to the default runlevel:
# rc-status | grep multipath
  multipath [ started ]
  multipathd [ started ]

% rc-status | grep multipath
 multipath    ... [  started  ]
 multipathd   ... [  started  ]

The configfiles:

# cat /etc/multipath.conf
defaults {
  path_grouping_policy multibus
  path_selector "queue-length 0"
  rr_min_io_rq 100
}

% cat /etc/multipath.conf
defaults {
        path_grouping_policy multibus
        path_selector "queue-length 0"
        rr_min_io_rq 100
}

# ls /etc/multipath
bindings wwids
san1 ~ # cat /etc/multipath/bindings
# Multipath bindings, Version : 1.0
# NOTE: this file is automatically maintained by the multipath program.
# You should not need to edit this file in normal circumstances.
#
# Format:
# alias wwid
#
san1 ~ # cat /etc/multipath/wwids
# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:

Same.

With all this, I got multipath working:

# multipath -l
35000cca0c444c380 dm-11 HGST,HUS726T4TAL5204
size=3.6T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
  |- 0:0:10:0 sdk 8:160 active undef running
  |- 0:0:35:0 sdai 66:32 active undef running
  |- 1:0:10:0 sdbg 67:160 active undef running
  `- 1:0:35:0 sdce 69:32 active undef running

% multipath -l
%

All the config I have is listed above, hope it helps

I was hoping so too.

There is 1 thing that might cause issues. If multipath doesn't detect "sdd" and "sdg" are the same physical disk, it won't automagically link them. On my system, it identifies it due to them having the same serial number. For Fiberchannel to something else, you might need to configure this.

N.B. the devices have changed name since rebooting:

% lsblk | grep 0G
sda           8:0    0    10G  0 disk
sdb           8:16   0   100G  0 disk
sde           8:64   0    10G  0 disk
sdf           8:80   0   100G  0 disk

The system boots off of an NVMe. I have no idea why other SATA disks showed up between sda/sdb and sde/sdf.

Can you check the output of the following commands:
cat /sys/block/sdd/device/wwid

% cat /sys/block/sda/device/wwid
naa.600601603b1023004677868ab21aef11

% cat /sys/block/sde/device/wwid
naa.600601603b1023004677868ab21aef11

% cat /sys/block/sdb/device/wwid
naa.600601603b1023008c83299ab21aef11

% cat /sys/block/sdf/device/wwid
naa.600601603b1023008c83299ab21aef11

% fgrep 600601603b1023004677868ab21aef11 /sys/block/sd?/device/wwid
/sys/block/sda/device/wwid:naa.600601603b1023004677868ab21aef11
/sys/block/sde/device/wwid:naa.600601603b1023004677868ab21aef11

% fgrep 600601603b1023008c83299ab21aef11 /sys/block/sd?/device/wwid
/sys/block/sdb/device/wwid:naa.600601603b1023008c83299ab21aef11
/sys/block/sdf/device/wwid:naa.600601603b1023008c83299ab21aef11

It could. Check the entries in the /sys/... filesystem referenced above. That might show a possible cause.

I'm not seeing anything in /sys.

Check my config above and we might be able to figure this out.

I was genuinely hoping that I missed something. But I /think/ I covered the things that you mentioned.

It's been running stable for me for over 7 years now with most disks being that age as well. :)

:-)

Here's some diagnostic that I get from multipath when I crank it up to 5.

% multipath -d -v5 |& grep sda
858065.188852 | Discover device /sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.1/host1/rport-1:0-0/target1:0:0/1:0:0:0/block/sda
858065.189045 | sda: mask = 0x3f
858065.189061 | sda: dev_t = 8:0
858065.189064 | open '/sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.1/host1/rport-1:0-0/target1:0:0/1:0:0:0/block/sda/size'
858065.189093 | sda: size = 20971520
858065.189306 | sda: vendor = DGC
858065.189340 | sda: product = RAID 10
858065.189361 | sda: rev = 0429
858065.190057 | sda: h:b:t:l = 1:0:0:0
858065.191262 | sda: tgt_node_name = 0x50060160bce01897
858065.191266 | sda: uid_attribute = ID_SERIAL (setting: multipath internal)
858065.191285 | sda: recheck_wwid = 1 (setting: multipath.conf defaults/devices section)
858065.191488 | sda: udev property ID_WWN whitelisted
858065.191568 | sda: path state = running
858065.191851 | sda: 10240 cyl, 64 heads, 32 sectors/track, start at 0
858065.191856 | sda: vpd_vendor_id = 0 "undef" (setting: multipath internal)
858065.191899 | sda: serial = APM00085002348
858065.191902 | sda: detect_checker = no (setting: storage device configuration)
858065.191952 | sda checker timeout = 30 s (setting: kernel sysfs)
858065.191958 | sda: path_checker = emc_clariion (setting: storage device configuration)
858065.192088 | sda: emc_clariion state = down
858065.192112 | sda: emc_clariion checker: Path not correctly configured for failover
858065.192117 | sda: uid = 3600601603b1023004677868ab21aef11 (udev)
858065.192122 | sda: detect_prio = yes (setting: multipath internal)
858065.192391 | sda: prio = alua (setting: storage device autodetected)
858065.192395 | sda: prio args = "" (setting: storage device autodetected)
858065.192413 | sda: reported target port group is 1
858065.192571 | sda: aas = 01 [active/non-optimized]
858065.192575 | sda: alua prio = 10
858065.202314 | sda: udev property ID_WWN whitelisted
858065.202319 | checking if sda should be multipathed
858065.202345 | wwid 3600601603b1023004677868ab21aef11 not in wwids file, skipping sda
858065.202349 | sda: orphan path, only one path
3600601603b1023004677868ab21aef11 1:0:0:0 sda 8:0 10 undef undef DGC,RAID 10 unknown

% multipath -d -v5 |& grep sde
858083.764969 | Discover device /sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:0/block/sde
858083.765119 | sde: mask = 0x3f
858083.765123 | sde: dev_t = 8:64
858083.765127 | open '/sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:0/block/sde/size'
858083.765144 | sde: size = 20971520
858083.765344 | sde: vendor = DGC
858083.765368 | sde: product = RAID 10
858083.765392 | sde: rev = 0429
858083.765947 | sde: h:b:t:l = 0:0:0:0
858083.766697 | sde: tgt_node_name = 0x50060160bce01897
858083.766702 | sde: uid_attribute = ID_SERIAL (setting: multipath internal)
858083.766706 | sde: recheck_wwid = 1 (setting: multipath.conf defaults/devices section)
858083.766839 | sde: udev property ID_WWN whitelisted
858083.766927 | sde: path state = running
858083.767178 | sde: 10240 cyl, 64 heads, 32 sectors/track, start at 0
858083.767186 | sde: vpd_vendor_id = 0 "undef" (setting: multipath internal)
858083.767227 | sde: serial = APM00085002348
858083.767234 | sde: detect_checker = no (setting: storage device configuration)
858083.767286 | sde checker timeout = 30 s (setting: kernel sysfs)
858083.767449 | sde: path_checker = emc_clariion (setting: storage device configuration)
858083.767557 | sde: emc_clariion state = down
858083.767566 | sde: emc_clariion checker: Path not correctly configured for failover
858083.767572 | sde: uid = 3600601603b1023004677868ab21aef11 (udev)
858083.767577 | sde: detect_prio = yes (setting: multipath internal)
858083.768138 | sde: prio = alua (setting: storage device autodetected)
858083.768144 | sde: prio args = "" (setting: storage device autodetected)
858083.768175 | sde: reported target port group is 2
858083.768312 | sde: aas = 80 [active/optimized] [preferred]
858083.768320 | sde: alua prio = 50
858083.782976 | sde: udev property ID_WWN whitelisted
858083.782996 | checking if sde should be multipathed
858083.783049 | wwid 3600601603b1023004677868ab21aef11 not in wwids file, skipping sde
858083.783052 | sde: orphan path, only one path
3600601603b1023004677868ab21aef11 0:0:0:0 sde 8:64 50 undef undef DGC,RAID 10 unknown



% multipath -d -v5 |& grep sdb
858087.190831 | Discover device /sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.1/host1/rport-1:0-0/target1:0:0/1:0:0:1/block/sdb
858087.190989 | sdb: mask = 0x3f
858087.190992 | sdb: dev_t = 8:16
858087.190995 | open '/sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.1/host1/rport-1:0-0/target1:0:0/1:0:0:1/block/sdb/size'
858087.191011 | sdb: size = 209715200
858087.191198 | sdb: vendor = DGC
858087.191221 | sdb: product = VRAID
858087.191242 | sdb: rev = 0429
858087.191789 | sdb: h:b:t:l = 1:0:0:1
858087.192443 | sdb: tgt_node_name = 0x50060160bce01897
858087.192448 | sdb: uid_attribute = ID_SERIAL (setting: multipath internal)
858087.192452 | sdb: recheck_wwid = 1 (setting: multipath.conf defaults/devices section)
858087.192575 | sdb: udev property ID_WWN whitelisted
858087.192619 | sdb: path state = running
858087.192781 | sdb: 13054 cyl, 255 heads, 63 sectors/track, start at 0
858087.192789 | sdb: vpd_vendor_id = 0 "undef" (setting: multipath internal)
858087.192818 | sdb: serial = APM00085002348
858087.192824 | sdb: detect_checker = no (setting: storage device configuration)
858087.192861 | sdb checker timeout = 30 s (setting: kernel sysfs)
858087.192865 | sdb: path_checker = emc_clariion (setting: storage device configuration)
858087.192945 | sdb: emc_clariion state = down
858087.192949 | sdb: emc_clariion checker: Path not correctly configured for failover
858087.192954 | sdb: uid = 3600601603b1023008c83299ab21aef11 (udev)
858087.192958 | sdb: detect_prio = yes (setting: multipath internal)
858087.193089 | sdb: prio = alua (setting: storage device autodetected)
858087.193095 | sdb: prio args = "" (setting: storage device autodetected)
858087.193127 | sdb: reported target port group is 1
858087.193198 | sdb: aas = 80 [active/optimized] [preferred]
858087.193203 | sdb: alua prio = 50
858087.201206 | sdb: udev property ID_WWN whitelisted
858087.201210 | checking if sdb should be multipathed
858087.201250 | wwid 3600601603b1023008c83299ab21aef11 not in wwids file, skipping sdb
858087.201253 | sdb: orphan path, only one path
3600601603b1023008c83299ab21aef11 1:0:0:1 sdb 8:16 50 undef undef DGC,VRAID unknown

% multipath -d -v5 |& grep sdf
858088.602109 | Discover device /sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:1/block/sdf
858088.602263 | sdf: mask = 0x3f
858088.602267 | sdf: dev_t = 8:80
858088.602271 | open '/sys/devices/pci0000:00/0000:00:1d.0/0000:02:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:1/block/sdf/size'
858088.602286 | sdf: size = 209715200
858088.602472 | sdf: vendor = DGC
858088.602494 | sdf: product = VRAID
858088.602515 | sdf: rev = 0429
858088.603042 | sdf: h:b:t:l = 0:0:0:1
858088.603683 | sdf: tgt_node_name = 0x50060160bce01897
858088.603688 | sdf: uid_attribute = ID_SERIAL (setting: multipath internal)
858088.603691 | sdf: recheck_wwid = 1 (setting: multipath.conf defaults/devices section)
858088.603805 | sdf: udev property ID_WWN whitelisted
858088.603842 | sdf: path state = running
858088.604249 | sdf: 13054 cyl, 255 heads, 63 sectors/track, start at 0
858088.604260 | sdf: vpd_vendor_id = 0 "undef" (setting: multipath internal)
858088.604325 | sdf: serial = APM00085002348
858088.604331 | sdf: detect_checker = no (setting: storage device configuration)
858088.604373 | sdf checker timeout = 30 s (setting: kernel sysfs)
858088.604377 | sdf: path_checker = emc_clariion (setting: storage device configuration)
858088.604468 | sdf: emc_clariion state = down
858088.604473 | sdf: emc_clariion checker: Path not correctly configured for failover
858088.604478 | sdf: uid = 3600601603b1023008c83299ab21aef11 (udev)
858088.604482 | sdf: detect_prio = yes (setting: multipath internal)
858088.604705 | sdf: prio = alua (setting: storage device autodetected)
858088.604709 | sdf: prio args = "" (setting: storage device autodetected)
858088.604733 | sdf: reported target port group is 2
858088.604899 | sdf: aas = 01 [active/non-optimized]
858088.604906 | sdf: alua prio = 10
858088.618617 | sdf: udev property ID_WWN whitelisted
858088.618622 | checking if sdf should be multipathed
858088.618649 | wwid 3600601603b1023008c83299ab21aef11 not in wwids file, skipping sdf
858088.618653 | sdf: orphan path, only one path
3600601603b1023008c83299ab21aef11 0:0:0:1 sdf 8:80 10 undef undef DGC,VRAID unknown


The thing that makes me think that this might be EMC CLARiiON specific is the following output.

% multipath -d -v5 |& grep emc_clariion
858194.922991 | loading /lib64/multipath/libcheckemc_clariion.so checker
858194.923088 | checker emc_clariion: message table size = 9
858194.923093 | sde: path_checker = emc_clariion (setting: storage device configuration)
858194.923175 | sde: emc_clariion state = down
858194.923179 | sde: emc_clariion checker: Path not correctly configured for failover 858194.925776 | sdf: path_checker = emc_clariion (setting: storage device configuration)
858194.925848 | sdf: emc_clariion state = down
858194.925852 | sdf: emc_clariion checker: Path not correctly configured for failover 858194.928303 | sda: path_checker = emc_clariion (setting: storage device configuration)
858194.928386 | sda: emc_clariion state = down
858194.928390 | sda: emc_clariion checker: Path not correctly configured for failover 858194.930674 | sdb: path_checker = emc_clariion (setting: storage device configuration)
858194.930755 | sdb: emc_clariion state = down
858194.930761 | sdb: emc_clariion checker: Path not correctly configured for failover
858194.937861 | emc_clariion checker refcount 4
858194.937989 | emc_clariion checker refcount 3
858194.938070 | emc_clariion checker refcount 2
858194.938125 | emc_clariion checker refcount 1
858194.938305 | unloading emc_clariion checker



--
Grant. . . .

Reply via email to