Package: multipath-tools Version: 0.5.0-6+deb8u1 Severity: critical Tags: patch
Configuration: I have the following setup: Dell PowerEdge M620 + QLogic ISP2532-based 8GB Fibre Channel to PCI Express HBA attached to our SAN with multipath. OS is Debian Jessie 8.1 The Servers root file system resides on a LVM logical Volume. The packages multipath-tools and multipath-tools-boot were installed. Symptom: Approximately 50% of the time the server won't boot correctly. (Depending on the outcome of the race condition between udev and multipathd [see below]) The password prompt for entering single user mode (or rescue.target) appears. Problem: The problem seems to be the same, Will Aoki already reported for upgrade-reports in the bug report 788295. He was using open-iscsi, while I'm using a FC-HBA with the qla2xxx module. I'm guessing other combinations are affected too. Bug 788295 has a very detailed analysis of the problem. The provided logs correlate with mine. Since 788295 was filed against upgrade-reports, it'll probably not get fixed, hence this report. Further Information: Existing Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788295 Ubuntu fixed the issue. See https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1431650 Ubuntu Package with fix: http://packages.ubuntu.com/trusty-updates/multipath-tools See also the comment of the patch taken from Ubuntu for more technical details. Solution: The following patch, taken from the Ubuntu package solved the problem for me and Will Aoki. Could you please add this patch to the official Debian package and if possible get the fixed package into jessie-updates and the next jessie release? ------------------- START OF PATCH ----------------- >From 841977fc9c3432702c296d6239e4a54291a6007a Mon Sep 17 00:00:00 2001 From: Hannes Reinecke <h...@suse.de> Date: Tue, 24 Jun 2014 08:49:15 +0200 Subject: [PATCH] libmultipath: use a shared lock to co-operate with udev udev since v214 is placing a shared lock on the device node whenever it's processing the event. This introduces a race condition with multipathd, as multipathd is processing the event for the block device at the same time as udev is processing the events for the partitions. And a lock on the partitions will also be visible on the block device itself, hence multipathd won't be able to lock the device. When multipath manages to take a lock on the device, udev will fail, and consequently ignore this entire event. Which in turn might cause the system to malfunction as it might have been a crucial event like 'remove' or 'link down'. So we should better use LOCK_SH here; with that the flock call in multipathd _and_ udev will succeed and the events can be processed. References: bnc#883878 Signed-off-by: Hannes Reinecke <h...@suse.de> --- libmultipath/configure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libmultipath/configure.c b/libmultipath/configure.c index 0ddd3d5..dc2ebf0 100644 --- a/libmultipath/configure.c +++ b/libmultipath/configure.c @@ -529,7 +529,7 @@ lock_multipath (struct multipath * mpp, int lock) if (!pgp->paths) continue; vector_foreach_slot(pgp->paths, pp, j) { - if (lock && flock(pp->fd, LOCK_EX | LOCK_NB) && + if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) && errno == EWOULDBLOCK) goto fail; else if (!lock) ------------------- END OF PATCH ----------------- Additional comments: Why I rated this critical: (1) The Ubuntu bug is rated critical. (2) I think the "makes unrelated software on the system (or the whole system) break" clause applies when a system does not reliably boot anymore. I can provide journal entries of a failed boot attempt if necessary. Since such logs already exist in bug 788295 and a tested patch exists, I thought it wasn't. Kind Regards Niels Baumgartner