On Wed, 2026-02-04 at 20:03 -0500, Benjamin Marzinski wrote:
> On Thu, Feb 05, 2026 at 12:57:38AM +0100, Hannes Reinecke wrote:
> > On 2/4/26 19:32, Stefan Hajnoczi wrote:
> > > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > > Hi Stefan,
> > > > 
> > > > On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> > [ .. ]>>>
> > > > > It can be generic. The messages will contain the block device
> > > > > major:minor as well as information to describe <linux/pr.h>
> > > > > requests.
> > > > 
> > > > So the ioctls will pass through qemu into the kernel, to be
> > > > intercepted
> > > > by the dm-mpath driver, which will use an upcall to have them
> > > > handled
> > > > by mpathpersistd (for the actual command) and multipathd (for
> > > > the path
> > > > registrations).
> > > > 
> > > > I don't fully understand the advantage, security and
> > > > complexity-wise,
> > > > of this concept, compared to intercepting them qemu and using a
> > > > socket
> > > > to talk to mpathpersistd directly. If we did this, we could
> > > > even
> > > > support both generic and SCSI PR commands.
> > > 
> > > Hi Martin,
> > > The simplification and security benefits are on the application
> > > side,
> > > not on the DM-Multipath side, so I can see what you're getting
> > > at. From
> > > the DM-Multipath perspective things get a little more complex.
> > > 
> > >  From an application perspective, a single API that works across
> > > block
> > > device types (SCSI, NVMe, DM-Multipath) and requires no
> > > privileges or
> > > sockets (they are a pain in container environments) is the most
> > > convenient. The <linux/pr.h> ioctl API offers exactly this.
> > > 
> > > Unfortunately, DM-Multipath currently does not fully support
> > > <linux/pr.h>. It sends PR operations down each path, but that is
> > > only a
> > > subset of libmpathpersist's logic and multipathd is not kept in
> > > sync.
> > > 
> > > My impression is that libmpathpersist and multipathd logic cannot
> > > be
> > > easily moved into the kernel. This is where the upcall idea comes
> > > from.
> > > Let's notify multipath-tools from DM-Multipath so it can do its
> > > work in
> > > userspace.
> > > 
> > It _might_ be possible by extending the current path-switching
> > code in the kernel to keep track of PRs. The we could move the
> > registration upon path switching, and (ideally) could do away
> > with upcalls.
> > Not sure, though, how targets react when having to deal with a
> > flood of PR commands ...
> > But maybe worth a try.
> 
> Making a multipath device pretend to be single Persistently
> Reservable
> device involves a lot of ugly workarounds that I'm not really excited
> to
> see in the kernel.
> 
> For instance, every time a new path appears or a path that was down
> when
> the device was registered comes up, multipath needs to register that
> path. But a preempt could come it while it is doing this (or indeed
> any
> time after multipath registered the other paths). So it has to check
> the that the registrations are still there on the other paths before
> registering the new path, and then check again afterwards to make
> sure
> that there wasn't a preempt during the registration. 
> 
> Worse, you can't release a reservation from a path that is down. If
> multipath needs to release its reservation, and the path that is
> holding
> it is down, the only solution I could come up with is to suspend the
> device so no IO happens. Preempt the reservation to move it to an
> active
> path, which wipes the registrations off all the other paths. Then
> reregister the all the active paths again, and unsuspend the device.
> The failed paths will get reregistred as they come back up.
> 
> And there's more cases like these. They are, of course, just as
> doable
> in the kernel as in userspace, but it's a lot of persistent
> reservation
> code to put into the multipath target.

Having reviewed (or tried to do so) Ben's code for handling the various
corner cases, I agree that we don't want to start all over with this.

Martin

Reply via email to