On Fri, 4 Jan 2008, Matthew Wilcox wrote:
>
> Hi Bruce,
>
> The current implementation of vfs_setlease/generic_setlease/etc is a
> bit quirky. I've been thinking it over for the past couple of days,
> and I think we need to refactor it to work sensibly.
>
> As you may have noticed, I've been mulling over getting rid of the
> BKL in fs/locks.c and the lease interface is particularly problematic.
> Here's one reason why:
>
> fcntl_setlease
> lock_kernel
> vfs_setlease
> lock_kernel
> generic_setlease
> unlock_kernel
> fasync_helper
> unlock_kernel
>
> This is perfectly legitimate for the BKL of course, but other spinlocks
> can't be acquired recursively in this way. At first I thought I'd just
> push the call to generic_setlease into __vfs_setlease and have two ways
> of calling it:
>
> fcntl_setlease
> lock_kernel
> __vfs_setlease
> fasync_helper
> unlock_kernel
>
> vfs_setlease
> lock_kernel
> __vfs_setlease
> unlock_kernel
>
> But generic_setlease allocates memory (GFP_KERNEL) under the lock, so
> that's bad for converting to spnlocks later. Then I thought I'd move
> the spinlock acquisition into generic_setlease(). But I can't do that
> either as fcntl_setlease() needs to hold the spinlock around the call
> to generic_setlease() and fasync_helper().
>
> Then I started to wonder about the current split of functionality between
> fcntl_setlease, vfs_setlease and generic_setlease. The check for no
> other process having this file open should be common to all filesystems.
> And it should be honoured by NFS (it shouldn't hand out a delegation if
> a local user has the file open), so it needs to be in vfs_setlease().
>
> Now I'm worrying about the mechanics of calling out to a filesystem to
> perform a lock. Obviously, a network filesystem is going to have to
> sleep waiting for a response from the fileserver, so that precludes the
> use of a spinlock held over the call to f_op->setlease. I'm not keen on
> the idea of EXPORT_SYMBOL_GPL() a spinlock -- verifying locking is quite
> hard enough without worrying what a filesystem might be doing with it.
>
> So I think we need to refactor the interface, and I'd like to hear your
> thoughts on my ideas of how to handle this.
>
> First, have clients of vfs_setlease call lease_alloc() and pass it in,
> rather than allocate it on the stack and have this horrible interface
> where we may pass back an allocated lock. This allows us to not allocate
> memory (hence sleep) during generic_setlease().
>
> Second, move some of the functionality of generic_setlease() to
> vfs_setlease(), as mentioned above.
>
> Third, change the purpose of f_op->setlease. We can rename it if you
> like to catch any out of tree users. I'd propose using it like this:
fwiw, i've done some work on extending the lease subsystem to help
support the full range of requirements for NFSv4 file and directory
delegations (e.g., breaking a lease when unlinking a file) and we ended up
actually doing most of what you've just suggested here, which i take to be
a good sign.
most of my refactoring came out of trying to simplify locking and
avoid holding locks too long (rather than specifically focusing on
cluster-oriented stuff, but the goals dovetail) and your work on getting
the BKL out of locks.c entirely is something i really like and look
forward to.
thanks,
d
.
> vfs_setlease()
> if (f_op->setlease())
> res = f_op->setlease()
> if (res)
> return res;
> lock_kernel()
> generic_setlease()
> unlock_kernel()
>
> fcntl_setlease
> if (f_op->setlease())
> res = f_op->setlease()
> if (res)
> return res;
> lock_kernel
> generic_setlease()
> ... fasync ...
> unlock_kernel
>
> So 'f_op->setlease' now means "Try to get a lease from the fileserver".
> We can optimise this a bit to not even call setlease if, say, we already
> have a read lease and we're trying to get a second read lease. But we
> have to record our local leases (that way they'll show up in /proc/locks).
>
> I think the combination of these three ideas gives us a sane interface
> to the various setlease functions, and let us convert from lock_kernel()
> to spin_lock() later. Have I missed something? I don't often think
> about the needs of cluster filesystems, so I may misunderstand how they
> need this operation to work.
>
> At some point, we need to revisit the logic for 'is this file open
> by another process' -- it's clearly buggy since it doesn't hold the
> inode_lock while checking i_count, so it could race against an __iget().
>
> --
> Intel are signing my paycheques ... these opinions are still mine
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours. We can't possibly take such
> a retrograde step."
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html