Control: tags -1 + moreinfo

Hi Christoph,

On Wed, Aug 16, 2023 at 02:26:43PM +0200, Christoph Anton Mitterer wrote:
> Package: src:linux
> Version: 6.1.38-2
> Severity: normal
> 
> 
> Hey.
> 
> I'm seeing the following problem since upgrading from Debian bullseye
> to
> bookworm:
> 
> We run a Tier-2 for the LHC Computing Grid, where dCache is used as
> storage
> software.
> dCache in turn provides a NFS 4.1 / pNFS server.
> This means in specific, that there is one NFS "door" server and pool
> servers
> (which contain the actual data). The NFS client (from the Linux kernel)
> connects
> to the door, but when actual files are read/written the connection goes
> to one
> of the pools.
> 
> Now what fails is, when I try to mv a file on the NFS mountpoint.
> In specific:
> - the mv process seems to simply freeze
> - while it's frozen, when listing the directory, the file has still the
> old name
> - when I then Ctrl-C the mv it exits
> - when now listing the directory, the file has the new name
> 
> This worked properly with at least up to the 5.10.179-3 kerne from
> bullseye.
> 
> I should also note, that in the case where it fails, the server (the
> door) runs
> on the same host from where I also run the client... and that any
> loopback
> traffic is generally whitelisted for netfilter. Further, the pools are
> in the
> same subnet, and again any traffic within that subnet is whitelisted on
> all
> servers.
> 
> /etc/exports (which dCache uses as well) has:
>   / localhost(rw,no_root_squash,secure)
>   /pnfs localhost(rw,no_root_squash,secure)
> (with the /pnfs mountpoint being the one that's used)
> 
> 
> Next I tried the same from my laptop's Debian sid (kernel 6.4.4-3) from
> outside
> the subnet (but allowing NFS for my particular IP):
> There it also works.
> 
> 
> When mounting, kernel log shows:
> [Aug12 15:15] FS-Cache: Loaded
> [  +0,033084] RPC: Registered named UNIX socket transport module.
> [  +0,000005] RPC: Registered udp transport module.
> [  +0,000001] RPC: Registered tcp transport module.
> [  +0,000000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [  +0,136237] Key type dns_resolver registered
> [  +0,113564] NFS: Registering the id_resolver key type
> [  +0,000012] Key type id_resolver registered
> [  +0,000001] Key type id_legacy registered
> [Aug12 15:17] nfs4filelayout_init: NFSv4 File Layout Driver
> Registering...
> 
> but that's the same on both nodes (apart from times of course).
> 
> 
> Any ideas?

While looking at some NFS related bugs I noticed this one which was
unaswered, but reported against an old 6.1.y version.

I'm closing the bug in the sense of BTS housekeeping, but please do
the following: If you are able to reproduce the problem with a current
6.1.y version, then please reopen the bug and do remove the moreinfo
tag if you have an indepented reproducer of dCache, in which case it
might be considered a upstream problem, otherwise I would suggest you
first approach the dCache developers (it still could be a kernel
problem as dCache from a quick look is plain in userpace components?).

Please do attach the full boot log, after having triggered the
problem.

If you can, try please as well a more recent version ideally the one
from unstable to verify the problem is still present there.

Thanks for your understanding,

Regards,
Salvatore

Reply via email to