Control: tags -1 + moreinfo Hi Christoph,
On Wed, Aug 16, 2023 at 02:26:43PM +0200, Christoph Anton Mitterer wrote: > Package: src:linux > Version: 6.1.38-2 > Severity: normal > > > Hey. > > I'm seeing the following problem since upgrading from Debian bullseye > to > bookworm: > > We run a Tier-2 for the LHC Computing Grid, where dCache is used as > storage > software. > dCache in turn provides a NFS 4.1 / pNFS server. > This means in specific, that there is one NFS "door" server and pool > servers > (which contain the actual data). The NFS client (from the Linux kernel) > connects > to the door, but when actual files are read/written the connection goes > to one > of the pools. > > Now what fails is, when I try to mv a file on the NFS mountpoint. > In specific: > - the mv process seems to simply freeze > - while it's frozen, when listing the directory, the file has still the > old name > - when I then Ctrl-C the mv it exits > - when now listing the directory, the file has the new name > > This worked properly with at least up to the 5.10.179-3 kerne from > bullseye. > > I should also note, that in the case where it fails, the server (the > door) runs > on the same host from where I also run the client... and that any > loopback > traffic is generally whitelisted for netfilter. Further, the pools are > in the > same subnet, and again any traffic within that subnet is whitelisted on > all > servers. > > /etc/exports (which dCache uses as well) has: > / localhost(rw,no_root_squash,secure) > /pnfs localhost(rw,no_root_squash,secure) > (with the /pnfs mountpoint being the one that's used) > > > Next I tried the same from my laptop's Debian sid (kernel 6.4.4-3) from > outside > the subnet (but allowing NFS for my particular IP): > There it also works. > > > When mounting, kernel log shows: > [Aug12 15:15] FS-Cache: Loaded > [ +0,033084] RPC: Registered named UNIX socket transport module. > [ +0,000005] RPC: Registered udp transport module. > [ +0,000001] RPC: Registered tcp transport module. > [ +0,000000] RPC: Registered tcp NFSv4.1 backchannel transport module. > [ +0,136237] Key type dns_resolver registered > [ +0,113564] NFS: Registering the id_resolver key type > [ +0,000012] Key type id_resolver registered > [ +0,000001] Key type id_legacy registered > [Aug12 15:17] nfs4filelayout_init: NFSv4 File Layout Driver > Registering... > > but that's the same on both nodes (apart from times of course). > > > Any ideas? While looking at some NFS related bugs I noticed this one which was unaswered, but reported against an old 6.1.y version. I'm closing the bug in the sense of BTS housekeeping, but please do the following: If you are able to reproduce the problem with a current 6.1.y version, then please reopen the bug and do remove the moreinfo tag if you have an indepented reproducer of dCache, in which case it might be considered a upstream problem, otherwise I would suggest you first approach the dCache developers (it still could be a kernel problem as dCache from a quick look is plain in userpace components?). Please do attach the full boot log, after having triggered the problem. If you can, try please as well a more recent version ideally the one from unstable to verify the problem is still present there. Thanks for your understanding, Regards, Salvatore