Control: tags -1 + unreproducible moreinfo

On Wed, Jan 22, 2025 at 12:29:12AM +0100, Francesco Poli (wintermute) wrote:
> Package: nfs-kernel-server
> Version: 1:2.8.2-1+b1
> Severity: grave
> Justification: causes non-serious data loss
> X-Debbugs-Cc: invernom...@paranoici.org
> 
> 
> Dear maintainers,
> I encountered a big issue, while upgrading package 'nfs-kernel-server'
> on the box where the NFS server runs (the clients run on the compute
> nodes of an HPC cluster).
> 
> The upgrade:
> 
>   [UPGRADE] nfs-kernel-server:amd64 1:2.8.2-1 -> 1:2.8.2-1+b1
> 
> got stuck at
> 
>   [...]
>   Setting up nfs-kernel-server (1:2.8.2-1+b1) ...
> 
> 
> 
> It looks like it was stuck at the restart of the systemd service:
> 
> # systemctl status nfs-kernel-server.service
> ● nfs-server.service - NFS server and services
>      Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; 
> prese>
>     Drop-In: /run/systemd/generator/nfs-server.service.d
>              └─order-with-mounts.conf
>      Active: activating (start-pre) since Tue 2025-01-21 12:40:52 CET; 10min 
> ago
>         Job: 97667
>  Invocation: ced460d410fe4059b9e8781b35340d70
>        Docs: man:rpc.nfsd(8)
>              man:exportfs(8)
>   Cntrl PID: 249039 (exportfs)
>       Tasks: 3 (limit: 154102)
>      Memory: 680K (peak: 2.5M)
>         CPU: 10ms
>      CGroup: /system.slice/nfs-server.service
>              ├─239857 /usr/sbin/nfsdctl threads 0
>              ├─239918 /usr/sbin/exportfs -au
>              └─249039 /usr/sbin/exportfs -r
> 
> There was a 'nfsdctl' process in uninterruptible sleep (D):
> 
> $  ps -eldaf | grep nf[s]
> 4 D root      239857       1  0  80   0 -   847 -      12:07 ?        
> 00:00:00 /usr/sbin/nfsdctl threads 0
> 5 S root      247511       1  0  80   0 -  1375 -      12:35 ?        
> 00:00:00 /usr/sbin/nfsdcld
> 
> After about 30 min, since trying to kill PID 239857 obviously had no effect,
> and I could not find any other strategy to restart nfs-kernel-server.service,
> I had to reboot the box, thus causing many problems to all the NFS clients.
> 
> After reboot, I could issue:
> 
>   # aptitude --purge-unused safe-upgrade
> 
> which finally completed the upgrade (fixing the nfs-kernel-server package,
> which was left in a partially configured state).
> 
> 
> I have never seen anything like this before, and I have upgraded
> nfs-kernel-server and related packages on Debian machines for quite
> a long time.
> Anyway, this should *not* happen during a system upgrade with
> aptitude or apt!
> 
> I don't know whether bug [#992661] is related or not.
> 
> [#992661]: <https://bugs.debian.org/992661>
> 
> By looking at /var/log/kern.log , I see that a kernel BUG was traced
> at the time when the 'nfsdctl' process got stuck in D state.
> See the attached kern.log snippet.
> 
> Please investigate and fix the issue as soon as possible.
> I really hope we can prevent this from happening again!
> 
> Thanks for your time and dedication.

So I'm not able to reproduce this on a current Debian unstable system
mimicking the upgrade. *But* it is possible we have some races
somehwere as recently discussed at our regular kernel team meeting.

We need first to find a way to trigger the issue in any case.

Regards,
Salvatore

Reply via email to