Package: nfs-kernel-server
Version: 1:2.8.2-1+b1
Severity: grave
Justification: causes non-serious data loss
X-Debbugs-Cc: invernom...@paranoici.org


Dear maintainers,
I encountered a big issue, while upgrading package 'nfs-kernel-server'
on the box where the NFS server runs (the clients run on the compute
nodes of an HPC cluster).

The upgrade:

  [UPGRADE] nfs-kernel-server:amd64 1:2.8.2-1 -> 1:2.8.2-1+b1

got stuck at

  [...]
  Setting up nfs-kernel-server (1:2.8.2-1+b1) ...



It looks like it was stuck at the restart of the systemd service:

# systemctl status nfs-kernel-server.service
● nfs-server.service - NFS server and services
     Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; prese>
    Drop-In: /run/systemd/generator/nfs-server.service.d
             └─order-with-mounts.conf
     Active: activating (start-pre) since Tue 2025-01-21 12:40:52 CET; 10min ago
        Job: 97667
 Invocation: ced460d410fe4059b9e8781b35340d70
       Docs: man:rpc.nfsd(8)
             man:exportfs(8)
  Cntrl PID: 249039 (exportfs)
      Tasks: 3 (limit: 154102)
     Memory: 680K (peak: 2.5M)
        CPU: 10ms
     CGroup: /system.slice/nfs-server.service
             ├─239857 /usr/sbin/nfsdctl threads 0
             ├─239918 /usr/sbin/exportfs -au
             └─249039 /usr/sbin/exportfs -r

There was a 'nfsdctl' process in uninterruptible sleep (D):

$  ps -eldaf | grep nf[s]
4 D root      239857       1  0  80   0 -   847 -      12:07 ?        00:00:00 
/usr/sbin/nfsdctl threads 0
5 S root      247511       1  0  80   0 -  1375 -      12:35 ?        00:00:00 
/usr/sbin/nfsdcld

After about 30 min, since trying to kill PID 239857 obviously had no effect,
and I could not find any other strategy to restart nfs-kernel-server.service,
I had to reboot the box, thus causing many problems to all the NFS clients.

After reboot, I could issue:

  # aptitude --purge-unused safe-upgrade

which finally completed the upgrade (fixing the nfs-kernel-server package,
which was left in a partially configured state).


I have never seen anything like this before, and I have upgraded
nfs-kernel-server and related packages on Debian machines for quite
a long time.
Anyway, this should *not* happen during a system upgrade with
aptitude or apt!

I don't know whether bug [#992661] is related or not.

[#992661]: <https://bugs.debian.org/992661>

By looking at /var/log/kern.log , I see that a kernel BUG was traced
at the time when the 'nfsdctl' process got stuck in D state.
See the attached kern.log snippet.

Please investigate and fix the issue as soon as possible.
I really hope we can prevent this from happening again!

Thanks for your time and dedication.



-- Package-specific info:
-- rpcinfo --
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100011    1   udp  64737  rquotad
    100011    2   udp  64737  rquotad
    100011    1   tcp  55614  rquotad
    100011    2   tcp  55614  rquotad
    100024    1   udp  41792  status
    100024    1   tcp  50467  status
    100005    1   udp  46127  mountd
    100005    1   tcp  39579  mountd
    100005    2   udp  49119  mountd
    100005    2   tcp  40039  mountd
    100005    3   udp  33530  mountd
    100005    3   tcp  55283  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100021    1   udp  38915  nlockmgr
    100021    3   udp  38915  nlockmgr
    100021    4   udp  38915  nlockmgr
    100021    1   tcp  33105  nlockmgr
    100021    3   tcp  33105  nlockmgr
    100021    4   tcp  33105  nlockmgr
-- /etc/default/nfs-kernel-server --
RPCNFSDPRIORITY=0
NEED_SVCGSSD=""
-- /etc/nfs.conf --
[general]
pipefs-directory=/run/rpc_pipefs
[nfsrahead]
[exports]
[exportfs]
[gssd]
[lockd]
[exportd]
[mountd]
manage-gids=y
[nfsdcld]
[nfsdcltrack]
[nfsd]
rdma=y
rdma-port=20049
[statd]
[sm-notify]
[svcgssd]
-- /etc/nfs.conf.d/*.conf --

-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.9-amd64 (SMP w/16 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages nfs-kernel-server depends on:
ii  keyutils                1.6.3-4
ii  libblkid1               2.40.4-1
ii  libc6                   2.40-5
ii  libcap2                 1:2.66-5+b1
ii  libevent-core-2.1-7t64  2.1.12-stable-10+b1
ii  libnl-3-200             3.7.0-0.3+b1
ii  libnl-genl-3-200        3.7.0-0.3+b1
ii  libreadline8t64         8.2-6
ii  libsqlite3-0            3.46.1-1
ii  libtirpc3t64            1.3.4+ds-1.3+b1
ii  libuuid1                2.40.4-1
ii  libwrap0                7.6.q-35
ii  libxml2                 2.12.7+dfsg+really2.9.14-0.2+b1
ii  netbase                 6.4
ii  nfs-common              1:2.8.2-1+b1
ii  ucf                     3.0048

Versions of packages nfs-kernel-server recommends:
ii  python3       3.12.8-1
ii  python3-yaml  6.0.2-1+b1

Versions of packages nfs-kernel-server suggests:
ii  procps  2:4.0.4-6

-- no debconf information

Attachment: kern_log_snippet.log.gz
Description: application/gzip

Reply via email to