Package: src:linux Version: 6.1.90-1 Severity: normal X-Debbugs-Cc: richard+debian+bugrep...@kojedz.in
Dear Maintainer, I am running kubernetes on debian, and pods are mounting multiple nfs shares. I am running dovecot processes in PODs, which receive mails from the internet, and also serves as imap server for clients. I am monitoring my mail system by sending mails periodically (15 seconds) and also downloading them via imap. I found a few times that some dovecot process stuck in D state, a reboot was always needed to recover from that state. Unfortunately, I was not able to trigger the bug really fast, I dont really know what operations does dovecot issue and in what order to trigger this behavior. So until I get closer, I've set up a similar, but smaller environment with just a single dovecot process, and it also does the same work, delivering only test mails locally, and serving them via imap to the monitoring client, storing everything on NFS. Fortunately, this also triggers the bug, after a few hours one of the dovecot processes is stuck in D state. Kernel also shows blocked state: May 19 12:16:49 k8s-node07 kernel: INFO: task lmtp:665683 blocked for more than 120 seconds. May 19 12:16:49 k8s-node07 kernel: Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1 May 19 12:16:49 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 19 12:16:49 k8s-node07 kernel: task:lmtp state:D stack:0 pid:665683 ppid:2881 flags:0x00000000 May 19 12:16:49 k8s-node07 kernel: Call trace: May 19 12:16:49 k8s-node07 kernel: __switch_to+0xf0/0x170 May 19 12:16:49 k8s-node07 kernel: __schedule+0x340/0x940 May 19 12:16:49 k8s-node07 kernel: schedule+0x58/0xf0 May 19 12:16:49 k8s-node07 kernel: __nfs_lookup_revalidate+0x118/0x160 [nfs] May 19 12:16:49 k8s-node07 kernel: nfs4_lookup_revalidate+0x20/0x30 [nfs] May 19 12:16:49 k8s-node07 kernel: lookup_fast+0x138/0x150 May 19 12:16:49 k8s-node07 kernel: walk_component+0x30/0x1a0 May 19 12:16:49 k8s-node07 kernel: path_lookupat+0x80/0x1a4 May 19 12:16:49 k8s-node07 kernel: filename_lookup+0xb4/0x1b0 May 19 12:16:49 k8s-node07 kernel: vfs_statx+0x94/0x19c May 19 12:16:49 k8s-node07 kernel: vfs_fstatat+0x68/0x90 May 19 12:16:49 k8s-node07 kernel: __do_sys_newfstatat+0x58/0xa0 May 19 12:16:49 k8s-node07 kernel: __arm64_sys_newfstatat+0x28/0x34 May 19 12:16:49 k8s-node07 kernel: invoke_syscall+0x78/0x100 May 19 12:16:49 k8s-node07 kernel: el0_svc_common.constprop.0+0x4c/0xf4 May 19 12:16:49 k8s-node07 kernel: do_el0_svc+0x34/0xd0 May 19 12:16:49 k8s-node07 kernel: el0_svc+0x34/0xd4 May 19 12:16:49 k8s-node07 kernel: el0t_64_sync_handler+0xf4/0x120 May 19 12:16:49 k8s-node07 kernel: el0t_64_sync+0x18c/0x190 Or, for another process: May 20 04:50:01 k8s-node07 kernel: INFO: task imap:8337 blocked for more than 120 seconds. May 20 04:50:01 k8s-node07 kernel: Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1 May 20 04:50:01 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 20 04:50:01 k8s-node07 kernel: task:imap state:D stack:0 pid:8337 ppid:3164 flags:0x00000000 May 20 04:50:01 k8s-node07 kernel: Call trace: May 20 04:50:01 k8s-node07 kernel: __switch_to+0xf0/0x170 May 20 04:50:01 k8s-node07 kernel: __schedule+0x340/0x940 May 20 04:50:01 k8s-node07 kernel: schedule+0x58/0xf0 May 20 04:50:01 k8s-node07 kernel: __nfs_lookup_revalidate+0x118/0x160 [nfs] May 20 04:50:01 k8s-node07 kernel: nfs4_lookup_revalidate+0x20/0x30 [nfs] May 20 04:50:01 k8s-node07 kernel: lookup_fast+0x138/0x150 May 20 04:50:01 k8s-node07 kernel: walk_component+0x30/0x1a0 May 20 04:50:01 k8s-node07 kernel: path_lookupat+0x80/0x1a4 May 20 04:50:01 k8s-node07 kernel: filename_lookup+0xb4/0x1b0 May 20 04:50:01 k8s-node07 kernel: vfs_statx+0x94/0x19c May 20 04:50:01 k8s-node07 kernel: vfs_fstatat+0x68/0x90 May 20 04:50:01 k8s-node07 kernel: __do_sys_newfstatat+0x58/0xa0 May 20 04:50:01 k8s-node07 kernel: __arm64_sys_newfstatat+0x28/0x34 May 20 04:50:01 k8s-node07 kernel: invoke_syscall+0x78/0x100 May 20 04:50:01 k8s-node07 kernel: el0_svc_common.constprop.0+0x4c/0xf4 May 20 04:50:01 k8s-node07 kernel: do_el0_svc+0x34/0xd0 May 20 04:50:01 k8s-node07 kernel: el0_svc+0x34/0xd4 May 20 04:50:01 k8s-node07 kernel: el0t_64_sync_handler+0xf4/0x120 May 20 04:50:01 k8s-node07 kernel: el0t_64_sync+0x18c/0x190 Of course the NFS server is running, and other NFS mounts are still working from the node. Also, this started to happen with Debian's kernel. Before that, I was compiling my own upstream kernel, version 5.15. With that, I've never experienced such a lockup. Unfortunately, I dont know, how to go further, how shall I collect more relevant debugging information. I expect thet dovecot is just an application, which should not cause any kernel-side lockups. In my test lab, this specific NFS mount is just mounted on one machine, so it really suggests me a linux nfs-client side issue, not related to caching coherency between multiple clients. -- Package-specific info: ** Version: Linux version 6.1.0-21-arm64 (debian-ker...@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP Debian 6.1.90-1 (2024-05-03) ** Command line: net.ifnames=0 console=ttyS2,1500000 console=tty1 root=UUID=b4ff4167-1fe9-4fd6-9b9c-c3c68d98108b rw rootwait panic=10 ** Not tainted ** Kernel log: May 20 04:52:02 k8s-node07 kernel: INFO: task imap:8337 blocked for more than 241 seconds. May 20 04:52:02 k8s-node07 kernel: Not tainted 6.1.0-21-arm64 #1 Debian 6.1.90-1 May 20 04:52:02 k8s-node07 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 20 04:52:02 k8s-node07 kernel: task:imap state:D stack:0 pid:8337 ppid:3164 flags:0x00000000 May 20 04:52:02 k8s-node07 kernel: Call trace: May 20 04:52:02 k8s-node07 kernel: __switch_to+0xf0/0x170 May 20 04:52:02 k8s-node07 kernel: __schedule+0x340/0x940 May 20 04:52:02 k8s-node07 kernel: schedule+0x58/0xf0 May 20 04:52:02 k8s-node07 kernel: __nfs_lookup_revalidate+0x118/0x160 [nfs] May 20 04:52:02 k8s-node07 kernel: nfs4_lookup_revalidate+0x20/0x30 [nfs] May 20 04:52:02 k8s-node07 kernel: lookup_fast+0x138/0x150 May 20 04:52:02 k8s-node07 kernel: walk_component+0x30/0x1a0 May 20 04:52:02 k8s-node07 kernel: path_lookupat+0x80/0x1a4 May 20 04:52:02 k8s-node07 kernel: filename_lookup+0xb4/0x1b0 May 20 04:52:02 k8s-node07 kernel: vfs_statx+0x94/0x19c May 20 04:52:02 k8s-node07 kernel: vfs_fstatat+0x68/0x90 May 20 04:52:02 k8s-node07 kernel: __do_sys_newfstatat+0x58/0xa0 May 20 04:52:02 k8s-node07 kernel: __arm64_sys_newfstatat+0x28/0x34 May 20 04:52:02 k8s-node07 kernel: invoke_syscall+0x78/0x100 May 20 04:52:02 k8s-node07 kernel: el0_svc_common.constprop.0+0x4c/0xf4 May 20 04:52:02 k8s-node07 kernel: do_el0_svc+0x34/0xd0 May 20 04:52:02 k8s-node07 kernel: el0_svc+0x34/0xd4 May 20 04:52:02 k8s-node07 kernel: el0t_64_sync_handler+0xf4/0x120 May 20 04:52:02 k8s-node07 kernel: el0t_64_sync+0x18c/0x190 ** Model information ** Loaded modules: sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic crc64 sg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod scsi_common nf_conntrack_netlink rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nft_log nft_limit xt_limit xt_NFLOG nfnetlink_log xt_physdev xt_TCPMSS xt_tcpudp xt_mark xt_multiport xt_addrtype dummy ipt_REJECT nf_reject_ipv4 ip_set_hash_ipport nft_chain_nat xt_nat xt_MASQUERADE xt_ipvs nf_nat xt_set ip_set_hash_ip ip_set_hash_net ip_set veth xt_conntrack xt_comment nft_compat nf_tables nfnetlink overlay sunrpc binfmt_misc evdev aes_ce_blk snd_soc_rk817 aes_ce_cipher polyval_ce snd_soc_core polyval_generic snd_pcm_dmaengine ext4 ghash_ce gf128mul sha2_ce leds_gpio snd_pcm sha256_arm64 sha1_ce rockchip_thermal crc16 mbcache snd_timer jbd2 snd dw_wdt soundcore rk817_charger rk805_pwrkey cpufreq_dt br_netfilter bridge stp llc ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 drm loop fuse efi_pstore dm_mod dax configfs ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic realtek rk808_regulator fan53555 dwmac_rk stmmac_platform stmmac pcs_xpcs spi_rockchip phylink dw_mmc_rockchip dw_mmc_pltfm of_mdio dw_mmc fixed crct10dif_ce crct10dif_common fixed_phy fwnode_mdio pl330 i2c_rk3x io_domain libphy ** PCI devices: not available ** USB devices: not available -- System Information: Debian Release: 12.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: arm64 (aarch64) Kernel: Linux 6.1.0-21-arm64 (SMP w/4 CPU threads) Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages linux-image-6.1.0-21-arm64 depends on: ii initramfs-tools [linux-initramfs-tool] 0.142 ii kmod 30+20221128-1 ii linux-base 4.9 Versions of packages linux-image-6.1.0-21-arm64 recommends: ii apparmor 3.0.8-3 ii firmware-linux-free 20200122-1 Versions of packages linux-image-6.1.0-21-arm64 suggests: pn debian-kernel-handbook <none> pn linux-doc-6.1 <none> Versions of packages linux-image-6.1.0-21-arm64 is related to: pn firmware-amd-graphics <none> pn firmware-atheros <none> pn firmware-bnx2 <none> pn firmware-bnx2x <none> pn firmware-brcm80211 <none> pn firmware-cavium <none> pn firmware-intel-sound <none> pn firmware-intelwimax <none> pn firmware-ipw2x00 <none> pn firmware-ivtv <none> pn firmware-iwlwifi <none> pn firmware-libertas <none> pn firmware-linux-nonfree <none> pn firmware-misc-nonfree <none> pn firmware-myricom <none> pn firmware-netxen <none> pn firmware-qlogic <none> pn firmware-realtek <none> pn firmware-samsung <none> pn firmware-siano <none> pn firmware-ti-connectivity <none> pn xen-hypervisor <none> -- no debconf information