The delta between 5.4.0.107.121 and 5.4.0.90.101 are about 2000 commits and more than 20 are NFS related and also some about vfs.
------- Comment From stefan.am...@de.ibm.com 2022-04-07 09:14 EDT------- thanks for starting to look into this issue! We have a large number of LPARs running client workload. This issue happens occasionally - each time impacting client SLAs and causing several hours to re-install LPARs and integrating them in our cloud environment. We need to understand the root cause of the issue. While we will update our systems any time soon, we cannot afford waiting for the updates to happen to see if the issue disappears. So I hope we can identify the root cause! If it is fixed already - the better :-) Regarding the kdump: I'm currently clarifying if we can send the kdump. Since it is from a production machine. And I want to ensure we are GDPR compliant. Do you happen to know, if there is a GDPR compliant process established on your side? As far as I know, a box folder is not an option. But the ftp transfer may be. Or would it be possible for you to specify the commands to run? We could then ensure the data is clean and send it over -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968096 Title: [UBUNTU 20.04] Null Pointer issue in nfs code running Ubuntu on IBM Z Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: State the component where the Bug is occuring: kernel Indicate the nature of the problem by answering the below questions: - Is this problem reproducible? No No, steps unknown, but we have seen these before - Is the system sitting at a debugger (kdb, or xmon)? No - Is the system hung? No No, dumped and rebooted - Are there any custom patches installed? Yes On base system level (CloudAppliance) we are still running with the zfpc_proc module loaded. But no recent changes in the module and is running absolutely stable in HA (same kernel and userspace, Ubuntu 20.04 LTS) - Is there any special hardware that may be relevant to this problem? Yes We are running with mlx (cloud network adapters) installed. - Is access information for the machine the problem was found on available? Yes - Is the bug occuring in a userspace application? No - Was a stack trace produced? Yes This is what mention in first comment by @Boris Barth - Did the system produce an Oops message on the console? Yes [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Was a system dump produced ie kdump, netdumpmp, or LKCD? Yes That is the kdump where the stacktrace from. Enter data below to accurately describe the problem: - Problem description: Null Pointer issue in nfs code running Ubuntu Ubuntu 18.04 with HWE kernel 5.4 on IBM Z - Enter uname -a output: @lon1-qz1-sr4-rk101-s04> uname -a Linux lon1-qz1-sr4-rk101-s04 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 19:59:45 UTC 2021 s390x s390x s390x GNU/Linux - Enter failing machine type and model (ie p520 9111-520 lpar, x336 47U-8637): Manufacturer: IBM Type: 8562 Model: A00 GT2 Model Capacity: A00 00000000 Capacity Adj. Ind.: 100 LPAR CPUs Total: 16 LPAR CPUs Configured: 16 LPAR CPUs Standby: 0 LPAR CPUs Reserved: 0 LPAR CPUs Dedicated: 0 LPAR CPUs Shared: 16 LPAR CPUs G-MTID: 0 LPAR CPUs S-MTID: 1 LPAR CPUs PS-MTID: 1 - Enter primary and backup contact information (name/email): Prabhat Ranjan pranj...@in.ibm.com Christoph Schlameu? schlame...@de.ibm.com - Detail the configuration of the additonal hardware - Enter common userspace tool name: N/A - Enter name of userspace RPM: N/A - If failing tool is obtained from project website vs RPM install, what is the version/release/mod. If from the project's CVS, what is the branch tag and date of checkout (put "na" if not applicable)? N/A - Is the failing userspace tool 32-bit, 64-bit, or both? N/A - Describe how unresponsive the system is. What steps have you taken to reclaim the system: kernel oops was detected and automatically dumped and restarted - Is a debugger configured (xmon or kdb enabled)? No - Enter Oops message from console: [556585.270902] illegal operation: 0001 ilc:1 [#10] SMP [556585.270905] Modules linked in: vhost_net macvtap macvlan tap rpcsec_gss_krb5 auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache veth xt_statistic ipt_REJECT nf_reject_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_mark sunrpc nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_set ip_set_hash_net ip_set_hash_ip ip_set tcp_diag inet_diag xt_comment xt_nat cls_cgroup sch_htb act_gact sch_multiq act_mirred act_pedit act_tunnel_key cls_flower act_police cls_u32 vxlan ip6_udp_tunnel udp_tunnel dummy nf_tables ebtable_filter ebtables xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key sch_ingress mlx5_ib ib_uverbs ib_core mlx5_core tls mlxfw ptp pps_core dm_integrity async_xor async_tx dm_bufio bonding xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge vhost_vsock vmw_vsock_virtio_transport_common vhost vsock 8021q garp mrp stp llc xt_multiport xt_tcpudp qeth_l2 lcs ctcm fsm dasd_fba_mod aufs overlay scsi_dh_rdac [556585.270923] scsi_dh_emc s390_trng xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter sch_fq_codel zFPC_proc(OE) zFPC_diag(OE) vfio_ap vfio_mdev drm vfio_iommu_type1 drm_panel_orientation_quirks i2c_core ip_tables x_tables scsi_dh_alua pkey zcrypt ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch qeth ccwgroup eadm_sch vfio_ccw mdev vfio btrfs libcrc32c crc32_vx_s390 xor zstd_compress raid6_pq dm_crypt virtio_blk dm_service_time dm_multipath zfcp scsi_transport_fc qdio dasd_eckd_mod dasd_mod zlib_deflate [last unloaded: tls] [556585.270945] CPU: 28 PID: 217741 Comm: worker Kdump: loaded Tainted: G D OE 5.4.0-90-generic #101-Ubuntu [556585.270947] Hardware name: IBM 8562 GT2 A00 (LPAR) [556585.270948] Krnl PSW : 0704d00180000000 0000000000000002 (0x2) [556585.270951] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [556585.270953] Krnl GPRS: 0000000000000000 0000000000000000 000003e010ebbcf8 00000071c45e1ec0 [556585.270954] 0000000000000000 0000002816f7b18c 00000078dd36a4a0 000000713a62f718 [556585.270955] 0000000000000000 000003e010ebbcf8 0000000000000068 00000071c45e1ec0 [556585.270957] 0000006090a12200 0000000000000c40 000003ff80d6fb54 000003e010ebbbf0 [556585.270959] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [556585.270967] Call Trace: [556585.270982] ([<000003ff80d6fb1a>] rpcauth_lookup_credcache+0x5a/0x300 [sunrpc]) [556585.270993] [<000003ff80e1182c>] nfs_ctx_key_to_expire+0xec/0x130 [nfs] [556585.271004] [<000003ff80e1189c>] nfs_key_timeout_notify+0x2c/0x70 [nfs] [556585.271014] [<000003ff80dfdf7e>] nfs_file_write+0x3e/0x320 [nfs] [556585.271016] [<00000028165944a8>] new_sync_write+0x118/0x1b0 [556585.271017] [<0000002816594ee0>] vfs_write+0xb0/0x1b0 [556585.271019] [<0000002816596a1e>] ksys_pwrite64+0x7e/0xc0 [556585.271021] [<0000002816bb26b2>] system_call+0x2a6/0x2c8 - Detail the steps to reproduce this problem: unknown - Was the system configured to capture a system dump? Yes To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1968096/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp