Hey Salvatore!

On Wed, Jun 24, 2020 at 05:57:42PM +0200, Salvatore Bonaccorso wrote:
>Control: found -1 4.19.118-1
>Control: tags -1 + upstream
>
>Hi Steve,
>
>On Mon, Jun 22, 2020 at 12:58:35PM +0100, Steve McIntyre wrote:
>> Source: linux
>> Version: 4.19.118-2+deb10u1
>> Severity: serious
>> 
>> Hi folks,
>> 
>> Trying to reproduce #963462 on my Thinkpad T470, I'm repeatedly
>> getting a hard lockup running the strace testsuite. I've done this 4
>> times to be sure. Each time it seems to have failed in a slightly
>> different place in the testsuite (suggesting it's not one particular
>> syscall test that's triggering the failure). Only one of the 4 lockups
>> left eny evidence in the logs (reproduced below), so I'm not sure if
>> the error here is actually the root of the problem or not. :-/
>> 
>> The machine is not noticeably running hot or anything while doing
>> these tests.
>> 
>> Rebooting back to 4.19.0-8-amd64 (aka 4.9.98+1+deb10u1), I've just run
>> the same testsuite twice in a row and it ran to completion with no
>> lockup.
>> 
>> Here's the one bit of log that I did get, in case it's useful.
>> 
>> Jun 22 11:36:49 tack kernel: [  318.195906] futex_wake_op: futex tries to 
>> shift op by -849; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.195910] futex_wake_op: futex tries to 
>> shift op by -849; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.195971] futex_wake_op: futex tries to 
>> shift op by -518; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.195974] futex_wake_op: futex tries to 
>> shift op by -518; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.195977] futex_wake_op: futex tries to 
>> shift op by -1; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.195979] futex_wake_op: futex tries to 
>> shift op by -1; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.199661] futex_wake_op: futex tries to 
>> shift op by -849; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.199674] futex_wake_op: futex tries to 
>> shift op by -849; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.199917] futex_wake_op: futex tries to 
>> shift op by -518; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.199982] futex_wake_op: futex tries to 
>> shift op by -518; fix this program
>> Jun 22 11:36:49 tack kernel: [  318.398755] L1TF CPU bug present and SMT on, 
>> data leak possible. See CVE-2018-3646 and 
>> https://www.kernel.org/doc/html/latest/admin-gui
>> de/hw-vuln/l1tf.html for details.
>> Jun 22 11:36:49 tack kernel: [  318.587324] WARNING: CPU: 2 PID: 32174 at 
>> mm/page_alloc.c:4385 __alloc_pages_nodemask+0x241/0x2b0
>> Jun 22 11:36:49 tack kernel: [  318.587326] Modules linked in: acpi_call(OE) 
>> ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter 
>> nft_chain_nat_ipv4 nf_
>> nat_ipv4 xt_addrtype nft_compat xt_conntrack nf_nat nf_conntrack 
>> nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp overlay devlink 
>> cpufreq_userspace cpufreq_conser
>> vative cpufreq_powersave ipmi_devintf ipmi_msghandler nf_tables nfnetlink 
>> appletalk psnap llc ax25 pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) 
>> vboxdrv(OE) cmac 
>> bnep fuse binfmt_misc pktcdvd snd_hda_codec_hdmi btusb btrtl btbcm btintel 
>> bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 
>> videobuf2_common snd_hd
>> a_codec_realtek snd_hda_codec_generic videodev media drbg ansi_cprng 
>> ecdh_generic nls_ascii nls_cp437 vfat fat arc4 intel_rapl snd_soc_skl msr 
>> snd_soc_skl_ipc snd_soc_
>> sst_ipc
>> Jun 22 11:36:49 tack kernel: [  318.587344]  snd_soc_sst_dsp 
>> x86_pkg_temp_thermal intel_powerclamp snd_hda_ext_core coretemp 
>> snd_soc_acpi_intel_match iwlmvm snd_soc_ac
>> pi kvm_intel snd_soc_core mac80211 snd_compress wmi_bmof i915 snd_hda_intel 
>> iwlwifi kvm snd_hda_codec snd_h
>> da_core irqbypass snd_hwdep evdev intel_cstate joydev snd_pcm_oss 
>> drm_kms_helper intel_uncore serio_raw intel_rapl_perf mei_me snd_mixer_oss 
>> cfg80211 sg efi_pstore drm pcspkr mei efivars snd_pcm ucsi_acpi snd_timer 
>> typec_ucsi i2c_algo_bit iTCO_wdt iTCO_vendor_support intel_pch_thermal typec 
>> thinkpad_acpi tpm_crb wmi nvram snd soundcore tpm_tis rfkill video 
>> tpm_tis_core battery pcc_cpufreq ac tpm rng_core acpi_pad button parport_pc 
>> nfsd ppdev auth_rpcgss lp nfs_acl lockd grace sunrpc parport efivarfs 
>> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic
>> Jun 22 11:36:49 tack kernel: [  318.587362]  fscrypto ecb btrfs 
>> zstd_decompress zstd_compress xxhash algif_skcipher af_alg sr_mod cdrom uas 
>> usb_storage dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy 
>> async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath 
>> linear md_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel 
>> ghash_clmulni_intel pcbc ahci aesni_intel libahci aes_x86_64 crypto_simd 
>> libata cryptd xhci_pci glue_helper xhci_hcd e1000e psmouse scsi_mod usbcore 
>> i2c_i801 thermal usb_common
>> Jun 22 11:36:49 tack kernel: [  318.587375] CPU: 2 PID: 32174 Comm: keyctl 
>> Tainted: G           OE     4.19.0-9-amd64 #1 Debian 4.19.118-2+deb10u1
>> Jun 22 11:36:49 tack kernel: [  318.587376] Hardware name: LENOVO 
>> 20HD000EUK/20HD000EUK, BIOS N1QET67W (1.42 ) 10/03/2017
>> Jun 22 11:36:49 tack kernel: [  318.587377] RIP: 
>> 0010:__alloc_pages_nodemask+0x241/0x2b0
>> Jun 22 11:36:49 tack kernel: [  318.587379] Code: 89 f7 89 ee 45 31 f6 e8 4d 
>> d5 ff ff e9 fb fe ff ff e8 73 ae 01 00 e9 cb fe ff ff 45 31 f6 81 e7 00 02 
>> 00 00 0f 85 e7 fe ff ff <0f> 0b e9 e0 fe ff ff 31 c0 e9 6a fe ff ff 65 48 8b 
>> 04 25 40 5c 01
>> Jun 22 11:36:49 tack kernel: [  318.587379] RSP: 0018:ffffb7e68ac2fe70 
>> EFLAGS: 00010246
>> Jun 22 11:36:49 tack kernel: [  318.587381] RAX: 0000000000000000 RBX: 
>> fffffffffffffff4 RCX: 0000000000000000
>> Jun 22 11:36:49 tack kernel: [  318.587381] RDX: 0000000000000000 RSI: 
>> 0000000000000034 RDI: 0000000000000000
>> Jun 22 11:36:49 tack kernel: [  318.587382] RBP: fffffffffffffffc R08: 
>> 0000000000000000 R09: 0000000000000000
>> Jun 22 11:36:49 tack kernel: [  318.587383] R10: 0000000000000000 R11: 
>> 0000000000000000 R12: 00000000006000c0
>> Jun 22 11:36:49 tack kernel: [  318.587383] R13: 00007efdd17ecf00 R14: 
>> 0000000000000000 R15: 0000000000000000
>> Jun 22 11:36:49 tack kernel: [  318.587385] FS:  00007efdd19fc580(0000) 
>> GS:ffff9dc6a2500000(0000) knlGS:0000000000000000
>> Jun 22 11:36:49 tack kernel: [  318.587386] CS:  0010 DS: 0000 ES: 0000 CR0: 
>> 0000000080050033
>> Jun 22 11:36:49 tack kernel: [  318.587386] CR2: 00007efdd1829000 CR3: 
>> 0000000816ce8001 CR4: 00000000003606e0
>> Jun 22 11:36:49 tack kernel: [  318.587387] Call Trace:
>> Jun 22 11:36:49 tack kernel: [  318.587391]  kmalloc_order+0x14/0x30
>> Jun 22 11:36:49 tack kernel: [  318.587393]  kmalloc_order_trace+0x1d/0xa0
>> Jun 22 11:36:49 tack kernel: [  318.587395]  keyctl_read_key+0xb3/0x130
>> Jun 22 11:36:49 tack kernel: [  318.587397]  do_syscall_64+0x53/0x110
>> Jun 22 11:36:49 tack kernel: [  318.587399]  
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> Jun 22 11:36:49 tack kernel: [  318.587401] RIP: 0033:0x7efdd192ff79
>> Jun 22 11:36:49 tack kernel: [  318.587402] Code: 00 c3 66 2e 0f 1f 84 00 00 
>> 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 
>> c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 3e 0c 00 f7 
>> d8 64 89 01 48
>> Jun 22 11:36:49 tack kernel: [  318.587403] RSP: 002b:00007ffe112d1798 
>> EFLAGS: 00000246 ORIG_RAX: 00000000000000fa
>> Jun 22 11:36:49 tack kernel: [  318.587404] RAX: ffffffffffffffda RBX: 
>> 00007ffe112d17f0 RCX: 00007efdd192ff79
>> Jun 22 11:36:49 tack kernel: [  318.587404] RDX: 00007efdd17ecf00 RSI: 
>> 00000000ffffffff RDI: 000000000000000b
>> Jun 22 11:36:49 tack kernel: [  318.587405] RBP: 00007ffe112d17d0 R08: 
>> deadfee4badc0de8 R09: 00005612c7c8c141
>> Jun 22 11:36:49 tack kernel: [  318.587406] R10: fffffffffffffffc R11: 
>> 0000000000000246 R12: 0000000000000003
>> Jun 22 11:36:49 tack kernel: [  318.587407] R13: 00007ffe112d1830 R14: 
>> 0000000000000001 R15: 00005612c7c8c116
>> Jun 22 11:36:49 tack kernel: [  318.587408] ---[ end trace e512ea2af2666eea 
>> ]---
>> 
>> As I can reproduce this quite easily, I'm happy to help with whatever
>> debugging might be useful.
>
>This sounds familiar to
>https://lore.kernel.org/stable/7231ea1a-70b2-c156-1724-2357ed10b...@intel.com/
>
>d3ec10aa9581 ("KEYS: Don't write out to userspace while holding key
>semaphore") was backported to v4.19.y in 4.19.118. The issue seems to
>be fixed with 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error
>on key read") which was backported to 4.19.119.
>
>Can you check if cherry-picking the above commit
>(e4a281c7daa07814748179ee8b4b483124bb94ea in the linux-4.19.y) fixes
>the issue?

Sure, building it now to test.

-- 
Steve McIntyre, Cambridge, UK.                                st...@einval.com
"Because heaters aren't purple!" -- Catherine Pitt

Reply via email to