** Changed in: linux (Ubuntu)
Status: Incomplete => Invalid
** Changed in: ubuntu-power-systems
Status: Incomplete => Invalid
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1676678
Title:
ISST-LTE:dotg6:Kernel access of bad area, sig: 11 - during stress
tests
Status in The Ubuntu-power-systems project:
Invalid
Status in linux package in Ubuntu:
Invalid
Bug description:
---Problem Description---
After running stress tests (IO, TCP, BASE) for a few hours, Ubuntu 17.04 KVM
guest dotg6 crashed, produced a kdump, and rebooted.
---uname output---
Linux dotg6 4.10.0-13-generic #15-Ubuntu SMP Thu Mar 9 20:27:28 UTC 2017
ppc64le ppc64le ppc64le GNU/Linux
Machine Type = KVM guest on a 8247-22L (host also running Ubuntu 17.04)
Stack trace output:
[ 1909.621800] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1909.621870] SMP NR_CPUS=2048
[ 1909.621871] NUMA
[ 1909.621925] pSeries
[ 1909.622016] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc xfs libcrc32c vmx_crypto
sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq dm_service_time
crc32c_vpmsum virtio_scsi virtio_net scsi_dh_emc scsi_dh_rdac scsi_dh_alua
dm_multipath
[ 1909.622401] CPU: 2 PID: 27704 Comm: ppc64_cpu Not tainted
4.10.0-13-generic #15-Ubuntu
[ 1909.622536] task: c000000042a64200 task.stack: c00000003423c000
[ 1909.622627] NIP: d0000000016a14f4 LR: d0000000016a14a0 CTR:
c000000000609d00
[ 1909.622737] REGS: c00000003423f7f0 TRAP: 0380 Not tainted
(4.10.0-13-generic)
[ 1909.622850] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 1909.622860] CR: 24002428 XER: 20000000
[ 1909.623016] CFAR: c00000000061a238 SOFTE: 1
[ 1909.623016] GPR00: d0000000016a14a0 c00000003423fa70 d0000000016ab8cc
c000000170fd5000
[ 1909.623016] GPR04: ffffffffffffffff 0000000000000000 0000000000000000
0000000000007530
[ 1909.623016] GPR08: c00000000146c700 757465736d642f6e c00000000146dbe0
d0000000016a2ef8
[ 1909.623016] GPR12: c000000000609d00 c000000001b81200 0000000000000008
0000000000000001
[ 1909.623016] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000046f05f80
[ 1909.623016] GPR20: 0000000046f061f8 0000000000000000 0000000046f05f58
c00000017fd4a808
[ 1909.623016] GPR24: 0000000000000001 c000000170fc7a30 c000000001326eb0
d0000000016a1648
[ 1909.623016] GPR28: c000000001471c28 c000000170fc7860 0000000071ae4a20
0000000000000058
[ 1909.623990] NIP [d0000000016a14f4] __virtscsi_set_affinity+0xac/0x200
[virtio_scsi]
[ 1909.624114] LR [d0000000016a14a0] __virtscsi_set_affinity+0x58/0x200
[virtio_scsi]
[ 1909.624235] Call Trace:
[ 1909.624278] [c00000003423fa70] [d0000000016a14a0]
__virtscsi_set_affinity+0x58/0x200 [virtio_scsi] (unreliable)
[ 1909.624445] [c00000003423fac0] [d0000000016a1678]
virtscsi_cpu_online+0x30/0x70 [virtio_scsi]
[ 1909.624746] [c00000003423fae0] [c0000000000db73c]
cpuhp_invoke_callback+0x3ec/0x5a0
[ 1909.624887] [c00000003423fb50] [c0000000000dba88]
cpuhp_down_callbacks+0x78/0xf0
[ 1909.625037] [c00000003423fba0] [c000000000268bb0] _cpu_down+0x150/0x1b0
[ 1909.625174] [c00000003423fc00] [c0000000000de1b4] do_cpu_down+0x64/0xb0
[ 1909.625330] [c00000003423fc40] [c00000000074b834]
cpu_subsys_offline+0x24/0x40
[ 1909.625485] [c00000003423fc60] [c000000000743284] device_offline+0xf4/0x130
[ 1909.625610] [c00000003423fca0] [c000000000743434] online_store+0x64/0xb0
[ 1909.625736] [c00000003423fce0] [c00000000073e37c] dev_attr_store+0x3c/0x60
[ 1909.625862] [c00000003423fd00] [c0000000003faa18] sysfs_kf_write+0x68/0xa0
[ 1909.625984] [c00000003423fd20] [c0000000003f98bc]
kernfs_fop_write+0x17c/0x250
[ 1909.626132] [c00000003423fd70] [c00000000033c98c] __vfs_write+0x3c/0x70
[ 1909.626253] [c00000003423fd90] [c00000000033e414] vfs_write+0xd4/0x240
[ 1909.626374] [c00000003423fde0] [c00000000033ffc8] SyS_write+0x68/0x110
[ 1909.626501] [c00000003423fe30] [c00000000000b184] system_call+0x38/0xe0
[ 1909.626624] Instruction dump:
[ 1909.626691] 2f890000 419e0064 3be00000 393f0021 3880ffff 792926e4 7d3d4a14
e9290010
[ 1909.626835] 2fa90000 7d234b78 419e002c e9290020 <e9290330> e9290058
2fa90000 7d2c4b78
[ 1909.627003] ---[ end trace ecc8a323beb021a2 ]---
crash> bt
PID: 27704 TASK: c000000042a64200 CPU: 2 COMMAND: "ppc64_cpu"
#0 [c00000003423f630] crash_kexec at c0000000001a04c4
#1 [c00000003423f670] oops_end at c000000000024da8
#2 [c00000003423f6f0] bad_page_fault at c0000000000627b0
#3 [c00000003423f760] slb_miss_bad_addr at c000000000026828
#4 [c00000003423f780] bad_addr_slb at c000000000008acc
Data SLB Access [380] exception frame:
R0: d0000000016a14a0 R1: c00000003423fa70 R2: d0000000016ab8cc
R3: c000000170fd5000 R4: ffffffffffffffff R5: 0000000000000000
R6: 0000000000000000 R7: 0000000000007530 R8: c00000000146c700
R9: 757465736d642f6e R10: c00000000146dbe0 R11: d0000000016a2ef8
R12: c000000000609d00 R13: c000000001b81200 R14: 0000000000000008
R15: 0000000000000001 R16: 0000000000000000 R17: 0000000000000000
R18: 0000000000000000 R19: 0000000046f05f80 R20: 0000000046f061f8
R21: 0000000000000000 R22: 0000000046f05f58 R23: c00000017fd4a808
R24: 0000000000000001 R25: c000000170fc7a30 R26: c000000001326eb0
R27: d0000000016a1648 R28: c000000001471c28 R29: c000000170fc7860
R30: 0000000071ae4a20 R31: 0000000000000058
NIP: d0000000016a14f4 MSR: 800000000280b033 OR3: c00000000061a238
CTR: c000000000609d00 LR: d0000000016a14a0 XER: 0000000020000000
CCR: 0000000024002428 MQ: 0000000000000001 DAR: 757465736d64329e
DSISR: c00000000001b910 Syscall Result: 0000000000000000
#5 [c00000003423fa70] __virtscsi_set_affinity at d0000000016a14f4
[virtio_scsi]
[Link Register] [c00000003423fa70] __virtscsi_set_affinity at
d0000000016a14a0 (unreliable)
#6 [c00000003423fac0] virtscsi_cpu_online at d0000000016a1678 [virtio_scsi]
#7 [c00000003423fae0] cpuhp_invoke_callback at c0000000000db73c
#8 [c00000003423fb50] cpuhp_down_callbacks at c0000000000dba88
#9 [c00000003423fba0] _cpu_down at c000000000268bb0
#10 [c00000003423fc00] do_cpu_down at c0000000000de1b4
#11 [c00000003423fc40] cpu_subsys_offline at c00000000074b834
#12 [c00000003423fc60] device_offline at c000000000743284
#13 [c00000003423fca0] online_store at c000000000743434
#14 [c00000003423fce0] dev_attr_store at c00000000073e37c
#15 [c00000003423fd00] sysfs_kf_write at c0000000003faa18
#16 [c00000003423fd20] kernfs_fop_write at c0000000003f98bc
#17 [c00000003423fd70] __vfs_write at c00000000033c98c
#18 [c00000003423fd90] vfs_write at c00000000033e414
#19 [c00000003423fde0] sys_write at c00000000033ffc8
#20 [c00000003423fe30] system_call at c00000000000b184
System Call [c01] exception frame:
R0: 0000000000000004 R1: 00003ffff823a5c0 R2: 00003fff7bf57f00
R3: 0000000000000008 R4: 0000010029020080 R5: 0000000000000001
R6: 00003fff7bee0d2c R7: 0000010029020010 R8: 0000000000000000
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 00003fff7bfed060
NIP: 00003fff7bf350cc MSR: 800000000280f033 OR3: 0000000000000008
CTR: 0000000000000000 LR: 0000000046f01e0c XER: 0000000000000000
CCR: 0000000048000484 MQ: 0000000000000001 DAR: 00003fff7bd7e2c8
DSISR: 0000000040000000 Syscall Result: 0000000000000008
The initial part while invoking the crash tool on the vmcore :
KERNEL: /usr/lib/debug/boot/vmlinux-4.10.0-13-generic
DUMPFILE: /var/crash/201703221034/dump.201703221034 [PARTIAL DUMP]
CPUS: 7
DATE: Wed Mar 22 10:34:11 2017
UPTIME: 00:11:42
LOAD AVERAGE: 35.29, 25.73, 15.42
TASKS: 704
NODENAME: dotg6
RELEASE: 4.10.0-13-generic
VERSION: #15-Ubuntu SMP Thu Mar 9 20:27:28 UTC 2017
MACHINE: ppc64le (3425 Mhz)
MEMORY: 6 GB
PANIC: "Unable to handle kernel paging request for data at address
0x757465736d64329e"
PID: 27704
COMMAND: "ppc64_cpu"
TASK: c000000042a64200 [THREAD_INFO: c00000003423c000]
CPU: 2
STATE: TASK_RUNNING (PANIC)
> Can this problem be reproduced with some certainty ? If so, I could probably
> provide a debug patch to the guest kernel and collect some information when
> this happens.
This guest seems to have crashed twice with this error now with the
same backtrace, so it seems likely that it will occur again, but
there's no specific timeframe for a crash.
There is a test running on this guest which periodically turns SMT on
and off, and it's possible that the SMT test is triggering this crash.
Causing the SMT test to run more frequently may also trigger this
crash more consistently.
Mirroring to Canonical for their awareness while IBM continues
investigation...
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1676678/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp