------- Comment From ppaid...@in.ibm.com 2018-05-13 00:08 EDT-------
(In reply to comment #32)
> I built a test kernel with commit c0f7f5b6c69107ca92909512533e70258ee19188.
> The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1768898
>
> Can you test this kernel and see if it resolves this bug?
>
> Note about installing test kernels:
> ? If the test kernel is prior to 4.15(Bionic) you need to install the
> linux-image and linux-image-extra .deb packages.
> ? If the test kernel is 4.15(Bionic) or newer, you need to install the
> linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.
>
> Thanks in advance!

System is having bionic kernel, while installing the above packages getting an 
error.
root@ltc-boston125:~/test_bug# ls
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb  
linux-modules-extra-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb
linux-modules-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb
root@ltc-boston125:~/test_bug# dpkg -i *.deb
Selecting previously unselected package linux-image-unsigned-4.15.0-20-generic.
dpkg: regarding 
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb 
containing linux-image-unsigned-4.15.0-20-generic:
linux-image-unsigned-4.15.0-20-generic conflicts with 
linux-image-4.15.0-20-generic
linux-image-4.15.0-20-generic (version 4.15.0-20.21) is present and installed.

dpkg: error processing archive 
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb 
(--install):
conflicting packages - not installing linux-image-unsigned-4.15.0-20-generic
dpkg: warning: downgrading linux-modules-4.15.0-20-generic from 4.15.0-20.21 to 
4.15.0-20.21~lp1768898
(Reading database ... 67065 files and directories currently installed.)
Preparing to unpack 
linux-modules-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb ...
Unpacking linux-modules-4.15.0-20-generic (4.15.0-20.21~lp1768898) over 
(4.15.0-20.21) ...
dpkg: warning: downgrading linux-modules-extra-4.15.0-20-generic from 
4.15.0-20.21 to 4.15.0-20.21~lp1768898
Preparing to unpack 
linux-modules-extra-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb ...
Unpacking linux-modules-extra-4.15.0-20-generic (4.15.0-20.21~lp1768898) over 
(4.15.0-20.21) ...
Setting up linux-modules-4.15.0-20-generic (4.15.0-20.21~lp1768898) ...
Setting up linux-modules-extra-4.15.0-20-generic (4.15.0-20.21~lp1768898) ...
Processing triggers for linux-image-4.15.0-20-generic (4.15.0-20.21) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-20-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
done
Errors were encountered while processing:
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb

uname  -a
Linux ltc-boston125 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 
2018 ppc64le ppc64le ppc64le GNU/Linux

Can you please let me know why it is failing to install.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.

  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
  [15541.629958]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=999069 
  [15541.630046]         (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135]     LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784]     LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15541.632065] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
  [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
  [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart 
cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd 
ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast
  [15579.002363]  i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas 
syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class 
crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas
  [15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15579.002598] NIP:  c0000000001d5368 LR: c0000000001d5340 CTR: 
c000000000acc7f0
  [15579.002664] REGS: c000003e84eff7e0 TRAP: 0901   Tainted: G             L   
 (4.15.0-15-generic)
  [15579.002735] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48044222 
 XER: 00000000
  [15579.002810] CFAR: c01721ed8 
  [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 
0000000000000000 
  [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00 
  [15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180
  [15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180
  [15579.003191] Call Trace:
  [15579.003217] [c000003e84effa60] [c0000000001d5340] 
smp_call_function_single+0x110/0x180 (unreliable)
  [15579.003298] [c000003e84effad0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15579.003381] [c000003e84effb30] [c000000000acc840] 
powernv_cpufreq_get+0x50/0x70
  [15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140
  [15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0
  [15579.003560] [c000003e84effbe0] [c00000000009da50] 
pnv_get_proc_freq+0x20/0x50
  [15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400
  [15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610
  [15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0
  [15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70
  [15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0
  [15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110
  [15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c
  [15579.004016] Instruction dump:
  [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 
60000000 
  [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 
4082fff0 7c2004ac 
  [15604.648202] INFO: rcu_sched self-detected stall on CPU
  [15604.648260]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1005652 
  [15604.648332]         (t=2431300 jiffies g=184827 c=184826 q=57308)
  [15604.648385] NMI backtrace for cpu 60
  [15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15604.648491] Call Trace:
  [15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15604.648581] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15604.648728] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15604.648793] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15604.648858] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15604.648924] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15604.648988] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15604.649052] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15604.649118] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15604.649183] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15604.649248] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15604.649369] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180
  [15604.649435]     LR = smp_call_function_single+0x110/0x180
  [15604.649530] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15604.649595] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15604.649770] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15604.650002] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15604.650084]     LR = smp_call_function_many+0x324/0x450
  [15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15604.650235] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15604.650301] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15604.650366] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
  [15667.666494] INFO: rcu_sched self-detected stall on CPU
  [15667.666550]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1012258 
  [15667.666622]         (t=2447054 jiffies g=184827 c=184826 q=57457)
  [15667.666675] NMI backtrace for cpu 60
  [15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15667.666781] Call Trace:
  [15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15667.666871] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15667.667002] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15667.667086] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15667.667151] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15667.667216] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15667.667280] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15667.667344] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15667.667409] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15667.667474] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15667.667539] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15667.667660] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180
  [15667.667727]     LR = smp_call_function_single+0x110/0x180
  [15667.667821] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15667.667886] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15667.668061] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15667.668292] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15667.668358]     LR = smp_call_function_many+0x324/0x450
  [15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15667.668524] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15667.668589] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15667.668654] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c

  ```

  Per feedback from Vaidy, this currently appears to NOT be a firmware
  problem. This seems to be a kernel synchronization issue leading to a
  dead lock.

  -------
  Fix identified by Shilpa as per Nick Piggin's recommendation.  Kernel fix is 
currently being tested.

   -------
  Fix upstream in 4.17-rc3

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188
  cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer 
interrupt

  Posted to stable as well.

  Mirroring to Launchpad for Canonical to pull in commit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768898/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to