I hadn't used sshfs on Power machines before so there isn't a prior good
version.


Mainline v4.13 - jenkins-ppc64


steps on host:

virsh start {guest}

sshfs ozlabs@jenkins-ppc64:/home/ozlabs jenkins-ppc64/ -o
reconnect,idmap=user


cd jenkins-ppc64/linux

make -j 400  ; # something with a lot of read on sshfs

# on another terminal

virsh suspend jenkins-ppc64

killall -9 cc (or whatever processes is reading)


# resume guest

root@p87:~# virsh resume jenkins-ppc64
Domain jenkins-ppc64 resumed

root@p87:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     jenkins                        running
 2     jenkins-ppc64                  running

root@p87:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     jenkins                        running
 2     jenkins-ppc64                  running

root@p87:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     jenkins                        running
 2     jenkins-ppc64                  running

root@p87:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     jenkins                        running
 2     jenkins-ppc64                  running

root@p87:~# virsh console jenkins-ppc64
Connected to domain jenkins-ppc64
Escape character is ^]
Shared connection to p87 closed.

No further network connections to p87 could be established.


I happened to have a FSP ipmi sol activate console open:

(large amounts of tg3 0005:09:00.0 enP5p9s0f0:  removed - appologies if
I missed backtraces from specific CPUs).


root@p87:~# Watchdog CPU:160 detected Hard LOCKUP other CPUS:40
watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [ksmd:1695]                  
                                                                                
        
watchdog: BUG: soft lockup - CPU#160 stuck for 22s! [qemu-system-ppc:13057]     
                                                                                
        
INFO: rcu_sched self-detected stall on CPU                                      
                                                                                
        
        64-...: (2600 ticks this GP) idle=8b2/140000000000001/0 
softirq=40596/40596 fqs=1148 
         (t=2601 jiffies g=18364 c=18363 q=1057)
INFO: rcu_sched detected stalls on CPUs/tasks:
Watchdog CPU:24 detected Hard LOCKUP other CPUS:160
Watchdog CPU:160 Hard LOCKUP
        40-...: (1 GPs behind) idle=4a2/140000000000000/0 softirq=40283/40285 
fqs=1149 
        64-...: (2601 ticks this GP) idle=8b2/140000000000002/0 
softirq=40596/40596 fqs=1149 
Watchdog CPU:8 Hard LOCKUP

Watchdog CPU:8 became unstuck
Watchdog CPU:56 detected Hard LOCKUP other CPUS:112
Watchdog CPU:24 Hard LOCKUP
Watchdog CPU:24 became unstuck
rcu_sched kthread starved for 1074 jiffies! g18364 c18363 f0x0 
RCU_GP_DOING_FQS(4) ->state=0x0
Watchdog CPU:112 became unstuck
Watchdog CPU:8 detected Hard LOCKUP other CPUS:64
Watchdog CPU:64 Hard LOCKUP
watchdog: BUG: soft lockup - CPU#64 stuck for 22s! [ksmd:1695]
watchdog: BUG: soft lockup - CPU#144 stuck for 22s! [qemu-system-ppc:21857]
INFO: rcu_sched self-detected stall on CPU
        64-...: (9303 ticks this GP) idle=8b2/140000000000001/0 
softirq=40596/40596 fqs=3608                                                    
                        
         (t=10406 jiffies g=18364 c=18363 q=5303)                               
                                                                                
        
INFO: rcu_sched detected stalls on CPUs/tasks:                                  
                                                                                
        
        40-...: (1 GPs behind) idle=4a2/140000000000000/0 softirq=40283/40285 
fqs=3608 
        64-...: (9303 ticks this GP) idle=8b2/140000000000001/0 
softirq=40596/40596 fqs=3608 
        (detected by 8, t=11509 jiffies, g=18364, c=18363, q=5786)
Watchdog CPU:24 detected Hard LOCKUP other CPUS:8
rcu_sched kthread starved for 2206 jiffies! g18364 c18363 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x1
Watchdog CPU:8 became unstuck
sd 0:2:1:0: [sdb] tag#13 Resetting device
watchdog: BUG: soft lockup - CPU#64 stuck for 24s! [ksmd:1695]
ipr 0001:08:00.0: Timed out waiting for aborted commands
sd 0:2:3:0: [sdd] tag#5 Resetting device
INFO: rcu_sched self-detected stall on CPU
        64-...: (16006 ticks this GP) idle=8b2/140000000000001/0 
softirq=40596/40596 fqs=6187 
         (t=18211 jiffies g=18364 c=18363 q=9258)
INFO: rcu_sched detected stalls on CPUs/tasks:
ipr 0001:08:00.0: Timed out waiting for aborted commands
sd 0:2:5:0: [sdf] tag#11 Resetting device
tg3 0005:09:00.0 enP5p9s0f0: transmit timed out, resetting
tg3 0005:09:00.0 enP5p9s0f0: 0x00000000: 0x165714e4, 0x00100546, 0x02000001, 
0x00800000
tg3 0005:09:00.0 enP5p9s0f0: 0x00000010: 0x0000000c, 0x00002501, 0x0001000c, 
0x00002501



Sending NMI from CPU 152 to CPUs 64:
Watchdog CPU:32 Hard LOCKUP
Modules linked in: fuse vhost_net vhost tap iptable_mangle ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user 
xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack overlay 
bridge stp llc binfmt_misc kvm_hv kvm vmx_crypto powernv_op_panel powernv_rng 
rng_core leds_powernv led_class autofs4 xfs btrfs lzo_compress raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c multipath mlx4_en raid10 crc32c_vpmsum lpfc be2net crc_t10dif 
crct10dif_generic crct10dif_common mlx4_core
irq event stamp: 3244340
hardirqs last  enabled at (3244339): [<c000000000b23120>] 
_raw_spin_unlock_irqrestore+0x50/0xd0
hardirqs last disabled at (3244340): [<c000000000b1a6c8>] 
__schedule+0x128/0x1050
softirqs last  enabled at (3148130): [<c0000000000ff518>] 
__do_softirq+0x4e8/0x710
softirqs last disabled at (3148101): [<c0000000000ffc38>] irq_exit+0x108/0x150
CPU: 32 PID: 9 Comm: rcu_sched Tainted: G        W    L  4.13.0 #1
task: c000001ff2411700 task.stack: c000001ff2498000
NIP: c0000000001bdae0 LR: c000000000154aa0 CTR: c000000000924a40
REGS: c00000003fe7fd80 TRAP: 0900   Tainted: G        W    L   (4.13.0)
MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
  CR: 28002242  XER: 20000000
CFAR: c000000000154a9c SOFTE: 0 
GPR00: c000000000154ce8 c000001ff249b510 c000000001042100 c000001fffb26a70 
GPR04: c000001fee4464a8 c000001fffb21d60 0000000292308e16 c000000001116680 
GPR08: 0000000000000000 c000000001116680 c000001ff2498000 00000000000047bc 
GPR12: c000000000924a40 c00000000fd8a000 00000000000047bc 0000000000000000 
GPR16: 0000000000000000 0000000000000003 0000000000000000 0000000000000001 
GPR20: c000001fffb26000 c000001ff2411700 000000000000ba7e c000000000f8eb20 
GPR24: 000000000000ba7e 0000000000000001 0000000000000000 c000001fffb260a0 
GPR28: c000001fffb26000 0000000000000001 c000001ff2411800 c000001fffb260a0 
NIP [c0000000001bdae0] hrtimer_active+0x0/0x90
LR [c000000000154aa0] task_tick_fair+0x350/0x6d0
Call Trace:
[c000001ff249b510] [c000000000154ce8] task_tick_fair+0x598/0x6d0 (unreliable)
[c000001ff249b5f0] [c000000000140c4c] scheduler_tick+0xac/0x1b0
[c000001ff249b650] [c0000000001bd60c] update_process_times+0x5c/0x90
[c000001ff249b680] [c0000000001d732c] tick_sched_handle.isra.5+0x2c/0xc0
[c000001ff249b6b0] [c0000000001d7418] tick_sched_timer+0x58/0xd0
[c000001ff249b6f0] [c0000000001be674] __hrtimer_run_queues+0x154/0x790
[c000001ff249b780] [c0000000001bf9c0] hrtimer_interrupt+0xe0/0x330
[c000001ff249b850] [c000000000026e50] __timer_interrupt+0xb0/0x520
[c000001ff249b8b0] [c0000000000277b0] timer_interrupt+0x90/0xe0
[c000001ff249b8e0] [c000000000009280] decrementer_common+0x160/0x170
--- interrupt: 901 at .L142+0x0/0x4
    LR = arch_local_irq_restore.part.5+0xa8/0xc0
[c000001ff249bbd0] [0000000000000001] 0x1 (unreliable)
[c000001ff249bbf0] [c000000000b2312c] _raw_spin_unlock_irqrestore+0x5c/0xd0
[c000001ff249bc20] [c0000000001afb8c] force_qs_rnp+0x21c/0x240
[c000001ff249bca0] [c0000000001b0488] rcu_gp_kthread+0x8d8/0x1bc0
[c000001ff249bdc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001ff249be30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
3920ffff 79290040 7d234b78 4e800020 3c4c00e8 38424640 3d22ff18 f8830040 
3929cbb0 f9230028 4e800020 60420000 <e9430030> e92a0000 81090038 710a0001 
Watchdog CPU:32 became unstuck
NMI backtrace for cpu 64
CPU: 64 PID: 1695 Comm: ksmd Tainted: G        W    L  4.13.0 #1
task: c000001feaf8ba00 task.stack: c000001fe9010000
NIP: c0000000001deb08 LR: c0000000001deac4 CTR: c00000000008ebd0
REGS: c000001fe90134b0 TRAP: 0501   Tainted: G        W    L   (4.13.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 44444424  XER: 00000000
CFAR: c0000000001deb10 SOFTE: 1 
GPR00: c0000000001dea98 c000001fe9013730 c000000001042100 0000000000000028 
GPR04: 0000000000000028 0000000000000028 0000000000000000 0000000000000004 
GPR08: c000000001083bf8 0000000000000001 c000001fffd29d98 c000003fff727080 
GPR12: c00000000008ebd0 c00000000fd94000 
NIP [c0000000001deb08] smp_call_function_many+0x398/0x460
LR [c0000000001deac4] smp_call_function_many+0x354/0x460
Call Trace:
[c000001fe9013730] [c0000000001dea98] smp_call_function_many+0x328/0x460 
(unreliable)
[c000001fe90137a0] [c0000000001dec1c] smp_call_function+0x4c/0x70
[c000001fe90137d0] [c0000000000711c4] pmdp_invalidate+0x74/0xb0
[c000001fe9013800] [c0000000003717f0] __split_huge_pmd+0x6f0/0xcc0
[c000001fe90138c0] [c000000000329b24] try_to_unmap_one+0x6d4/0x830
[c000001fe90139a0] [c0000000003282f4] rmap_walk_anon+0x164/0x3b0
[c000001fe9013a10] [c00000000032b444] try_to_unmap+0xa4/0x160
[c000001fe9013a70] [c000000000373a4c] split_huge_page_to_list+0x18c/0xbb0
[c000001fe9013b30] [c000000000352b2c] try_to_merge_one_page+0x2ac/0xa70
[c000001fe9013c40] [c00000000035335c] try_to_merge_with_ksm_page+0x6c/0xf0
[c000001fe9013c90] [c000000000354a70] ksm_scan_thread+0x9c0/0x1af0
[c000001fe9013dc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001fe9013e30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
3d020004 39081af8 78691f24 e95e0000 7d28482a 7d4a4a14 812a0018 71290001 
4182001c 60420000 7c210b78 7c421378 <812a0018> 71290001 4082fff0 7c2004ac 
rcu_sched kthread starved for 1093 jiffies! g18364 c18363 f0x0 
RCU_GP_DOING_FQS(4) ->state=0x0
rcu_sched       S 8960     9      2 0x00000800
Call Trace:
[c000001ff249b850] [c000001ff249b930] 0xc000001ff249b930 (unreliable)
[c000001ff249ba20] [c00000000001f258] __switch_to+0x278/0x4a0
[c000001ff249ba80] [c000000000b1a97c] __schedule+0x3dc/0x1050
[c000001ff249bb60] [c000000000b1b63c] schedule+0x4c/0xe0
[c000001ff249bb90] [c000000000b219a8] schedule_timeout+0xa8/0x5e0
[c000001ff249bca0] [c0000000001b06c8] rcu_gp_kthread+0xb18/0x1bc0
[c000001ff249bdc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001ff249be30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Watchdog CPU:152 became unstuck


...


tg3 0005:09:00.0 enP5p9s0f0: 3: NAPI info 
[0000003e:0000003e:(0000:0000:01ff):0ab0:(02b0:02b0:0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: 4: Host status block 
[00000001:000000b1:(0000:0000:0528):(0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: 4: NAPI info 
[000000a5:000000a5:(0000:0000:01ff):051c:(051c:051c:0000:0000)]
systemd[1]: systemd-journald.service: Processes still around after SIGKILL. 
Ignoring.
INFO: rcu_sched self-detected stall on CPU
        64-...: (62926 ticks this GP) idle=8b2/140000000000001/0 
softirq=40596/40596 fqs=22240 
         (t=72846 jiffies g=18364 c=18363 q=44508)
Sending NMI from CPU 64 to CPUs 40:
INFO: rcu_sched detected stalls on CPUs/tasks:
systemd[1]: systemd-udevd.service: Processes still around after SIGKILL. 
Ignoring.
Watchdog CPU:56 Hard LOCKUP
Modules linked in: fuse vhost_net vhost tap iptable_mangle ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user 
xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack overlay 
bridge stp llc binfmt_misc kvm_hv kvm vmx_crypto powernv_op_panel powernv_rng 
rng_core leds_powernv led_class autofs4 xfs btrfs lzo_compress raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c multipath mlx4_en raid10 crc32c_vpmsum lpfc be2net crc_t10dif 
crct10dif_generic crct10dif_common mlx4_core
irq event stamp: 24447578
hardirqs last  enabled at (24447578): [<c000000000b23120>] 
_raw_spin_unlock_irqrestore+0x50/0xd0
hardirqs last disabled at (24447577): [<c000000000b22e64>] 
_raw_spin_lock_irqsave+0x34/0xa0
softirqs last  enabled at (24442107): [<c0000000000ff518>] 
__do_softirq+0x4e8/0x710
softirqs last disabled at (24442100): [<c0000000000ffc38>] irq_exit+0x108/0x150
CPU: 56 PID: 13007 Comm: qemu-system-ppc Tainted: G        W    L  4.13.0 #1
task: c000003efc71a280 task.stack: c000003efc7a4000
NIP: c0000000001831f4 LR: c000000000b22ea4 CTR: c0000000007d25a0
REGS: c00000003fd5fd80 TRAP: 0900   Tainted: G        W    L   (4.13.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 28022224  XER: 20000000
CFAR: c00000000018320c SOFTE: 0 
GPR00: c000000000b22e98 c000003efc7a70d0 c000000001042100 c000000000ef6b80 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
GPR08: 0000000000000000 0000000080000040 0000000080000038 9000000000001003 
GPR12: 0000000000002200 c00000000fd91800 0000000000000000 c000000000d1b8c0 
GPR16: c000000000d1b8f0 c000000000d1b928 c000000000ef6f80 c000000000ef5f80 
GPR20: c000000000ced068 c000000000d1b890 c000000000ebee48 c000000000f35f80 
GPR24: c000000001087d50 c000000001088210 c000000001088208 0000003ffe650000 
GPR28: c000000000f38080 c000000000ed6b80 0000000000000000 c000000000ef6b80 
NIP [c0000000001831f4] do_raw_spin_lock+0x94/0x210
LR [c000000000b22ea4] _raw_spin_lock_irqsave+0x74/0xa0
Call Trace:
[c000003efc7a70d0] [c000000000f38080] rcu_struct_flavors+0x0/0x10 (unreliable)
[c000003efc7a7100] [c000000000b22e98] _raw_spin_lock_irqsave+0x68/0xa0
[c000003efc7a7140] [c0000000001b2f38] rcu_check_callbacks+0xa58/0xd30
[c000003efc7a7280] [c0000000001bd5ec] update_process_times+0x3c/0x90
[c000003efc7a72b0] [c0000000001d732c] tick_sched_handle.isra.5+0x2c/0xc0
[c000003efc7a72e0] [c0000000001d7418] tick_sched_timer+0x58/0xd0
[c000003efc7a7320] [c0000000001be674] __hrtimer_run_queues+0x154/0x790
[c000003efc7a73b0] [c0000000001bf9c0] hrtimer_interrupt+0xe0/0x330
[c000003efc7a7480] [c000000000026e50] __timer_interrupt+0xb0/0x520
[c000003efc7a74e0] [c0000000000277b0] timer_interrupt+0x90/0xe0
[c000003efc7a7510] [c000000000009280] decrementer_common+0x160/0x170
--- interrupt: 901 at .L142+0x0/0x4
    LR = arch_local_irq_restore.part.5+0xa8/0xc0
[c000003efc7a7800] [0000000000000038] 0x38 (unreliable)
[c000003efc7a7820] [d0000000182980e0] kvmppc_run_core+0x1028/0x2140 [kvm_hv]
[c000003efc7a79d0] [d00000001829a028] kvmppc_vcpu_run_hv+0x3a0/0x1ea0 [kvm_hv]
[c000003efc7a7b10] [d000000017fa62d4] kvmppc_vcpu_run+0x2c/0x48 [kvm]
[c000003efc7a7b30] [d000000017fa2a30] kvm_arch_vcpu_ioctl_run+0x108/0x320 [kvm]
[c000003efc7a7bd0] [d000000017f953bc] kvm_vcpu_ioctl+0x414/0x8f8 [kvm]
[c000003efc7a7d40] [c0000000003b922c] do_vfs_ioctl+0xcc/0xa80
[c000003efc7a7de0] [c0000000003b9c40] SyS_ioctl+0x60/0x100
[c000003efc7a7e30] [c00000000000b96c] system_call+0x58/0x6c
Instruction dump:
40c2fff0 7c2004ac 2fa90000 409e0020 a12d0008 913f0008 e92d0250 f93f0010 
38210030 ebe1fff8 4e800020 7c210b78 <e92d0000> 89290009 71290002 408200f0 
NMI backtrace for cpu 64
        40-...: (1 GPs behind) idle=4a2/140000000000000/0 softirq=40283/40285 
fqs=22241 

Watchdog CPU:0 Hard LOCKUP
Modules linked in: fuse vhost_net vhost tap iptable_mangle ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user 
xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack overlay 
bridge stp llc binfmt_misc kvm_hv kvm vmx_crypto powernv_op_panel powernv_rng 
rng_core leds_powernv led_class autofs4 xfs btrfs lzo_compress raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c multipath mlx4_en raid10 crc32c_vpmsum lpfc be2net crc_t10dif 
crct10dif_generic crct10dif_common mlx4_core
irq event stamp: 3244340
hardirqs last  enabled at (3244339): [<c000000000b23120>] 
_raw_spin_unlock_irqrestore+0x50/0xd0
hardirqs last disabled at (3244340): [<c000000000b1a6c8>] 
__schedule+0x128/0x1050
softirqs last  enabled at (3148130): [<c0000000000ff518>] 
__do_softirq+0x4e8/0x710
softirqs last disabled at (3148101): [<c0000000000ffc38>] irq_exit+0x108/0x150
CPU: 0 PID: 9 Comm: rcu_sched Tainted: G        W    L  4.13.0 #1
task: c000001ff2411700 task.stack: c000001ff2498000
NIP: c00000000014f680 LR: c000000000140c58 CTR: c000000000924a40
REGS: c00000003ffffd80 TRAP: 0900   Tainted: G        W    L   (4.13.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 28000242  XER: 20000000
CFAR: c000000000155348 SOFTE: 0 
GPR00: c000000000140c58 c000001ff249b5f0 c000000001042100 c000001fff326000 
GPR04: 0000000000000400 0000000000000001 00000002a5d78cbc c000000001116680 
GPR08: c000000001083bf8 000000000002d348 0000001ffe450000 00000000000047bc 
GPR12: 0000000000000000 c00000000fd80000 00000000000047bc 0000000000000000 
GPR16: 0000000000000000 0000000000000003 0000000000000004 0000000000000000 
GPR20: c000000000d1b1a8 c000001fff311140 0000000000000001 00000848af56c7bb 
GPR24: c000001fff311098 0000000000000000 c000001ff2411700 c000001fff326018 
GPR28: c000000001083bf8 0000000000000000 c000000000ed6000 c000001fff326000 
NIP [c00000000014f680] cpu_load_update+0x10/0x200
LR [c000000000140c58] scheduler_tick+0xb8/0x1b0
Call Trace:
[c000001ff249b5f0] [c000000000140c58] scheduler_tick+0xb8/0x1b0 (unreliable)
[c000001ff249b650] [c0000000001bd60c] update_process_times+0x5c/0x90
[c000001ff249b680] [c0000000001d732c] tick_sched_handle.isra.5+0x2c/0xc0
[c000001ff249b6b0] [c0000000001d7418] tick_sched_timer+0x58/0xd0
[c000001ff249b6f0] [c0000000001be674] __hrtimer_run_queues+0x154/0x790
[c000001ff249b780] [c0000000001bf9c0] hrtimer_interrupt+0xe0/0x330
[c000001ff249b850] [c000000000026e50] __timer_interrupt+0xb0/0x520
[c000001ff249b8b0] [c0000000000277b0] timer_interrupt+0x90/0xe0
[c000001ff249b8e0] [c000000000009280] decrementer_common+0x160/0x170
--- interrupt: 901 at .L142+0x0/0x4
    LR = arch_local_irq_restore.part.5+0xa8/0xc0
[c000001ff249bbd0] [0000000000000001] 0x1 (unreliable)
[c000001ff249bbf0] [c000000000b2312c] _raw_spin_unlock_irqrestore+0x5c/0xd0
[c000001ff249bc20] [c0000000001afb8c] force_qs_rnp+0x21c/0x240
[c000001ff249bca0] [c0000000001b0488] rcu_gp_kthread+0x8d8/0x1bc0
[c000001ff249bdc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001ff249be30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
7ce041ad 40c2fff4 4e800020 60420000 3c4c00ef 38422aa0 4bff77b0 60420000 
3c4c00ef 38422a90 7c0802a6 e9230090 <7c6c1b78> fb41ffd0 fb61ffd8 fb81ffe0 
tg3 0005:09:00.0 enP5p9s0f0: transmit timed out, resetting
NMI backtrace for cpu 64
CPU: 64 PID: 1695 Comm: ksmd Tainted: G        W    L  4.13.0 #1
task: c000001feaf8ba00 task.stack: c000001fe9010000
NIP: c0000000001deb08 LR: c0000000001deac4 CTR: c00000000008ebd0
REGS: c000001fe90134b0 TRAP: 0501   Tainted: G        W    L   (4.13.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 44444424  XER: 00000000
CFAR: c0000000001deb10 SOFTE: 1 
GPR00: c0000000001dea98 c000001fe9013730 c000000001042100 0000000000000028 
GPR04: 0000000000000028 0000000000000028 0000000000000000 0000000000000004 
GPR08: c000000001083bf8 0000000000000001 c000001fffd29d98 c000003fff727080 
GPR12: c00000000008ebd0 c00000000fd94000 
NIP [c0000000001deb08] smp_call_function_many+0x398/0x460
LR [c0000000001deac4] smp_call_function_many+0x354/0x460
Call Trace:
[c000001fe9013730] [c0000000001dea98] smp_call_function_many+0x328/0x460 
(unreliable)
[c000001fe90137a0] [c0000000001dec1c] smp_call_function+0x4c/0x70
[c000001fe90137d0] [c0000000000711c4] pmdp_invalidate+0x74/0xb0
[c000001fe9013800] [c0000000003717f0] __split_huge_pmd+0x6f0/0xcc0
[c000001fe90138c0] [c000000000329b24] try_to_unmap_one+0x6d4/0x830
[c000001fe90139a0] [c0000000003282f4] rmap_walk_anon+0x164/0x3b0
[c000001fe9013a10] [c00000000032b444] try_to_unmap+0xa4/0x160
[c000001fe9013a70] [c000000000373a4c] split_huge_page_to_list+0x18c/0xbb0
[c000001fe9013b30] [c000000000352b2c] try_to_merge_one_page+0x2ac/0xa70
[c000001fe9013c40] [c00000000035335c] try_to_merge_with_ksm_page+0x6c/0xf0
[c000001fe9013c90] [c000000000354a70] ksm_scan_thread+0x9c0/0x1af0
[c000001fe9013dc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001fe9013e30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
3d020004 39081af8 78691f24 e95e0000 7d28482a 7d4a4a14 812a0018 71290001 
4182001c 60420000 7c210b78 7c421378 <812a0018> 71290001 4082fff0 7c2004ac 
rcu_sched kthread starved for 1095 jiffies! g18364 c18363 f0x0 
RCU_GP_DOING_FQS(4) ->state=0x0
rcu_sched       R  running task     8960     9      2 0x00000804
Call Trace:
Watchdog CPU:16 became unstuck
systemd[1]: systemd-logind.service: Processes still around after SIGKILL. 
Ignoring.
tg3 0005:09:00.0 enP5p9s0f0: 0x00000000: 0x165714e4, 0x00100546, 0x02000001, 
0x00800000
tg3 0005:09:00.0 enP5p9s0f0: 0x00000010: 0x0000000c, 0x00002501, 0x0001000c, 
0x00002501
tg3 0005:09:00.0 enP5p9s0f0: 0x00000020: 0x0002000c, 0x00002501, 0x00000000, 
0x04201014
tg3 0005:09:00.0 enP5p9s0f0: 0x00000030: 0x00000000, 0x00000048, 0x00000000, 
0x00000100
tg3 0005:09:00.0 enP5p9s0f0: 0x00000040: 0x00000000, 0xc5000000, 0xc8035001, 
0x64002008



.....

tg3 0005:09:00.0 enP5p9s0f0: 3: Host status block 
[00000001:00000050:(0000:0000:0000):(0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: 3: NAPI info 
[0000003e:0000003e:(0000:0000:01ff):0ab0:(02b0:02b0:0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: 4: Host status block 
[00000001:000000b1:(0000:0000:0528):(0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: 4: NAPI info 
[000000a5:000000a5:(0000:0000:01ff):051c:(051c:051c:0000:0000)]
tg3 0005:09:00.0 enP5p9s0f0: transmit timed out, resetting
watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [ksmd:1695]
Modules linked in: fuse vhost_net vhost tap iptable_mangle ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user 
xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack overlay 
bridge stp llc binfmt_misc kvm_hv kvm vmx_crypto powernv_op_panel powernv_rng 
rng_core leds_powernv led_class autofs4 xfs btrfs lzo_compress raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c multipath mlx4_en raid10 crc32c_vpmsum lpfc be2net crc_t10dif 
crct10dif_generic crct10dif_common mlx4_core
irq event stamp: 822355
hardirqs last  enabled at (822355): [<c00000000000bfc8>] 
fast_exc_return_irq+0x28/0x34
hardirqs last disabled at (822354): [<c000000000009268>] 
decrementer_common+0x148/0x170
softirqs last  enabled at (779544): [<c0000000000ff518>] 
__do_softirq+0x4e8/0x710
softirqs last disabled at (779529): [<c0000000000ffc38>] irq_exit+0x108/0x150
CPU: 64 PID: 1695 Comm: ksmd Tainted: G        W    L  4.13.0 #1
task: c000001feaf8ba00 task.stack: c000001fe9010000
NIP: c0000000001deb08 LR: c0000000001deac4 CTR: c00000000008ebd0
REGS: c000001fe90134b0 TRAP: 0901   Tainted: G        W    L   (4.13.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 44444424  XER: 00000000
CFAR: c0000000001deb10 SOFTE: 1 
GPR00: c0000000001dea98 c000001fe9013730 c000000001042100 0000000000000028 
GPR04: 0000000000000028 0000000000000028 0000000000000000 0000000000000004 
GPR08: c000000001083bf8 0000000000000001 c000001fffd29d98 c000003fff727080 
GPR12: c00000000008ebd0 c00000000fd94000 
NIP [c0000000001deb08] smp_call_function_many+0x398/0x460
LR [c0000000001deac4] smp_call_function_many+0x354/0x460
Call Trace:
[c000001fe9013730] [c0000000001dea98] smp_call_function_many+0x328/0x460 
(unreliable)
[c000001fe90137a0] [c0000000001dec1c] smp_call_function+0x4c/0x70
[c000001fe90137d0] [c0000000000711c4] pmdp_invalidate+0x74/0xb0
[c000001fe9013800] [c0000000003717f0] __split_huge_pmd+0x6f0/0xcc0
[c000001fe90138c0] [c000000000329b24] try_to_unmap_one+0x6d4/0x830
[c000001fe90139a0] [c0000000003282f4] rmap_walk_anon+0x164/0x3b0
[c000001fe9013a10] [c00000000032b444] try_to_unmap+0xa4/0x160
[c000001fe9013a70] [c000000000373a4c] split_huge_page_to_list+0x18c/0xbb0
[c000001fe9013b30] [c000000000352b2c] try_to_merge_one_page+0x2ac/0xa70
[c000001fe9013c40] [c00000000035335c] try_to_merge_with_ksm_page+0x6c/0xf0
[c000001fe9013c90] [c000000000354a70] ksm_scan_thread+0x9c0/0x1af0
[c000001fe9013dc0] [c00000000012ca24] kthread+0x1b4/0x1c0
[c000001fe9013e30] [c00000000000bcec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
3d020004 39081af8 78691f24 e95e0000 7d28482a 7d4a4a14 812a0018 71290001 
4182001c 60420000 7c210b78 7c421378 <812a0018> 71290001 4082fff0 7c2004ac 
tg3 0005:09:00.0 enP5p9s0f0: 0x00000000: 0x165714e4, 0x00100546, 0x02000001, 
0x00800000
tg3 0005:09:00.0 enP5p9s0f0: 0x00000010: 0x0000000c, 0x00002501, 0x0001000c, 
0x00002501
tg3 0005:09:00.0 enP5p9s0f0: 0x00000020: 0x0002000c, 0x00002501, 0x00000000, 
0x04201014


** Changed in: linux (Ubuntu Xenial)
       Status: Incomplete => Confirmed

** Tags added: kernel-bug-exists-upstream

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1714859

Title:
  oops in 4.4.0-62-generic (ppc64le)

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Xenial:
  Confirmed

Bug description:
  Sep  1 14:23:30 p87 kernel: [17274563.423972] device vnet2 entered 
promiscuous mode
  Sep  1 14:23:30 p87 kernel: [17274563.436101] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:23:30 p87 kernel: [17274563.436113] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:23:31 p87 kernel: [17274564.005034] audit: type=1400 
audit(1504239811.140:793): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" 
pid=18952 comm="apparmor_parser"
  Sep  1 14:23:31 p87 kernel: [17274564.019911] audit: type=1400 
audit(1504239811.156:794): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" 
name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" 
pid=18952 comm="apparmor_parser"
  Sep  1 14:23:31 p87 kernel: [17274564.324291] KVM guest htab at 
c0000079a1000000 (order 29), LPID 3
  Sep  1 14:23:45 p87 kernel: [17274578.483572] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:26:53 p87 kernel: [17274766.572284] br0: port 4(vnet2) entered 
disabled state
  Sep  1 14:26:53 p87 kernel: [17274766.579930] device vnet2 left promiscuous 
mode
  Sep  1 14:26:53 p87 kernel: [17274766.579932] br0: port 4(vnet2) entered 
disabled state
  Sep  1 14:26:54 p87 kernel: [17274767.303109] audit: type=1400 
audit(1504240014.441:795): apparmor="STATUS" operation="profile_remove" 
profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" 
pid=22668 comm="apparmor_parser"
  Sep  1 14:26:56 p87 kernel: [17274768.917394] audit: type=1400 
audit(1504240016.053:796): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" 
pid=22672 comm="apparmor_parser"
  Sep  1 14:26:56 p87 kernel: [17274768.917673] audit: type=1400 
audit(1504240016.053:797): apparmor="STATUS" operation="profile_load" 
profile="unconfined" 
name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" 
pid=22672 comm="apparmor_parser"
  Sep  1 14:26:56 p87 kernel: [17274768.982783] device vnet2 entered 
promiscuous mode
  Sep  1 14:26:56 p87 kernel: [17274768.994891] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:26:56 p87 kernel: [17274768.994902] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:26:56 p87 kernel: [17274769.535982] audit: type=1400 
audit(1504240016.673:798): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5" 
pid=22702 comm="apparmor_parser"
  Sep  1 14:26:56 p87 kernel: [17274769.546719] audit: type=1400 
audit(1504240016.685:799): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" 
name="libvirt-435e8619-e460-4c31-98b9-01dc8eab0cc5//qemu_bridge_helper" 
pid=22702 comm="apparmor_parser"
  Sep  1 14:26:57 p87 kernel: [17274769.861666] KVM guest htab at 
c0000079a1000000 (order 29), LPID 3
  Sep  1 14:27:11 p87 kernel: [17274784.050382] br0: port 4(vnet2) entered 
forwarding state
  Sep  1 14:28:30 p87 kernel: [17274863.605174] Unable to handle kernel paging 
request for data at address 0x00000008
  Sep  1 14:28:30 p87 kernel: [17274863.605188] Faulting instruction address: 
0xc00000000044edcc
  Sep  1 14:28:30 p87 kernel: [17274863.605195] Oops: Kernel access of bad 
area, sig: 11 [#1]
  Sep  1 14:28:30 p87 kernel: [17274863.605199] SMP NR_CPUS=2048 NUMA PowerNV
  Sep  1 14:28:30 p87 kernel: [17274863.605206] Modules linked in: tcp_diag 
inet_diag ebtable_filter ebtables binfmt_misc veth xfrm_user xfrm_algo 
vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ip6table_filter ip6_tables ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack 
br_netfilter overlay bridge stp llc kvm_hv uio_pdrv_genirq powernv_rng 
ibmpowernv vmx_crypto leds_powernv uio ipmi_powernv ipmi_msghandler kvm_pr kvm 
autofs4 xfs btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid0 multipath linear raid10 raid1 ses 
enclosure mlx4_en be2net lpfc mlx4_core vxlan ip6_udp_tunnel scsi_transport_fc 
udp_tunnel ipr [last unloaded: ebtables]
  Sep  1 14:28:30 p87 kernel: [17274863.605294] CPU: 24 PID: 19018 Comm: sshfs 
Not tainted 4.4.0-62-generic #83-Ubuntu
  Sep  1 14:28:30 p87 kernel: [17274863.605300] task: c000001fc9c7c8e0 ti: 
c000001fe4c10000 task.ti: c000001fe4c10000
  Sep  1 14:28:30 p87 kernel: [17274863.605306] NIP: c00000000044edcc LR: 
c00000000044ede0 CTR: c000000000108b20
  Sep  1 14:28:30 p87 kernel: [17274863.605311] REGS: c000001fe4c139a0 TRAP: 
0300   Not tainted  (4.4.0-62-generic)
  Sep  1 14:28:30 p87 kernel: [17274863.605315] MSR: 9000000000009033 
<SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24000424  XER: 00000000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] CFAR: c000000000008468 DAR: 
0000000000000008 DSISR: 42000000 SOFTE: 1
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR00: c00000000044ede0 
c000001fe4c13c20 c0000000015f7600 c000001fda681000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR04: c00000006377fa00 
0000000000000001 0000000000000000 0000000000000000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR08: 0000000000000000 
c00000006377fa00 c000001fe4c13c90 0000000000000c01
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR12: c000000000108b20 
c00000000fb4e400 0000000000000000 0000000000000000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR16: 0000000000000000 
0000000000000000 c00000003e213560 0000000000000000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR20: c000001fe4c13c90 
c000001fda6815b0 c000001fe4c13ca0 c000001fda681000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR24: ffffffffffffff99 
0000000000000400 0000000000000000 c000001fda681000
  Sep  1 14:28:30 p87 kernel: [17274863.605330] GPR28: ffffffffffffff99 
0000000000000080 0000000000000100 c000001fe4c13c90
  Sep  1 14:28:30 p87 kernel: [17274863.605393] NIP [c00000000044edcc] 
end_requests+0x8c/0xd0
  Sep  1 14:28:30 p87 kernel: [17274863.605398] LR [c00000000044ede0] 
end_requests+0xa0/0xd0
  Sep  1 14:28:30 p87 kernel: [17274863.605401] Call Trace:
  Sep  1 14:28:30 p87 kernel: [17274863.605405] [c000001fe4c13c20] 
[c00000000044ede0] end_requests+0xa0/0xd0 (unreliable)
  Sep  1 14:28:30 p87 kernel: [17274863.605412] [c000001fe4c13c70] 
[c00000000044fb90] fuse_abort_conn+0x3a0/0x490
  Sep  1 14:28:30 p87 kernel: [17274863.605418] [c000001fe4c13d20] 
[c00000000044fd48] fuse_dev_release+0xc8/0xd0
  Sep  1 14:28:30 p87 kernel: [17274863.605426] [c000001fe4c13d50] 
[c0000000002e65c0] __fput+0xe0/0x310
  Sep  1 14:28:30 p87 kernel: [17274863.605433] [c000001fe4c13db0] 
[c0000000000e3e50] task_work_run+0xf0/0x130
  Sep  1 14:28:30 p87 kernel: [17274863.605440] [c000001fe4c13e00] 
[c0000000000178d4] do_notify_resume+0xc4/0xd0
  Sep  1 14:28:30 p87 kernel: [17274863.605446] [c000001fe4c13e30] 
[c000000000009838] ret_from_except_lite+0x64/0x68
  Sep  1 14:28:30 p87 kernel: [17274863.605451] Instruction dump:
  Sep  1 14:28:30 p87 kernel: [17274863.605455] 7d08e878 7d0051ad 40c2fff4 
60420000 7d0050a8 7d08f078 7d0051ad 40c2fff4
  Sep  1 14:28:30 p87 kernel: [17274863.605464] e9490008 e9090000 7d244b78 
7f63db78 <f9480008> f90a0000 f9290000 f9290008
  Sep  1 14:28:30 p87 kernel: [17274863.605479] ---[ end trace c395103c4016f0f2 
]---
  Sep  1 14:28:30 p87 kernel: [17274863.634036]

  
  make -j 800
  on a sshfs connected source directory (the source was a KVM guest running on 
the local machine). the guest (a BE guest) was rather stalled at the time and 
ended up hard powering off on it. The oops occurred about the same time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1714859/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to