Hi Colin,

That looks one progress, but still takes time to reproduce that,
and I will use your new approach to reproduce that.

When you are doing that, could you dump the file of /proc/$(pidof
irqbalance)/maps so that we can see where the faulted address are
in the process's vm space?

thanks,


On Sat, Jul 4, 2015 at 4:10 AM, Colin Ian King
<1469...@bugs.launchpad.net> wrote:
> Running the following:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu 
> crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork 
> futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf 
> longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap 
> msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit 
> seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice 
> stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp 
> udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr 
> zero zombie"
>
> for t in $tests
> do
>         echo $t
>         echo $t > /dev/kmsg
>         ./stress-ng --$t 0 -v -t 60
> done
>
> eventually tripped the translation fault in irqbalance.  I ran this
> after a clean reboot.
>
> [ 4901.799846] timerfd
> [ 4961.807050] tsearch
> [ 5021.884456] udp
> [ 5081.895058] udp-flood
> [ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at 
> 0x002d6da4, esr 0x92000006
> [ 5141.674376] pgd = ffffffcfb51a0000
> [ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, 
> *pmd=0000000000000000
>
> [ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic 
> #21-Ubuntu
> [ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> [ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: 
> ffffffcfab710000
> [ 5141.816206] PC is at 0x7f88287834
> [ 5141.816208] LR is at 0x7f882877f4
> [ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: 
> 80000000
> [ 5141.816212] sp : 0000007ff2e46b30
> [ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0
> [ 5141.816217] x27: 0000000000409548 x26: 000000000041a000
> [ 5141.816220] x25: 0000000000000001 x24: 0000000000000010
> [ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880
> [ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000
> [ 5141.816228] x19: 0000000000000002 x18: 0000000000000000
> [ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0
> [ 5141.816233] x15: 0000000000000020 x14: 0000000000000001
> [ 5141.816235] x13: 0000000000000000 x12: 0000000000000000
> [ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010
> [ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007
> [ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80
> [ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0
> [ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001
> [ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c
>
> [ 5141.907792] urandom
> [ 5201.928712] utime
> [ 5261.934534] vecmath
> [ 5321.940302] vfork
> [ 5381.947904] vm
> [ 5441.991784] vm-rw
> [ 5502.017614] vm-splice
> [ 5562.023334] wcs
> [ 5622.037054] wait
> [ 5682.043302] yield
> [ 5742.056595] xattr
> [ 5802.075772] zero
> [ 5862.087396] zombie
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1469214
>
> Title:
>   HP ProLiant m400 Server crashes with unhandled level 3 translation
>   fault
>
> Status in linux package in Ubuntu:
>   Triaged
>
> Bug description:
>   Running stress-ng on a HP ProLiant m400 server can cause unhandled
>   level 3 translations faults:
>
>   use stress-ng from git://kernel.ubuntu.com/cking/stress-ng
>
>   ./stress-ng --seq 0 -t 60 -v
>
>   and after some time this trips the following:
>
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] 
> systemd-timesyn[481]: unhandled level 3 translation fault (7) at 
> 0x7fa8ea6008, esr 0x92000007
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = 
> ffffffcfb563f000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] 
> *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, 
> *pte=000000001d151c00
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 
> Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP 
> ProLiant m400 Server Cartridge (DT)
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: 
> ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 
> 0x7fa8d81824
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 
> 0x7fa8e3b3e4
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : 
> [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 
> 0000007ff120d660
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 
> 0000007ff120d660 x28: 0000007fa8f1c000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 
> 0000007fa8f32084 x26: 0000007fa8f32000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 
> 0000007fa8f1d788 x24: 0000007fa8f1d888
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 
> 0000000000000001 x22: 0000007fa8f1faa0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 
> 0000007ff120d7f0 x20: 0000007ff120d7d0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 
> 0000007fa8f31000 x18: 0000007fa8f1e000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 
> 0000007fa8e3b3b8 x16: 0000007fa8ea6000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 
> 003b9aca00000000 x14: 00219bbdd0000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: 
> ffffffffaa751223 x12: 0000000000000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 
> 0101010101010101 x10: 7f7f7f7f7f7f7f7f
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 
> 37333c43484f5e46 x8 : 0000007ff120d818
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 
> 0000007ff120d8f0 x6 : 0000007ff120d828
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : 
> ffffff80ffffffd0 x4 : 0000007ff120d8c0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 
> 0000007ff120d7d0 x2 : 0000007fa8f1faa0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 
> 0000000000000001 x0 : 0000000000000064
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1469214

Title:
  HP ProLiant m400 Server crashes with unhandled level 3 translation
  fault

Status in linux package in Ubuntu:
  Triaged

Bug description:
  Running stress-ng on a HP ProLiant m400 server can cause unhandled
  level 3 translations faults:

  use stress-ng from git://kernel.ubuntu.com/cking/stress-ng

  ./stress-ng --seq 0 -t 60 -v

  and after some time this trips the following:

  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] 
systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, 
esr 0x92000007
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = 
ffffffcfb563f000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] 
*pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, 
*pte=000000001d151c00
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 
Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP 
ProLiant m400 Server Cartridge (DT)
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: 
ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : 
[<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 
0000007ff120d660 x28: 0000007fa8f1c000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 
0000007fa8f32084 x26: 0000007fa8f32000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 
0000007fa8f1d788 x24: 0000007fa8f1d888
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 
0000000000000001 x22: 0000007fa8f1faa0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 
0000007ff120d7f0 x20: 0000007ff120d7d0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 
0000007fa8f31000 x18: 0000007fa8f1e000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 
0000007fa8e3b3b8 x16: 0000007fa8ea6000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 
003b9aca00000000 x14: 00219bbdd0000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: 
ffffffffaa751223 x12: 0000000000000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 
0101010101010101 x10: 7f7f7f7f7f7f7f7f
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 
37333c43484f5e46 x8 : 0000007ff120d818
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 
0000007ff120d8f0 x6 : 0000007ff120d828
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : 
ffffff80ffffffd0 x4 : 0000007ff120d8c0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 
0000007ff120d7d0 x2 : 0000007fa8f1faa0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 
0000000000000001 x0 : 0000000000000064
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to