Hi Colin, That looks one progress, but still takes time to reproduce that, and I will use your new approach to reproduce that.
When you are doing that, could you dump the file of /proc/$(pidof irqbalance)/maps so that we can see where the faulted address are in the process's vm space? thanks, On Sat, Jul 4, 2015 at 4:10 AM, Colin Ian King <1469...@bugs.launchpad.net> wrote: > Running the following: > > #!/bin/bash > tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu > crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork > futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf > longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap > msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit > seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice > stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp > udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr > zero zombie" > > for t in $tests > do > echo $t > echo $t > /dev/kmsg > ./stress-ng --$t 0 -v -t 60 > done > > eventually tripped the translation fault in irqbalance. I ran this > after a clean reboot. > > [ 4901.799846] timerfd > [ 4961.807050] tsearch > [ 5021.884456] udp > [ 5081.895058] udp-flood > [ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at > 0x002d6da4, esr 0x92000006 > [ 5141.674376] pgd = ffffffcfb51a0000 > [ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, > *pmd=0000000000000000 > > [ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic > #21-Ubuntu > [ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT) > [ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: > ffffffcfab710000 > [ 5141.816206] PC is at 0x7f88287834 > [ 5141.816208] LR is at 0x7f882877f4 > [ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: > 80000000 > [ 5141.816212] sp : 0000007ff2e46b30 > [ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0 > [ 5141.816217] x27: 0000000000409548 x26: 000000000041a000 > [ 5141.816220] x25: 0000000000000001 x24: 0000000000000010 > [ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880 > [ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000 > [ 5141.816228] x19: 0000000000000002 x18: 0000000000000000 > [ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0 > [ 5141.816233] x15: 0000000000000020 x14: 0000000000000001 > [ 5141.816235] x13: 0000000000000000 x12: 0000000000000000 > [ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010 > [ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007 > [ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80 > [ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0 > [ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001 > [ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c > > [ 5141.907792] urandom > [ 5201.928712] utime > [ 5261.934534] vecmath > [ 5321.940302] vfork > [ 5381.947904] vm > [ 5441.991784] vm-rw > [ 5502.017614] vm-splice > [ 5562.023334] wcs > [ 5622.037054] wait > [ 5682.043302] yield > [ 5742.056595] xattr > [ 5802.075772] zero > [ 5862.087396] zombie > > -- > You received this bug notification because you are subscribed to linux > in Ubuntu. > https://bugs.launchpad.net/bugs/1469214 > > Title: > HP ProLiant m400 Server crashes with unhandled level 3 translation > fault > > Status in linux package in Ubuntu: > Triaged > > Bug description: > Running stress-ng on a HP ProLiant m400 server can cause unhandled > level 3 translations faults: > > use stress-ng from git://kernel.ubuntu.com/cking/stress-ng > > ./stress-ng --seq 0 -t 60 -v > > and after some time this trips the following: > > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] > systemd-timesyn[481]: unhandled level 3 translation fault (7) at > 0x7fa8ea6008, esr 0x92000007 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = > ffffffcfb563f000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] > *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, > *pte=000000001d151c00 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566] > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 > Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP > ProLiant m400 Server Cartridge (DT) > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: > ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at > 0x7fa8d81824 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at > 0x7fa8e3b3e4 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : > [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : > 0000007ff120d660 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: > 0000007ff120d660 x28: 0000007fa8f1c000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: > 0000007fa8f32084 x26: 0000007fa8f32000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: > 0000007fa8f1d788 x24: 0000007fa8f1d888 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: > 0000000000000001 x22: 0000007fa8f1faa0 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: > 0000007ff120d7f0 x20: 0000007ff120d7d0 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: > 0000007fa8f31000 x18: 0000007fa8f1e000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: > 0000007fa8e3b3b8 x16: 0000007fa8ea6000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: > 003b9aca00000000 x14: 00219bbdd0000000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: > ffffffffaa751223 x12: 0000000000000000 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: > 0101010101010101 x10: 7f7f7f7f7f7f7f7f > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : > 37333c43484f5e46 x8 : 0000007ff120d818 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : > 0000007ff120d8f0 x6 : 0000007ff120d828 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : > ffffff80ffffffd0 x4 : 0000007ff120d8c0 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : > 0000007ff120d7d0 x2 : 0000007fa8f1faa0 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : > 0000000000000001 x0 : 0000000000000064 > Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616] > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1469214 Title: HP ProLiant m400 Server crashes with unhandled level 3 translation fault Status in linux package in Ubuntu: Triaged Bug description: Running stress-ng on a HP ProLiant m400 server can cause unhandled level 3 translations faults: use stress-ng from git://kernel.ubuntu.com/cking/stress-ng ./stress-ng --seq 0 -t 60 -v and after some time this trips the following: Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566] Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT) Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp