On Mon, Jul 6, 2015 at 9:28 PM, Colin Ian King <1469...@bugs.launchpad.net> wrote: > I re-ran this today with the following script as a non-root user: > > #!/bin/bash > tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu > crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork > futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf > longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap > msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit > seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice > stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp > udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr > zero zombie" > > for t in $tests > do > echo $t > echo $t | sudo tee /dev/kmsg > ./stress-ng --$t 0 -v -t 60 > done > > and hit this issue: > > [14098.848615] urandom > [14111.696335] irqbalance[828]: unhandled level 2 translation fault (11) at > 0x00004f64, esr 0x92000006 > [14111.696341] pgd = ffffffcfef71b000 > [14111.737149] [00004f64] *pgd=0000004fef1f3003, *pud=0000004fef1f3003, > *pmd=0000000000000000 >
As I suggested, it should be helpful to provide /proc/$(pidof irqbalance)/maps, otherwise we can't know where both the faulted and PC address are. Finally I have figured out one simple way to reproduce the issue: 1) apply the attached debug patch to stress-ng 2) run the following script: sudo cat /proc/$(pidof irqbalance)/maps /home/ubuntu/git/stress-ng/stress-ng --sequential 0 --seq-start 80 --seq-end 84 -t 60 --syslog --metrics --times -v And the above command just runs the following 4 stresses in 4 minutes: stress-ng: info: [1067] dispatching hogs: 8 tsearch, 8 udp, 8 udp-flood, 8 urandom 3) the above may trigger the following faults from irqbalance with ~3/4 probability, and the faulted address is in heap, and PC points to code of libglib-2.0.so, so looks like a use-after-free in irqbalance or libglib? And no information shows it is related with kernel, also the four stresses are quite simple and shouldn't cause trouble to kernel. # irqbalance memory maps 00400000-0040a000 r-xp 00000000 08:02 10496929 /usr/sbin/irqbalance 00419000-0041a000 r-xp 00009000 08:02 10496929 /usr/sbin/irqbalance 0041a000-0041b000 rwxp 0000a000 08:02 10496929 /usr/sbin/irqbalance 16294000-162b5000 rwxp 00000000 00:00 0 [heap] 162b5000-162ce000 rwxp 00000000 00:00 0 [heap] 7f8fbf9000-7f8fbfb000 rwxp 00000000 00:00 0 7f8fbfb000-7f8fc11000 r-xp 00000000 08:02 4722034 /lib/aarch64-linux-gnu/libpthread-2.21.so 7f8fc11000-7f8fc20000 ---p 00016000 08:02 4722034 /lib/aarch64-linux-gnu/libpthread-2.21.so 7f8fc20000-7f8fc21000 r-xp 00015000 08:02 4722034 /lib/aarch64-linux-gnu/libpthread-2.21.so 7f8fc21000-7f8fc22000 rwxp 00016000 08:02 4722034 /lib/aarch64-linux-gnu/libpthread-2.21.so 7f8fc22000-7f8fc26000 rwxp 00000000 00:00 0 7f8fc26000-7f8fc7f000 r-xp 00000000 08:02 4718668 /lib/aarch64-linux-gnu/libpcre.so.3.13.1 7f8fc7f000-7f8fc8f000 ---p 00059000 08:02 4718668 /lib/aarch64-linux-gnu/libpcre.so.3.13.1 7f8fc8f000-7f8fc90000 r-xp 00059000 08:02 4718668 /lib/aarch64-linux-gnu/libpcre.so.3.13.1 7f8fc90000-7f8fc91000 rwxp 0005a000 08:02 4718668 /lib/aarch64-linux-gnu/libpcre.so.3.13.1 7f8fc91000-7f8fdc1000 r-xp 00000000 08:02 4722027 /lib/aarch64-linux-gnu/libc-2.21.so 7f8fdc1000-7f8fdd0000 ---p 00130000 08:02 4722027 /lib/aarch64-linux-gnu/libc-2.21.so 7f8fdd0000-7f8fdd4000 r-xp 0012f000 08:02 4722027 /lib/aarch64-linux-gnu/libc-2.21.so 7f8fdd4000-7f8fdd6000 rwxp 00133000 08:02 4722027 /lib/aarch64-linux-gnu/libc-2.21.so 7f8fdd6000-7f8fdda000 rwxp 00000000 00:00 0 7f8fdda000-7f8fde3000 r-xp 00000000 08:02 10885206 /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0 7f8fde3000-7f8fdf2000 ---p 00009000 08:02 10885206 /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0 7f8fdf2000-7f8fdf3000 r-xp 00008000 08:02 10885206 /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0 7f8fdf3000-7f8fdf4000 rwxp 00009000 08:02 10885206 /usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0 7f8fdf4000-7f8fdf8000 rwxp 00000000 00:00 0 7f8fdf8000-7f8fe89000 r-xp 00000000 08:02 4722041 /lib/aarch64-linux-gnu/libm-2.21.so 7f8fe89000-7f8fe98000 ---p 00091000 08:02 4722041 /lib/aarch64-linux-gnu/libm-2.21.so 7f8fe98000-7f8fe99000 r-xp 00090000 08:02 4722041 /lib/aarch64-linux-gnu/libm-2.21.so 7f8fe99000-7f8fe9a000 rwxp 00091000 08:02 4722041 /lib/aarch64-linux-gnu/libm-2.21.so 7f8fe9a000-7f8ff8c000 r-xp 00000000 08:02 4718610 /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1 7f8ff8c000-7f8ff9c000 ---p 000f2000 08:02 4718610 /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1 7f8ff9c000-7f8ff9d000 r-xp 000f2000 08:02 4718610 /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1 7f8ff9d000-7f8ff9e000 rwxp 000f3000 08:02 4718610 /lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1 7f8ff9e000-7f8ff9f000 rwxp 00000000 00:00 0 7f8ff9f000-7f8ffa3000 r-xp 00000000 08:02 10879730 /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0 7f8ffa3000-7f8ffb2000 ---p 00004000 08:02 10879730 /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0 7f8ffb2000-7f8ffb3000 r-xp 00003000 08:02 10879730 /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0 7f8ffb3000-7f8ffb4000 rwxp 00004000 08:02 10879730 /usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0 7f8ffb4000-7f8ffd0000 r-xp 00000000 08:02 4722030 /lib/aarch64-linux-gnu/ld-2.21.so 7f8ffd0000-7f8ffd3000 rwxp 00000000 00:00 0 7f8ffdc000-7f8ffde000 rwxp 00000000 00:00 0 7f8ffde000-7f8ffdf000 r--p 00000000 00:00 0 [vvar] 7f8ffdf000-7f8ffe0000 r-xp 00000000 00:00 0 [vdso] 7f8ffe0000-7f8ffe1000 r-xp 0001c000 08:02 4722030 /lib/aarch64-linux-gnu/ld-2.21.so 7f8ffe1000-7f8ffe3000 rwxp 0001d000 08:02 4722030 /lib/aarch64-linux-gnu/ld-2.21.so 7fecdb1000-7fecdd2000 rw-p 00000000 00:00 0 [stack] [ 250.276095] irqbalance[779]: unhandled level 2 translation fault (11) at 0x00162a54, esr 0x92000006 [ 250.276103] pgd = ffffffc0ff812000 [ 250.316917] [00162a54] *pgd=00000040ffa6b003, *pud=00000040ffa6b003, *pmd=0000000000000000 [ 250.416447] CPU: 5 PID: 779 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu [ 250.416450] Hardware name: HP ProLiant m400 Server Cartridge (DT) [ 250.416452] task: ffffffcfb46cc980 ti: ffffffc0feba0000 task.ti: ffffffc0feba0000 [ 250.416464] PC is at 0x7f8ff02834 [ 250.416467] LR is at 0x7f8ff027f4 [ 250.416469] pc : [<0000007f8ff02834>] lr : [<0000007f8ff027f4>] pstate: 80000000 [ 250.416471] sp : 0000007fecdd1480 [ 250.416472] x29: 0000007fecdd1480 x28: 000000000041a000 [ 250.416476] x27: 000000000041a000 x26: 00000000004094e0 [ 250.416478] x25: 0000000000000001 x24: 0000000000000010 [ 250.416481] x23: 00000000162948a0 x22: 0000000016294880 [ 250.416484] x21: 0000000000000018 x20: 0000007f8ff9e000 [ 250.416486] x19: 0000000000000002 x18: 0000000000000000 [ 250.416489] x17: 0000007f8fc088ec x16: 0000007f8ff9d2e0 [ 250.416491] x15: 0000000000000020 x14: 0000000000000000 [ 250.416494] x13: 0000000000000000 x12: 0000000000000000 [ 250.416496] x11: 0000007fecdceff0 x10: 0000000000000010 [ 250.416499] x9 : 00000000000000a0 x8 : 0000000000000007 [ 250.416501] x7 : 0000000000000033 x6 : 0000000016294c80 [ 250.416504] x5 : 0000000000000001 x4 : 0000007f8fc212a0 [ 250.416506] x3 : 0000000016294880 x2 : 0000000000000001 [ 250.416509] x1 : 00000000000003fa x0 : 0000000000162a4c ** Patch added: "0001-stress-ng-support-sequential-range.patch" https://bugs.launchpad.net/bugs/1469214/+attachment/4425151/+files/0001-stress-ng-support-sequential-range.patch -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1469214 Title: HP ProLiant m400 Server crashes with unhandled level 3 translation fault Status in linux package in Ubuntu: Triaged Bug description: Running stress-ng on a HP ProLiant m400 server can cause unhandled level 3 translations faults: use stress-ng from git://kernel.ubuntu.com/cking/stress-ng ./stress-ng --seq 0 -t 60 -v and after some time this trips the following: Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566] Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT) Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064 Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp