I have gone ahead and backported the fixes for xenial's 4.4 kernel. This patch series is for ubuntu-xenial 4.4: https://paste.ubuntu.com/p/Kj43J6H3Hm/
This patch series is for vanilla upstream 4.4: https://paste.ubuntu.com/p/MzdjcHCqbz/ Both patch and compile and fix the problem. ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Xenial) Status: New => In Progress ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Tags added: sts ** Description changed: - The "fanotify07" and "fanotify08" from the LTP syscall tests has failed - on a testing node with Trusty kernel installed. + BugLink: https://bugs.launchpad.net/bugs/1775165 + + [Impact] + + When userspace tasks which are processing fanotify permission events act + incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes + the whole notification subsystem to hang. + + This has been seen in production, and it can also be seen when running the + Linux Test Project testsuite, specifically fanotify07. + + [Fix] + + Instead of holding the SRCU lock while waiting for userspace to respond, + which may never happen, or not in the order we are expecting, we drop the + fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then + reacquire the lock again when userspace responds. + + The fixes are from a series of upstream commits: + + 05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick) + 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport) + abc77577a669f424c5d0c185b9994f2621c52aa4 (backport) + + The following are upstream commits necessary for the fixes to function: + + 35e481761cdc688dbee0ef552a13f49af8eba6cc (backport) + 0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick) + + [Testcase] + + You can reproduce the problem pretty quickly with the Linux Test + Project: Steps (with root): - 1. sudo apt-get install git xfsprogs -y - 2. git clone --depth=1 https://github.com/linux-test-project/ltp.git - 3. cd ltp - 4. make autotools - 5. ./configure - 6. make; make install - 7. cd /opt/ltp - 8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs - 9. ./runltp -f /tmp/jobs + 1. sudo apt-get install git xfsprogs -y + 2. git clone --depth=1 https://github.com/linux-test-project/ltp.git + 3. cd ltp + 4. make autotools + 5. ./configure + 6. make; make install + 7. cd /opt/ltp + 8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs + 9. ./runltp -f /tmp/jobs + + On a stock Xenial kernel, the system will hang, and the testcase will look like: <<<test_start>>> - tag=fanotify07 stime=1528197132 - cmdline="fanotify07" + tag=fanotify07 stime=1554326200 + cmdline="fanotify07 " contacts="" analysis=exit <<<test_output>>> - incrementing stop - tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s + tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Cannot kill test processes! Congratulation, likely test hit a kernel bug. Exitting uncleanly... <<<execution_status>>> initiation_status="ok" duration=350 termination_type=exited termination_id=1 corefile=no cutime=0 cstime=0 <<<test_end>>> - INFO: ltp-pan reported some tests FAIL - LTP Version: 20180515 - [ 841.063676] INFO: task fanotify07:3660 blocked for more than 120 seconds. - [ 841.063692] Not tainted 3.13.0-149-generic #199-Ubuntu - [ 841.063705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. - [ 841.063723] fanotify07 D ffff8804584742f0 0 3660 3652 0x00000000 - [ 841.063724] ffff880459e9bd00 0000000000000086 ffff88045556b000 0000000000013b00 - [ 841.063726] ffff880459e9bfd8 0000000000013b00 ffff88045556b000 ffff880459b4e690 - [ 841.063728] 0000000000000000 ffff880458474200 ffff8804584742f0 0000000000020000 - [ 841.063730] Call Trace: - [ 841.063731] [<ffffffff8173bae9>] schedule+0x29/0x70 - [ 841.063733] [<ffffffff8120bfb0>] fanotify_handle_event+0x110/0x1d0 - [ 841.063735] [<ffffffff810b0c30>] ? prepare_to_wait_event+0x100/0x100 - [ 841.063737] [<ffffffff81207c36>] send_to_group+0x166/0x240 - [ 841.063738] [<ffffffff811e0731>] ? touch_atime+0x71/0x140 - [ 841.063740] [<ffffffff81207ff5>] fsnotify+0x2e5/0x320 - [ 841.063742] [<ffffffff812e1d84>] security_file_permission+0x94/0xb0 - [ 841.063743] [<ffffffff811c5192>] rw_verify_area+0x52/0xd0 - [ 841.063745] [<ffffffff811c527a>] vfs_read+0x6a/0x160 - [ 841.063746] [<ffffffff811c5db9>] SyS_read+0x49/0xa0 - [ 841.063748] [<ffffffff81748830>] system_call_fastpath+0x1a/0x1f - [ 1304.848642] ltp-pan[3809]: segfault at 0 ip 00007f07c8aafdfa sp 00007ffc1da92078 error 4 in libc-2.19.so[7f07c8a27000+1be000] + Looking at dmesg, we see the following call stack - dmesg: http://paste.ubuntu.com/p/FRZnV5smGh/ + [ 790.772792] LTP: starting fanotify07 (fanotify07 ) + [ 960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds. + [ 960.140867] Not tainted 4.4.0-142-generic #168-Ubuntu + [ 960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. + [ 960.141498] fsnotify_mark D ffff8800b6703c98 0 36 2 0x00000000 + [ 960.141516] ffff8800b6703c98 ffff88013a558a00 ffff8800b7797000 ffff8800b66f8000 + [ 960.141524] ffff8800b6704000 7fffffffffffffff ffff8800b6703de0 ffff8800b66f8000 + [ 960.141528] 0000000000000000 ffff8800b6703cb0 ffffffff8185cb45 ffff8800b6703de8 + [ 960.141532] Call Trace: + [ 960.141580] [<ffffffff8185cb45>] schedule+0x35/0x80 + [ 960.141588] [<ffffffff818600f4>] schedule_timeout+0x1b4/0x270 + [ 960.141617] [<ffffffff810f57ac>] ? mod_timer+0x10c/0x240 + [ 960.141621] [<ffffffff8185c60d>] ? __schedule+0x30d/0x810 + [ 960.141625] [<ffffffff8185d652>] wait_for_completion+0xb2/0x190 + [ 960.141636] [<ffffffff810b1f10>] ? wake_up_q+0x70/0x70 + [ 960.141641] [<ffffffff810eb140>] __synchronize_srcu+0x100/0x1a0 + [ 960.141645] [<ffffffff810ea400>] ? trace_raw_output_rcu_utilization+0x60/0x60 + [ 960.141664] [<ffffffff81260870>] ? fsnotify_put_mark+0x40/0x40 + [ 960.141669] [<ffffffff810eb204>] synchronize_srcu+0x24/0x30 + [ 960.141672] [<ffffffff812608f4>] fsnotify_mark_destroy+0x84/0x130 + [ 960.141680] [<ffffffff810ca000>] ? wake_atomic_t_function+0x60/0x60 + [ 960.141691] [<ffffffff810a6227>] kthread+0xe7/0x100 + [ 960.141694] [<ffffffff8185c601>] ? __schedule+0x301/0x810 + [ 960.141699] [<ffffffff810a6140>] ? kthread_create_on_node+0x1e0/0x1e0 + [ 960.141703] [<ffffffff818618e5>] ret_from_fork+0x55/0x80 + [ 960.141706] [<ffffffff810a6140>] ? kthread_create_on_node+0x1e0/0x1e0 - ProblemType: Bug - DistroRelease: Ubuntu 14.04 - Package: linux-image-3.13.0-149-generic 3.13.0-149.199 - ProcVersionSignature: User Name 3.13.0-149.199-generic 3.13.11-ckt39 - Uname: Linux 3.13.0-149-generic x86_64 - AlsaDevices: - total 0 - crw-rw---- 1 root audio 116, 1 Jun 5 11:01 seq - crw-rw---- 1 root audio 116, 33 Jun 5 11:01 timer - AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' - ApportVersion: 2.14.1-0ubuntu3.27 - Architecture: amd64 - ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' - AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: - CurrentDmesg: [ 3.553366] init: plymouth-upstart-bridge main process ended, respawning - Date: Tue Jun 5 11:06:45 2018 - IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' - MachineType: Intel Corporation S1200RP - PciMultimedia: + The vanilla 4.4 kernel also shows the same call stack. - ProcEnviron: - TERM=xterm-256color - PATH=(custom, no user) - XDG_RUNTIME_DIR=<set> - LANG=en_US.UTF-8 - SHELL=/bin/bash - ProcFB: 0 inteldrmfb - ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-149-generic root=UUID=b0d2ae4e-12dd-423e-acea-272ee8b2a893 ro - RelatedPackageVersions: - linux-restricted-modules-3.13.0-149-generic N/A - linux-backports-modules-3.13.0-149-generic N/A - linux-firmware 1.127.24 - RfKill: Error: [Errno 2] No such file or directory: 'rfkill' - SourcePackage: linux - UpgradeStatus: No upgrade log present (probably fresh install) - dmi.bios.date: 07/01/2015 - dmi.bios.vendor: Intel Corp. - dmi.bios.version: S1200RP.86B.03.02.0003.070120151022 - dmi.board.asset.tag: .................... - dmi.board.name: S1200RP - dmi.board.vendor: Intel Corporation - dmi.board.version: G62254-407 - dmi.chassis.asset.tag: .................... - dmi.chassis.type: 17 - dmi.chassis.vendor: .............................. - dmi.chassis.version: .................. - dmi.modalias: dmi:bvnIntelCorp.:bvrS1200RP.86B.03.02.0003.070120151022:bd07/01/2015:svnIntelCorporation:pnS1200RP:pvr....................:rvnIntelCorporation:rnS1200RP:rvrG62254-407:cvn..............................:ct17:cvr..................: - dmi.product.name: S1200RP - dmi.product.version: .................... - dmi.sys.vendor: Intel Corporation + On a patched kernel, the test will pass successfully, and there will be no + messages in dmesg. + + [Regression Potential] + + This makes modifications to how locking is performed in fsnotify / fanotify and + there may be some cause for regression. Running all fanotify Linux Test Project + tests shows that there are no extra failures caused by the patches, and instead + fewer failures are seen due to the bugfix. + + Running the entire Linux Test Project testsuite actually works and runs to + completion, somewhich doesn't happen in a unpatched kernel since it will hang + on the fanotify07 test. + + The patches are taken from upstream, and all necessary commits have been taken + into account, so I am happy with the potential risks and that testing has been + completed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1775165 Title: fanotify07/fanotify08 in LTP syscall test generates kernel trace with T/X/X-AWS kernel To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1775165/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs