Public bug reported: SRU Justification -----------------
[Impact] A kernel BUG is sometimes observed when using fscache: [4740718.880898] FS-Cache: [4740718.880920] FS-Cache: Assertion failed [4740718.880934] FS-Cache: 0 > 0 is false [4740718.881001] ------------[ cut here ]------------ [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449! [4740718.881040] invalid opcode: 0000 [#1] SMP [4740718.892659] Call Trace: [4740718.893506] [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles] [4740718.894374] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache] [4740718.895180] [<ffffffff81096da0>] process_one_work+0x150/0x3f0 [4740718.895966] [<ffffffff8109751a>] worker_thread+0x11a/0x470 [4740718.896753] [<ffffffff81808e59>] ? __schedule+0x359/0x980 [4740718.897783] [<ffffffff81097400>] ? rescuer_thread+0x310/0x310 [4740718.898581] [<ffffffff8109cdd6>] kthread+0xd6/0xf0 [4740718.899469] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 [4740718.900477] [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70 [4740718.901514] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 [Problem] In include/fscache-cache.h, fscache_retrieval_complete reads, in part: atomic_sub(n_pages, &op->n_pages); if (atomic_read(&op->n_pages) <= 0) fscache_op_complete(&op->op, true); The code is using atomic_sub followed by an atomic_read. This causes two threads doing a decrement of pages to race with each other seeing the op->refcount <= 0 at same time, and end up calling fscache_op_complete in both the threads leading to the OOPS. [Fix] The fix is trivial to use atomic_sub_return instead of two calls. [Testcase] The user has tested the patch successfully on their fscache/cachefiles setup. [Regression Potential] Limited to fscache. Small, comprehensible change. ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete ** Description changed: SRU Justification ----------------- [Impact] A kernel BUG is sometimes observed when using fscache: + [4740718.880898] FS-Cache: + [4740718.880920] FS-Cache: Assertion failed + [4740718.880934] FS-Cache: 0 > 0 is false + [4740718.881001] ------------[ cut here ]------------ + [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449! + [4740718.881040] invalid opcode: 0000 [#1] SMP + + [4740718.892659] Call Trace: + [4740718.893506] [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles] + [4740718.894374] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache] + [4740718.895180] [<ffffffff81096da0>] process_one_work+0x150/0x3f0 + [4740718.895966] [<ffffffff8109751a>] worker_thread+0x11a/0x470 + [4740718.896753] [<ffffffff81808e59>] ? __schedule+0x359/0x980 + [4740718.897783] [<ffffffff81097400>] ? rescuer_thread+0x310/0x310 + [4740718.898581] [<ffffffff8109cdd6>] kthread+0xd6/0xf0 + [4740718.899469] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 + [4740718.900477] [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70 + [4740718.901514] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 - Jun 25 11:32:08 kernel: [4740718.880898] FS-Cache: - Jun 25 11:32:08 kernel: [4740718.880920] FS-Cache: Assertion failed - Jun 25 11:32:08 kernel: [4740718.880934] FS-Cache: 0 > 0 is false - Jun 25 11:32:08 kernel: [4740718.881001] ------------[ cut here ]------------ - Jun 25 11:32:08 kernel: [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449! - Jun 25 11:32:08 kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP - ... - Jun 25 11:32:08 kernel: [4740718.892659] Call Trace: - Jun 25 11:32:08 kernel: [4740718.893506] [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles] - Jun 25 11:32:08 kernel: [4740718.894374] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache] - Jun 25 11:32:08 kernel: [4740718.895180] [<ffffffff81096da0>] process_one_work+0x150/0x3f0 - Jun 25 11:32:08 kernel: [4740718.895966] [<ffffffff8109751a>] worker_thread+0x11a/0x470 - Jun 25 11:32:08 kernel: [4740718.896753] [<ffffffff81808e59>] ? __schedule+0x359/0x980 - Jun 25 11:32:08 kernel: [4740718.897783] [<ffffffff81097400>] ? rescuer_thread+0x310/0x310 - Jun 25 11:32:08 kernel: [4740718.898581] [<ffffffff8109cdd6>] kthread+0xd6/0xf0 - Jun 25 11:32:08 kernel: [4740718.899469] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 - Jun 25 11:32:08 kernel: [4740718.900477] [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70 - Jun 25 11:32:08 kernel: [4740718.901514] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 - [Problem] In include/fscache-cache.h, fscache_retrieval_complete reads, in part: - atomic_sub(n_pages, &op->n_pages); - if (atomic_read(&op->n_pages) <= 0) - fscache_op_complete(&op->op, true); - - The code is using atomic_sub followed by an atomic_read. This causes two threads doing a decrement of pages to race with each other seeing the op->refcount <= 0 at same time, - and end up calling fscache_op_complete in both the threads leading to the OOPS. - + atomic_sub(n_pages, &op->n_pages); + if (atomic_read(&op->n_pages) <= 0) + fscache_op_complete(&op->op, true); + + The code is using atomic_sub followed by an atomic_read. This causes two + threads doing a decrement of pages to race with each other seeing the + op->refcount <= 0 at same time, and end up calling fscache_op_complete + in both the threads leading to the OOPS. + [Fix] The fix is trivial to use atomic_sub_return instead of two calls. [Testcase] The user has tested the patch successfully on their fscache/cachefiles setup. [Regression Potential] Limited to fscache. Small, comprehensible change. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1797314 Title: fscache: bad refcounting in fscache_op_complete leads to OOPS Status in linux package in Ubuntu: Incomplete Bug description: SRU Justification ----------------- [Impact] A kernel BUG is sometimes observed when using fscache: [4740718.880898] FS-Cache: [4740718.880920] FS-Cache: Assertion failed [4740718.880934] FS-Cache: 0 > 0 is false [4740718.881001] ------------[ cut here ]------------ [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449! [4740718.881040] invalid opcode: 0000 [#1] SMP [4740718.892659] Call Trace: [4740718.893506] [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles] [4740718.894374] [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache] [4740718.895180] [<ffffffff81096da0>] process_one_work+0x150/0x3f0 [4740718.895966] [<ffffffff8109751a>] worker_thread+0x11a/0x470 [4740718.896753] [<ffffffff81808e59>] ? __schedule+0x359/0x980 [4740718.897783] [<ffffffff81097400>] ? rescuer_thread+0x310/0x310 [4740718.898581] [<ffffffff8109cdd6>] kthread+0xd6/0xf0 [4740718.899469] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 [4740718.900477] [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70 [4740718.901514] [<ffffffff8109cd00>] ? kthread_park+0x60/0x60 [Problem] In include/fscache-cache.h, fscache_retrieval_complete reads, in part: atomic_sub(n_pages, &op->n_pages); if (atomic_read(&op->n_pages) <= 0) fscache_op_complete(&op->op, true); The code is using atomic_sub followed by an atomic_read. This causes two threads doing a decrement of pages to race with each other seeing the op->refcount <= 0 at same time, and end up calling fscache_op_complete in both the threads leading to the OOPS. [Fix] The fix is trivial to use atomic_sub_return instead of two calls. [Testcase] The user has tested the patch successfully on their fscache/cachefiles setup. [Regression Potential] Limited to fscache. Small, comprehensible change. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1797314/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp