On 12/11/20 8:30 AM, Jonathan Lemon wrote:
On Wed, Dec 09, 2020 at 11:02:54AM -0800, Yonghong Song wrote:
Maybe you can post v3 of the patch with the above information in the
commit description so people can better understand what the problem
you are trying to solve here?
Also, could you also send to b...@vger.kernel.org?
Sure, I can do that.
If unable to obtain the file structure for the current task,
proceed to the next task number after the one returned from
task_seq_get_next(), instead of the next task number from the
original iterator.
This seems a correct change. The current code should still work
but it may do some redundant/unnecessary work in kernel.
This only happens when a task does not have any file,
no sure whether this is the culprit for the problem this
patch tries to address.
Use thread_group_leader() instead of comparing tgid vs pid, which
might may be racy.
I see
static inline bool thread_group_leader(struct task_struct *p)
{
return p->exit_signal >= 0;
}
I am not sure whether thread_group_leader(task) is equivalent
to task->tgid == task->pid or not. Any documentation or explanation?
Could you explain why task->tgid != task->pid in the original
code could be racy?
My understanding is that anything which uses pid_t for comparision
in the kernel is incorrect. Looking at de_thread(), there is a
section which swaps the pid and tids around, but doesn't seem to
change tgid directly.
There's also this comment in linux/pid.h:
/*
* Both old and new leaders may be attached to
* the same pid in the middle of de_thread().
*/
So the safest thing to do is use the explicit thread_group_leader()
macro rather than trying to open code things.
I did some limited experiments and did not trigger a case where
task->tgid != task->pid not agreeing with !thread_group_leader().
Will need more tests in the environment to reproduce the warning
to confirm whether this is the culprit or not.
Perhaps, but on the other hand, the splats disappear with this
patch, so it's doing something right. If your debug code hasn't
detected any cases where thread_group_leader() isn't making a
difference, then there shouldn't be any objections in making the
replacement, right? It does make the code easier to understand
and matches the rest of the kernel.
Agree. Let me do a little more experiments to double check we
did not miss anything with this particular change and will
report back later.