On 11/17, Bernd Edlinger wrote:
>
> On 11/11/25 10:21, Christian Brauner wrote:
> > On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote:
>
> >> But this is minor. Why do we need "bool unsafe_execve_in_progress" ?
> >> If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex
> >> unconditionally.
> >>
>
> I would not like to drop the mutex when no absolutely necessary for 
> performance reasons.

OK, I won't insist... But I don't really understand how this can help to
improve the performance. If nothing else, this adds another for_other_threads()
loop.

And again, the unsafe_execve_in_progress == T case is unlikely. I'm afraid this
case (de_thread() without cred_guard_mutex) won't have enough testing.

In any case, why you dislike the suggestion to add this 
unsafe_execve_in_progress
logic in a separate patch?

> >>> + if (unlikely(unsafe_execve_in_progress)) {
> >>> +         spin_unlock_irq(lock);
> >>> +         sig->exec_bprm = bprm;
> >>> +         mutex_unlock(&sig->cred_guard_mutex);
> >>> +         spin_lock_irq(lock);
> >>
> >> I don't think spin_unlock_irq() + spin_lock_irq() makes any sense...
> >>
>
> Since the spin lock was acquired while holding the mutex, both should be
> unlocked in reverse sequence and the spin lock re-acquired after releasing
> the mutex.

Why?

> I'd expect the scheduler to do a task switch after the cred_guard_mutex is
> unlocked, at least in the RT-linux variant, while the spin lock is not yet
> unlocked.

I must have missed something, but I still don't understand why this would
be wrong...

> >>> @@ -1114,13 +1139,31 @@ int begin_new_exec(struct linux_binprm * bprm)
> >>>    */
> >>>   trace_sched_prepare_exec(current, bprm);
> >>>
> >>> + /* If the binary is not readable then enforce mm->dumpable=0 */
> >>> + would_dump(bprm, bprm->file);
> >>> + if (bprm->have_execfd)
> >>> +         would_dump(bprm, bprm->executable);
> >>> +
> >>> + /*
> >>> +  * Figure out dumpability. Note that this checking only of current
> >>> +  * is wrong, but userspace depends on it. This should be testing
> >>> +  * bprm->secureexec instead.
> >>> +  */
> >>> + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> >>> +     is_dumpability_changed(current_cred(), bprm->cred) ||
> >>> +     !(uid_eq(current_euid(), current_uid()) &&
> >>> +       gid_eq(current_egid(), current_gid())))
> >>> +         set_dumpable(bprm->mm, suid_dumpable);
> >>> + else
> >>> +         set_dumpable(bprm->mm, SUID_DUMP_USER);
> >>> +
> >>
> >> OK, we need to do this before de_thread() drops cred_guard_mutex.
> >> But imo this too should be done in a separate patch, the changelog should
> >> explain this change.
> >>
>
> The dumpability need to be determined before de_thread, because 
> ptrace_may_access
> needs this information to determine if the tracer is allowed to ptrace. That 
> is
> part of the core of the patch, it would not work without that.

Yes,

> I will add more comments to make that more easy to understand.

But again, why this change can't come in a separate patch? Before the patch 
which
drops cred_guard_mutex in de_thread().

> >>    int lock_current_cgm(void)
> >>    {
> >>            if 
> >> (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
> >>                    return -ERESTARTNOINTR;
> >>
> >>            if (!current->signal->group_exec_task)
> >>                    return 0;
> >>
> >>            WARN_ON(!fatal_signal_pending(current));
> >>            mutex_unlock(&current->signal->cred_guard_mutex);
> >>            return -ERESTARTNOINTR;
> >>    }
> >>
> >> ?
> >>
>
> Some use mutex_lock_interruptible and some use mutex_lock_killable here,
> so it wont work for all of them.  I would not consider this a new kind
> of dead-lock free mutex, but just an open-coded state machine, handling
> the state that the tasks have whild de_thread is running.

OK. and we don't have mutex_lock_state(). I think that all users could
use mutex_lock_killable(), but you are right anyway, and this is minor.

> >> Note that it checks ->group_exec_task, not ->exec_bprm. So this change can
> >> come in a separate patch too, but I won't insist.

Yes. Although this is minor too ;)

> >> This is the most problematic change which I can't review...
> >>
> >> Firstly, it changes task->mm/real_cred for __ptrace_may_access() and this
> >> looks dangerous to me.
> >
> > Yeah, that is not ok. This is effectively override_creds for real_cred
> > and that is not a pattern I want to see us establish at all! Temporary
> > credential overrides for the subjective credentials is already terrible
> > but at least we have the explicit split between real_cred and cred
> > expressely for that. So no, that's not an acceptable solution.
> >
>
> Okay I understand your point.
> I did this originally just to avoid to have to change the interface to all
> the security engines, but instead I could add a flag PTRACE_MODE_BPRMCREDS to
> the ptrace_may_access which must be handled in all security engines, to use
> child->signal->exec_bprm->creds instead of __task_cred(child).

Can't comment... I don't understand your idea, but this is my fault. I guess
this needs more changes, in particular __ptrace_may_access_mm_cred(), but
most probably I misunderstood your idea.

>
> >> Or. check_unsafe_exec() sets LSM_UNSAFE_PTRACE if ptrace. Is it safe to
> >> ptrace the execing task after that? I have no idea what the security hooks
> >> can do...
>
> That means the tracee is already ptraced before the execve, and SUID-bits
> do not work as usual, and are more or less ignored.  But in this patch
> the tracee is not yet ptraced.

Well. I meant that if LSM_UNSAFE_PTRACE is not set, then currently (say)
security_bprm_committing_creds() has all rights to assume that the execing
task is not ptraced. Yes, I don't see any potential problem right now, but
still.

And just in case... Lets look at this code

        +                               rcu_assign_pointer(task->real_cred, 
bprm->cred);
        +                               task->mm = bprm->mm;
        +                               retval = __ptrace_may_access(task, 
PTRACE_MODE_ATTACH_REALCREDS);
        +                               rcu_assign_pointer(task->real_cred, 
old_cred);
        +                               task->mm = old_mm;

again.

This is mostly theoretical, but what if begin_new_exec() fails after de_thread()
and before exec_mmap() and/or commit_creds(bprm->cred) ? In this case the 
execing
thread will report SIGSEGV to debugger which can (say) read old_mm.

No?

I am starting to think that ptrace_attach() should simply fail with -EWOULDBLOCK
if it detects "unsafe_execve_in_progress" ... And perhaps this is what you 
already
tried to do in the past, I can't recall :/

Oleg.


Reply via email to