Re: mtx_enter_try(9) & recursion

Mark Kettenis Sun, 02 Jun 2019 12:22:30 -0700

> Date: Sun, 2 Jun 2019 15:39:59 -0300
> From: Martin Pieuchot <m...@openbsd.org>
> Cc: tech@openbsd.org
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline
> 
> On 02/06/19(Sun) 15:50, Visa Hankala wrote:
> > On Sun, Jun 02, 2019 at 10:29:01AM -0300, Martin Pieuchot wrote:
> > > On 02/06/19(Sun) 04:30, Visa Hankala wrote:
> > > > On Sat, Jun 01, 2019 at 07:04:23PM -0300, Martin Pieuchot wrote:
> > > > > On 01/06/19(Sat) 23:22, Mark Kettenis wrote:
> > > > > > > Date: Sat, 1 Jun 2019 17:32:52 -0300
> > > > > > > From: Martin Pieuchot <m...@openbsd.org>
> > > > > > > 
> > > > > > > Currently it isn't safe to call mtx_enter_try(9) if you're already
> > > > > > > holding the mutex.  That means it isn't safe to call that function
> > > > > > > in hardclock(9), like with `windup_mtx'.  That's why the mutex 
> > > > > > > needs
> > > > > > > to be initialized as IPL_CLOCK.
> > > > > > > 
> > > > > > > I'm working on removing the SCHED_LOCK() from inside hardclock(9).
> > > > > > > That leads me to wonder if I should initialize all mutexes to 
> > > > > > > IPL_SCHED,
> > > > > > > possibly blocking clock interrupts, or if we should change the 
> > > > > > > mutex API
> > > > > > > to allow mtx_enter_try(9) to deal with recursion.
> > > > > > > 
> > > > > > > The diff below removes the recursion check for mtx_enter_try(9).
> > > > > > > 
> > > > > > > Comments?  Oks?
> > > > > > 
> > > > > > My initial reaction is that if you're trying to lock when you 
> > > > > > already
> > > > > > have the lock, there is something wrong with your locking strategy 
> > > > > > and
> > > > > > that this is something we don't want.
> > > > > 
> > > > > Could you elaborate?  Are you saying that preventing hardclock(9) to 
> > > > > run
> > > > > is the way to move forward to unlock its internals?  Why isn't that
> > > > > strategy wrong?
> > > > > 
> > > > > In the `windup_mtx' case, does it matter if the mutex is taken by
> > > > > another CPU or by myself?  What's the problem when CPU0 is one holding
> > > > > the lock?
> > > > 
> > > > mutex(9) is not and should not become recursive. Recursive locking
> > > > works when it is voluntary. If recursion was allowed with interrupts,
> > > > the CPU could re-enter the critical section at any moment, possibly
> > > > seeing inconsistent state or breaking assumptions made by the original
> > > > entry.
> > > 
> > > I don't understand how your answer is related to my diff :(
> > > 
> > > Why allowing mtx_enter_try(9) to fail and not to panic(9) when the CPU
> > > executing the code already owns the mutex is wrong?
> > > 
> > > Is that what you call recursion?  Not entering the critical section in
> > > the interrupt instead of blocking the interrupt?
> > > 
> > > How is that different from not entering the critical section because
> > > another CPU is holding the mutex?
> > 
> > To me, it looks very suspicious if a CPU tries to lock a mutex that it
> > already holds because it indicates that there is a potential recursion.
> > My stance is that such a locking behaviour is a design error in general
> > and the current way of checking recursion in mtx_enter_try() is sound.
> 
> But why?  Nobody is answering my question :o)  You're saying recursion
> is bad because there's recursion.


I think recursion makes it harder to reason about lock ordering...

> How is that different from using mtx_enter_try(9) to protect from
> another CPU being in the critical section?  Why do you call that a
> design error?

To me, the use of mtx_enter_try(9) is suspicious to start with.  It
either means there is some magic going on to avoid a lock ordering
problem.  Or the code is opportunistically trying to do some work that
doesn't necessarily need to be done at this point.  Ideally, I'd like
to see every mtx_enter_try(9) call annotated with an explanation why
it isn't a simple mtx_enter(9).

> I don't mind putting all mutexes at IPL_SCHED.  I would like to
> understand why it matters *who* is holding the mutex.  Since
> mtx_enter_try(9) is here to not enter the critical section for $reasons,
> setting an ipl of IPL_CLOCK sounds just like a special corner case.

I wonder what $reasons is in the case of tc_ticktock().  I think it is
opportunistically trying to avoid spinning on the lock, which is fine
since presumably some other thread is already busy doing the work.
But I'd argue that the code would still be correct if tc_ticktock()
would simply call mtx_enter().  At which point the IPL_CLOCK would be
mandatory.

> You might understand why I'm asking.  We are trying to get more stuff
> done in parallel.  That implies separating parts of the kernel logically.
> Blocking IPL_CLOCK is a big hammer, it block a lot of stuff.  So yes it
> is easy to use, but my feeling is that this simplicity comes at a cost,
> latency wise.

I'm not so sure this really matters.  In principle you're just
delaying the work while you're busy doing some other work.  So you're
not really doing less work in parallel.  You're right about latency
though.  But we shouldn't be holding an IPL_CLOCK/IPL_SCHED mutex for
long.  Paradoxically by blocking interrupts you making sure you're
holding the mutex for the shortest time possibly since the work won't
be interrupted!

Re: mtx_enter_try(9) & recursion

Reply via email to