On 02/06/19(Sun) 15:50, Visa Hankala wrote: > On Sun, Jun 02, 2019 at 10:29:01AM -0300, Martin Pieuchot wrote: > > On 02/06/19(Sun) 04:30, Visa Hankala wrote: > > > On Sat, Jun 01, 2019 at 07:04:23PM -0300, Martin Pieuchot wrote: > > > > On 01/06/19(Sat) 23:22, Mark Kettenis wrote: > > > > > > Date: Sat, 1 Jun 2019 17:32:52 -0300 > > > > > > From: Martin Pieuchot <m...@openbsd.org> > > > > > > > > > > > > Currently it isn't safe to call mtx_enter_try(9) if you're already > > > > > > holding the mutex. That means it isn't safe to call that function > > > > > > in hardclock(9), like with `windup_mtx'. That's why the mutex needs > > > > > > to be initialized as IPL_CLOCK. > > > > > > > > > > > > I'm working on removing the SCHED_LOCK() from inside hardclock(9). > > > > > > That leads me to wonder if I should initialize all mutexes to > > > > > > IPL_SCHED, > > > > > > possibly blocking clock interrupts, or if we should change the > > > > > > mutex API > > > > > > to allow mtx_enter_try(9) to deal with recursion. > > > > > > > > > > > > The diff below removes the recursion check for mtx_enter_try(9). > > > > > > > > > > > > Comments? Oks? > > > > > > > > > > My initial reaction is that if you're trying to lock when you already > > > > > have the lock, there is something wrong with your locking strategy and > > > > > that this is something we don't want. > > > > > > > > Could you elaborate? Are you saying that preventing hardclock(9) to run > > > > is the way to move forward to unlock its internals? Why isn't that > > > > strategy wrong? > > > > > > > > In the `windup_mtx' case, does it matter if the mutex is taken by > > > > another CPU or by myself? What's the problem when CPU0 is one holding > > > > the lock? > > > > > > mutex(9) is not and should not become recursive. Recursive locking > > > works when it is voluntary. If recursion was allowed with interrupts, > > > the CPU could re-enter the critical section at any moment, possibly > > > seeing inconsistent state or breaking assumptions made by the original > > > entry. > > > > I don't understand how your answer is related to my diff :( > > > > Why allowing mtx_enter_try(9) to fail and not to panic(9) when the CPU > > executing the code already owns the mutex is wrong? > > > > Is that what you call recursion? Not entering the critical section in > > the interrupt instead of blocking the interrupt? > > > > How is that different from not entering the critical section because > > another CPU is holding the mutex? > > To me, it looks very suspicious if a CPU tries to lock a mutex that it > already holds because it indicates that there is a potential recursion. > My stance is that such a locking behaviour is a design error in general > and the current way of checking recursion in mtx_enter_try() is sound.
But why? Nobody is answering my question :o) You're saying recursion is bad because there's recursion. How is that different from using mtx_enter_try(9) to protect from another CPU being in the critical section? Why do you call that a design error? I don't mind putting all mutexes at IPL_SCHED. I would like to understand why it matters *who* is holding the mutex. Since mtx_enter_try(9) is here to not enter the critical section for $reasons, setting an ipl of IPL_CLOCK sounds just like a special corner case. You might understand why I'm asking. We are trying to get more stuff done in parallel. That implies separating parts of the kernel logically. Blocking IPL_CLOCK is a big hammer, it block a lot of stuff. So yes it is easy to use, but my feeling is that this simplicity comes at a cost, latency wise. Anyway I'll drop this diff. I'll try to thing about your point :o)