We found cases where a few of the *_cpu_exec_interrupt per arch functions call CPU class's cc->do_interrupt function pointer (for example arm_cpu_exec_interrupt does this).
This is an issue because *_cpu_exec_interrupt will hold the BQL across the call to cc->do_interrupt, and *_do_interrupt will also hold the BQL. Most of the arches do not have this issue because they call the *_do_interrupt function for that arch directly, and in those cases we will change to call the function *_do_interrupt_locked. We see a few possible solutions to this: 1) We could go back to the option of conditionalizing the BQL inside these few *_do_interrupt functions, only acquiring the BQL if it is not already held. I counted 3 different arches that directly use the ->do_interrupt function from their *_cpu_exec_interrupt. 2) Another perhaps cleaner option is to add a new cpu class function ->do_interrupt_locked. This lets callers like *_cpu_exec_interrupt call to ->do_interrupt_locked with lock held and solves the issue without resorting to conditional locking. Another benefit we could gain from this approach is to simplify our solution overall by adding a common do_interrupt function. void cpu_common_do_interrupt(CPUState *cs) { CPUClass *cc = CPU_GET_CLASS(cpu); qemu_mutex_lock_iothread(); cc->do_interrupt_locked(cpu); qemu_mutex_unlock_iothread(); } cc->do_interrupt would be set to cpu_common_do_interrupt by default in cpu_class_init. In other words, the base cpu class would handle holding the BQL for us, and we would not need to implement a new *_do_interrupt function for each arch. We are thinking that 2) would be a good option. What are the opinions on these possible solutions? Or are there other solutions that we should consider here? Thanks & Regards, -Rob On Thu, 6 Aug 2020 at 16:04, Robert Foley <robert.fo...@linaro.org> wrote: > > The comment around documenting the cpu_mutex fields and critical sections > got us thinking and revisiting our locking assumptions in > cpu_handle_interrupt. > > Initially we were thinking that removing the BQL from cpu_handle_interrupt > meant that we needed to replace it with the cpu mutex to protect the per cpu > data that is accessed like interrupt_request. We are reconsidering this and > now thinking that the cpu mutex might not be needed here. > > BQL is clearly needed to protect the critical section around the call to > ->cpu_exec_interrupt. What else is the BQL protecting in cpu_handle_interrupt > that we need to consider? Are we missing anything here? > > It's also worth mentioning that we did give this approach a try. > We tried out changes to cpu_handle_interrupt(), > dropping the BQL from all but around ->cpu_exec_interrupt, and not using the > cpu_mutex at all. It seemed to be functional, passing all the tests that > we tried (so far). :) > > Thanks, > -Rob > > On Thu, 6 Aug 2020 at 12:11, Robert Foley <robert.fo...@linaro.org> wrote: > > > > On Thu, 6 Aug 2020 at 05:22, Paolo Bonzini <pbonz...@redhat.com> wrote: > > > > > > On 05/08/20 21:18, Richard Henderson wrote: > > > > On 8/5/20 11:12 AM, Robert Foley wrote: > > > >> This change removes the implied BQL from the cpu_handle_interrupt, > > > >> and cpu_handle_exception paths. This BQL acquire is being pushed > > > >> down into the per arch implementation. > > > >> > > > >> Signed-off-by: Robert Foley <robert.fo...@linaro.org> > > > >> --- > > > >> accel/tcg/cpu-exec.c | 19 +++++++++++-------- > > > >> 1 file changed, 11 insertions(+), 8 deletions(-) > > > >> > > > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > > >> index 80d0e649b2..8e2bfd97a1 100644 > > > >> --- a/accel/tcg/cpu-exec.c > > > >> +++ b/accel/tcg/cpu-exec.c > > > >> @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState > > > >> *cpu, int *ret) > > > >> #else > > > >> if (replay_exception()) { > > > >> CPUClass *cc = CPU_GET_CLASS(cpu); > > > >> - qemu_mutex_lock_iothread(); > > > >> cc->do_interrupt(cpu); > > > >> - qemu_mutex_unlock_iothread(); > > > >> cpu->exception_index = -1; > > > >> > > > > > > > > This patch is not bisectable. The removal of the lock here needs to > > > > happen at > > > > the end, or something. > > > > > > Indeed the series should be structured like this: > > > > > > 1) rename all *_do_interrupt functions to *_do_interrupt_locked > > > > > > 2) add back *_do_interrupt that takes the BQL and calls > > > *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from > > > cpu-exec.c > > > > > > 3) modify the cpu_mutex and BQL critical sections around > > > ->cpu_exec_interrupt, so that the BQL critical section covers just the > > > call to ->cpu_exec_interrupt. Document which fields are now covered by > > > cpu_mutex. > > > > > > 4/5) same as 1/2 for ->cpu_exec_interrupt > > > > > > Patches 1/2 would be pretty large, but they're trivial to review just by > > > grepping for "->do_interrupt\s*=", and likewise for 4/5. > > > > > > > Thanks for the details ! > > > > It seems like we will have 3 separate patches for this series, 1/2, 3, and > > 4/5. > > > > We will go in this direction. > > > > Thanks, > > -Rob > > > > > Thanks, > > > > > > Paolo > > >