On Wed, Jun 28, 2017 at 05:04:12PM +0800, Qiao Zhou wrote:
> In current die(), the irq is disabled for __die() handle, not
> including the possible panic() handling. Since the log in __die()
> can take several hundreds ms, new irq might come and interrupt
> current die().
> 
> If the process calling die() holds some critical resource, and some
> other process scheduled later also needs it, then it would deadlock.
> The first panic will not be executed.
> 
> So here disable irq for the whole flow of die().

Could you give an example of this going wrong, please?

> 
> Signed-off-by: Qiao Zhou <[email protected]>
> ---
>  arch/arm64/kernel/traps.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 0805b44..b12bf0f 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -274,10 +274,13 @@ static DEFINE_RAW_SPINLOCK(die_lock);
>  void die(const char *str, struct pt_regs *regs, int err)
>  {
>       int ret;
> +     unsigned long flags;
> +
> +     local_irq_save(flags);
>  
>       oops_enter();
>  
> -     raw_spin_lock_irq(&die_lock);
> +     raw_spin_lock(&die_lock);

Can we instead move the taking of the die_lock before oops_enter, or does
that break something else?

>       console_verbose();
>       bust_spinlocks(1);
>       ret = __die(str, err, regs);
> @@ -287,13 +290,16 @@ void die(const char *str, struct pt_regs *regs, int err)
>  
>       bust_spinlocks(0);
>       add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
> -     raw_spin_unlock_irq(&die_lock);
> +     raw_spin_unlock(&die_lock);
>       oops_exit();
>  
>       if (in_interrupt())
>               panic("Fatal exception in interrupt");
>       if (panic_on_oops)
>               panic("Fatal exception");
> +
> +     local_irq_restore(flags);

We could also move the unlock_irq down here.

Will

Reply via email to