Bug#591362: linux-image-2.6.26-2-xen-686: domU hang and are unresponsive (was #534880)

Ben Hutchings Sun, 08 Aug 2010 17:21:22 -0700

On Wed, 2010-08-04 at 15:05 +0200, Zdenek Salvet wrote:
> On Wed, Aug 04, 2010 at 03:23:52AM +0100, Ben Hutchings wrote:
> > > I found root cause of the problem; after I added following fix to lenny 
> > > xen kernel, none of 56 domU froze again in one week of testing:
> > [...]
> > 
> > That sounds promising.  Is this patch based on a change made by the
> > upstream Xen or Linux developers?
> 
> No, it is my own fix and I have not reported it anywhere else yet.


OK.  It is clearly not applicable to the pvops version of Linux-for-Xen
so it probably doesn't make sense to send upstream.

> The deadlock it fixes is very similar to that fixed by
> bugfix/all/printk-robustify-printk.patch .

So you reckon xtime_lock is lower in the lock hierarchy than run-queue
locks?  I can't see any place where xtime_lock is obtained after a
run-queue lock, but this change nevertheless looks reasonable and safe.

Ian, any comment on this?

> --- source_amd64_xen/arch/x86/kernel/time_32-xen.c      2010-07-24 
> 07:28:32.162719094 +0200
> +++ source_amd64_xen.new/arch/x86/kernel/time_32-xen.c  2010-07-24 
> 07:26:32.416076711 +0200
> @@ -466,6 +466,7 @@
>  {
>         s64 delta, delta_cpu, stolen, blocked;
>         unsigned int i, cpu = smp_processor_id();
> +       int schedule_clock_was_set_work = 0;
>         struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
>         struct vcpu_runstate_info runstate;
>  
> @@ -525,12 +526,13 @@
>  
>         if (shadow_tv_version != HYPERVISOR_shared_info->wc_version) {
>                 update_wallclock();
> -               if (keventd_up())
> -                       schedule_work(&clock_was_set_work);
> +               schedule_clock_was_set_work = 1;
>         }
>  
>         write_sequnlock(&xtime_lock);
>  
> +       if (schedule_clock_was_set_work && keventd_up())
> +               schedule_work(&clock_was_set_work);
>         /*
>          * Account stolen ticks.
>          * HACK: Passing NULL to account_steal_time()

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

signature.asc
Description: This is a digitally signed message part

Bug#591362: linux-image-2.6.26-2-xen-686: domU hang and are unresponsive (was #534880)

Reply via email to