This appears to occur because of two separate issues:

1. When coming out of resume the Time Stamp Counter (TSC) [1] seems to
have it's top 32 bits set to 0xffffffff making a very large 64 bit TSC
value.

2. This TSC is divided down to nanosecond resolution (this is accurate
and the calculation is correct) and then divided down again by shift of
30 to get it down to approximately 1 second resolution in
get_timestamp() and this function returns the time in seconds as a 32
bit value. This 32 bit 1 second value is then used by the softlockup
handler to check for CPU  task lockups.

However, when the TSC is 0xffffffff00000000 or more, get_timestamp()
returns 0xfffffff0 to 0xffffffff which overflows the 32 bit additions in
the following statements in function softlockup_tick() (see
kernel/softlockup.c):

        if (now > touch_ts + softlockup_thresh/2)
                wake_up_process(per_cpu(softlockup_watchdog, this_cpu));

        /* Warn about unreasonable delays: */
        if (now <= (touch_ts + softlockup_thresh))
                return;
 
These overflows can be resolved by casting the unsigned longs to unsigned long 
longs to avoid the overflow. This then works around the stupidly large TSC 
value post-resume.  I will discuss a patch I wrote with Ingo Molnar and the TSC 
issue with Intel as both need to be fixed.

I believe only 32 bit kernels see this bug. As it is, it will only occur
when the TSC is very large, which only happens if it's broken (as in our
Arrandale resume state) or a machine has been on for thousands of years.

References:

[1] TSC - http://en.wikipedia.org/wiki/Time_Stamp_Counter

-- 
[1266874888.891025] BUG: soft lockup - CPU#3 stuck for 0s! [status:2072]
https://bugs.launchpad.net/bugs/526564
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to