Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it)

Thorsten Leemhuis Sun, 01 Oct 2017 05:47:19 -0700

Hi, the regression tracker here. What's the status of this issue? Was
the problem fixed? It seems nothing happened for more than 10 days -- or
did the discussion move somewhere else? Ciao, Thorsten


On 20.09.2017 02:30, Chuck Ebbert wrote:
> On Tue, 19 Sep 2017 16:51:06 +0100
> Marc Zyngier <[email protected]> wrote:
> 
>> On 19/09/17 16:40, Yanko Kaneti wrote:
>>> On Tue, 2017-09-19 at 16:33 +0100, Marc Zyngier wrote:  
>>>> On 19/09/17 16:12, Yanko Kaneti wrote:  
>>>>> Hello, 
>>>>>
>>>>> Fedora rawhide config here. 
>>>>> AMD FX-8370E
>>>>>
>>>>> Bisected a problem to:
>>>>> 74def747bcd0 (genirq: Restrict effective affinity to interrupts
>>>>> actually using it) 
>>>>>
>>>>> It seems to be causing stalls, short lived or long lived lockups
>>>>> very shortly after boot. Everything becomes jerky.
>>>>>
>>>>> The only visible in the log indication is something like :
>>>>> ....
>>>>> [   59.802129] clocksource: timekeeping watchdog on CPU3: Marking
>>>>> clocksource 'tsc' as unstable because the skew is too large:
>>>>> [   59.802134] clocksource:                       'hpet' wd_now:
>>>>> 3326e7aa wd_last: 329956f8 mask: ffffffff [   59.802137]
>>>>> clocksource:                       'tsc' cs_now: 423662bc6f
>>>>> cs_last: 41dfc91650 mask: ffffffffffffffff [   59.802140] tsc:
>>>>> Marking TSC unstable due to clocksource watchdog [   59.802158]
>>>>> TSC found unstable after boot, most likely due to broken BIOS.
>>>>> Use 'tsc=unstable'. [   59.802161] sched_clock: Marking unstable
>>>>> (59802142067, 15510)<-(59920871789, -118714277) [   60.015604]
>>>>> clocksource: Switched to clocksource hpet [   89.015994] INFO:
>>>>> NMI handler (perf_event_nmi_handler) took too long to run:
>>>>> 209.660 msecs [   89.016003] perf: interrupt took too long
>>>>> (1638003 > 2500), lowering kernel.perf_event_max_sample_rate to
>>>>> 1000 ....
>>>>>
>>>>> Just reverting that commit on top of linus mainline cures all the
>>>>> symptoms  
>>>>
>>>> Interesting. Do you still get HPET interrupts?  
>>>
>>> Sorry, I might need some basic help here (i.e where do I count
>>> them...)  
>>
>> /proc/interrupts should display them.
>>
>>> After the watchdog switches the clocksource to hpet the system is
>>> still somewhat alive, so I'll guess some clock is still
>>> ticking....  
>> Probably, but I suspect they're not hitting the right CPU, hence the
>> lockups.
>>
>> Unfortunately, my x86-foo is pretty minimal, and I'm about to drop off
>> the net for a few days.
>>
>> Thomas, any insight?
> 
> Looking at flat_cpu_mask_to_apicid(), I don't see how 74def747bcd0
> can be correct:
> 
>       struct cpumask *effmsk =
>       irq_data_get_effective_affinity_mask(irqdata); unsigned long
>       cpu_mask = cpumask_bits(mask)[0] & APIC_ALL_CPUS;
> 
>       if (!cpu_mask)
>               return -EINVAL;
>       *apicid = (unsigned int)cpu_mask;
>       cpumask_bits(effmsk)[0] = cpu_mask;
> 
> Before that patch, this function wrote to the effective mask
> unconditionally. After, it only writes to effective_mask if it is
> already non-zero.
> 
> 
> http://news.gmane.org/find-root.php?message_id=20170919203044.560cb9f1%40gmail.com
>  
> http://mid.gmane.org/20170919203044.560cb9f1%40gmail.com
>

Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it)

Reply via email to