On Tue, 2023-08-15 at 09:08 -0400, Paul Gortmaker wrote:
> [Dilemma on changes - merge or not to merge (e.g. 6.4)] On 14/08/2023 (Mon 
> 10:54) Richard Purdie wrote:
> 
> > I'm becoming a little weary/wary of some of the changes that are coming
> > in. The challenge is that once they merge, issues become the problem of
> > a very small number of people.
> > 
> > My current dilemma is the 6.4 kernel. People would like it, we'd really
> > ideally use it for the next release but there are issues.
> > 
> > I've worked through a few, at least pinning down where the issues were
> > then resolving them with the help of others (thanks Bruce, Jon, Ross).
> > 
> > Remaining are:
> >   * an error upon boot on preempt-rt on qemux86-64
> >      (e.g. 
> > https://autobuilder.yoctoproject.org/typhoon/#/builders/72/builds/7616/steps/36/logs/stdio)
> >      We'll probably just have to ignore it in parselogs as it has been??
> >      around for a while and nobody seems interested in fixing it upstream.
> 
> Just back from vacation and I see an internal report of 10-ish at boot
> 
>   NOHZ tick-stop error: local softirq work is pending, handler #80!!!
> 
> ..on the 6.1.43-rt10-yocto-preempt-rt kernel, on real hardware.  So it
> seems we can't blame that one entirely on v6.4 kernel (or qemu).

That lets us rule out qemu and maybe look at "stable" series updates?
Any idea if it is there in early 6.1.x or just appeared?

> We used to get (late 3.x and 4.x era) pretty common "NOHZ: local softirq
> pending" messages even on common/popular distro kernels.  But I haven't
> seen those for a long time and they didn't scream "error" or have the
> alarmist three exclamation marks either.

When I was looking around I did see a commit which "clarified" the
message adding the "error" keyword...

> I'll see if I can dig into that further.  This instance is new to me, so
> any additional context or information I might not turn up myself would
> be useful.

Thanks. I don't really have any at this point, I've just been
collecting the failures. Bruce may have more. I have a few too many
issues going on at once atm.

> >   * some random hangs:
> >      
> > https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/349/steps/12/logs/stdio
> >      
> > https://autobuilder.yoctoproject.org/typhoon/#/builders/148/builds/354/steps/12/logs/stdio
> > 
> > The latter are rare and intermittent, mainly taking out CI test builds.
> > Most people aren't affected by them, find them hard to reproduce let
> > alone fix and will ignore them. That will leave me/Bruce/PaulG holding
> > the pieces.
> 
> Ugh.  The RCU one is ugly and the Silent Boot Death one is no better.
> Nobody likes SBD cases.  They suck.

They do indeed.

> > I know Bruce spends a ton of time debugging weird things just to get
> > the kernel to the point we can even consider merging and nobody ever
> > really sees or appreciates that work :(.
> 
> Well, not "nobody".  There are at least two people who have a good idea
> of what Bruce does.  :-P

Too few would have been more accurate I guess :)

Cheers,

Richard
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#186065): 
https://lists.openembedded.org/g/openembedded-core/message/186065
Mute This Topic: https://lists.openembedded.org/mt/100733646/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to