I thought I'd summarise where things are at with the 6.5 kernel.

We've fixed:
* the ARM LTP OOM lockup (kernel patch)
* the locale ARM selftest failure which was OOM due to silly buffer 
  allocations in 6.5 (kernel commandline)
* the ARM jitterentropy errors (kernel patch)
* the cryptodev build failures (recipe updated)

We've also:
* disabled the strace tests that fail with 6.5.
* made sure the serial ports and getty counts match
* added ttyrun which wraps serial consoles and avoids hacks
* made the qemurunner logging save all the port logs
* made the qemurunner write the binary data it is sent verbatim
* made sure to use nodelay on qemu's tcpserial

This leaves an annoying serial console problem where ttyS1 never has
the getty login prompt appear.

What we know:

* We've only seen this on x86 more recently (yesterday/today) but have
seen it on ARM in the days before that.

* It affects both sysvinit and systemd images.

* Systemd does print that it started a getty on ttyS0 and ttyS1 when
the failure occurs

* There is a getty running according to "ps" when the failure occurs

* There are only ever one or three characters received to ttyS1 in the
failure case (0x0d and 0x0a chars, i.e. CR and LF)

* It can't be any kind of utf-8 conversion issue since the login prompt
isn't visible in the binary log

* the kernel boot logs do show the serial port created with the same
ioport and irq on x86.

Previously we did see some logs with timing issues on the ttyS0 port
but the nodelay parameter may have helped with that.

There are debug patches in master-next against qemurunner which try and
poke around to gather more debug when things fail using ttyS0.

The best failure log we have is now this one:

https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/5874/steps/14/logs/stdio

where I've saved the logs:

https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853
and
https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853.2

You can see ttyS1 times out after 1000 seconds and the port only has a
single byte (in the .2 file). The other log shows ps output showing the
getty running for ttyS1.

Ideas welcome on where from here. 

I've tweaked master-next to keep reading the ttyS1 port after we poke
it from ttyS0 to see if that reveals anything next time it fails (build
running).

Cheers,

Richard
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#188800): 
https://lists.openembedded.org/g/openembedded-core/message/188800
Mute This Topic: https://lists.openembedded.org/mt/101824562/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to