On 11/23/24 02:28, Stafford Horne wrote:
Just a guess, but given the alignment change, I suspect it's barfing on the
statically linked initramfs? That seems the most likely step to go off the
rails given the failing patch is a symbol alignment change in the flattened
device tree plumbing, and I think the initramfs extractor parses device
trees very early on to find stuff (I forget why). Moving "where the data
lives" without a corresponding change to the "where to look for the data"
code seems a bit strange, but it's not my area...
OK, and the broken earlycon may be masking what is going on, as we should at
least see some console output before things fail. The earlcon fix is in 6.13
not 6.12.
I was able to test your or1k.tgz image and figure out what is wrong. Your
run-qemu.sh script has 'console=FIXME'. This command line argument is taken in
and is causing the boot process to not be able to find the console.
Changing it to 'console=ttyS0' allows me to see the output.
Ha, so it STARTED parsing console= and broke. Oops. (It was there so I'd
notice...)
I put a branch with the qemu patches I have here:
- https://github.com/stffrdhrn/qemu/tree/or1k-9.2.0-fixes-1
Here's the miniconfig I built 6.12 with (90% of which is generic to all the
architectures I'm testing, the sections are labeled. The console="FIXME" bit
is because I can't get qemu-system-or1k -append "blah" to go through to
linux, so I stuck FIXME in that field for the or1k target and it wound up in
the output):
The kernel command line is injected by qemu into the qemu generated
devicetree. I notice when I boot your kernel with the reverted FDT alignment
fix the console prints:
Kernel command line: earlycon
This means that the qemu devicetree is not being used, hence the command line
args are not working. The qemu device tree not being used is not good, but that
is why reverting the alignment fix 'seems' to fix the issue. To me the revert
looks to be breaking the qemu devicetree allowing us to fall back to the kernel
supplied devicetree.
I'm happy to do it the "right" way if I know what that is. I just
stumbled around and got it to work.
Also, looking at that, I'm using a builtin DTB and you might be passing one
in via -dtb? Another thing the alignment change might break...
Thanks for the steps. I was just using the or1k.tgz you provided earlier. The
above will help if I want to try some kernel fixes on my own.
I'm attempting to regression test as many targets as I can to get
consistent basic behavior out of:
https://landley.net/bin/mkroot/0.8.11/
I'm trying to get a new release out with the 6.12 kernel which is why
I'm revisiting this now.
I've even got a test script that runs all the targets under qemu
(booting them in parallel even) and checks that A) they boot and run
userspace, B) they can talk to an emulated hard disk, C) they can talk
to an emulated network, D) the clock gets set reasonably, E) it knows
how to exit the emulator. You'd be surprised how many regressions there
are in just that...
Speaking of which, is there a way to get or1k to exit the emulator? I
told the kernel to reboot but it says "reboot failed, system halted" and
hangs instead of exiting qemu. (My testroot runs qemu under "timeout -i
10" to kill it after 10 seconds of inactivity, I.E. nothing written to
stdout, but it still counts as a failure on one of the criteria.)
Note, I did find some issues with the kernel nor properly handling stdout-path.
It seems that if there are multiple uarts the first one will always be used as
the default uart. Only the console= command line argument can be used to
override that.
I've never managed to get console= to go through to linux in
qemu-system-or1k. The above tries but is ignored.
As I mentioned above this is a good clue and explains why the alignment "fix"
fixes your issue.
Happy to do it properly. Almost all the other targets can do console=,
the FIXME was there to highlight the fact it didn't work right.
(Silently working for the WRONG REASON is still bad when regression
testing.)
It's also doing a statically linked initramfs because -initrd didn't work on
this target. Happy to update if it's been fixed, the other targets are
almost all using -initrd to feed in an external cpio.gz
Using -initrd should work. But also using the statically linked initramfs
should be fine too. The setup I use for testing uses virt with a virtio block
driver.
Most of the other targets _don't_ use builtin initramfs, so you can swap
them out "aftermarket" as it were. When it's separate you can examine
and edit the contents without rebuilding the kernel...
When using qemu with -initrd qemu will back the kernel, initrd and fdt one after
the other into memory as per.
[ kernel ] - Loads from 0x100 (based on elf layout)
[ initrd ] - page aligned
[ fdt ] - page aligned devicetree (revert moved to 4 bytes aligned)
The fdt address gets placed into r3 which the kernel uses to find the qemu FDT.
Finding the FDT one of the first steps of the boot processes.
I updated my mkroot config:
https://github.com/landley/toybox/commit/fb3ca98e2faa
I.E. changed the FIXME to ttyS0, removed BUILTIN=1 so it's no longer
statically linking the initramfs image, and yanked the builtin DTB, and
the result works with v9.2.0-rc1.
Still doesn't know how to exit qemu, though. (Is there a kernel symbol I
can add to 6.12, or does qemu still not have an exit mechanism for this
board yet?)
(FYI: be2csv is a shell function to convert bash's brace expansion
syntax to a comma separated value list, and then csv2cfg is another
shell function that turns the CSV into https://lwn.net/Articles/160497/
. The CSV is shipped as docs/linux-microconfig in the tarball if you're
curious. That's how a 400 line bash script can build a Linux system that
boots to a shell prompt for a dozen architectures. The or1k config is
now 2 lines, for example. 3 with the "if or1k" check. The variables it
assigns to are documented around line 190.)
If you provide command line args console=ttyS0 things will work.
Also console=ttyS0 is not used as all as it should be the default in QEMU.
I specify it explicitly to be consistent across architectures.
It looks like the root cause of the issue was the 'console=FIXME'.
I hope it helps.
Yup, I just had to remove workarounds for old qemu that are no longer
needed. Thanks for the help. (If you do teach qemu to exit at some
point, please let me know...)
-Stafford
Thanks,
Rob