I believe I've found the bug.
When I noticed the program was failing to access the data, I suspected it
might be due to missing initializations. So, I tried using the alternative
loader for RISC-V compilers from 9legacy (thanks to Richard Miller), and
with a small adjustment ($-4), I managed to get the code running.
*— miller.s —*
#define EBREAK WORD $(0x73 | 1<<20)
TEXT start(SB), $-4
/* set static base */
MOVW $setSB(SB), R3
/* set stack pointer */
MOVW $(512*1024-16),R2
/* clear bss */
MOVW $edata(SB), R1
MOVW $end(SB), R2
MOVW R0, 0(R1)
ADD $4, R1
BLT R2, R1, -2(PC)
/* call main */
JAL R1, main(SB)
TEXT abort(SB), $-4
EBREAK
RET
However, instead of printing the expected string "Hello world.....", only a
single character "v" appeared. Upon debugging, I discovered that in the
printstr function, storing R1/ra (which had the return address 0x80000076)
on the stack ended up overwriting part of the string. That address
coincides with the end of the program and overwrites the string with this
value, which explains why it prints 0x76 (i.e., 'v'), followed by 0x00, and
then stops as expected.
Looking into the loader code, I saw that the stack pointer (SP) should have
been set to a high address (e.g., MOVW $(512*1024-16), R2). However,
according to the gdb disassembly (0x80000014: addi sp, gp, -2032), it ends
up being much lower — right at the end of the program — which causes the
overwrite.
I’ve attached:
-
The assembly generated after linking
*— hqmiller.asm —*
term% il -l -a -H 1 -T0x80000000 -R4 -o hqmiller.bin miller.i helloq.i
80000000: (3) TEXT start+0(SB),$-4
80000000: 800011b7 87818193(5) MOV $setSB+0(SB),R3
80000008: 00080137 ff010113(8) MOV $524272,R2
80000010: 81018093 (11) MOV $edata+0(SB),R1
80000014: 81018113 (12) MOV $end+0(SB),R2
80000018: 0000a023 (13) MOVW R0,0(R1)
8000001c: 0091 (15) ADD $4,R1
8000001e: fe20cde3 (15) BLT R2,R1,80000018(BRANCH)
80000022: 20a9 (18) JAL ,R1,main+8000006c(BRANCH)
80000024: (20) TEXT abort+0(SB),$-4
80000024: 00100073 (21) WORD ,$1048691
80000028: 8082 (22) JMP ,0(R1)
8000002a: 0001 (0) ADD $0,R0,R0
8000002c: (74) TEXT uartputc+0(SB),R0,$-4
8000002c: 10000637 (74) MOV $268435456,R12
80000030: 01841593 4185d593(74) MOVB R8,R11
80000038: 01859513 41855513(76) MOVB R11,R10
80000040: c208 (76) MOVW R10,0(R12)
80000042: 8082 (76) JMP ,0(R1)
80000044: (80) TEXT printstr+0(SB),R0,$4
80000044: 1161 (80) ADD $-8,R2
80000046: c006 (80) MOVW R1,0(R2)
80000048: 84a2 (80) MOV R8,R9
8000004a: 00048583 (82) MOVB 0(R9),R11
8000004e: c999 (82) BEQ R11,80000064(BRANCH)
80000050: 00148613 (83) ADD $1,R9,R12
80000054: c632 (83) MOVW R12,s+0(FP)
80000056: 00048403 (83) MOVB 0(R9),R8
8000005a: 3fc9 (83) JAL ,uartputc+8000002c(BRANCH)
8000005c: 44b2 (83) MOVW s+0(FP),R9
8000005e: 00048583 (82) MOVB 0(R9),R11
80000062: f5fd (82) BNE R11,80000050(BRANCH)
80000064: 4082 (83) MOVW 0(R2),R1
80000066: 0121 (83) ADD $8,R2
80000068: 8082 (83) JMP ,0(R1)
8000006a: 0001 (0) ADD $0,R0,R0
8000006c: (87) TEXT main+0(SB),R0,$4
8000006c: 1161 (87) ADD $-8,R2
8000006e: c006 (87) MOVW R1,0(R2)
80000070: 80018413 (89) MOV $.string<>+0(SB),R8
80000074: 3fc1 (89) JAL ,printstr+80000044(BRANCH)
80000076: a001 (90) JMP ,38(APC)
-
The corresponding gdb disassembly
*— hqmiller.dump **—*
Dump of assembler code from 0x80000000 to 0x80000078:
=> 0x80000000: lui gp,0x80001
0x80000004: addi gp,gp,-1928 # 0x80000878
0x80000008: lui sp,0x80
0x8000000c: addi sp,sp,-16 # 0x7fff0
0x80000010: addi ra,gp,-2032
0x80000014: addi sp,gp,-2032
0x80000018: sw zero,0(ra)
0x8000001c: addi ra,ra,4
0x8000001e: blt ra,sp,0x80000018
0x80000022: jal 0x8000006c
0x80000024: ebreak
0x80000028: ret
0x8000002a: nop
0x8000002c: lui a2,0x10000
0x80000030: slli a1,s0,0x18
0x80000034: srai a1,a1,0x18
0x80000038: slli a0,a1,0x18
0x8000003c: srai a0,a0,0x18
0x80000040: sw a0,0(a2)
0x80000042: ret
0x80000044: addi sp,sp,-8
0x80000046: sw ra,0(sp)
0x80000048: mv s1,s0
0x8000004a: lb a1,0(s1)
0x8000004e: beqz a1,0x80000064
0x80000050: addi a2,s1,1
0x80000054: sw a2,12(sp)
0x80000056: lb s0,0(s1)
0x8000005a: jal 0x8000002c
0x8000005c: lw s1,12(sp)
0x8000005e: lb a1,0(s1)
0x80000062: bnez a1,0x80000050
0x80000064: lw ra,0(sp)
0x80000066: addi sp,sp,8
0x80000068: ret
0x8000006a: nop
0x8000006c: addi sp,sp,-8
0x8000006e: sw ra,0(sp)
0x80000070: addi s0,gp,-2048
0x80000074: jal 0x80000044
0x80000076: j 0x80000076
So, while I’ve located the root of the issue, I still don’t understand how
a value defined in the loader ends up being something different after
compilation. Any ideas or suggestions?
Thanks a lot,
José J.
P.S.: Thanks Ron, I hope to learn and enjoy playing with Plan 9.
El vie, 29 ago 2025 a las 22:51, ron minnich (<[email protected]>)
escribió:
> Thank you for your interest in Plan 9. I hope you will continue to study
> it. There are many valuable lessons in the code, which was created by the
> group that invented C, and, of course, Unix.
>
> On Fri, Aug 29, 2025 at 9:39 AM José J. Cabezas Castillo <
> [email protected]> wrote:
>
>> Hi,
>>
>> My name is José J. many years ago, in a college operating systems class,
>> the professor mentioned Plan 9 as a curiosity — an OS where "everything is
>> a file". Recently, I’ve had more time, so I installed 9front from the ISO
>> on a few machines to explore and learn.
>>
>> To get started with RISC-V, I tried creating a simple "Hello World" for
>> TinyEMU. I managed to output some characters using HTIF:
>>
>> ---h.c---
>> #include <u.h>
>>
>> #define HTIFADDR_BASE 0x40008000
>>
>> void
>> f(void)
>> {
>> *((volatile uvlong *) HTIFADDR_BASE) = 0x0101000000000031ull;
>> *((volatile uvlong *) HTIFADDR_BASE) = 0x0101000000000032ull;
>> *((volatile uvlong *) HTIFADDR_BASE) = 0x0101000000000033ull;
>> for(;;);
>> }
>> ---
>> Compiled with:
>> %ic -FVw h.c
>> %il -l -H 1 -T0x80000000 -o h32.bin h.i
>>
>> Then I wanted to improve it to print "Hello World", following this video
>> and GitHub repo:
>>
>> - https://www.youtube.com/watch?v=HC7b1SVXoKM
>> - https://github.com/chuckb/riscv-helloworld-c/
>>
>> The example is for Linux, but I tried to adapt it for Plan 9. Using QEMU
>> in Linux, I was able to print characters like before (with just h.c and
>> no functions). However, when I switched to using functions (helloq.c)
>> and a loader (chuck.s), it stopped working.
>>
>> Here is the code:
>>
>> ---helloq.c---
>>
>> #include <u.h>
>>
>> #define UART_BASE 0x10000000
>>
>> void
>> uartputc(char c)
>> {
>> *((volatile ulong *) UART_BASE) = c;
>> }
>>
>> void
>> printstr(char *s)
>> {
>> while (*s)
>> uartputc(*s++);
>> }
>>
>> void
>> main(void)
>> {
>> printstr("Hello world\n");
>> for (;;);
>> }
>> ---
>> ---chuck.s---
>> TEXT start(SB), $0
>> /* set stack pointer */
>> MOVW $0x80020000, R2
>> /* set frame pointer */
>> ADD R0, R2, R8
>> /* call main */
>> JAL R1, main(SB)
>> ---
>> Compiled with:
>> ic -FVSw helloq.c
>> il -l -a -H 1 -T0x80000000 -R4 -o helloq.bin chuck.i helloq.i
>>
>> The assembler generated is:
>> 80000000: (1) TEXT start+0(SB),$4
>>
>> * 80000000: 1161 (1) ADD $-8,R2 80000002: c006
>> (1) MOVW R1,0(R2)*
>> 80000004: 80020137 (3) MOV $-2147352576,R2
>> 80000008: 00010433 (6) ADD R0,R2,R8
>> 8000000c: 2091 (9) JAL ,R1,main+80000050(BRANCH)
>> 8000000e: 0001 (0) ADD $0,R0,R0
>> 80000010: (74) TEXT uartputc+0(SB),R0,$-4
>> 80000010: 10000637 (74) MOV $268435456,R12
>> 80000014: 01841593 4185d593(74) MOVB R8,R11
>> 8000001c: 01859513 41855513(76) MOVB R11,R10
>> 80000024: c208 (76) MOVW R10,0(R12)
>> 80000026: 8082 (76) JMP ,0(R1)
>> 80000028: (80) TEXT printstr+0(SB),R0,$4
>> 80000028: 1161 (80) ADD $-8,R2
>> 8000002a: c006 (80) MOVW R1,0(R2)
>> 8000002c: 84a2 (80) MOV R8,R9
>> 8000002e: 00048583 (82) MOVB 0(R9),R11
>> 80000032: c999 (82) BEQ R11,80000048(BRANCH)
>> 80000034: 00148613 (83) ADD $1,R9,R12
>> 80000038: c632 (83) MOVW R12,s+0(FP)
>> 8000003a: 00048403 (83) MOVB 0(R9),R8
>> 8000003e: 3fc9 (83) JAL ,uartputc+80000010(BRANCH)
>> 80000040: 44b2 (83) MOVW s+0(FP),R9
>> 80000042: 00048583 (82) MOVB 0(R9),R11
>> 80000046: f5fd (82) BNE R11,80000034(BRANCH)
>> 80000048: 4082 (83) MOVW 0(R2),R1
>> 8000004a: 0121 (83) ADD $8,R2
>> 8000004c: 8082 (83) JMP ,0(R1)
>> 8000004e: 0001 (0) ADD $0,R0,R0
>> 80000050: (87) TEXT main+0(SB),R0,$4
>> 80000050: 1161 (87) ADD $-8,R2
>> 80000052: c006 (87) MOVW R1,0(R2)
>> 80000054: 80018413 (89) MOV $.string<>+0(SB),R8
>> 80000058: 3fc1 (89) JAL ,printstr+80000028(BRANCH)
>> 8000005a: a001 (90) JMP ,30(APC)
>> ----
>>
>> However, when debugging with GDB in QEMU, I found that the instruction at
>> address *0x80000002* causes the PC (program counter) to reset, and
>> execution does not continue. I believe these extra instructions are added
>> by the loader automatically, but I don’t know how to prevent this.
>>
>> I also tried using l.s from the 9legacy compiler sources, but had the
>> same result. I’ve been reading through start.s from the RISC-V kernel and
>> looking at the mkfile, suspecting I might need to pass specific options to
>> compile correctly, but there are too many and I don’t fully understand them
>> yet.
>>
>> Can someone explain how to compile without these extra instructions, or
>> why the PC is being reset and how to avoid it?
>>
>> Thanks in advance, and apologies for my English and the length of this
>> email.
>>
>> Best regards,
>> José J.
>>
>>
>> *9fans <https://9fans.topicbox.com/latest>* / 9fans / see discussions
> <https://9fans.topicbox.com/groups/9fans> + participants
> <https://9fans.topicbox.com/groups/9fans/members> + delivery options
> <https://9fans.topicbox.com/groups/9fans/subscription> Permalink
> <https://9fans.topicbox.com/groups/9fans/T3f252d4d7c5389ee-M7443e2eed479486fd6a55cc6>
>
--
José J. Cabezas
------------------------------------------
9fans: 9fans
Permalink:
https://9fans.topicbox.com/groups/9fans/T3f252d4d7c5389ee-Mf1a4a51e93e914a5b9276778
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription