https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118916
Bug ID: 118916
Summary: AARCH64: rtl-cse2 Option in O3 Level Optimization
Ignores "volatile", Causing 'Invalid Instruction
Syndrome'
Product: gcc
Version: 14.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: changyp6 at gmail dot com
Target Milestone: ---
Created attachment 60520
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60520&action=edit
rct-cse2 test case
I have a program sets registers in AARCH64 EL2 mode
1 #define RCT_BASE 0xFFED080000
2 #define RCT_REG(x) (RCT_BASE + (x))
3
4
5 #define REF_CLK_FREQ 24000000UL
6
7 #define RCT_TIMER2_REG 0xFFED080494
8 #define RCT_TIMER2_CTRL_REG 0xFFED080498
9
10 #define __raw_writel(v, a) (*(volatile unsigned int *)(unsigned
long)(a) = (v))
11 #define __raw_readl(a) (*(volatile unsigned int *)(unsigned
long)(a))
12
13 #define writel(p, v) __raw_writel(v, p)
14 #define readl(p) __raw_readl(p)
15
16 typedef unsigned int u32;
17
18 u32 rct_timer2_tick2ms(u32 s_tck, u32 e_tck)
19 {
20 return (e_tck - s_tck) / (REF_CLK_FREQ / 1000);
21 }
22
23 void rct_timer2_reset_count()
24 {
25 /* reset timer */
26 writel(RCT_TIMER2_CTRL_REG, 0x1);
27 /* enable timer */
28 writel(RCT_TIMER2_CTRL_REG, 0x0);
29 }
30
31 u32 rct_timer2_get_count()
32 {
33 return rct_timer2_tick2ms(0x00000000, readl(RCT_TIMER2_REG));
34 }
35
36 void rct_timer2_dly_ms(u32 dly_tim)
37 {
38 u32 cur_tim;
39
40 rct_timer2_reset_count();
41 while (1) {
42 cur_tim = rct_timer2_get_count();
43 if (cur_tim >= dly_tim)
44 break;
45 }
46 }
When building this program by toolchain from ARM:
https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz
if using -O2, or -O3 -fdisable-rtl-cse2, program runs OK
if using -O3, program reports 'Invalid Instruction Syndrome'
After debugging, I found that this is caused by 'rtl-cse2' optimization.
Disassembles shows that the 'Invalid Instruction Syndrome' happens on line
'84: b81fc45f str wzr, [x2], #-4'
this line runs OK in EL1 mode, however, According to ARM Architecture Reference
Manual, ISV bit in ESR_EL2 would be 0 while instruction is performing register
writeback, indicating 'Invalid Instruction Syndrome'
(Refer to:
https://developer.arm.com/documentation/ddi0601/2024-12/AArch64-Registers/ESR-EL2--Exception-Syndrome-Register--EL2-)
-O3 compiled:
0000000000000064 <rct_timer2_dly_ms>:
64: d2809301 mov x1, #0x498 // #1176
68: 52800024 mov w4, #0x1 // #1
6c: f2bda101 movk x1, #0xed08, lsl #16
70: 52833e23 mov w3, #0x19f1 // #6641
74: f2c01fe1 movk x1, #0xff, lsl #32
78: 72a0aec3 movk w3, #0x576, lsl #16
7c: aa0103e2 mov x2, x1
80: b9000024 str w4, [x1]
84: b81fc45f str wzr, [x2], #-4
88: b9400041 ldr w1, [x2]
8c: 9ba37c21 umull x1, w1, w3
90: d369fc21 lsr x1, x1, #41
94: 6b01001f cmp w0, w1
98: 54ffff88 b.hi 88 <rct_timer2_dly_ms+0x24> // b.pmore
9c: d65f03c0 ret
Following is compiled with '-O3 -fdisable-rtl-cse2', which runs successfully
0000000000000064 <rct_timer2_dly_ms>:
64: d2809301 mov x1, #0x498 // #1176
68: d2809283 mov x3, #0x494 // #1172
6c: f2bda101 movk x1, #0xed08, lsl #16
70: 52800024 mov w4, #0x1 // #1
74: f2c01fe1 movk x1, #0xff, lsl #32
78: f2bda103 movk x3, #0xed08, lsl #16
7c: 52833e22 mov w2, #0x19f1 // #6641
80: f2c01fe3 movk x3, #0xff, lsl #32
84: 72a0aec2 movk w2, #0x576, lsl #16
88: b9000024 str w4, [x1]
8c: b900003f str wzr, [x1]
90: b9400061 ldr w1, [x3]
94: 9ba27c21 umull x1, w1, w2
98: d369fc21 lsr x1, x1, #41
9c: 6b01001f cmp w0, w1
a0: 54ffff88 b.hi 90 <rct_timer2_dly_ms+0x2c> // b.pmore
a4: d65f03c0 ret
In my program on line 10, 11, the register address is 'volatile unsigned int
*', which tells compiler NOT TO OPTIMIZE it.
Attached is the test case for this issue.
1. Download ARM toolchain from
https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz
and extract this toolchain by command `sudo tar Jxvf
arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz -C /usr/local`
2. tar Jxvf rct-cse2-opt-issue.tar.xz
cd rtc-cse2-opt-issue
make
vimdiff rct-cse2-opt-issue.obj-failed.S rct-cse2-opt-issue.obj-success.S
Please help take a look at this issue, or give me some advise on how to work
around it in my program while still using -O3 optimization.