https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118916
Bug ID: 118916 Summary: AARCH64: rtl-cse2 Option in O3 Level Optimization Ignores "volatile", Causing 'Invalid Instruction Syndrome' Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: changyp6 at gmail dot com Target Milestone: --- Created attachment 60520 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60520&action=edit rct-cse2 test case I have a program sets registers in AARCH64 EL2 mode 1 #define RCT_BASE 0xFFED080000 2 #define RCT_REG(x) (RCT_BASE + (x)) 3 4 5 #define REF_CLK_FREQ 24000000UL 6 7 #define RCT_TIMER2_REG 0xFFED080494 8 #define RCT_TIMER2_CTRL_REG 0xFFED080498 9 10 #define __raw_writel(v, a) (*(volatile unsigned int *)(unsigned long)(a) = (v)) 11 #define __raw_readl(a) (*(volatile unsigned int *)(unsigned long)(a)) 12 13 #define writel(p, v) __raw_writel(v, p) 14 #define readl(p) __raw_readl(p) 15 16 typedef unsigned int u32; 17 18 u32 rct_timer2_tick2ms(u32 s_tck, u32 e_tck) 19 { 20 return (e_tck - s_tck) / (REF_CLK_FREQ / 1000); 21 } 22 23 void rct_timer2_reset_count() 24 { 25 /* reset timer */ 26 writel(RCT_TIMER2_CTRL_REG, 0x1); 27 /* enable timer */ 28 writel(RCT_TIMER2_CTRL_REG, 0x0); 29 } 30 31 u32 rct_timer2_get_count() 32 { 33 return rct_timer2_tick2ms(0x00000000, readl(RCT_TIMER2_REG)); 34 } 35 36 void rct_timer2_dly_ms(u32 dly_tim) 37 { 38 u32 cur_tim; 39 40 rct_timer2_reset_count(); 41 while (1) { 42 cur_tim = rct_timer2_get_count(); 43 if (cur_tim >= dly_tim) 44 break; 45 } 46 } When building this program by toolchain from ARM: https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz if using -O2, or -O3 -fdisable-rtl-cse2, program runs OK if using -O3, program reports 'Invalid Instruction Syndrome' After debugging, I found that this is caused by 'rtl-cse2' optimization. Disassembles shows that the 'Invalid Instruction Syndrome' happens on line '84: b81fc45f str wzr, [x2], #-4' this line runs OK in EL1 mode, however, According to ARM Architecture Reference Manual, ISV bit in ESR_EL2 would be 0 while instruction is performing register writeback, indicating 'Invalid Instruction Syndrome' (Refer to: https://developer.arm.com/documentation/ddi0601/2024-12/AArch64-Registers/ESR-EL2--Exception-Syndrome-Register--EL2-) -O3 compiled: 0000000000000064 <rct_timer2_dly_ms>: 64: d2809301 mov x1, #0x498 // #1176 68: 52800024 mov w4, #0x1 // #1 6c: f2bda101 movk x1, #0xed08, lsl #16 70: 52833e23 mov w3, #0x19f1 // #6641 74: f2c01fe1 movk x1, #0xff, lsl #32 78: 72a0aec3 movk w3, #0x576, lsl #16 7c: aa0103e2 mov x2, x1 80: b9000024 str w4, [x1] 84: b81fc45f str wzr, [x2], #-4 88: b9400041 ldr w1, [x2] 8c: 9ba37c21 umull x1, w1, w3 90: d369fc21 lsr x1, x1, #41 94: 6b01001f cmp w0, w1 98: 54ffff88 b.hi 88 <rct_timer2_dly_ms+0x24> // b.pmore 9c: d65f03c0 ret Following is compiled with '-O3 -fdisable-rtl-cse2', which runs successfully 0000000000000064 <rct_timer2_dly_ms>: 64: d2809301 mov x1, #0x498 // #1176 68: d2809283 mov x3, #0x494 // #1172 6c: f2bda101 movk x1, #0xed08, lsl #16 70: 52800024 mov w4, #0x1 // #1 74: f2c01fe1 movk x1, #0xff, lsl #32 78: f2bda103 movk x3, #0xed08, lsl #16 7c: 52833e22 mov w2, #0x19f1 // #6641 80: f2c01fe3 movk x3, #0xff, lsl #32 84: 72a0aec2 movk w2, #0x576, lsl #16 88: b9000024 str w4, [x1] 8c: b900003f str wzr, [x1] 90: b9400061 ldr w1, [x3] 94: 9ba27c21 umull x1, w1, w2 98: d369fc21 lsr x1, x1, #41 9c: 6b01001f cmp w0, w1 a0: 54ffff88 b.hi 90 <rct_timer2_dly_ms+0x2c> // b.pmore a4: d65f03c0 ret In my program on line 10, 11, the register address is 'volatile unsigned int *', which tells compiler NOT TO OPTIMIZE it. Attached is the test case for this issue. 1. Download ARM toolchain from https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz and extract this toolchain by command `sudo tar Jxvf arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz -C /usr/local` 2. tar Jxvf rct-cse2-opt-issue.tar.xz cd rtc-cse2-opt-issue make vimdiff rct-cse2-opt-issue.obj-failed.S rct-cse2-opt-issue.obj-success.S Please help take a look at this issue, or give me some advise on how to work around it in my program while still using -O3 optimization.