https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118916

            Bug ID: 118916
           Summary: AARCH64: rtl-cse2 Option in O3 Level Optimization
                    Ignores "volatile", Causing 'Invalid Instruction
                    Syndrome'
           Product: gcc
           Version: 14.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: changyp6 at gmail dot com
  Target Milestone: ---

Created attachment 60520
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60520&action=edit
rct-cse2 test case

I have a program sets registers in AARCH64 EL2 mode


   1 #define RCT_BASE                0xFFED080000
   2 #define RCT_REG(x)              (RCT_BASE + (x))
   3 
   4 
   5 #define REF_CLK_FREQ            24000000UL
   6 
   7 #define RCT_TIMER2_REG          0xFFED080494
   8 #define RCT_TIMER2_CTRL_REG     0xFFED080498
   9 
  10 #define __raw_writel(v, a)      (*(volatile unsigned int *)(unsigned
long)(a) = (v))
  11 #define __raw_readl(a)          (*(volatile unsigned int *)(unsigned
long)(a))
  12 
  13 #define writel(p, v)            __raw_writel(v, p)
  14 #define readl(p)                __raw_readl(p)
  15 
  16 typedef unsigned int u32;
  17 
  18 u32 rct_timer2_tick2ms(u32 s_tck, u32 e_tck)
  19 {
  20     return (e_tck - s_tck) / (REF_CLK_FREQ / 1000);
  21 }
  22 
  23 void rct_timer2_reset_count()
  24 {
  25     /* reset timer */
  26     writel(RCT_TIMER2_CTRL_REG, 0x1);
  27     /* enable timer */
  28     writel(RCT_TIMER2_CTRL_REG, 0x0);
  29 }
  30 
  31 u32 rct_timer2_get_count()
  32 {
  33     return rct_timer2_tick2ms(0x00000000, readl(RCT_TIMER2_REG));
  34 }
  35 
  36 void rct_timer2_dly_ms(u32 dly_tim)
  37 {
  38     u32 cur_tim;
  39 
  40     rct_timer2_reset_count();
  41     while (1) {
  42         cur_tim = rct_timer2_get_count();
  43         if (cur_tim >= dly_tim)
  44             break;
  45     }
  46 }



When building this program by toolchain from ARM:
https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz

if using -O2, or -O3 -fdisable-rtl-cse2, program runs OK
if using -O3, program reports 'Invalid Instruction Syndrome'

After debugging, I found that this is caused by 'rtl-cse2' optimization.

Disassembles shows that the 'Invalid Instruction Syndrome' happens on line
'84:   b81fc45f    str wzr, [x2], #-4'
this line runs OK in EL1 mode, however, According to ARM Architecture Reference
Manual, ISV bit in ESR_EL2 would be 0 while instruction is performing register
writeback, indicating 'Invalid Instruction Syndrome'
(Refer to:
https://developer.arm.com/documentation/ddi0601/2024-12/AArch64-Registers/ESR-EL2--Exception-Syndrome-Register--EL2-)


-O3 compiled:
   0000000000000064 <rct_timer2_dly_ms>:
     64:   d2809301    mov x1, #0x498                  // #1176
     68:   52800024    mov w4, #0x1                    // #1
     6c:   f2bda101    movk    x1, #0xed08, lsl #16
     70:   52833e23    mov w3, #0x19f1                 // #6641
     74:   f2c01fe1    movk    x1, #0xff, lsl #32
     78:   72a0aec3    movk    w3, #0x576, lsl #16
     7c:   aa0103e2    mov x2, x1
     80:   b9000024    str w4, [x1]
     84:   b81fc45f    str wzr, [x2], #-4
     88:   b9400041    ldr w1, [x2]
     8c:   9ba37c21    umull   x1, w1, w3
     90:   d369fc21    lsr x1, x1, #41
     94:   6b01001f    cmp w0, w1
     98:   54ffff88    b.hi    88 <rct_timer2_dly_ms+0x24>  // b.pmore
     9c:   d65f03c0    ret


Following is compiled with '-O3 -fdisable-rtl-cse2', which runs successfully
0000000000000064 <rct_timer2_dly_ms>:
  64:   d2809301        mov     x1, #0x498                      // #1176
  68:   d2809283        mov     x3, #0x494                      // #1172
  6c:   f2bda101        movk    x1, #0xed08, lsl #16
  70:   52800024        mov     w4, #0x1                        // #1
  74:   f2c01fe1        movk    x1, #0xff, lsl #32
  78:   f2bda103        movk    x3, #0xed08, lsl #16
  7c:   52833e22        mov     w2, #0x19f1                     // #6641
  80:   f2c01fe3        movk    x3, #0xff, lsl #32
  84:   72a0aec2        movk    w2, #0x576, lsl #16
  88:   b9000024        str     w4, [x1]
  8c:   b900003f        str     wzr, [x1]
  90:   b9400061        ldr     w1, [x3]
  94:   9ba27c21        umull   x1, w1, w2
  98:   d369fc21        lsr     x1, x1, #41
  9c:   6b01001f        cmp     w0, w1
  a0:   54ffff88        b.hi    90 <rct_timer2_dly_ms+0x2c>  // b.pmore
  a4:   d65f03c0        ret



In my program on line 10, 11, the register address is 'volatile unsigned int
*', which tells compiler NOT TO OPTIMIZE it.

Attached is the test case for this issue.

1. Download ARM toolchain from
https://developer.arm.com/-/media/Files/downloads/gnu/14.2.rel1/binrel/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz
and extract this toolchain by command `sudo tar Jxvf
arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz -C /usr/local`

2. tar Jxvf rct-cse2-opt-issue.tar.xz
   cd rtc-cse2-opt-issue
   make
   vimdiff rct-cse2-opt-issue.obj-failed.S rct-cse2-opt-issue.obj-success.S

Please help take a look at this issue, or give me some advise on how to work
around it in my program while still using -O3 optimization.

Reply via email to