Issue 144845
Summary __arm_rsr64 treated as CSE'able on arm32
Labels backend:ARM
Assignees
Reporter frobtech
    Consider this code:
```
#include <arm_acle.h> 
 
#include <stdint.h> 
 
#ifdef __aarch64__ 
#define REG "cntvct_el0" 
#else 
#define REG "cp15:1:c14" 
#endif 
 
uint64_t get_cntvct_xor() {                                                     
 uint64_t v1 = __arm_rsr64(REG); 
  uint64_t v2 = __arm_rsr64(REG); 
  return v1 ^ v2; 
}
```

When compiled for aarch64, it produces two `mrs` instructions as expected.
For example, `clang++ --target=aarch64-fuchsia -S -o - -O2 rsr.cc` produces (trimmed):
```
_Z14get_cntvct_xorv: // @_Z14get_cntvct_xorv                 
        .cfi_startproc 
// %bb.0: 
        mrs     x8, CNTVCT_EL0                                                  
        mrs x9, CNTVCT_EL0                                                  
        eor x0, x9, x8                                                      
 ret 
```

However, when compiled for aarch32, it acts as if the intrinsic has "non-volatile" semantics and can be presumed to return the same value when called twice.
For example, `clang++ --target=armv7-linux-gnueabihf -S -o - -O2 rsr.cc` produces (trimmed):
```
_Z14get_cntvct_xorv: @ @_Z14get_cntvct_xorv                  
        .fnstart 
@ %bb.0: 
        mov     r0, #0 
        mov     r1, #0 
        bx      lr 
```
(It's similar with `-mthumb` added.)

The ARM ACLE spec does not say whether the `__arm_rsr64` lowering should have "volatile" (non-CSE'able) or "non-volatile" (CSE'able) semantics.  But for aarch64, both LLVM and GCC agree that it has the "volatile" semantics, and users now rely on that.

This example is exercising the aarch64 and aarch32 spellings of the exact same hardware access.  IMHO they should definitely be treated consistently between the two backends.  (GCC does not support the same intrinsics for aarch32 targets as for aarch64, so we don't have that precedent to refer to here.)

That seems to be the intent of the LLVM code too.  To wit, in both cases above with `-emit-llvm` added, the IR is basically the same:
```
define dso_local noundef i64 @_Z14get_cntvct_xorv() local_unnamed_addr #0 {     
  %1 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)              
  %2 = tail call i64 @llvm.read_volatile_register.i64(metadata !5)              
  %3 = xor i64 %2, %1                                                           
  ret i64 %3 
}
```

It certainly seems wrong that `llvm.read_volatile_register.i64` is being lowered on aarch32 as CSE'able.  The "volatile" in the name really suggests the contrary.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to