Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect

2022-04-01 Thread Mark Rutland via Gcc
Hi Jeremy,

Thanks for raising this.

On Fri, Apr 01, 2022 at 11:44:06AM -0500, Jeremy Linton wrote:
> The relaxed variants of read/write macros are only declared
> as `asm volatile()` which forces the compiler to generate the
> instruction in the code path as intended. The only problem
> is that it doesn't also tell the compiler that there may
> be memory side effects. Meaning that if a function is comprised
> entirely of relaxed io operations, the compiler may think that
> it only has register side effects and doesn't need to be called.

As I mentioned on a private mail, I don't think that reasoning above is
correct, and I think this is a miscompilation (i.e. a compiler bug).

The important thing is that any `asm volatile` may have a side effects
generally outside of memory or GPRs, and whether the assembly contains a memory
load/store is immaterial. We should not need to add a memory clobber in order
to retain the volatile semantic.

See:

  https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile

... and consider the x86 example that reads rdtsc, or an arm64 sequence like:

| void do_sysreg_thing(void)
| {
|   unsigned long tmp;
|   
|   tmp = read_sysreg(some_reg);
|   tmp |= SOME_BIT;
|   write_sysreg(some_reg);
| }

... where there's no memory that we should need to hazard against.

This patch might workaround the issue, but I don't believe it is a correct fix.

> For an example function look at bcmgenet_enable_dma(), before the
> relaxed variants were removed. When built with gcc12 the code
> contains the asm blocks as expected, but then the function is
> never called.

So it sounds like this is a regression in GCC 12, which IIUC isn't released yet
per:

  https://gcc.gnu.org/gcc-12/changes.html

... which says:

| Note: GCC 12 has not been released yet

Surely we can fix it prior to release?

Thanks,
Mark.

> 
> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/include/asm/io.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 7fd836bea7eb..3cceda7948a0 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -24,25 +24,25 @@
>  #define __raw_writeb __raw_writeb
>  static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>  {
> - asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
> + asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr) : "memory");
>  }
>  
>  #define __raw_writew __raw_writew
>  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
>  {
> - asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
> + asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr) : "memory");
>  }
>  
>  #define __raw_writel __raw_writel
>  static __always_inline void __raw_writel(u32 val, volatile void __iomem 
> *addr)
>  {
> - asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr));
> + asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr) : "memory");
>  }
>  
>  #define __raw_writeq __raw_writeq
>  static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>  {
> - asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
> + asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr) : "memory");
>  }
>  
>  #define __raw_readb __raw_readb
> -- 
> 2.35.1
> 


GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-05 Thread Mark Rutland via Gcc
Hi all,

[adding kernel folk who work on asm stuff]

As a heads-up, GCC 12 (not yet released) appears to erroneously optimize away
calls to functions with volatile asm. Szabolcs has raised an issue on the GCC
bugzilla:  

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160

... which is a P1 release blocker, and is currently being investigated.

Jemery originally reported this as an issue with {readl,writel}_relaxed(), but
the underlying problem doesn't have anything to do with those specifically.

I'm dumping a bunch of info here largely for posterity / archival, and to find
out who (from the kernel side) is willing and able to test proposed compiler
fixes, once those are available.

I'm happy to do so for aarch64; Peter, I assume you'd be happy to look at the
x86 side?

This is a generic issue, and 

I wrote test cases for aarch64 and x86_64. Those are inline later in this mail,
and currently you can see them on compiler explorer:

  aarch64: https://godbolt.org/z/vMczqjYvs

  x86_64: https://godbolt.org/z/cveff9hq5



My aarch64 test case is:

| #define sysreg_read(regname)  \
| ({\
|   unsigned long __sr_val; \
|   asm volatile(   \
|   "mrs %0, " #regname "\n"\
|   : "=r" (__sr_val)); \
|   \
|   __sr_val;   \
| })
| 
| #define sysreg_write(regname, __sw_val)   \
| do {  \
|   asm volatile(   \
|   "msr " #regname ", %0\n"\
|   :   \
|   : "r" (__sw_val));  \
| } while (0)
| 
| #define isb() \
| do {  \
|   asm volatile(   \
|   "isb"   \
|   :   \
|   :   \
|   : "memory");\
| } while (0)
| 
| static unsigned long sctlr_read(void)
| {
|   return sysreg_read(sctlr_el1);
| }
| 
| static void sctlr_write(unsigned long val)
| {
|   sysreg_write(sctlr_el1, val);
| }
| 
| static void sctlr_rmw(void)
| {
|   unsigned long val;
| 
|   val = sctlr_read();
|   val |= 1UL << 7;
|   sctlr_write(val);
| }
| 
| void sctlr_read_multiple(void)
| {
|   sctlr_read();
|   sctlr_read();
|   sctlr_read();
|   sctlr_read();
| }
| 
| void sctlr_write_multiple(void)
| {
|   sctlr_write(0);
|   sctlr_write(0);
|   sctlr_write(0);
|   sctlr_write(0);
|   sctlr_write(0);
| }
| 
| void sctlr_rmw_multiple(void)
| {
|   sctlr_rmw();
|   sctlr_rmw();
|   sctlr_rmw();
|   sctlr_rmw();
| }
| 
| void function(void)
| {
|   sctlr_read_multiple();
|   sctlr_write_multiple();
|   sctlr_rmw_multiple();
| 
|   isb();
| }

Per compiler explorer (https://godbolt.org/z/vMczqjYvs) GCC trunk currently
compiles this as:

| sctlr_rmw:
| mrs x0, sctlr_el1
| orr x0, x0, 128
| msr sctlr_el1, x0
| ret
| sctlr_read_multiple:
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| ret
| sctlr_write_multiple:
| mov x0, 0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| ret
| sctlr_rmw_multiple:
| ret
| function:
| isb
| ret

Whereas GCC 11.2 compiles this as:

| sctlr_rmw:
| mrs x0, sctlr_el1
| orr x0, x0, 128
| msr sctlr_el1, x0
| ret
| sctlr_read_multiple:
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| mrs x0, sctlr_el1
| ret
| sctlr_write_multiple:
| mov x0, 0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| msr sctlr_el1, x0
| ret
| sctlr_rmw_multiple:
| stp x29, x30, [sp, -16]!
| mov x29, sp
| bl  sctlr_rmw
| bl  sctlr_rmw
| bl  sctlr_rmw
| bl  sctlr_rmw
| ldp x29, x30, [sp], 16
| ret
| function:
| stp x29, x30, [sp, -16]!
| mov x29, sp
| bl  sctlr_read_multiple
| bl  sctlr_write_multiple
| bl  sctlr_rmw_multiple
| isb
| ldp x29, x30, [sp], 16
| ret



My x86_64 test case is:

| unsigned long rdmsr(unsigned long reg)
| {
| unsigned int lo, hi;
| 
| asm volatile(
| "rdmsr"
| : "=d" (hi), "=a" (lo)
| : "c" (reg)
| );
| 
| return ((unsigned long)hi << 32) | lo;
| }
| 
| void wrmsr(unsigned long reg, unsigned long val)
| {
| unsigned int lo = val;
| unsigned int hi = val >> 32;
| 
| asm volatile(
| "wrmsr"
| :
| : "d" (hi), "a" (lo), "c" (reg)
| );
|

Re: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-05 Thread Mark Rutland via Gcc
Sorry, I copied the wrong version of the x86_64 assembly as generated by GCC
11.2.0). Updated below.

On Tue, Apr 05, 2022 at 01:51:30PM +0100, Mark Rutland wrote:
> My x86_64 test case is:
> 
> | unsigned long rdmsr(unsigned long reg)
> | {
> | unsigned int lo, hi;
> | 
> | asm volatile(
> | "rdmsr"
> | : "=d" (hi), "=a" (lo)
> | : "c" (reg)
> | );
> | 
> | return ((unsigned long)hi << 32) | lo;
> | }
> | 
> | void wrmsr(unsigned long reg, unsigned long val)
> | {
> | unsigned int lo = val;
> | unsigned int hi = val >> 32;
> | 
> | asm volatile(
> | "wrmsr"
> | :
> | : "d" (hi), "a" (lo), "c" (reg)
> | );
> | }
> | 
> | void msr_rmw_set_bits(unsigned long reg, unsigned long bits)
> | {
> | unsigned long val;
> | 
> | val = rdmsr(reg);
> | val |= bits;
> | wrmsr(reg, val);
> | }
> | 
> | void func_with_msr_side_effects(unsigned long reg)
> | {
> | msr_rmw_set_bits(reg, 1UL << 0);
> | msr_rmw_set_bits(reg, 1UL << 1);
> | msr_rmw_set_bits(reg, 1UL << 2);
> | msr_rmw_set_bits(reg, 1UL << 3);
> | }
> 
> Per compiler explorer (https://godbolt.org/z/cveff9hq5) GCC trunk currently
> compiles this as:
> 
> | msr_rmw_set_bits:
> | mov rcx, rdi
> | rdmsr
> | sal rdx, 32
> | mov eax, eax
> | or  rax, rsi
> | or  rax, rdx
> | mov rdx, rax
> | shr rdx, 32
> | wrmsr
> | ret
> | func_with_msr_side_effects:
> | ret
> 

GCC 11.2 compiles that as:

| rdmsr:
| mov rcx, rdi
| rdmsr
| sal rdx, 32
| mov eax, eax
| or  rax, rdx
| ret
| wrmsr:
| mov rax, rsi
| mov rdx, rsi
| shr rdx, 32
| mov rcx, rdi
| wrmsr
| ret
| msr_rmw_set_bits:
| mov rcx, rdi
| rdmsr
| sal rdx, 32
| mov eax, eax
| or  rax, rsi
| or  rax, rdx
| mov rdx, rax
| shr rdx, 32
| wrmsr
| ret
| func_with_msr_side_effects:
| pushrbx
| mov rbx, rdi
| mov esi, 1
| callmsr_rmw_set_bits
| mov esi, 2
| mov rdi, rbx
| callmsr_rmw_set_bits
| mov esi, 4
| mov rdi, rbx
| callmsr_rmw_set_bits
| mov esi, 8
| mov rdi, rbx
| callmsr_rmw_set_bits
| pop rbx
| ret

Thanks,
Mark.


Re: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-11 Thread Mark Rutland via Gcc
On Tue, Apr 05, 2022 at 04:05:22PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 05, 2022 at 01:51:30PM +0100, Mark Rutland wrote:
> > Hi all,
> > 
> > [adding kernel folk who work on asm stuff]
> > 
> > As a heads-up, GCC 12 (not yet released) appears to erroneously optimize 
> > away
> > calls to functions with volatile asm. Szabolcs has raised an issue on the 
> > GCC
> > bugzilla:  
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
> > 
> > ... which is a P1 release blocker, and is currently being investigated.
> > 
> > Jemery originally reported this as an issue with {readl,writel}_relaxed(), 
> > but
> > the underlying problem doesn't have anything to do with those specifically.
> > 
> > I'm dumping a bunch of info here largely for posterity / archival, and to 
> > find
> > out who (from the kernel side) is willing and able to test proposed compiler
> > fixes, once those are available.
> > 
> > I'm happy to do so for aarch64; Peter, I assume you'd be happy to look at 
> > the
> > x86 side?
> 
> Sure..

FWIW, compiler explorer now have a trunk build with the fix, and my x86-64
example now gets compiled to something which looks correct:

  https://godbolt.org/z/cveff9hq5

Thanks,
Mark.


Re: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-11 Thread Mark Rutland via Gcc
On Tue, Apr 05, 2022 at 01:51:30PM +0100, Mark Rutland wrote:
> Hi all,
> 
> [adding kernel folk who work on asm stuff]
> 
> As a heads-up, GCC 12 (not yet released) appears to erroneously optimize away
> calls to functions with volatile asm. Szabolcs has raised an issue on the GCC
> bugzilla:  
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
> 
> ... which is a P1 release blocker, and is currently being investigated.

Jan Hubicka fixed this in GCC commit:

  aabb9a261ef060cf ("Propagate nondeterministic and side_effects flags in 
modref summary after inlining")

... and all my local tests look good with that applied.

Compiler explorer's trunk build now has that fix, so the examples from before
now look good:

  aarch64: https://godbolt.org/z/vMczqjYvs

  x86_64: https://godbolt.org/z/cveff9hq5

Jeremy, now that the real issue has been identified and fixed, I assume you'll
send a revert for commit:

  8d3ea3d402db94b6 ("net: bcmgenet: Use stronger register read/writes to assure 
ordering")

... ?

Thanks,
Mark.


Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Mark Rutland via Gcc
On Mon, Feb 24, 2025 at 10:46:42AM +0100, Attila Szegedi wrote:
> Hi folks,

Hi,

I've been pointed at this thread due to the reference to my Linux patch
series fixing some KVM FPSIMD/SVE/SME issues.

> I'm looking for a bit of a historic context for a fun GCC behavior we
> stumbled across. For... reasons we build some of our binaries using an
> older version of GCC (8.3.1, yes, we'll be upgrading soon, and no, this
> message is not about helping with an ancient version :-) )
> 
> We noticed that this version of GCC compiling on aarch64 will happily use
> FP registers to temporarily store/load pointers, so there'd be "fmov d9,
> x1" to store a pointer, and then later when it's used as a parameter to a
> function call we'll see "fmov x1, d9" etc. We noticed this while
> investigating some crashes that seemed to always occur in functions called
> with parameters loaded through this mechanism, on certain specific models
> of aarch64 CPUs.

Hmmm... IIUC d9 specifically should be preserved by callees per AAPCS64;
do you see this with specific registers? e.g. v8 to v15?

Are you able to share any more information about the configuration(s)
that you see this with, e.g.

* Which CPU(s)?

  If you're not able to say which CPU(s) specifically, knowing whether
  SVE and/or SME are present would be helpful.

* Which kernel version(s), assuming this is with Linux?

  If virtualization is involved, knowing the guest and host kernel
  versions would be helpful.

Thanks,
Mark.