[Bug target/88469] [7/8 regression] AAPCS/AAPCS64 - Struct with 64-bit bitfield (128-bit on AArch64) may be passed in wrong registers

2019-11-14 Thread stefanrin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88469

Stefan Ring  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Stefan Ring  ---
This was already fixed by the commits in comments #8-#11.

[Bug target/88469] New: Unaligned stack access on arm (in particular armv5)

2018-12-12 Thread stefanrin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88469

Bug ID: 88469
   Summary: Unaligned stack access on arm (in particular armv5)
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stefanrin at gmail dot com
  Target Milestone: ---

The compiler generates unaligned stack accesses for its own code, which causes
it to trap on armv5. The disassembly of the offending function looks like this:

(configure arguments: --build=armv5tel-unknown-linux-gnueabi
--prefix=$HOME/gcc8 --enable-languages=c,c++ --with-arch=armv5te
--with-mode=arm --disable-nls)

00398e64 <_ZN11cgraph_node11create_edgeEPS_P5gcall13profile_count>:
  398e64:   e24dd008sub sp, sp, #8
  398e68:   e92d41f0push{r4, r5, r6, r7, r8, lr}
  398e6c:   e24dd018sub sp, sp, #24
  398e70:   e58d3034str r3, [sp, #52]   ; 0x34
  398e74:   e1cd63d4ldrdr6, [sp, #52]   ; 0x34
  398e78:   e28d3018add r3, sp, #24
  398e7c:   e1a08000mov r8, r0
  398e80:   e1a05001mov r5, r1
  398e84:   e59fe068ldr lr, [pc, #104]  ; 398ef4
<_ZN11cgraph_node11create_edgeEPS_P5gcall13profile_count+0x90>
  398e88:   e1cd61f0strdr6, [sp, #16]
  398e8c:   e9130003ldmdb   r3, {r0, r1}
  398e90:   e3a0c000mov ip, #0
  398e94:   e1a03002mov r3, r2
  398e98:   e88d0003stm sp, {r0, r1}
  398e9c:   e1a02005mov r2, r5
  398ea0:   e59eldr r0, [lr]
  398ea4:   e1a01008mov r1, r8
  398ea8:   e58dc008str ip, [sp, #8]
  398eac:   eb3cbl  398ba4
<_ZN12symbol_table11create_edgeEP11cgraph_nodeS1_P5gcall13profile_countb>
  398eb0:   e1a04000mov r4, r0
  398eb4:   eb0709d8bl  55b61c
<_Z24initialize_inline_failedP11cgraph_edge>
  398eb8:   e5953044ldr r3, [r5, #68]   ; 0x44
  398ebc:   e5843014str r3, [r4, #20]
  398ec0:   e353cmp r3, #0
  398ec4:   15834010strne   r4, [r3, #16]
  398ec8:   e5983040ldr r3, [r8, #64]   ; 0x40
  398ecc:   e1a4mov r0, r4
  398ed0:   e353cmp r3, #0
  398ed4:   e584301cstr r3, [r4, #28]
  398ed8:   15834018strne   r4, [r3, #24]
  398edc:   e5884040str r4, [r8, #64]   ; 0x40
  398ee0:   e5854044str r4, [r5, #68]   ; 0x44
  398ee4:   e28dd018add sp, sp, #24
  398ee8:   e8bd41f0pop {r4, r5, r6, r7, r8, lr}
  398eec:   e28dd008add sp, sp, #8
  398ef0:   e12fff1ebx  lr
  398ef4:   012f41a0teqeq   pc, r0, lsr #3

The ldrd at 398e74 is the problem. To be honest, I don't fully understand
understand this code. profile_count seems to be a struct with a 64 bit value as
its first element. From my understanding of AAPCS, this should not be stored in
r3, because it is not an even register number. But be that as it may, this
seems to store the first part of the 64 bit counter into the stack so that it
can then be loaded into r6/r7 together with its upper part. This can never be
properly aligned.

For comparison, the same function in an armv7 hardfloat build looks like this:

(configure arguments: --build=arm-linux-gnueabihf --prefix=$HOME/gcc8
--enable-languages=c,c++ --with-arch=armv7-a --with-fpu=vfpv3-d16
--with-mode=arm --with-float=hard --disable-nls --enable-multilib)

003c21e0 <_ZN11cgraph_node11create_edgeEPS_P5gcall13profile_count>:
  3c21e0:   e24dd008sub sp, sp, #8
  3c21e4:   e309c130movwip, #37168  ; 0x9130
  3c21e8:   e340c133movtip, #307; 0x133
  3c21ec:   e92d4370push{r4, r5, r6, r8, r9, lr}
  3c21f0:   e24dd018sub sp, sp, #24
  3c21f4:   e1a05001mov r5, r1
  3c21f8:   e1a06000mov r6, r0
  3c21fc:   e58d3034str r3, [sp, #52]   ; 0x34
  3c2200:   e1a03002mov r3, r2
  3c2204:   e1cd83d4ldrdr8, [sp, #52]   ; 0x34
  3c2208:   e1a02001mov r2, r1
  3c220c:   e28d1018add r1, sp, #24
  3c2210:   e3a0e000mov lr, #0
  3c2214:   e1cd81f0strdr8, [sp, #16]
  3c2218:   e9110003ldmdb   r1, {r0, r1}
  3c221c:   e58de008str lr, [sp, #8]
  3c2220:   e88d0003stm sp, {r0, r1}
  3c2224:   e1a01006mov r1, r6
  3c2228:   e59cldr r0, [ip]
  3c222c:   eb43bl  3c1f40
<_ZN12symbol_table11create_edgeEP11cgraph_nodeS1_P5gcall13profile_countb>
  3c2

[Bug target/88469] Unaligned stack access on arm (in particular armv5)

2018-12-12 Thread stefanrin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88469

--- Comment #2 from Stefan Ring  ---
Created attachment 45222
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45222&action=edit
Preprocessed sample

g++ -c -O2 -x c++ prep

produces the shown code.

$ g++ -v
Using built-in specs.
COLLECT_GCC=/home/sr/gcc8/bin/g++
COLLECT_LTO_WRAPPER=/home/sr/gcc8/libexec/gcc/armv5tel-unknown-linux-gnueabi/8.2.0/lto-wrapper
Target: armv5tel-unknown-linux-gnueabi
Configured with: ../gcc-8.2.0/configure --build=armv5tel-unknown-linux-gnueabi
--prefix=/home/sr/gcc8 --enable-languages=c,c++ --with-arch=armv5te
--with-mode=arm --disable-nls
Thread model: posix
gcc version 8.2.0 (GCC)

[Bug target/88469] [7/8 regression] AAPCS/AAPCS64 - Struct with 64-bit bitfield (128-bit on AArch64) may be passed in wrong registers

2019-02-24 Thread stefanrin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88469

--- Comment #12 from Stefan Ring  ---
Unfortunately my armv5 device has died in the meantime, so I cannot verify my
original use case. The behavior is indeed different on armv7. It does not trap,
even for the original misaligned code. And contrary to x86, where the alignment
check flag can be changed by user space, this is a privileged operation on arm,
so I cannot even selectively enable it.

[Bug target/88469] [7/8 regression] AAPCS/AAPCS64 - Struct with 64-bit bitfield (128-bit on AArch64) may be passed in wrong registers

2019-02-26 Thread stefanrin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88469

--- Comment #14 from Stefan Ring  ---
(In reply to Richard Earnshaw from comment #13)
> Note that if you have root access on your board you can modify the kernel's
> behaviour for various unaligned accesses by changing /proc/cpu/alignment
> (see Documentation/arm/mem_alignment in the kernel sources).  You might want
> to try setting this to 3 to get the kernel to report (but fix up) any
> misaligned accesses).

I know that, but armv7 does not trap at all for misaligned ldrd, at the
hardware level. It does trap if it’s not even 32bit-aligned, but that’s a
different matter.

[Bug target/99531] New: Performance regression since gcc 9 (argument passing / register allocation)

2021-03-10 Thread stefanrin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99531

Bug ID: 99531
   Summary: Performance regression since gcc 9 (argument passing /
register allocation)
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stefanrin at gmail dot com
  Target Milestone: ---
Target: x86_64

For this source:

int func(int, int, int, int, int, int);
int caller(int a, int b, int c, int d, int e) { return func(0, a, b, c, d, e);
}

the code generated for caller is:

pushq %r12
movl %r8d, %r9d
popq %r12
movl %ecx, %r8d
movl %edx, %ecx
movl %esi, %edx
movl %edi, %esi
xorl %edi, %edi
jmp func

gcc 9 started producing the useless push/pop pair.

Mailing list link:
https://gcc.gnu.org/pipermail/gcc-help/2021-February/139885.html