[Bug tree-optimization/110589] New: Missed optimization with call-clobbered restrict qualified references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110589 Bug ID: 110589 Summary: Missed optimization with call-clobbered restrict qualified references Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: javier.martinez.bugzilla at gmail dot com Target Milestone: --- I see this on GCC trunk and do not think it is a regression, the optimization is being taken by Clang. https://godbolt.org/z/G8qrzaKn9 extern void foo(void); int test_clobber_by_call (int * restrict val_ptr) { *val_ptr = 1; foo(); return *val_ptr; } GCC 14.0 -O3 produces: test_clobber_by_call: pushrbx mov rbx, rdi mov BYTE PTR [rdi], 1 callfoo movzx eax, BYTE PTR [rbx] ; <--- Not expected pop rbx ret I would expect restrict to be a guarantee that foo() will not alias val_ptr, producing: test_clobber_by_call: mov DWORD PTR [rdi], 1 callfoo mov eax, 1 ret This is indeed the output when the compiler recognizes that foo does not alias, as in: __attribute((noinline)) void foo(int *a) { *(a+10) = 0; } int test_clobber_by_call (int * val_ptr) { *val_ptr = 1; foo(val_ptr); return *val_ptr; } - It looks to me as if tree-ssa-alias.c#call_may_clobber_ref_p is not considering a restrict qualified reference. The following patch produces optimized code for the above example. I do not claim that it is correct, but it does reflect what I expected to see: diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index c3f43dc..277a21e 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -3037,6 +3037,16 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) && SSA_NAME_POINTS_TO_READONLY_MEMORY (TREE_OPERAND (base, 0))) return false; + /* perhaps should be moved further up */ + if ((TREE_CODE (base) == MEM_REF + || TREE_CODE (base) == TARGET_MEM_REF) + && TREE_CODE(TREE_OPERAND (base, 0)) == SSA_NAME) +{ + struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)); + if (pi && pi->pt.vars_contains_restrict) +return false; +} + if (int res = check_fnspec (call, ref, true)) { if (res == 1) test_clobber_by_call: sub rsp, 8 mov BYTE PTR [rdi], 1 callmay_alias mov eax, 1 ; <- deref gone add rsp, 8 ret
[Bug rtl-optimization/71923] return instruction emitted twice with branch target inbetween
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71923 Javier Martinez changed: What|Removed |Added CC||javier.martinez.bugzilla@gm ||ail.com --- Comment #2 from Javier Martinez --- Also reproducible with: extern void s1(void); extern void s2(void); void foo(int i) { switch (i) { case 1: return s1(); case 2: return s1(); case 3: return s2(); } } On Trunk and with -O2 or higher: foo(int): cmp edi, 2 jg .L2 test edi, edi jle .L7 jmp s1 #tailcall .LVL1: .p2align 4,,10 .p2align 3 .L2: cmp edi, 3 jne .L8 jmp s2 #tailcall .LVL2: .p2align 4,,10 .p2align 3 .L7: ret# <--- ret .p2align 4,,10 .p2align 3 .L8: ret# <--- ret
[Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724 Bug ID: 110724 Summary: Unnecessary alignment on branch to unconditional branch targets Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: javier.martinez.bugzilla at gmail dot com Target Milestone: --- https://godbolt.org/z/f7qMxxfMj void duff(int * __restrict to, const int * __restrict from, const int count) { int n = (count+7) / 8; switch(count%8) { case 0: do { *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; [[likely]] case 1: *to++ = *from++; } while (--n>0); } } Trunk with O3: jle .L1 [...] lea rax, [rax+4] jmp .L5# <-- no fall-through to ret .p2align 4,,7 # <-- unnecessary alignment .p2align 3 .L1: ret I believe this 16-byte alignment is done to put the branch target at the beginning of a front-end instruction fetch block. That however seems unnecessary when the branch target is itself an unconditional branch, as the instructions to follow will not retire. In this example the degrade is code size / instruction caching only, as there is no possible fall-through to .L1 that would cause nop's to be consumed. Changing the C++ attribute to [[unlikely]] introduces fall-through, and GCC seems to remove the padding, which is great.
[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724 --- Comment #3 from Javier Martinez --- The generic tuning of 16:11:8 looks reasonable to me, I do not argue against it. From Anger Fog’s Optimizing subroutines in assembly language: > Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. > If an important subroutine entry or jump label happens to be near the > end of a 16-byte block then the microprocessor will only get a few > useful bytes of code when fetching that block of code. It may have > to fetch the next 16 bytes too before it can decode the first instructions > after the label. This can be avoided by aligning important subroutine > entries and loop entries by 16. Aligning by 8 will assure that at least 8 > bytes of code can be loaded with the first instruction fetch, which may > be sufficient if the instructions are small. This looks like the reason behind the alignment. That section of the book goes on to explain the inconvenience (execution of nops) of alignment on labels reachable by other means than branching - which I presume lead to the :m and :m2 tuning values, the distinction between -falign-labels and -falign-jumps, and the reason padding is removed when my label is reachable by fall-through with [[unlikely]]. All this is fine. My thesis is that this alignment strategy is always unnecessary in one specific circumstance - when the branch target is itself an unconditional branch of size 1, as in: .L1: ret Because the ret instruction will never cross a block boundary, and the instructions following the ret must not execute, so there is no front-end stall to avoid.
[Bug target/110724] Unnecessary alignment on branch to unconditional branch targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724 --- Comment #7 from Javier Martinez --- Another case where it might be interesting to remove padding (or reduce the :m threshold) is when the path is known to be cold. I can see Trunk padding labels inside [clone .cold], and with attribute((cold)) && __builtin_expect hints.