[Bug tree-optimization/110589] New: Missed optimization with call-clobbered restrict qualified references

2023-07-07 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110589

Bug ID: 110589
   Summary: Missed optimization with call-clobbered restrict
qualified references
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: javier.martinez.bugzilla at gmail dot com
  Target Milestone: ---

I see this on GCC trunk and do not think it is a regression, the optimization
is being taken by Clang.
https://godbolt.org/z/G8qrzaKn9


extern void foo(void);

int test_clobber_by_call (int * restrict val_ptr) {
*val_ptr = 1;

foo();

return *val_ptr;
}

GCC 14.0 -O3 produces:
test_clobber_by_call:
pushrbx
mov rbx, rdi
mov BYTE PTR [rdi], 1
callfoo
movzx   eax, BYTE PTR [rbx] ; <--- Not expected
pop rbx
ret


I would expect restrict to be a guarantee that foo() will not alias val_ptr,
producing:
test_clobber_by_call:
mov DWORD PTR [rdi], 1
callfoo
mov eax, 1
ret

This is indeed the output when the compiler recognizes that foo does not alias,
as in:
__attribute((noinline)) void foo(int *a) {
*(a+10) = 0;
}


int test_clobber_by_call (int * val_ptr) {
*val_ptr = 1;

foo(val_ptr);

return *val_ptr;
}


-

It looks to me as if tree-ssa-alias.c#call_may_clobber_ref_p is not considering
a restrict qualified reference. 

The following patch produces optimized code for the above example. I do not
claim that it is correct, but it does reflect what I expected to see:


diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index c3f43dc..277a21e 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -3037,6 +3037,16 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool
tbaa_p)
   && SSA_NAME_POINTS_TO_READONLY_MEMORY (TREE_OPERAND (base, 0)))
 return false;

+  /* perhaps should be moved further up */
+  if ((TREE_CODE (base) == MEM_REF
+   || TREE_CODE (base) == TARGET_MEM_REF)
+   && TREE_CODE(TREE_OPERAND (base, 0)) == SSA_NAME)
+{
+  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0));
+  if (pi && pi->pt.vars_contains_restrict)
+return false;
+}
+
   if (int res = check_fnspec (call, ref, true))
 {
   if (res == 1)


test_clobber_by_call:
sub rsp, 8
mov BYTE PTR [rdi], 1
callmay_alias
mov eax, 1   ; <- deref gone
add rsp, 8
ret

[Bug rtl-optimization/71923] return instruction emitted twice with branch target inbetween

2023-07-18 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71923

Javier Martinez  changed:

   What|Removed |Added

 CC||javier.martinez.bugzilla@gm
   ||ail.com

--- Comment #2 from Javier Martinez  
---
Also reproducible with:

extern void s1(void);
extern void s2(void);

void foo(int i) {
switch (i) {
case 1:  return s1(); 
case 2:  return s1();
case 3:  return s2();
}
}


On Trunk and with -O2 or higher:

foo(int):
  cmp edi, 2
  jg .L2
  test edi, edi
  jle .L7
  jmp s1 #tailcall
.LVL1:
  .p2align 4,,10
  .p2align 3
.L2:
  cmp edi, 3
  jne .L8
  jmp s2 #tailcall
.LVL2:
  .p2align 4,,10
  .p2align 3
.L7:
  ret# <--- ret
  .p2align 4,,10
  .p2align 3
.L8:
  ret# <--- ret

[Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Bug ID: 110724
   Summary: Unnecessary alignment on branch to unconditional
branch targets
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: javier.martinez.bugzilla at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/f7qMxxfMj

void duff(int * __restrict to, const int * __restrict from, const int count)
{
int n = (count+7) / 8;
switch(count%8)
{
   case 0: do { *to++ = *from++;
   case 7:  *to++ = *from++;
   case 6:  *to++ = *from++;
   case 5:  *to++ = *from++;
   case 4:  *to++ = *from++;
   case 3:  *to++ = *from++;
   case 2:  *to++ = *from++;
   [[likely]] case 1:  *to++ = *from++;
} while (--n>0);
}
}

Trunk with O3:
jle .L1
[...]
lea rax, [rax+4]
jmp .L5# <-- no fall-through to ret
.p2align 4,,7  # <-- unnecessary alignment
.p2align 3
.L1:
ret


I believe this 16-byte alignment is done to put the branch target at the
beginning of a front-end instruction fetch block. That however seems
unnecessary when the branch target is itself an unconditional branch, as the
instructions to follow will not retire.

In this example the degrade is code size / instruction caching only, as there
is no possible fall-through to .L1 that would cause nop's to be consumed.
Changing the C++ attribute to [[unlikely]] introduces fall-through, and GCC
seems to remove the padding, which is great.

[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #3 from Javier Martinez  
---
The generic tuning of 16:11:8 looks reasonable to me, I do not argue against
it.



From Anger Fog’s Optimizing subroutines in assembly language:

> Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
> If an important subroutine entry or jump label happens to be near the
> end of a 16-byte block then the microprocessor will only get a few 
> useful bytes of code when fetching that block of code. It may have
> to fetch the next 16 bytes too before it can decode the first instructions
> after the label. This can be avoided by aligning important subroutine
> entries and loop entries by 16. Aligning by 8 will assure that at least 8
> bytes of code can be loaded with the first instruction fetch, which may
> be sufficient if the instructions are small.



This looks like the reason behind the alignment. That section of the book
goes on to explain the inconvenience (execution of nops) of alignment on labels
reachable by other means than branching - which I presume lead to the :m and
:m2 tuning values, the distinction between -falign-labels and -falign-jumps,
and the reason padding is removed when my label is reachable by fall-through
with [[unlikely]].



All this is fine. 

My thesis is that this alignment strategy is always unnecessary in one specific
circumstance - when the branch target is itself an unconditional branch of size
1, as in:



.L1:

  ret 



Because the ret instruction will never cross a block boundary, and the
instructions following the ret must not execute, so there is no front-end stall
to avoid.

[Bug target/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-19 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #7 from Javier Martinez  
---
Another case where it might be interesting to remove padding (or reduce the :m
threshold) is when the path is known to be cold. I can see Trunk padding labels
inside [clone .cold], and with attribute((cold)) &&  __builtin_expect hints.