[Bug bootstrap/98338] [10/11 Regression] profiledbootstrap failure on x86_64-linux

2021-03-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98338

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #25 from Jan Hubicka  ---
Fixed. Sorry for the delay - next time I should not commit into a private
branch :(

[Bug tree-optimization/99101] optimization bug with -ffinite-loops

2021-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99101

--- Comment #24 from Jan Hubicka  ---
I do not think there is problem with pdom for cyclic WRT acyclic paths. Both
notions are equivalent here. 

If you have instruction I in, say, header of a loop and you determine it live
then the condition controlling loopback is in control dependent blocks and that
will bring it live which transitively brings everything else.

The thing is that post dominance assumes that every path must progress to exit
node (as promised by -ffinite-loops)

volatile int xx;
int main()
{
  int jobs_ = 1;
  int at_eof_ = 0;
  while (1)
{
  for (int i = 0; i < jobs_; i++)
{
  if (at_eof_)
continue;
  at_eof_ = 1;
  __builtin_printf ("1\n");
  if (xx)
return 1;
}
  jobs_ = 0;
}
  return 0;
}

has infinite loop that is sort of equivalent to

volatile int xx;
int main()
{
  int jobs_ = 1;
  int at_eof_ = 0;
  while (1)
{
  if (at_eof_)
continue;
  at_eof_ = 1;
  __builtin_printf ("1\n");
  if (xx)
return 1;
  jobs_ = 0;
  while (jobs_ == 0);
}
  return 0;
}
and we manage to "shortcut" "while (jobs_ == 0);" rather than forcing the
original lop to be finite. Since the difference is not visible across any path
that must progress to exit node, both are valid in this sense.

With -fno-finite-loops pdoms still do not consider infinite paths, but since we
make sure that every BB has a path to exit every infinite path can be
approximated by sequence of finite paths. Since we keep all the finite paths
consitent, the only problem may be that we will optimize out the condtiion
deciding on back edge but we don't do that becuase we mark them necessary...

[Bug middle-end/99394] New: s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

Bug ID: 99394
   Summary: s254 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Clang is vectorizing s254 loop with -mtune=archive on znver2 leading to about
758% speedup. Loop is:

real_t s254(struct args_t * func_args)
{

//scalar and array expansion
//carry around variable

initialise_arrays(__func__);
gettimeofday(&func_args->t1, NULL);

real_t x;
for (int nl = 0; nl < 4*iterations; nl++) {
x = b[LEN_1D-1];
for (int i = 0; i < LEN_1D; i++) {
a[i] = (b[i] + x) * (real_t).5;
x = b[i];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

gettimeofday(&func_args->t2, NULL);
return calc_checksum(__func__);
}

and clang produces:
00407d30 :
  407d30:   41 56   push   %r14
  407d32:   53  push   %rbx
  407d33:   48 83 ec 28 sub$0x28,%rsp
  407d37:   49 89 femov%rdi,%r14
  407d3a:   bf 6b e2 42 00  mov$0x42e26b,%edi
  407d3f:   e8 cc f8 00 00  call   417610 
  407d44:   31 db   xor%ebx,%ebx
  407d46:   4c 89 f7mov%r14,%rdi
  407d49:   31 f6   xor%esi,%esi
  407d4b:   e8 10 93 ff ff  call   401060 
  407d50:   c4 62 7d 18 05 af 62vbroadcastss 0x262af(%rip),%ymm8   
# 42e008 <_IO_stdin_used+0x8>
  407d57:   02 00 
  407d59:   c5 7c 11 04 24  vmovups %ymm8,(%rsp)
  407d5e:   66 90   xchg   %ax,%ax
  407d60:   48 c7 c0 00 0c fe ffmov$0xfffe0c00,%rax
  407d67:   c4 e2 7d 18 05 8c a7vbroadcastss 0x4a78c(%rip),%ymm0   
# 4524fc 
  407d6e:   04 00 
  407d70:   c5 fc 28 88 00 25 45vmovaps 0x452500(%rax),%ymm1
  407d77:   00 
  407d78:   c5 fc 28 90 20 25 45vmovaps 0x452520(%rax),%ymm2
  407d7f:   00 
  407d80:   c5 fc 28 98 40 25 45vmovaps 0x452540(%rax),%ymm3
  407d87:   00 
  407d88:   c4 e3 7d 06 c1 21   vperm2f128 $0x21,%ymm1,%ymm0,%ymm0
  407d8e:   c5 fc 28 a0 60 25 45vmovaps 0x452560(%rax),%ymm4
  407d95:   00 
  407d96:   c5 fc c6 c1 03  vshufps $0x3,%ymm1,%ymm0,%ymm0
  407d9b:   c5 fc c6 c1 98  vshufps $0x98,%ymm1,%ymm0,%ymm0
  407da0:   c4 e3 75 06 ea 21   vperm2f128 $0x21,%ymm2,%ymm1,%ymm5
  407da6:   c5 d4 c6 ea 03  vshufps $0x3,%ymm2,%ymm5,%ymm5
  407dab:   c5 d4 c6 ea 98  vshufps $0x98,%ymm2,%ymm5,%ymm5
  407db0:   c4 e3 6d 06 f3 21   vperm2f128 $0x21,%ymm3,%ymm2,%ymm6
  407db6:   c5 cc c6 f3 03  vshufps $0x3,%ymm3,%ymm6,%ymm6
  407dbb:   c5 cc c6 f3 98  vshufps $0x98,%ymm3,%ymm6,%ymm6
  407dc0:   c4 e3 65 06 fc 21   vperm2f128 $0x21,%ymm4,%ymm3,%ymm7
  407dc6:   c5 c4 c6 fc 03  vshufps $0x3,%ymm4,%ymm7,%ymm7
  407dcb:   c5 c4 c6 fc 98  vshufps $0x98,%ymm4,%ymm7,%ymm7
  407dd0:   c5 f4 58 c0 vaddps %ymm0,%ymm1,%ymm0
  407dd4:   c5 ec 58 cd vaddps %ymm5,%ymm2,%ymm1
  407dd8:   c5 e4 58 d6 vaddps %ymm6,%ymm3,%ymm2
  407ddc:   c5 dc 58 df vaddps %ymm7,%ymm4,%ymm3
  407de0:   c5 bc 59 c0 vmulps %ymm0,%ymm8,%ymm0
  407de4:   c5 bc 59 c9 vmulps %ymm1,%ymm8,%ymm1
  407de8:   c5 bc 59 d2 vmulps %ymm2,%ymm8,%ymm2
  407dec:   c5 bc 59 db vmulps %ymm3,%ymm8,%ymm3
  407df0:   c5 fc 29 80 00 19 47vmovaps %ymm0,0x471900(%rax)
  407df7:   00 
  407df8:   c5 fc 29 88 20 19 47vmovaps %ymm1,0x471920(%rax)
  407dff:   00 
  407e00:   c5 fc 29 90 40 19 47vmovaps %ymm2,0x471940(%rax)
  407e07:   00 
  407e08:   c5 fc 29 98 60 19 47vmovaps %ymm3,0x471960(%rax)
  407e0f:   00 
  407e10:   c5 fc 28 c4 vmovaps %ymm4,%ymm0
  407e14:   48 83 e8 80 sub$0xff80,%rax
  407e18:   0f 85 52 ff ff ff   jne407d70 
  407e1e:   bf 00 25 45 00  mov$0x452500,%edi
  407e23:   be 00 31 43 00  mov$0x433100,%esi
  407e28:   ba 00 19 47 00  mov$0x471900,%edx
  407e2d:   b9 00 0d 49 00  mov$0x490d00,%ecx
  407e32:   41 b8 00 01 4b 00   mov$0x4b0100,%r8d
  407e38:   41 b9 00 f5 4c 00   mov$0x4cf500,%r9d
  407e3e:   c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
  407e42:   68 00 f5 54 00  push   $0x54f500
  407e47:   68 00 f5 50 00  push   $0x50f500
  407e4c:   c5 f8 77v

[Bug middle-end/99395] New: s116 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

Bug ID: 99395
   Summary: s116 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

s116 loop is:

real_t s116(struct args_t * func_args)
{

//linear dependence testing

initialise_arrays(__func__);
gettimeofday(&func_args->t1, NULL);

for (int nl = 0; nl < iterations*10; nl++) {
for (int i = 0; i < LEN_1D - 5; i += 5) {
a[i] = a[i + 1] * a[i];
a[i + 1] = a[i + 2] * a[i + 1];
a[i + 2] = a[i + 3] * a[i + 2];
a[i + 3] = a[i + 4] * a[i + 3];
a[i + 4] = a[i + 5] * a[i + 4];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

gettimeofday(&func_args->t2, NULL);
return calc_checksum(__func__);
}

and vectorized code produced by clang11 is about 2 times faster on zen3 machine

00401d00 :
  401d00:   41 56   push   %r14
  401d02:   53  push   %rbx
  401d03:   50  push   %rax
  401d04:   49 89 femov%rdi,%r14
  401d07:   bf 66 e1 42 00  mov$0x42e166,%edi
  401d0c:   e8 ff 58 01 00  call   417610 
  401d11:   31 db   xor%ebx,%ebx
  401d13:   4c 89 f7mov%r14,%rdi
  401d16:   31 f6   xor%esi,%esi
  401d18:   e8 43 f3 ff ff  call   401060 
  401d1d:   eb 47   jmp401d66 
  401d1f:   90  nop
  401d20:   bf 00 25 45 00  mov$0x452500,%edi
  401d25:   be 00 31 43 00  mov$0x433100,%esi
  401d2a:   ba 00 19 47 00  mov$0x471900,%edx
  401d2f:   b9 00 0d 49 00  mov$0x490d00,%ecx
  401d34:   41 b8 00 01 4b 00   mov$0x4b0100,%r8d
  401d3a:   41 b9 00 f5 4c 00   mov$0x4cf500,%r9d
  401d40:   c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
  401d44:   68 00 f5 54 00  push   $0x54f500
  401d49:   68 00 f5 50 00  push   $0x50f500
  401d4e:   e8 6d 3c 01 00  call   4159c0 
  401d53:   48 83 c4 10 add$0x10,%rsp
  401d57:   83 c3 01add$0x1,%ebx
  401d5a:   81 fb 40 42 0f 00   cmp$0xf4240,%ebx
  401d60:   0f 84 9a 00 00 00   je 401e00 
  401d66:   c5 fa 10 05 92 07 05vmovss 0x50792(%rip),%xmm0#
452500 
  401d6d:   00 
  401d6e:   31 c0   xor%eax,%eax
  401d70:   c5 fa 10 0c 85 04 25vmovss 0x452504(,%rax,4),%xmm1
  401d77:   45 00 
  401d79:   c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
  401d7d:   c5 fa 11 04 85 00 25vmovss %xmm0,0x452500(,%rax,4)
  401d84:   45 00 
  401d86:   c5 f8 10 04 85 08 25vmovups 0x452508(,%rax,4),%xmm0
  401d8d:   45 00 
  401d8f:   c5 f0 c6 c8 00  vshufps $0x0,%xmm0,%xmm1,%xmm1
  401d94:   c5 f0 c6 c8 98  vshufps $0x98,%xmm0,%xmm1,%xmm1
  401d99:   c5 f8 59 c9 vmulps %xmm1,%xmm0,%xmm1
  401d9d:   c5 f8 11 0c 85 04 25vmovups %xmm1,0x452504(,%rax,4)
  401da4:   45 00 
  401da6:   48 3d f5 7c 00 00   cmp$0x7cf5,%rax
  401dac:   0f 87 6e ff ff ff   ja 401d20 
  401db2:   c4 e3 79 04 c0 e7   vpermilps $0xe7,%xmm0,%xmm0
  401db8:   c5 fa 10 0c 85 18 25vmovss 0x452518(,%rax,4),%xmm1
  401dbf:   45 00 
  401dc1:   c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0
  401dc5:   c5 fa 11 04 85 14 25vmovss %xmm0,0x452514(,%rax,4)
  401dcc:   45 00 
  401dce:   c5 f8 10 04 85 1c 25vmovups 0x45251c(,%rax,4),%xmm0
  401dd5:   45 00 
  401dd7:   c5 f0 c6 c8 00  vshufps $0x0,%xmm0,%xmm1,%xmm1
  401ddc:   c5 f0 c6 c8 98  vshufps $0x98,%xmm0,%xmm1,%xmm1
  401de1:   c5 f8 59 c9 vmulps %xmm1,%xmm0,%xmm1
  401de5:   c5 fa 10 04 85 28 25vmovss 0x452528(,%rax,4),%xmm0
  401dec:   45 00 
  401dee:   c5 f8 11 0c 85 18 25vmovups %xmm1,0x452518(,%rax,4)
  401df5:   45 00 
  401df7:   48 83 c0 0a add$0xa,%rax
  401dfb:   e9 70 ff ff ff  jmp401d70 
  401e00:   49 83 c6 10 add$0x10,%r14
  401e04:   4c 89 f7mov%r14,%rdi
  401e07:   31 f6   xor%esi,%esi
  401e09:   e8 52 f2 ff ff  call   401060 
  401e0e:   bf 66 e1 42 00  mov$0x42e166,%edi
  401e13:   48 83 c4 08 add$0x8,%rsp
  401e17:   5b  pop%rbx
  401e18:   41 5e   pop%r14
  401e1a:   e9 e1 51 

[Bug middle-end/99397] New: s152 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99397

Bug ID: 99397
   Summary: s152 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

s152 is:
void s152s(real_t a[LEN_1D], real_t b[LEN_1D], real_t c[LEN_1D], int i)
{
a[i] += b[i] * c[i];
}

real_t s152(struct args_t * func_args)
{

//interprocedural data flow analysis
//collecting information from a subroutine

initialise_arrays(__func__);
gettimeofday(&func_args->t1, NULL);

for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D; i++) {
b[i] = d[i] * e[i];
s152s(a, b, c, i);
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

gettimeofday(&func_args->t2, NULL);
return calc_checksum(__func__);
}


and clang11 vectorizes it as:
004048b0 :
  4048b0:   41 56   push   %r14
  4048b2:   53  push   %rbx
  4048b3:   50  push   %rax
  4048b4:   49 89 femov%rdi,%r14
  4048b7:   bf b7 e1 42 00  mov$0x42e1b7,%edi
  4048bc:   e8 4f 2d 01 00  call   417610 
  4048c1:   31 db   xor%ebx,%ebx
  4048c3:   4c 89 f7mov%r14,%rdi
  4048c6:   31 f6   xor%esi,%esi
  4048c8:   e8 93 c7 ff ff  call   401060 
  4048cd:   0f 1f 00nopl   (%rax)
  4048d0:   31 c0   xor%eax,%eax
  4048d2:   66 2e 0f 1f 84 00 00cs nopw 0x0(%rax,%rax,1)
  4048d9:   00 00 00
  4048dc:   0f 1f 40 00 nopl   0x0(%rax)
  4048e0:   c5 fc 28 80 00 01 4bvmovaps 0x4b0100(%rax),%ymm0
  4048e7:   00
  4048e8:   c5 fc 28 88 20 01 4bvmovaps 0x4b0120(%rax),%ymm1
  4048ef:   00
  4048f0:   c5 fc 59 80 00 0d 49vmulps 0x490d00(%rax),%ymm0,%ymm0
  4048f7:   00
  4048f8:   c5 f4 59 88 20 0d 49vmulps 0x490d20(%rax),%ymm1,%ymm1
  4048ff:   00
  404900:   c5 fc 29 80 00 31 43vmovaps %ymm0,0x433100(%rax)
  404907:   00
  404908:   c5 fc 29 88 20 31 43vmovaps %ymm1,0x433120(%rax)
  40490f:   00
  404910:   c5 fc 28 90 00 19 47vmovaps 0x471900(%rax),%ymm2
  404917:   00
  404918:   c5 fc 28 98 20 19 47vmovaps 0x471920(%rax),%ymm3
  40491f:   00
  404920:   c4 e2 7d a8 90 00 25vfmadd213ps 0x452500(%rax),%ymm0,%ymm2
  404927:   45 00
  404929:   c4 e2 75 a8 98 20 25vfmadd213ps 0x452520(%rax),%ymm1,%ymm3
  404930:   45 00
  404932:   c5 fc 29 90 00 25 45vmovaps %ymm2,0x452500(%rax)
  404939:   00
  40493a:   c5 fc 29 98 20 25 45vmovaps %ymm3,0x452520(%rax)
  404941:   00
  404942:   48 83 c0 40 add$0x40,%rax
  404946:   48 3d 00 f4 01 00   cmp$0x1f400,%rax
  40494c:   75 92   jne4048e0 
  40494e:   bf 00 25 45 00  mov$0x452500,%edi
  404953:   be 00 31 43 00  mov$0x433100,%esi
  404958:   ba 00 19 47 00  mov$0x471900,%edx
  40495d:   b9 00 0d 49 00  mov$0x490d00,%ecx
  404962:   41 b8 00 01 4b 00   mov$0x4b0100,%r8d
  404968:   41 b9 00 f5 4c 00   mov$0x4cf500,%r9d
  40496e:   c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
  404972:   68 00 f5 54 00  push   $0x54f500
  404977:   68 00 f5 50 00  push   $0x50f500
  40497c:   c5 f8 77vzeroupper 
  40497f:   e8 3c 10 01 00  call   4159c0 
  404984:   48 83 c4 10 add$0x10,%rsp
  404988:   83 c3 01add$0x1,%ebx
  40498b:   81 fb a0 86 01 00   cmp$0x186a0,%ebx
  404991:   0f 85 39 ff ff ff   jne4048d0 
  404997:   49 83 c6 10 add$0x10,%r14
  40499b:   4c 89 f7mov%r14,%rdi
  40499e:   31 f6   xor%esi,%esi
  4049a0:   e8 bb c6 ff ff  call   401060 
  4049a5:   bf b7 e1 42 00  mov$0x42e1b7,%edi
  4049aa:   48 83 c4 08 add$0x8,%rsp
  4049ae:   5b  pop%rbx
  4049af:   41 5e   pop%r14
  4049b1:   e9 4a 26 02 00  jmp427000 
  4049b6:   66 2e 0f 1f 84 00 00cs nopw 0x0(%rax,%rax,1)
  4049bd:   00 00 00 


We get:
real_t s152 (struct args_t * func_args)
{
  int i;
  int nl;
  static const char __func__[5] = "s152";
  struct timeval * _1;
  float _2;
  float _3;
  float _4;
  struct timeval * _5;
  real_t _16;
  long unsigned int _21;
  long unsigned int _22;
  real_t * _23;
  float _24;
  real_t * _25;
  float

[Bug middle-end/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

--- Comment #1 from Jan Hubicka  ---
Loop is:

real_t s116 (struct args_t * func_args)
{
  int i;
  int nl;
  static const char __func__[5] = "s116";
  struct timeval * _1;
  int _2;
  float _3;
  float _4;
  float _5;
  int _6;
  float _7;
  float _8;
  float _9;
  int _10;
  float _11;
  float _12;
  float _13;
  int _14;
  float _15;
  float _16;
  float _17;
  int _18;
  float _19;
  float _20;
  float _21;
  struct timeval * _22;
  real_t _33;
  unsigned int ivtmp_43;
  unsigned int ivtmp_44;
  unsigned int ivtmp_45;
  unsigned int ivtmp_46;

   [local count: 108459]:
  initialise_arrays (&__func__);
  _1 = &func_args_29(D)->t1;
  gettimeofday (_1, 0B);
  goto ; [100.00%]

   [local count: 1052266996]:

   [local count: 1063004409]:
  # i_48 = PHI <_18(8), 0(5)>
  # ivtmp_46 = PHI 
  _2 = i_48 + 1;
  _3 = a[_2];
  _4 = a[i_48];
  _5 = _3 * _4;
  a[i_48] = _5;
  _6 = i_48 + 2;
  _7 = a[_6];
  _8 = a[_2];
  _9 = _7 * _8;
  a[_2] = _9;
  _10 = i_48 + 3;
  _11 = a[_10];
  _12 = a[_6];
  _13 = _11 * _12;
  a[_6] = _13;
  _14 = i_48 + 4;
  _15 = a[_14];
  _16 = a[_10];
  _17 = _15 * _16;
  a[_10] = _17;
  _18 = i_48 + 5;
  _19 = a[_18];
  _20 = a[_14];
  _21 = _19 * _20;
  a[_14] = _21;
  ivtmp_45 = ivtmp_46 - 1;
  if (ivtmp_45 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]


tsvc.c:275:18: missed:   not vectorized, possible dependence between data-refs
a[i_48] and a[_18]
tsvc.c:274:27: missed:  bad data dependence.

_18 = i_48 + 5 and stride is 5...

[Bug middle-end/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

--- Comment #1 from Jan Hubicka  ---
Here we fail with:
tsvc.c:1526:27: note:   vect_is_simple_use: operand x_30 = PHI <_2(8),
x_18(3)>, type of def: unknown
tsvc.c:1526:27: missed:   Unsupported pattern.
tsvc.c:1527:26: missed:   not vectorized: unsupported use in stmt.
tsvc.c:1526:27: missed:  unexpected pattern.


{
  int i;
  int nl;
  real_t x;
  static const char __func__[5] = "s254";
  struct timeval * _1;
  float _2;
  float _3;
  float _4;
  struct timeval * _5;
  real_t _17;
  unsigned int ivtmp_27;
  unsigned int ivtmp_28;
  unsigned int ivtmp_29;
  unsigned int ivtmp_35;

   [local count: 108459]:
  initialise_arrays (&__func__);
  _1 = &func_args_13(D)->t1;
  gettimeofday (_1, 0B);

   [local count: 10737416]:
  # nl_31 = PHI 
  # ivtmp_28 = PHI 
  x_18 = b[31999];

   [local count: 1063004409]:
  # x_30 = PHI <_2(8), x_18(3)>
  # i_32 = PHI 
  # ivtmp_35 = PHI 
  _2 = b[i_32];
  _3 = _2 + x_30;
  _4 = _3 * 5.0e-1;
  a[i_32] = _4;
  i_22 = i_32 + 1;
  ivtmp_29 = ivtmp_35 - 1;
  if (ivtmp_29 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

   [local count: 1052266996]:
  goto ; [100.00%]



[Bug tree-optimization/99394] s254 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394

--- Comment #3 from Jan Hubicka  ---
testcase is:

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;

// %2.5

real_t s254(void)
{

//scalar and array expansion
//carry around variable

real_t x;
for (int nl = 0; nl < 4*iterations; nl++) {
x = b[LEN_1D-1];
for (int i = 0; i < LEN_1D; i++) {
a[i] = (b[i] + x) * (real_t).5;
x = b[i];
}
}

}

[Bug middle-end/99407] New: s243 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

Bug ID: 99407
   Summary: s243 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

This testcase (from TSVC) is about 4 times faster on zen3 when built with
clang.

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;
real_t s243(void)
{

//node splitting
//false dependence cycle breaking

for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D-1; i++) {
a[i] = b[i] + c[i  ] * d[i];
b[i] = a[i] + d[i  ] * e[i];
a[i] = b[i] + a[i+1] * d[i];
}
}
}

internal loop from clang is:
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups c(%rcx), %ymm12
vmovups c+32(%rcx), %ymm14
vmovups d(%rcx), %ymm0
vmovups d+32(%rcx), %ymm7
vfmadd213ps b(%rcx), %ymm0, %ymm12  # ymm12 = (ymm0 * ymm12) + mem
vfmadd213ps b+32(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * ymm14) +
mem
vfmadd231ps e(%rcx), %ymm0, %ymm12  # ymm12 = (ymm0 * mem) + ymm12
vfmadd231ps e+32(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * mem) +
ymm14
vmovups %ymm12, b(%rcx)
vmovups %ymm14, b+32(%rcx)
vfmadd231ps a+4(%rcx), %ymm0, %ymm12 # ymm12 = (ymm0 * mem) + ymm12
vfmadd231ps a+36(%rcx), %ymm7, %ymm14 # ymm14 = (ymm7 * mem) +
ymm14
vmovups %ymm12, a(%rcx)
vmovups %ymm14, a+32(%rcx)
addq$64, %rcx
cmpq$127936, %rcx   # imm = 0x1F3C0
jne .LBB0_2

[Bug middle-end/99407] s243 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

--- Comment #1 from Jan Hubicka  ---
Here we get:
s243.c:27:18: missed:   not vectorized, possible dependence between data-refs
a[i_29] and a[_9]
s243.c:26:27: missed:  bad data dependence.
s243.c:26:27: note:  * Analysis failed with vector mode V8QI

   [local count: 1052266997]:

   [local count: 1063004410]:
  # i_29 = PHI <_9(6), 0(4)>
  # ivtmp_43 = PHI 
  _1 = b[i_29];
  _2 = c[i_29];
  _3 = d[i_29];
  _4 = _2 * _3;
  _5 = _1 + _4;
  a[i_29] = _5;
  _6 = e[i_29];
  _7 = _3 * _6;
  _8 = _5 + _7;
  b[i_29] = _8;
  _9 = i_29 + 1;
  _10 = a[_9];
  _11 = _3 * _10;
  _12 = _8 + _11;
  a[i_29] = _12;
  ivtmp_42 = ivtmp_43 - 1;
  if (ivtmp_42 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

[Bug middle-end/99408] New: s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408

Bug ID: 99408
   Summary: s3251 benchmark of TSVC vectorized by clang runs about
7 times faster compared to gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
void
main(void)
{
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D-1; i++){
a[i+1] = b[i]+c[i];
b[i]   = c[i]*e[i];
d[i]   = a[i]*e[i];
}
}
}

Built with -march=znver2 -Ofast I get:
main:
.LFB0:
.cfi_startproc
vmovaps c+127968(%rip), %xmm5
vmovaps e+127968(%rip), %xmm4
movl$10, %edx
vmovq   c+127984(%rip), %xmm9
vmovq   e+127984(%rip), %xmm10
vmovss  c+127992(%rip), %xmm7
vmovss  e+127992(%rip), %xmm3
vmovss  c+127984(%rip), %xmm13
vmulps  %xmm4, %xmm5, %xmm6
vmulps  %xmm9, %xmm10, %xmm12
vmulss  %xmm3, %xmm7, %xmm11
.p2align 4
.p2align 3
.L2:
xorl%eax, %eax
.p2align 4
.p2align 3
.L4:
vmovaps c(%rax), %ymm2
addq$32, %rax
vaddps  b-32(%rax), %ymm2, %ymm0
vmovups %ymm0, a-28(%rax)
vmulps  e-32(%rax), %ymm2, %ymm0
vmovaps e-32(%rax), %ymm2
vmovaps %ymm0, b-32(%rax)
vmulps  a-32(%rax), %ymm2, %ymm0
vmovaps %ymm0, d-32(%rax)
cmpq$127968, %rax
jne .L4
vaddps  b+127968(%rip), %xmm5, %xmm1
vaddss  b+127984(%rip), %xmm13, %xmm2
decl%edx
vmovaps %xmm6, b+127968(%rip)
vmovq   b+127984(%rip), %xmm0
vmovlps %xmm12, b+127984(%rip)
vaddps  %xmm0, %xmm9, %xmm0
vmovups %xmm1, a+127972(%rip)
vshufps $255, %xmm1, %xmm1, %xmm1
vmulps  a+127968(%rip), %xmm4, %xmm8
vunpcklps   %xmm2, %xmm1, %xmm1
vaddss  b+127992(%rip), %xmm7, %xmm2
vmovss  %xmm11, b+127992(%rip)
vmulps  %xmm10, %xmm1, %xmm1
vmovlps %xmm0, a+127988(%rip)
vmovshdup   %xmm0, %xmm0
vmulss  %xmm3, %xmm0, %xmm0
vmovss  %xmm2, a+127996(%rip)
jne .L2
vmovaps %xmm8, d+127968(%rip)
vmovlps %xmm1, d+127984(%rip)
vmovss  %xmm0, d+127992(%rip)
vzeroupper
ret


Clang does:

main:   # @main
.cfi_startproc
# %bb.0:
vbroadcastssa(%rip), %ymm0
vmovss  e+127968(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  e+127980(%rip), %xmm2   # xmm2 = mem[0],zero,zero,zero
vmovss  c+127984(%rip), %xmm4   # xmm4 = mem[0],zero,zero,zero
vmovss  e+127984(%rip), %xmm5   # xmm5 = mem[0],zero,zero,zero
vmovss  c+127988(%rip), %xmm8   # xmm8 = mem[0],zero,zero,zero
vmovss  e+127988(%rip), %xmm9   # xmm9 = mem[0],zero,zero,zero
vmovss  c+127992(%rip), %xmm11  # xmm11 = mem[0],zero,zero,zero
vmovss  e+127992(%rip), %xmm12  # xmm12 = mem[0],zero,zero,zero
xorl%eax, %eax
vmovups %ymm0, -56(%rsp)# 32-byte Spill
vmovss  c+127968(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -64(%rsp)# 4-byte Spill
vmulss  %xmm4, %xmm5, %xmm3
vmulss  %xmm8, %xmm9, %xmm10
vmulss  %xmm11, %xmm12, %xmm13
vmovss  %xmm0, -60(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  e+127972(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -68(%rsp)# 4-byte Spill
vmovss  c+127972(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -76(%rsp)# 4-byte Spill
vmovss  %xmm0, -72(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  e+127976(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -80(%rsp)# 4-byte Spill
vmovss  c+127976(%rip), %xmm0   # xmm0 = mem[0],zero,zero,zero
vmovss  %xmm1, -88(%rsp)# 4-byte Spill
vmovss  %xmm0, -84(%rsp)# 4-byte Spill
vmulss  %xmm0, %xmm1, %xmm0
vmovss  c+127980(%rip), %xmm1   # xmm1 = mem[0],zero,zero,zero
vmovss  %xmm0, -92(%rsp)# 4-byte Spill
vmulss  %xmm1, %xmm2, %xmm0
   vmovss  %xmm0, -96(%rsp)# 4-byte Spill
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: 

[Bug middle-end/99409] New: s252 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409

Bug ID: 99409
   Summary: s252 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];

void main()
{

//scalar and array expansion
//loop with ambiguous scalar temporary

real_t t, s;
for (int nl = 0; nl < iterations; nl++) {
t = (real_t) 0.;
for (int i = 0; i < LEN_1D; i++) {
s = b[i] * c[i];
a[i] = s + t;
t = s;
}
}

}

clang does:
main:   # @main
.cfi_startproc
# %bb.0:
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-128000, %rcx  # imm = 0xFFFE0C00
.p2align4, 0x90
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups c+128000(%rcx), %ymm1
vmovups c+128032(%rcx), %ymm2
vmovups c+128064(%rcx), %ymm3
vmovups c+128096(%rcx), %ymm4
vmulps  b+128000(%rcx), %ymm1, %ymm1
vmulps  b+128032(%rcx), %ymm2, %ymm2
vmulps  b+128064(%rcx), %ymm3, %ymm3
vmulps  b+128096(%rcx), %ymm4, %ymm4
vperm2f128  $33, %ymm1, %ymm0, %ymm0 # ymm0 = ymm0[2,3],ymm1[0,1]
vperm2f128  $33, %ymm2, %ymm1, %ymm5 # ymm5 = ymm1[2,3],ymm2[0,1]
vperm2f128  $33, %ymm3, %ymm2, %ymm6 # ymm6 = ymm2[2,3],ymm3[0,1]
vperm2f128  $33, %ymm4, %ymm3, %ymm7 # ymm7 = ymm3[2,3],ymm4[0,1]
vshufps $3, %ymm1, %ymm0, %ymm0 # ymm0 =
ymm0[3,0],ymm1[0,0],ymm0[7,4],ymm1[4,4]
vshufps $3, %ymm2, %ymm5, %ymm5 # ymm5 =
ymm5[3,0],ymm2[0,0],ymm5[7,4],ymm2[4,4]
vshufps $3, %ymm3, %ymm6, %ymm6 # ymm6 =
ymm6[3,0],ymm3[0,0],ymm6[7,4],ymm3[4,4]
vshufps $3, %ymm4, %ymm7, %ymm7 # ymm7 =
ymm7[3,0],ymm4[0,0],ymm7[7,4],ymm4[4,4]
vshufps $152, %ymm1, %ymm0, %ymm0   # ymm0 =
ymm0[0,2],ymm1[1,2],ymm0[4,6],ymm1[5,6]
vshufps $152, %ymm2, %ymm5, %ymm5   # ymm5 =
ymm5[0,2],ymm2[1,2],ymm5[4,6],ymm2[5,6]
vshufps $152, %ymm3, %ymm6, %ymm6   # ymm6 =
ymm6[0,2],ymm3[1,2],ymm6[4,6],ymm3[5,6]
vshufps $152, %ymm4, %ymm7, %ymm7   # ymm7 =
ymm7[0,2],ymm4[1,2],ymm7[4,6],ymm4[5,6]
vaddps  %ymm0, %ymm1, %ymm0
vaddps  %ymm5, %ymm2, %ymm1
vaddps  %ymm6, %ymm3, %ymm2
vaddps  %ymm7, %ymm4, %ymm3
vmovups %ymm0, a+128000(%rcx)
vmovups %ymm1, a+128032(%rcx)
vmovups %ymm2, a+128064(%rcx)
vmovups %ymm3, a+128096(%rcx)
subq$-128, %rcx
vmovaps %ymm4, %ymm0
jne .LBB0_2
# %bb.3:#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$10, %eax   # imm = 0x186A0
jne .LBB0_1
# %bb.4:
vzeroupper
retq

s252.c:18:27: note:   worklist: examine stmt: _3 = s_11 + t_21;
s252.c:18:27: note:   vect_is_simple_use: operand _1 * _2, type of def:
internal
s252.c:18:27: note:   mark relevant 5, live 0: s_11 = _1 * _2;
s252.c:18:27: note:   vect_is_simple_use: operand t_21 = PHI ,
type of def: unknown
s252.c:18:27: missed:   Unsupported pattern.
s252.c:20:22: missed:   not vectorized: unsupported use in stmt.
s252.c:18:27: missed:  unexpected pattern.

   [local count: 1052266996]:

   [local count: 1063004409]:
  # t_21 = PHI 
  # i_23 = PHI 
  # ivtmp_20 = PHI 
  _1 = b[i_23];
  _2 = c[i_23];
  s_11 = _1 * _2;
  _3 = s_11 + t_21;
  a[i_23] = _3;
  i_13 = i_23 + 1;
  ivtmp_19 = ivtmp_20 - 1;
  if (ivtmp_19 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

[Bug middle-end/99411] New: s311 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Bug ID: 99411
   Summary: s311 benchmark of TSVC is vectorized by clang better
than by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];

int main()
{

//reductions
//sum reduction

real_t sum;
for (int nl = 0; nl < iterations*10; nl++) {
sum = (real_t)0.;
for (int i = 0; i < LEN_1D; i++) {
sum += a[i];
}
}
  return sum > 4;
}

We produce with -O2 -march=znver2

.L2:
movl$a, %eax
vxorps  %xmm0, %xmm0, %xmm0
.p2align 4
.p2align 3
.L3:
vaddps  (%rax), %ymm0, %ymm0
addq$32, %rax
cmpq$a+128000, %rax
jne .L3
vextractf128$0x1, %ymm0, %xmm1
decl%edx
vaddps  %xmm0, %xmm1, %xmm1
vmovhlps%xmm1, %xmm1, %xmm0
vaddps  %xmm1, %xmm0, %xmm0
vshufps $85, %xmm0, %xmm0, %xmm1
vaddps  %xmm0, %xmm1, %xmm0
jne .L2
xorl%eax, %eax
vcomiss .LC0(%rip), %xmm0
seta%al
vzeroupper
ret
.cfi_endproc


clang does:
main:   # @main
.cfi_startproc
# %bb.0:
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-128000, %rcx  # imm = 0xFFFE0C00
vxorps  %xmm1, %xmm1, %xmm1
vxorps  %xmm2, %xmm2, %xmm2
vxorps  %xmm3, %xmm3, %xmm3
.p2align4, 0x90
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vaddps  a+128000(%rcx), %ymm0, %ymm0
vaddps  a+128032(%rcx), %ymm1, %ymm1
vaddps  a+128064(%rcx), %ymm2, %ymm2
vaddps  a+128096(%rcx), %ymm3, %ymm3
subq$-128, %rcx
jne .LBB0_2
# %bb.3:#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$100, %eax  # imm = 0xF4240
jne .LBB0_1
# %bb.4:
vaddps  %ymm0, %ymm1, %ymm0
xorl%eax, %eax
vaddps  %ymm0, %ymm2, %ymm0
vaddps  %ymm0, %ymm3, %ymm0
vextractf128$1, %ymm0, %xmm1
vaddps  %xmm1, %xmm0, %xmm0
vpermilpd   $1, %xmm0, %xmm1# xmm1 = xmm0[1,0]
vaddps  %xmm1, %xmm0, %xmm0
vmovshdup   %xmm0, %xmm1# xmm1 = xmm0[1,1,3,3]
vaddss  %xmm1, %xmm0, %xmm0
vucomiss.LCPI0_0(%rip), %xmm0
seta%al
vzeroupper
retq

On zen3 hardware gcc version runs 2.4s, while clang's 0.8s

[Bug middle-end/99411] s311 and s31111 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311 benchmark of TSVC is   |s311 and s3 benchmark
   |vectorized by clang better  |of TSVC is vectorized by
   |than by gcc |clang better than by gcc

--- Comment #1 from Jan Hubicka  ---
I think this is same case

typedef float real_t;

#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];
real_t test(real_t* A){
  real_t s = (real_t)0.0;
  for (int i = 0; i < 4; i++)
s += A[i];
  return s;
}

int main()
{

//reductions
//sum reduction
real_t sum;
for (int nl = 0; nl < 2000*iterations; nl++) {
sum = (real_t)0.;
sum += test(a);
sum += test(&a[4]);
sum += test(&a[8]);
sum += test(&a[12]);
sum += test(&a[16]);
sum += test(&a[20]);
sum += test(&a[24]);
sum += test(&a[28]);
}
  return sum>4;
}

[Bug middle-end/99411] s311, s312 and s31111 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311 and s3 benchmark   |s311, s312 and s3
   |of TSVC is vectorized by|benchmark of TSVC is
   |clang better than by gcc|vectorized by clang better
   ||than by gcc

--- Comment #2 from Jan Hubicka  ---
another one:
// %3.1
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];

int main ()
{

//reductions
//product reduction

real_t prod;
for (int nl = 0; nl < 10*iterations; nl++) {
prod = (real_t)1.;
for (int i = 0; i < LEN_1D; i++) {
prod *= a[i];
}
}
return prod > 0;
}

[Bug middle-end/99411] s311, s312, s31111 and s31111 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311, s312 and s3   |s311, s312, s3 and
   |benchmark of TSVC is|s3 benchmark of TSVC is
   |vectorized by clang better  |vectorized by clang better
   |than by gcc |than by gcc

--- Comment #3 from Jan Hubicka  ---
and yet another one
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];
int main()
{

//reductions
//conditional sum reduction

real_t sum;
for (int nl = 0; nl < iterations/2; nl++) {
sum = 0.;
for (int i = 0; i < LEN_1D; i++) {
if (a[i] > (real_t)0.) {
sum += a[i];
}
}
}
   return sum > 4;
}

[Bug middle-end/99411] s311, s312, s31111 and s31111, s3110 benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311, s312, s3 and  |s311, s312, s3 and
   |s3 benchmark of TSVC is |s3, s3110 benchmark of
   |vectorized by clang better  |TSVC is vectorized by clang
   |than by gcc |better than by gcc

--- Comment #4 from Jan Hubicka  ---
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];
real_t aa[LEN_2D][LEN_2D];
int main()
{

//reductions
//if to max with index reductio 2 dimensions
//similar to S315

int xindex, yindex;
real_t max, chksum;
for (int nl = 0; nl < 100*(iterations/(LEN_2D)); nl++) {
max = aa[(0)][0];
xindex = 0;
yindex = 0;
for (int i = 0; i < LEN_2D; i++) {
for (int j = 0; j < LEN_2D; j++) {
if (aa[i][j] > max) {
max = aa[i][j];
xindex = i;
yindex = j;
}
}
}
chksum = max + (real_t) xindex + (real_t) yindex;
}
return max + xindex+1 + yindex+1;
}

[Bug middle-end/99412] New: s352 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99412

Bug ID: 99412
   Summary: s352 benchmark of TSVC is vectorized by clang and not
by gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256

real_t a[LEN_1D],b[LEN_1D];
int main ()
{

//loop rerolling
//unrolled dot product

real_t dot;
for (int nl = 0; nl < 8*iterations; nl++) {
dot = (real_t)0.;
for (int i = 0; i < LEN_1D; i += 5) {
dot = dot + a[i] * b[i] + a[i + 1] * b[i + 1] + a[i + 2]
* b[i + 2] + a[i + 3] * b[i + 3] + a[i + 4] * b[i + 4];
}
}

return dot;
}


clang does:
main:   # @main
.cfi_startproc
# %bb.0:
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-5, %rcx
.p2align4, 0x90
.LBB0_2:#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups b+20(,%rcx,4), %xmm1
vmovss  b+36(,%rcx,4), %xmm2# xmm2 = mem[0],zero,zero,zero
vmulps  a+20(,%rcx,4), %xmm1, %xmm1
vpermilpd   $1, %xmm1, %xmm3# xmm3 = xmm1[1,0]
vaddps  %xmm3, %xmm1, %xmm1
vmovshdup   %xmm1, %xmm3# xmm3 = xmm1[1,1,3,3]
vaddss  %xmm3, %xmm1, %xmm1
vfmadd231ss a+36(,%rcx,4), %xmm2, %xmm1 # xmm1 = (xmm2 * mem) +
xmm1
addq$5, %rcx
vaddss  %xmm0, %xmm1, %xmm0
cmpq$31995, %rcx# imm = 0x7CFB
jb  .LBB0_2
# %bb.3:#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$80, %eax   # imm = 0xC3500
jne .LBB0_1
# %bb.4:
vcvttss2si  %xmm0, %eax
retq

[Bug middle-end/99411] s311, s312, s31111, s31111, s3110, vsumr benchmark of TSVC is vectorized by clang better than by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

Jan Hubicka  changed:

   What|Removed |Added

Summary|s311, s312, s3 and  |s311, s312, s3, s3,
   |s3, s3110 benchmark of  |s3110, vsumr benchmark of
   |TSVC is vectorized by clang |TSVC is vectorized by clang
   |better than by gcc  |better than by gcc

--- Comment #5 from Jan Hubicka  ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D];
int main()
{

//control loops
//vector sum reduction

real_t sum;
for (int nl = 0; nl < iterations*10; nl++) {
sum = 0.;
for (int i = 0; i < LEN_1D; i++) {
sum += a[i];
}
}

return sum;
}

[Bug middle-end/99414] New: s235 benchmark of TSVC is vectorized better by icc than gcc (loop interchange)

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414

Bug ID: 99414
   Summary: s235 benchmark of TSVC is vectorized better by icc
than gcc (loop interchange)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
// %2.3
real_t main(struct args_t * func_args)
{

//loop interchanging
//imperfectly nested loops

for (int nl = 0; nl < 200*(iterations/LEN_2D); nl++) {
for (int i = 0; i < LEN_2D; i++) {
a[i] += b[i] * c[i];
for (int j = 1; j < LEN_2D; j++) {
aa[j][i] = aa[j-1][i] + bb[j][i] * a[i];
}
}
}
}


runs about 10 times faster on zen3 built by icc -O3 -ip -Ofast -g
-march=core-avx2 -mtune=core-avx2 -vec s235.c

main:
# parameter 1: %rdi
..B1.1: # Preds ..B1.0
# Execution count [1.77e+00]
.cfi_startproc
..___tag_value_main.1:
..L2:
  #9.1
pushq %rbp  #9.1
.cfi_def_cfa_offset 16
movq  %rsp, %rbp#9.1
.cfi_def_cfa 6, 16
.cfi_offset 6, -16
andq  $-128, %rsp   #9.1
subq  $128, %rsp#9.1
movl  $3, %edi  #9.1
xorl  %esi, %esi#9.1
call  __intel_new_feature_proc_init #9.1
# LOE rbx r12 r13 r14 r15
..B1.12:# Preds ..B1.1
# Execution count [1.77e+00]
vstmxcsr  (%rsp)#9.1
xorl  %eax, %eax#14.5
orl   $32832, (%rsp)#9.1
vldmxcsr  (%rsp)#9.1
# LOE rbx r12 r13 r14 r15 eax
..B1.2: # Preds ..B1.8 ..B1.12
# Execution count [7.83e+04]
xorl  %edx, %edx#15.9
# LOE rdx rbx r12 r13 r14 r15 eax
..B1.3: # Preds ..B1.3 ..B1.2
# Execution count [2.00e+07]
vmovups   b(,%rdx,4), %ymm1 #16.21
lea   (,%rdx,4), %rcx   #16.13
vmovups   32+b(,%rdx,4), %ymm3  #16.21
vmovups   64+b(,%rdx,4), %ymm5  #16.21
vmovups   96+b(,%rdx,4), %ymm7  #16.21
vmovups   128+b(,%rdx,4), %ymm9 #16.21
vmovups   160+b(,%rdx,4), %ymm11#16.21
vmovups   192+b(,%rdx,4), %ymm13#16.21
vmovups   224+b(,%rdx,4), %ymm15#16.21
vmovups   c(,%rdx,4), %ymm0 #16.28
vmovups   32+c(,%rdx,4), %ymm2  #16.28
vmovups   64+c(,%rdx,4), %ymm4  #16.28
vmovups   96+c(,%rdx,4), %ymm6  #16.28
vmovups   128+c(,%rdx,4), %ymm8 #16.28
vmovups   160+c(,%rdx,4), %ymm10#16.28
vmovups   192+c(,%rdx,4), %ymm12#16.28
vmovups   224+c(,%rdx,4), %ymm14#16.28
vfmadd213ps a(,%rdx,4), %ymm0, %ymm1#16.13
vfmadd213ps 32+a(,%rdx,4), %ymm2, %ymm3 #16.13
vfmadd213ps 64+a(,%rdx,4), %ymm4, %ymm5 #16.13
vfmadd213ps 96+a(,%rdx,4), %ymm6, %ymm7 #16.13
vfmadd213ps 128+a(,%rdx,4), %ymm8, %ymm9#16.13
vfmadd213ps 160+a(,%rdx,4), %ymm10, %ymm11  #16.13
vfmadd213ps 192+a(,%rdx,4), %ymm12, %ymm13  #16.13
vfmadd213ps 224+a(,%rdx,4), %ymm14, %ymm15  #16.13
vmovups   %ymm1, a(%rcx)#16.13
vmovups   %ymm3, 32+a(%rcx) #16.13
vmovups   %ymm5, 64+a(%rcx) #16.13
vmovups   %ymm7, 96+a(%rcx) #16.13
vmovups   %ymm9, 128+a(%rcx)  

[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

--- Comment #3 from Jan Hubicka  ---
ICC version seems to run faster
0040a050 :
  40a050:   55  push   %rbp
  40a051:   48 89 e5mov%rsp,%rbp
  40a054:   48 83 e4 e0 and$0xffe0,%rsp
  40a058:   41 57   push   %r15
  40a05a:   53  push   %rbx
  40a05b:   48 83 ec 10 sub$0x10,%rsp
  40a05f:   48 89 fbmov%rdi,%rbx
  40a062:   bf 74 f5 42 00  mov$0x42f574,%edi
  40a067:   e8 14 cc 00 00  call   416c80 
  40a06c:   48 89 dfmov%rbx,%rdi
  40a06f:   33 f6   xor%esi,%esi
  40a071:   e8 4a 70 ff ff  call   4010c0 
  40a076:   33 c0   xor%eax,%eax
  40a078:   41 89 c7mov%eax,%r15d
  40a07b:   33 d2   xor%edx,%edx
  40a07d:   0f 1f 00nopl   (%rax)
  40a080:   c5 fc 10 04 95 04 9dvmovups 0x579d04(,%rdx,4),%ymm0
  40a087:   57 00 
  40a089:   c5 fc 10 14 95 24 9dvmovups 0x579d24(,%rdx,4),%ymm2
  40a090:   57 00 
  40a092:   c5 fc 10 24 95 44 9dvmovups 0x579d44(,%rdx,4),%ymm4
  40a099:   57 00 
  40a09b:   c5 fc 10 34 95 64 9dvmovups 0x579d64(,%rdx,4),%ymm6
  40a0a2:   57 00 
  40a0a4:   c5 fc 59 0c 95 00 9dvmulps 0x579d00(,%rdx,4),%ymm0,%ymm1
  40a0ab:   57 00 
  40a0ad:   c5 ec 59 1c 95 20 9dvmulps 0x579d20(,%rdx,4),%ymm2,%ymm3
  40a0b4:   57 00 
  40a0b6:   c5 dc 59 2c 95 40 9dvmulps 0x579d40(,%rdx,4),%ymm4,%ymm5
  40a0bd:   57 00 
  40a0bf:   c5 cc 59 3c 95 60 9dvmulps 0x579d60(,%rdx,4),%ymm6,%ymm7
  40a0c6:   57 00 
  40a0c8:   c5 fc 11 0c 95 00 9dvmovups %ymm1,0x579d00(,%rdx,4)
  40a0cf:   57 00 
  40a0d1:   c5 fc 11 1c 95 20 9dvmovups %ymm3,0x579d20(,%rdx,4)
  40a0d8:   57 00 
  40a0da:   c5 fc 11 2c 95 40 9dvmovups %ymm5,0x579d40(,%rdx,4)
  40a0e1:   57 00 
  40a0e3:   c5 fc 11 3c 95 60 9dvmovups %ymm7,0x579d60(,%rdx,4)
  40a0ea:   57 00 
  40a0ec:   48 83 c2 20 add$0x20,%rdx
  40a0f0:   48 81 fa e0 7c 00 00cmp$0x7ce0,%rdx
  40a0f7:   72 87   jb 40a080 
  40a0f9:   33 c9   xor%ecx,%ecx
  40a0fb:   ba e1 7c 00 00  mov$0x7ce1,%edx
  40a100:   c5 fc 10 04 95 00 9dvmovups 0x579d00(,%rdx,4),%ymm0
  40a107:   57 00 
  40a109:   48 83 c2 08 add$0x8,%rdx
  40a10d:   c5 fc 59 0c 8d 80 90vmulps 0x599080(,%rcx,4),%ymm0,%ymm1
  40a114:   59 00 
  40a116:   c5 fc 11 0c 8d 80 90vmovups %ymm1,0x599080(,%rcx,4)
  40a11d:   59 00 
  40a11f:   48 83 c1 08 add$0x8,%rcx
  40a123:   48 83 f9 18 cmp$0x18,%rcx
  40a127:   72 d7   jb 40a100 
  40a129:   c5 fa 10 0d b3 ef 18vmovss 0x18efb3(%rip),%xmm1#
5990e4 
  40a130:   00 
  40a131:   bf 00 9d 57 00  mov$0x579d00,%edi
  40a136:   c5 fa 10 1d aa ef 18vmovss 0x18efaa(%rip),%xmm3#
5990e8 
  40a13d:   00 
  40a13e:   be 80 d8 45 00  mov$0x45d880,%esi
  40a143:   c5 f2 59 05 95 ef 18vmulss 0x18ef95(%rip),%xmm1,%xmm0  
 # 5990e0 
  40a14a:   00 
  40a14b:   ba 00 a9 55 00  mov$0x55a900,%edx
  40a150:   c5 e2 59 25 94 ef 18vmulss 0x18ef94(%rip),%xmm3,%xmm4  
 # 5990ec 
  40a157:   00 
  40a158:   c5 f2 59 d3 vmulss %xmm3,%xmm1,%xmm2
  40a15c:   c5 fa 11 05 7c ef 18vmovss %xmm0,0x18ef7c(%rip)#
5990e0 
  40a163:   00 
  40a164:   b9 80 e4 43 00  mov$0x43e480,%ecx
  40a169:   c5 fa 11 15 73 ef 18vmovss %xmm2,0x18ef73(%rip)#
5990e4 
  40a170:   00 
  40a171:   41 b8 00 b5 53 00   mov$0x53b500,%r8d
  40a177:   c5 fa 11 25 69 ef 18vmovss %xmm4,0x18ef69(%rip)#
5990e8 
  40a17e:   00 
  40a17f:   41 b9 c0 b4 4b 00   mov$0x4bb4c0,%r9d
  40a185:   68 00 91 59 00  push   $0x599100
  40a18a:   68 00 b5 4f 00  push   $0x4fb500
  40a18f:   c5 f8 77vzeroupper 
  40a192:   c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0
  40a196:   e8 d5 92 00 00  call   413470 
  40a19b:   48 83 c4 10 add$0x10,%rsp
  40a19f:   41 ff c7inc%r15d
  40a1a2:   41 81 ff 40 42 0f 00cmp$0xf4240,%r15d
  40a1a9:   0f 82 cc fe ff ff   jb 40a07b 
  40a1af:   48 83 c3 10 add$0x10,%rbx
  40a1b3:   33 f6   xor%esi,%esi
  40a1b5:   48 89 dfmov%rbx,%rdi
  40a1b8:   e8 03 6f ff ff  call   4010c0 
  40a1bd:   bf 74 f5 42 00  mov$0x42f57

[Bug middle-end/99415] New: s115 benchmark of TSVC is vectorized by icc and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99415

Bug ID: 99415
   Summary: s115 benchmark of TSVC is vectorized by icc and not by
gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256

real_t a[LEN_1D],aa[LEN_2D][LEN_2D];
void main()
{

for (int nl = 0; nl < 1000*(iterations/LEN_2D); nl++) {
for (int j = 0; j < LEN_2D; j++) {
for (int i = j+1; i < LEN_2D; i++) {
a[i] -= aa[j][i] * a[j];
}
}
}

}

is built as:
main:
..B1.1: # Preds ..B1.0
# Execution count [1.17e-01]
.cfi_startproc
..___tag_value_main.1:
..L2:
  #9.1
pushq %rbp  #9.1
.cfi_def_cfa_offset 16
movq  %rsp, %rbp#9.1
.cfi_def_cfa 6, 16
.cfi_offset 6, -16
andq  $-128, %rsp   #9.1
pushq %r14  #9.1
pushq %r15  #9.1
pushq %rbx  #9.1
subq  $104, %rsp#9.1
movl  $3, %edi  #9.1
xorl  %esi, %esi#9.1
call  __intel_new_feature_proc_init #9.1
.cfi_escape 0x10, 0x03, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0xe8, 0xff, 0xff, 0xff, 0x22
.cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
.cfi_escape 0x10, 0x0f, 0x0e, 0x38, 0x1c, 0x0d, 0x80, 0xff, 0xff, 0xff,
0x1a, 0x0d, 0xf0, 0xff, 0xff, 0xff, 0x22
# LOE rbx r12 r13 r14 r15
..B1.29:# Preds ..B1.1
# Execution count [1.17e-01]
vstmxcsr  (%rsp)#9.1
xorl  %eax, %eax#11.5
orl   $32832, (%rsp)#9.1
vldmxcsr  (%rsp)#9.1
# LOE r12 r13 eax
..B1.2: # Preds ..B1.22 ..B1.29
# Execution count [4.50e+04]
xorl  %r11d, %r11d  #12.9
xorl  %edi, %edi#12.9
xorl  %ebx, %ebx#12.9
xorl  %r9d, %r9d#12.9
xorl  %esi, %esi#12.9
# LOE rbx rsi r11 r12 r13 eax edi r9d
..B1.3: # Preds ..B1.21 ..B1.2
# Execution count [1.15e+07]
incl  %edi  #13.28
decl  %r9d  #13.28
cmpl  $256, %edi#13.35
jge   ..B1.21   # Prob 50%  #13.35
# LOE rbx rsi r11 r12 r13 eax edi r9d
..B1.4: # Preds ..B1.3
# Execution count [1.04e+07]
lea   256(%r9), %r10d   #13.35
cmpl  $16, %r10d#13.13
jl..B1.25   # Prob 10%  #13.13
# LOE rbx rsi r11 r12 r13 eax edi r9d r10d
..B1.5: # Preds ..B1.4
# Execution count [1.04e+07]
lea   4+aa(%rsi,%rbx), %r8  #14.25
andq  $31, %r8  #13.13
lea   (%rsi,%rbx), %r14 #14.25
movl  %r8d, %edx#13.13
negl  %edx  #13.13
addl  $32, %edx #13.13
shrl  $2, %edx  #13.13
testl %r8d, %r8d#13.13
cmovne%edx, %r8d#13.13
lea   16(%r8), %ecx #13.13
cmpl  %ecx, %r10d   #13.13

[Bug middle-end/99416] New: s211 benchmark of TSVC is vectorized by icc and not by gcc

2021-03-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99416

Bug ID: 99416
   Summary: s211 benchmark of TSVC is vectorized by icc and not by
gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
void main()
{

for (int nl = 0; nl < iterations; nl++) {
for (int i = 1; i < LEN_1D-1; i++) {
a[i] = b[i - 1] + c[i] * d[i];
b[i] = b[i + 1] - e[i] * d[i];
}
}
}


Icc produces:
ain:
..B1.1: # Preds ..B1.0
# Execution count [0.00e+00]
.cfi_startproc
..___tag_value_ain.1:
..L2:
  #9.1
subq  $136, %rsp#9.1
.cfi_def_cfa_offset 144
xorl  %edx, %edx#11.5
lea   12+d(%rip), %r8   #14.38
vmovss(%r8), %xmm0  #14.38
movl  $7, %edi  #13.38
lea   12+e(%rip), %r9   #14.38
vmulss(%r9), %xmm0, %xmm12  #14.38
xorl  %esi, %esi#13.38
lea   12+c(%rip), %r10  #13.38
vmulss(%r10), %xmm0, %xmm0  #13.38
vmovss16(%r8), %xmm4#14.38
movl  $31977, %ecx  #12.9
vmulss16(%r9), %xmm4, %xmm14#14.38
movl  $31975, %eax  #12.9
lea   24+b(%rip), %r11  #14.20
vmovss(%r11), %xmm11#14.20
vmovss4(%r8), %xmm6 #14.38
vmovss%xmm12, 104(%rsp) #14.38[spill]
vmovss%xmm11, 8(%rsp)   #14.20[spill]
vmulss4(%r9), %xmm6, %xmm12 #14.38
vmulss4(%r10), %xmm6, %xmm11#13.38
vmovss127984+d(%rip), %xmm6 #14.38
vmovss8(%r8), %xmm13#14.38
vmovss%xmm14, 96(%rsp)  #14.38[spill]
vmulss127984+e(%rip), %xmm6, %xmm14 #14.38
vmulss8(%r9), %xmm13, %xmm1 #14.38
vmovss%xmm14, 112(%rsp) #14.38[spill]
vmovss127988+d(%rip), %xmm14#14.38
vmovss%xmm1, 16(%rsp)   #14.38[spill]
vmulss8(%r10), %xmm13, %xmm1#13.38
vmulss16(%r10), %xmm4, %xmm13   #13.38
vmulss127988+e(%rip), %xmm14, %xmm4 #14.38
vmovss%xmm4, 120(%rsp)  #14.38[spill]
   vmulss127988+c(%rip), %xmm14, %xmm4 #13.38
vmovss-4(%r11), %xmm5   #14.20
vmovss-8(%r8), %xmm2#14.38
vmovss12(%r8), %xmm15   #14.38
vmovss%xmm4, 24(%rsp)   #13.38[spill]
vmovss127992+d(%rip), %xmm4 #14.38
vmovss%xmm5, (%rsp) #14.20[spill]
vmulss-8(%r9), %xmm2, %xmm3 #14.38
vmulss-8(%r10), %xmm2, %xmm5#13.38
vmulss12(%r9), %xmm15, %xmm2#14.38
vmulss12(%r10), %xmm15, %xmm15  #13.38
vmulss127992+e(%rip), %xmm4, %xmm14 #14.38
vmulss127992+c(%rip), %xmm4, %xmm4  #13.38
vmovss-4(%r8), %xmm10   #14.38
vmulss-4(%r9), %xmm10, %xmm7#14.38
vmulss-4(%r10), %xmm10, %xmm10  #13.38
vmovss%xmm7, 88(%rsp)   #14.38[spill]
vmovss%xmm4, 32(%rsp)   #13.38[spill]
vmovss%xmm15, 56(%rsp)  #13.31[spill]
vmovss%xmm14, 40(%rsp)  #13.31[spill]
vmovss%xmm3, 80(%rsp)   #

[Bug middle-end/99633] New: s1113 benchmark of TSVC is unrolled by icc and not by gcc and runs faster on znver3

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99633

Bug ID: 99633
   Summary: s1113 benchmark of TSVC is unrolled by icc and not by
gcc and runs faster on znver3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];

int main(struct args_t * func_args)
{

//linear dependence testing
//one iteration dependency on a(LEN_1D/2) but still vectorizable

//initialise_arrays(__func__);
//gettimeofday(&func_args->t1, NULL);

for (int nl = 0; nl < 2*iterations; nl++) {
for (int i = 0; i < LEN_1D; i++) {
a[i] = a[LEN_1D/2] + b[i];
}
//dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

return a[10];
}

Is unrolled twice by icc and runs 1.5s instead of 2.6s when built with gcc.
-funroll-loops fixes the issue, but it suggests we may want to unroll by
default on zver3

[Bug tree-optimization/99414] s235 and s233 benchmarks of TSVC is vectorized better by icc than gcc (loop interchange)

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414

Jan Hubicka  changed:

   What|Removed |Added

Summary|s235 benchmark of TSVC is   |s235 and s233 benchmarks of
   |vectorized better by icc|TSVC is vectorized better
   |than gcc (loop interchange) |by icc than gcc (loop
   ||interchange)

--- Comment #2 from Jan Hubicka  ---
another testcase

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t
aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int main(struct args_t * func_args)
{
//loop interchange
//interchanging with one of two inner loops

for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
for (int i = 1; i < LEN_2D; i++) {
for (int j = 1; j < LEN_2D; j++) {
aa[j][i] = aa[j-1][i] + cc[j][i];
}
for (int j = 1; j < LEN_2D; j++) {
bb[j][i] = bb[j][i-1] + cc[j][i];
}
}
dummy();
}

   return aa[0][0];
}

[Bug tree-optimization/99414] s235, s2233 and s233 benchmarks of TSVC is vectorized better by icc than gcc (loop interchange)

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414

Jan Hubicka  changed:

   What|Removed |Added

Summary|s235 and s233 benchmarks of |s235, s2233 and s233
   |TSVC is vectorized better   |benchmarks of TSVC is
   |by icc than gcc (loop   |vectorized better by icc
   |interchange)|than gcc (loop interchange)

--- Comment #3 from Jan Hubicka  ---
this one is 7s with gcc and 0.4s with icc.

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t
aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int main(struct args_t * func_args)
{
//loop interchange
//interchanging with one of two inner loops

for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
for (int i = 1; i < LEN_2D; i++) {
for (int j = 1; j < LEN_2D; j++) {
aa[j][i] = aa[j-1][i] + cc[j][i];
}
for (int j = 1; j < LEN_2D; j++) {
bb[i][j] = bb[i-1][j] + cc[i][j];
}
}
dummy();
}

   return aa[0][0];
}

[Bug tree-optimization/99414] s235, s2233, s275 and s233 benchmarks of TSVC is vectorized better by icc than gcc (loop interchange)

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414

Jan Hubicka  changed:

   What|Removed |Added

Summary|s235, s2233 and s233|s235, s2233, s275 and s233
   |benchmarks of TSVC is   |benchmarks of TSVC is
   |vectorized better by icc|vectorized better by icc
   |than gcc (loop interchange) |than gcc (loop interchange)

--- Comment #4 from Jan Hubicka  ---
s275:
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t
a[LEN_2D],d[LEN_2D],aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int main(struct args_t * func_args)
{
//control flow
//if around inner loop, interchanging needed

for (int i = 0; i < LEN_2D; i++) 
aa[0][i]=1;

for (int nl = 0; nl < 10*(iterations/LEN_2D); nl++) {
for (int i = 0; i < LEN_2D; i++) {
if (aa[0][i] > (real_t)0.) {
for (int j = 1; j < LEN_2D; j++) {
aa[j][i] = aa[j-1][i] + bb[j][i] * cc[j][i];
}
}
}
dummy();
}
   return aa[0][0];
}

[Bug tree-optimization/99414] s235, s2233, s275, s2275 and s233 benchmarks of TSVC is vectorized better by icc than gcc (loop interchange)

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99414

Jan Hubicka  changed:

   What|Removed |Added

Summary|s235, s2233, s275 and s233  |s235, s2233, s275, s2275
   |benchmarks of TSVC is   |and s233 benchmarks of TSVC
   |vectorized better by icc|is vectorized better by icc
   |than gcc (loop interchange) |than gcc (loop interchange)

--- Comment #5 from Jan Hubicka  ---
s2275:
typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t
a[LEN_2D],b[LEN_2D],c[LEN_2D],d[LEN_2D],aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int main(struct args_t * func_args)
{
//loop distribution is needed to be able to interchange

for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
for (int i = 0; i < LEN_2D; i++) {
for (int j = 0; j < LEN_2D; j++) {
aa[j][i] = aa[j][i] + bb[j][i] * cc[j][i];
}
a[i] = b[i] + c[i] * d[i];
}
dummy();
}
   return aa[0][0];
}

[Bug middle-end/99634] New: s2102 benchmarks of TSVC is vectorized better by icc than gcc

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99634

Bug ID: 99634
   Summary: s2102 benchmarks of TSVC is vectorized better by icc
than gcc
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
// array definitions

real_t
a[LEN_2D],d[LEN_2D],aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];


int main(struct args_t * func_args)
{
//diagonals
//identity matrix, best results vectorize both inner and outer loops

for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
for (int i = 0; i < LEN_2D; i++) {
for (int j = 0; j < LEN_2D; j++) {
aa[j][i] = (real_t)0.;
}
aa[i][i] = (real_t)1.;
}
dummy();
}
   return aa[0][0];
}

is vectorized by ic as:
min:
# parameter 1: %rdi
..B1.1: # Preds ..B1.0
# Execution count [5.00e-03]
.cfi_startproc
..___tag_value_min.1:
..L2:
  #36.1
pushq %rbp  #36.1
.cfi_def_cfa_offset 16
movq  %rsp, %rbp#36.1
.cfi_def_cfa 6, 16
.cfi_offset 6, -16
andq  $-32, %rsp#36.1
movl  $aa, %edi #38.13
xorl  %esi, %esi#38.13
movl  $262144, %edx #38.13
call  _intel_fast_memset#38.13
# LOE rbx r12 r13 r14 r15
..B1.2: # Preds ..B1.1
# Execution count [1.00e+00]
vmovups   .L_2il0floatpacket.0(%rip), %ymm1 #41.24
xorl  %edx, %edx#37.9
xorl  %eax, %eax#37.9
vextractf128 $1, %ymm1, %xmm0   #41.13
# LOE rax rdx rbx r12 r13 r14 r15 xmm0 xmm1
..B1.3: # Preds ..B1.3 ..B1.2
# Execution count [2.56e+02]
vextractps $3, %xmm1, 44204+aa(%rax,%rdx,4) #41.13
lea   (%rax,%rdx,4), %rcx   #41.13
vmovss%xmm0, 45232+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm0, 46260+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm0, 47288+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm0, 48316+aa(%rax,%rdx,4) #41.13
vmovss%xmm1, 49344+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm1, 50372+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm1, 51400+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm1, 52428+aa(%rax,%rdx,4) #41.13
vmovss%xmm0, 53456+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm0, 54484+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm0, 55512+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm0, 56540+aa(%rax,%rdx,4) #41.13
vmovss%xmm1, 57568+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm1, 58596+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm1, 59624+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm1, 60652+aa(%rax,%rdx,4) #41.13
vmovss%xmm0, 61680+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm0, 62708+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm0, 63736+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm0, 64764+aa(%rax,%rdx,4) #41.13
vmovss%xmm1, 65792+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm1, 66820+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm1, 67848+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm1, 68876+aa(%rax,%rdx,4) #41.13
vmovss%xmm0, 69904+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm0, 70932+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm0, 71960+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm0, 72988+aa(%rax,%rdx,4) #41.13
vmovss%xmm1, 74016+aa(%rax,%rdx,4)  #41.13
vextractps $1, %xmm1, 75044+aa(%rax,%rdx,4) #41.13
vextractps $2, %xmm1, 76072+aa(%rax,%rdx,4) #41.13
vextractps $3, %xmm1, 77100+aa(%rax,%rdx,4) 

[Bug middle-end/99638] New: s132 benchmarks of TSVC on zen3 benefits from -mno-fma

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99638

Bug ID: 99638
   Summary: s132 benchmarks of TSVC on zen3 benefits from -mno-fma
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;


// %2.5

void main()
{
//global data flow analysis
//loop with multiple dimension ambiguous subscripts

int m = 0;
int j = m;
int k = m+1;
for (int nl = 0; nl < 400*iterations; nl++) {
for (int i= 1; i < LEN_2D; i++) {
aa[j][i] = aa[k][i-1] + b[i] * c[1];
}
dummy();
}
}

compiled with -Ofast -march=native runs 4.4s compared to 4.2s with -Ofast
-march=native -mno-fma

[Bug middle-end/99638] s132 and s281 benchmarks of TSVC on zen3 benefits from -mno-fma

2021-03-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99638

Jan Hubicka  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
Summary|s132 benchmarks of TSVC on  |s132 and s281 benchmarks of
   |zen3 benefits from -mno-fma |TSVC on zen3 benefits from
   ||-mno-fma

--- Comment #1 from Jan Hubicka  ---
s281 benchmark:

typedef float real_t;

#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
// array definitions
real_t flat_2d_array[LEN_2D*LEN_2D];

real_t x[LEN_1D];

real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
aa[LEN_2D][LEN_2D],bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];

int indx[LEN_1D];

real_t* __restrict__ xx;
real_t* yy;

// %2.5

void main()
{
//crossing thresholds
//index set splitting
//reverse data access

real_t x;
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D; i++) {
x = a[LEN_1D-i-1] + b[i] * c[i];
a[i] = x-(real_t)1.0;
b[i] = x;
}
dummy();
}
}


with FMA runs 18s and without 14s

[Bug middle-end/99646] New: s111 benchmark of TSVC preffers -mprefer-avx128 on zen3

2021-03-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99646

Bug ID: 99646
   Summary: s111 benchmark of TSVC preffers -mprefer-avx128 on
zen3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

typedef float real_t;

#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256

real_t a[LEN_1D],b[LEN_1D],aa[LEN_2D][LEN_2D];
void main()
{
//linear dependence testing
//no dependence - vectorizable

for (int nl = 0; nl < 2*iterations; nl++) {
for (int i = 1; i < LEN_1D; i += 2) {
a[i] = a[i - 1] + b[i];
}
dummy();
}

}

takes 0.73s with -march=native -Ofast -mprefer-avx128 and 0.81s with
-march=native -Ofast

128bit version is:
main:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl$20, %ebx
.L2:
xorl%eax, %eax
.p2align 4
.p2align 3
.L4:
vmovaps a(%rax), %xmm2
vmovups b+4(%rax), %xmm3
addq$32, %rax
vshufps $136, a-16(%rax), %xmm2, %xmm0
vshufps $136, b-12(%rax), %xmm3, %xmm1
vaddps  %xmm1, %xmm0, %xmm0
vmovss  %xmm0, a-28(%rax)
vextractps  $1, %xmm0, a-20(%rax)
vextractps  $2, %xmm0, a-12(%rax)
vextractps  $3, %xmm0, a-4(%rax)
cmpq$127968, %rax
jne .L4
vmovss  b+127972(%rip), %xmm0
xorl%eax, %eax
vaddss  a+127968(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127972(%rip)
vmovss  a+127976(%rip), %xmm0
vaddss  b+127980(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127980(%rip)
vmovss  a+127984(%rip), %xmm0
vaddss  b+127988(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127988(%rip)
vmovss  a+127992(%rip), %xmm0
vaddss  b+127996(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127996(%rip)
calldummy


main:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
pushq   %rbx
.cfi_offset 3, -24
movl$20, %ebx
andq$-32, %rsp
.p2align 4
.p2align 3
.L2:
xorl%eax, %eax
.p2align 4
.p2align 3
.L4:
vmovaps a(%rax), %ymm4
vmovups b+4(%rax), %ymm5
addq$64, %rax
vshufps $136, a-32(%rax), %ymm4, %ymm1
vperm2f128  $3, %ymm1, %ymm1, %ymm2
vshufps $68, %ymm2, %ymm1, %ymm0
vshufps $238, %ymm2, %ymm1, %ymm2
vshufps $136, b-28(%rax), %ymm5, %ymm1
vinsertf128 $1, %xmm2, %ymm0, %ymm0
vperm2f128  $3, %ymm1, %ymm1, %ymm2
vshufps $68, %ymm2, %ymm1, %ymm3
vshufps $238, %ymm2, %ymm1, %ymm2
vinsertf128 $1, %xmm2, %ymm3, %ymm1
vaddps  %ymm1, %ymm0, %ymm0
vmovss  %xmm0, a-60(%rax)
vextractps  $1, %xmm0, a-52(%rax)
vextractps  $2, %xmm0, a-44(%rax)
vextractps  $3, %xmm0, a-36(%rax)
vextractf128$0x1, %ymm0, %xmm0
vmovss  %xmm0, a-28(%rax)
vextractps  $1, %xmm0, a-20(%rax)
vextractps  $2, %xmm0, a-12(%rax)
vextractps  $3, %xmm0, a-4(%rax)
cmpq$127936, %rax
jne .L4
vmovaps a+127936(%rip), %xmm6
vmovups b+127940(%rip), %xmm7
xorl%eax, %eax
vshufps $136, a+127952(%rip), %xmm6, %xmm0
vshufps $136, b+127956(%rip), %xmm7, %xmm1
vaddps  %xmm1, %xmm0, %xmm0
vmovss  %xmm0, a+127940(%rip)
vextractps  $1, %xmm0, a+127948(%rip)
vextractps  $2, %xmm0, a+127956(%rip)
vextractps  $3, %xmm0, a+127964(%rip)
vmovss  b+127972(%rip), %xmm0
vaddss  a+127968(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127972(%rip)
vmovss  b+127980(%rip), %xmm0
vaddss  a+127976(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127980(%rip)
vmovss  b+127988(%rip), %xmm0
vaddss  a+127984(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127988(%rip)
vmovss  a+127992(%rip), %xmm0
vaddss  b+127996(%rip), %xmm0, %xmm0
vmovss  %xmm0, a+127996(%rip)
vzeroupper
calldummy

[Bug ipa/99785] Awful lot of time spent building gl.cc in Firefox

2021-03-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99785

--- Comment #15 from Jan Hubicka  ---
We run into the size estimate with always inlines because after inlining we
update the size of caller (because that does matter when inlining normal
functions).

We already have special purepose always inliner to avoid some of the issues, so
I guess we keep running into this during the late IPA inlining?

Honza

[Bug ipa/99785] Awful lot of time spent building gl.cc in Firefox

2021-03-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99785

--- Comment #16 from Jan Hubicka  ---
OK,we seem to handle all relevant always_inlines in early passes and then we
produce functions large function with many non-always_inline calls that we
spend a lot of time inlining.  This is becuase we have relative function growth
bounds that are quite high and we manage to get a lot of inlining done.
I guess clang hits cap on those earlier. I will check if I can save some
compile time.

Honza

[Bug ipa/99751] [11 Regression] wrong code at -O1

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99751

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #7 from Jan Hubicka  ---
mine.

[Bug rtl-optimization/97836] wrong code at -O1 on x86_64-pc-linux-gnu by r11-5029

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97836

--- Comment #8 from Jan Hubicka  ---
indeed, I think for gcc11 we want to make return mark value as used and for
next stage1 we want to design EAF flags bit more carefully...

[Bug ipa/99751] [11 Regression] wrong code at -O1

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99751

--- Comment #8 from Jan Hubicka  ---
So we wrongly identify nodirectescape in store_to_c this is due to early exit
in analyze_call that does not account for const call possibly returning its
parameter. (An early confusion in EAF tracking logic before I settled up on the
fact that returns are not escapes for local PTA).  I am looking into fix. It is
odd that this did not show earlier.

[Bug ipa/99751] [11 Regression] wrong code at -O1

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99751

--- Comment #9 from Jan Hubicka  ---
OK, so actually there is logic to handle return values (even for consts) but it
has wrong if.  I am testing the attached fix.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 7aaf53be8f4..5f33bb5b410 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1545,9 +1545,9 @@ merge_call_lhs_flags (gcall *call, int arg, int index,
bool deref,
   tree lhs = gimple_call_lhs (call);
   analyze_ssa_name_flags (lhs, lattice, depth + 1, ipa);
   if (deref)
-   lattice[index].merge (lattice[SSA_NAME_VERSION (lhs)]);
-  else
lattice[index].merge_deref (lattice[SSA_NAME_VERSION (lhs)], false);
+  else
+   lattice[index].merge (lattice[SSA_NAME_VERSION (lhs)]);
 }
   /* In the case of memory store we can do nothing.  */
   else
@@ -1621,7 +1621,7 @@ analyze_ssa_name_flags (tree name, vec
&lattice, int depth,
   else if (gcall *call = dyn_cast  (use_stmt))
{
  tree callee = gimple_call_fndecl (call);
- /* Return slot optiomization would require bit of propagation;
+ /* Return slot optimization would require bit of propagation;
 give up for now.  */
  if (gimple_call_return_slot_opt_p (call)
  && gimple_call_lhs (call) != NULL_TREE

[Bug ipa/99751] [11 Regression] wrong code at -O1

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99751

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Jan Hubicka  ---
Forgot PR marker in changelog, but it is fixed by
g:dd64aaafe6916ac11ccae3182b4550c8b8f5e066

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

--- Comment #15 from Jan Hubicka  ---
I also tried to reproduce this locally w/o luck.

Looking at the backtrace in detail, there is no DEF_STMT involved.  It walks
from dwarf dies, to RTL constant pool address that points to tree which has
abstract origin that points to symtab node which points to callgraph edge which
points to dead basic block.

The pointer from cgraph node to edge that should be removed.
I can add code to clear pointers SSA_NAME->def_stmt bit there is no def stmt in
the backtrace, so it would not help here.
W/o reproducer it seem hard to tell what is/was real cause of this issue...

Honza

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

--- Comment #16 from Jan Hubicka  ---
I was trying to reproduce some kind of ICE for a while, trying to also rebuild
with ggc forced on every ggc_collect call, but no luck.

I wonder if you happen to know specific gcc regression that was failing and if
it was patched or not...

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

--- Comment #18 from Jan Hubicka  ---
> Looking around the only place (we don't know whether this was WPA or LTRANS)
> we'd have a cgraph with edges is during clone materialization which pointed
> me at cgraph_node::release_body which frees the body but fails to eventually
> zap ->call_stmt references

This I agree with, but during our last discussion I went through all
release_body calls and found none which would match this scenario - they are
all on paths where we zap cgraph edges to (it is only makes sense to exist in
this case, since we are supposed to keep cgrpah edges in sync with actual body
and after feeing the body this would leave cgaph in inconsistent stage).

I will try to move tree to 20210306 and see if that helps.

I can simply add cgraph edge removal to release_body to make code bit more
robust - while most uses erases edges earlier, it is almost free to check the
pointer for being NULL twice.  Still it is weird that the bug does not
reproduce with allways collect.

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

--- Comment #19 from Jan Hubicka  ---
Created attachment 50485
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50485&action=edit
small refactoring

this patch moves the removal to release_body and removes the calls on those
paths where removal is done just after call to it (as opposed to being done
earlier or via reset cal).

But still there is no code path where it should make difference. Pehraps the
assert will catch something interesting. Tests are running.

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #20 from Jan Hubicka  ---
I re-tried with g:0ad6a2e2f0c667f9916cfcdb81f41f6055f1d0b3
and it builds all fine even with --param ggc-min-expand=0 --param
ggc-min-heapsize=0. It seems that --enable-checking=gcac is now noop.

@doko: perhaps using --param ggc-min-expand=0 --param ggc-min-heapsize=0 on
your setup may trigger the problem again.  There is some chance that i.e. the
qt headers are the cause, but I am tempted to close the bug as WORKSFORME after
committing the refactoring patch.

[Bug ipa/99447] [11 Regression] ICE (segfault) in lookup_page_table_entry

2021-03-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #27 from Jan Hubicka  ---
Even with pie and fat LTO the compilation works well. In addition I committed
patch that should make it clear that we to not stale pointers.

Without a reproducer I am not sure what we can do more, so perhaps we can
resolve it as WORKSFORME.

[Bug ipa/99309] [10/11 Regression] Segmentation fault with __builtin_constant_p usage at -O2

2021-03-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99309

--- Comment #5 from Jan Hubicka  ---
As discussed, I can prepare patch to make inliner to redirect
__builtin_constant_p to __builtin_true whenever inliner detect that the
expression is compile time ocnstant.  This will avoid us eventually hitting
unreachable when late optimizations forget to make the transformation.

I was worried about this idea since this will still lead to some inconsistency
since uses guarded by the __builtin_constnat_p may or may not be constant
propagated and it seems logical to assume that in the block guarded by
builtin_constnat_p the expression will indeed evaluate to compile time
constant.

However we can get similar inconsistencies with alias oracle walking limits as
well, so these constructions are generally fragile (but seems increasingly
common in C++ codebases).

It would be still nice to have fre5 to constant propagate this. IPA analysis
are very simplistics.
Richi, any idea on this?

[Bug ipa/98265] [10/11 Regression] gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2021-04-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jan Hubicka  ---
Trunk now generates on the unreduced testcase:
.file   "test.cpp"
.text
.p2align 4
.globl  _Z1f
.type   _Z1f, @function
_Z1f:
.LFB6287:
.cfi_startproc
mulss   %xmm3, %xmm0
movq%rdi, %rax
mulss   %xmm3, %xmm1
mulss   %xmm3, %xmm2
movss   %xmm0, (%rdi)
movss   %xmm1, 4(%rdi)
movss   %xmm2, 8(%rdi)
ret
.cfi_endproc
.LFE6287:
.size   _Z1f, .-_Z1f
.ident  "GCC: (GNU) 11.0.1 20210331 (experimental)"
.section.note.GNU-stack,"",@progbits

[Bug middle-end/99857] [11 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) by r11-7926

2021-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99857

--- Comment #6 from Jan Hubicka  ---
Thanks for a testcase, it makes things easier to debug indeed :)
The problem is that openmp uses declare_vairant_alt on symbols to make them
special definitions, but the definition flag is not set.  That makes
free_lang_data to call release_body and since the code depends on references
things gets out of sync.

I am testing.

diff --git a/gcc/tree.c b/gcc/tree.c
index 7c44c226a33..e4e74ac8afc 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5849,7 +5849,7 @@ free_lang_data_in_decl (tree decl, class free_lang_data_d
*fld)
   if (!(node = cgraph_node::get (decl))
  || (!node->definition && !node->clones))
{
- if (node)
+ if (node && !node->declare_variant_alt)
node->release_body ();
  else
{

For next stage1 I think we want to set definition bit for them and remove all
the special cases of declare_vairant_alt that makes them to behave as
definitions. We also want to add checking that !definition symbols are extenral
symbols which is missed in the verifier.

[Bug lto/100010] [8/9/10/11 Regression] ICE in lto_output_node, at lto-cgraph.c:447 (-fdevirtualize-at-ltrans) since r6-6384-gceda2c69d5219719

2021-04-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100010

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #2 from Jan Hubicka  ---
mine.

[Bug ipa/92535] [10 regression] ICF is relatively expensive and became less effective

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92535

Jan Hubicka  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
Summary|[10/11 regression] ICF is   |[10 regression] ICF is
   |relatively expensive and|relatively expensive and
   |became less effective   |became less effective

--- Comment #17 from Jan Hubicka  ---
For GCC 11 we now get faster build times with ICF than without on cc1plus,
Firefox and clang LTO build.  So I think we can consider it no longer
regression while ICF can always be improved (and I have some changes queues for
next stage1).

I have no plan to backport this to gcc10, so unasigning.

[Bug ipa/99309] [10/11 Regression] Segmentation fault with __builtin_constant_p usage at -O2

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99309

Jan Hubicka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Jan Hubicka  ---
Have WIP patch to attach predicates to buildtin_constant_p and redirect to true
if inliner works out that it is a constat (still relying on late passes to
optimize the if branch well).

>From all the options I can think of this seems best even though it may end up
in relatively rare cases that we do the (very simple) propagation at IPA time
and late optimizations won't.  Without explicitly disabling passes (where I
think this is fine to happen) all testcases we seen so far was of the form that
constant was eventually propagated but only after we folded builtin_constant_p
to false.

Overall it is not possible to assure that builtin_constant_p on memory will
fold to true only if all uses of the memory later in the if branch will ford to
constant since AO has walking limits.

[Bug ipa/80726] [8/9/10/11 Regression] Destructor not inlined anymore (regression)

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80726

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Jan Hubicka  ---
This is a dup that is fixed on mainline.

*** This bug has been marked as a duplicate of bug 98265 ***

[Bug ipa/98265] [10 Regression] gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2021-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

Jan Hubicka  changed:

   What|Removed |Added

 CC||cuzdav at gmail dot com

--- Comment #12 from Jan Hubicka  ---
*** Bug 80726 has been marked as a duplicate of this bug. ***

[Bug ipa/92394] operand_equal_p should compare as base+offset when comparing addresses

2021-04-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92394

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Jan Hubicka  ---
This is fixed now since we have way to overload operand_equal_p in ICF.

[Bug ipa/97389] [11 Regression] Segfault in tramp3d since r11-3825-g71dbabccbfb295c8

2020-10-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97389

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #1 from Jan Hubicka  ---
Mine.

[Bug ipa/97389] [11 Regression] Segfault in tramp3d since r11-3825-g71dbabccbfb295c8

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97389

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jan Hubicka  ---
Fixed.

[Bug bootstrap/97350] [11 Regression] Ada bootstrap fails with: self_referential_size, at stor-layout.c:172

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97350

--- Comment #7 from Jan Hubicka  ---
Interesting, i get different ICE
during GIMPLE pass: slp
../../gcc/ada/libgnat/s-os_lib.adb: In function
‘system__os_lib__normalize_pathname__missed_drive_letter’:
../../gcc/ada/libgnat/s-os_lib.adb:2133:7: internal compiler error: in
vect_init_pattern_stmt, at tree-vect-patterns.c:115
 2133 |   function Missed_Drive_Letter (Name : String) return Boolean is
  |   ^
0x6534d9 vect_init_pattern_stmt
../../gcc/tree-vect-patterns.c:115
0x13e2913 vect_set_pattern_stmt
../../gcc/tree-vect-patterns.c:133
0x13e2913 vect_mark_pattern_stmts
../../gcc/tree-vect-patterns.c:5287
0x13e2913 vect_pattern_recog_1
../../gcc/tree-vect-patterns.c:5403
0x13ef3a1 vect_pattern_recog(vec_info*)
../../gcc/tree-vect-patterns.c:5543
0xcda2ce vect_slp_analyze_bb_1
../../gcc/tree-vect-slp.c:3819
0xcda2ce vect_slp_region
../../gcc/tree-vect-slp.c:3918
0xcda2ce vect_slp_bbs
../../gcc/tree-vect-slp.c:4074
0xcdb9d8 vect_slp_function(function*)
../../gcc/tree-vect-slp.c:4125
0xcdd085 execute
../../gcc/tree-vectorizer.c:1432

[Bug ipa/97403] New: Ancestor jump function should be generalized

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97403

Bug ID: 97403
   Summary: Ancestor jump function should be generalized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

In the following we should be able to propagate through test in ipa-cp, but we
are not.

struct foo {int bar};
__attribute__ ((noinline))
test2(int *p)
{ 
  return *p;
}
__attribute__ ((noinline))
test (struct foo *array)
{ 
  return test2 (&array[4].bar);
}
main()
{ 
  const struct foo array[5]={{1},{2},{3},{4},{5}};
  test(array);
}

[Bug bootstrap/97350] [11 Regression] Ada bootstrap fails with: self_referential_size, at stor-layout.c:172

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97350

--- Comment #10 from Jan Hubicka  ---
OK, I was poking a bit about the problem and indeed the bootstrapped gnat with
-O3 and PGO ices, while gnat built normally does not.
We fail:

#2  0x019b7dcb in _Z13variable_sizeP9tree_node (size=0x77448900) at
../../gcc/stor-layout.c:172
172   gcc_assert (self_refs.length () > 0);
(gdb) l
167   if (TREE_CODE (t) == CALL_EXPR || self_referential_component_ref_p
(t))
168 return size;
169
170   /* Collect the list of self-references in the expression.  */
171   find_placeholder_in_expr (size, &self_refs);
172   gcc_assert (self_refs.length () > 0);
173
174   /* Obtain a private copy of the expression.  */
175   t = size;

here the gcc_assert fires. Sadly self_refs has no debug info.
Size is:
 
unit-size 
align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7745f0a8 precision:128 min  max
>
readonly
arg:0 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7746ae70 precision:8 min  max  context 
RM size  RM max >
readonly visited
arg:0 
readonly visited
arg:0 
readonly nothrow visited
arg:0 
visited
arg:0 
visited> arg:1 >>
arg:1 >
arg:1 
readonly visited arg:0  arg:1
>>
arg:1 
readonly
arg:0 
readonly
arg:0 
readonly
arg:0 
readonly
arg:0 
readonly visited arg:0 >
arg:1 
readonly visited arg:0 >>
arg:1 >> arg:1 >
arg:2  constant 0>>

.P_BOUNDS->UB0 >= .P_BOUNDS->LB0 ? (bitsizetype) (((sizetype) .P_BOUNDS->UB0 - (sizetype) .P_BOUNDS->LB0) + 1) * 8
: 0;

I am not expert on Ada type sizes but it seems like well formed expression.

and backtrace is:
#0  _Z14internal_errorPKcz (gmsgid=0xac ) at ../../gcc/diagnostic.c:1752
#1  0x010ba114 in _Z11fancy_abortPKciS0_ (file=0x23a38a8 "in %s, at
%s:%d", line=172, function=0x1e507bb "self_referential_size") at
../../gcc/diagnostic.c:1824
#2  0x019b7dcb in _Z13variable_sizeP9tree_node (size=0x77448900) at
../../gcc/stor-layout.c:172
#3  _Z13variable_sizeP9tree_node (size=0x77448900) at
../../gcc/stor-layout.c:67
#4  0x0128f4e0 in finalize_type_size (type=0x7746c3f0) at
../../gcc/stor-layout.c:1967
#5  0x0128df40 in _Z11layout_typeP9tree_node (type=0x23a38a8) at
../../gcc/stor-layout.c:2625
#6  0x0190e307 in _ZL18build_array_type_1P9tree_nodeS0_bbb.lto_priv.0
(elt_type=0x7745f3f0, index_type=0x7746c348, typeless_storage=59,
shared=172, set_canonical=59)
at ../../gcc/tree.c:8194
#7  0x01567bcc in _Z18gnat_to_gnu_entityiP9tree_nodeb
(gnat_entity=37370024, gnu_expr=0x1e507bb, definition=59) at
../../gcc/ada/gcc-interface/decl.c:2366
#8  0x015618f5 in _Z16gnat_to_gnu_typei (gnat_entity=37370024) at
../../gcc/ada/gcc-interface/decl.c:4887
#9  0x015687a9 in _Z18gnat_to_gnu_entityiP9tree_nodeb
(gnat_entity=37370024, gnu_expr=0x1e507bb, definition=59) at
../../gcc/ada/gcc-interface/decl.c:4814
#10 0x015618f5 in _Z16gnat_to_gnu_typei (gnat_entity=37370024) at
../../gcc/ada/gcc-interface/decl.c:4887
#11 0x019ea47c in gigi (gnat_root=37370024, max_gnat_node=31786939,
number_name=30016059, nodes_ptr=0xac, flags_ptr=0x1ca023b, next_node_ptr=0x73,
prev_node_ptr=0x0, 
elists_ptr=0x0, elmts_ptr=0x0, strings_ptr=0x0, string_chars_ptr=0x0,
list_headers_ptr=0x0, number_file=12, file_info_ptr=0x7fffe3c0,
standard_boolean=16, standard_integer=37, 
standard_character=107, standard_long_long_float=100,
standard_exception_type=1704, gigi_operating_mode=0) at
../../gcc/ada/gcc-interface/trans.c:463
#12 0x019e406d in back_end__call_back_end (mode=(unknown: 1704)) at
../../gcc/ada/back_end.adb:155
#13 0x01928eed in _ada_gnat1drv () at ../../gcc/ada/gnat1drv.adb:1608
#14 0x01910a4b in _ZL15gnat_parse_filev.lto_priv.0 () at
../../gcc/ada/gcc-interface/misc.c:118
#15 0x019107f4 in _ZL12compile_filev.lto_priv.0 () at
../../gcc/toplev.c:460
#16 0x018f3296 in _ZN6toplev4mainEiPPc (this=0x7fffe63e, argc=21,
argv=0x7fffe728) at ../../gcc/toplev.c:2321
#17 0x018f26ec in main (argc=30016059, argv=0x1ca023b) at
../../gcc/main.c:39

Breakpointing on 171 works and vector seems to be filled in. However the
disasembly shows:

   0x019b7db2 <+98>:callq  0x1a1c050
<_Z24find_placeholder_in_exprP9tree_nodeP3vecIS0_7va_heap6vl_ptrE>
=> 0x019b7db7 <+103>:   mov$0x1e507bb,%edx
   0x019b7dbc <+108>:   mov$0xac,%esi
   0x019b7dc1 <+113>:   mov$0x1ca0231,%edi
   0x019b7dc6 <+118>:   callq  0x10ba0f0 <_Z11fancy_abortPKciS0_>

so it

[Bug bootstrap/97350] [11 Regression] Ada bootstrap fails with: self_referential_size, at stor-layout.c:172

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97350

--- Comment #11 from Jan Hubicka  ---
In WPA we seem to see the store to vector:
Propagated modref for push_without_duplicates/1089577   
  loads:
Limits: 32 bases, 16 refs   
Every base  
  stores:   
Limits: 32 bases, 16 refs   
  Base 0:struct vec (alias set 544) 
Ref 0:unsigned int (alias set 3)
  Every access  
  Base 1:union tree_node * (alias set 21)   
Ref 0:union tree_node * (alias set 21)  
  Every access  


Propagated modref for find_placeholder_in_expr/1089578  
  loads:
Limits: 32 bases, 16 refs   
Every base  
  stores:   
Limits: 32 bases, 16 refs   
  Base 0:struct vec (alias set 544) 
Ref 0:unsigned int (alias set 3)
  Every access  
  Base 1:union tree_node * (alias set 21)   
Ref 0:union tree_node * (alias set 21)  
  Every access  

I guess base 0, ref 0 is the length adjustment (m_num is unsigned int).
What seems interesting is that find_placeholder_in_expr lives in other
partition then variable_size.
It is read as:

Read modref for find_placeholder_in_expr/1089578
  loads:
Limits: 32 bases, 16 refs   
Every base  
  stores:   
Limits: 32 bases, 16 refs   
  Base 0: alias set 17  
Ref 0: alias set 3  
  Every access  
  Base 1: alias set 16  
Ref 0: alias set 16 
  Every access  

so alias set 17 and 3 are vec and unsigned_int.
However in fre3 we get:

ipa-modref: call stmt find_placeholder_in_expr (size_8(D), &self_refs); 
ipa-modref: call to find_placeholder_in_expr/1089578 does not clobber ref:
self_refs.m_vec alias sets: 11->12

This seems odd: alias set 11 and 12 seems quite different form 17 and 3.
Moreover 3 is usual alias set for a builtin type (unsigned int).

[Bug bootstrap/97350] [11 Regression] Ada bootstrap fails with: self_referential_size, at stor-layout.c:172

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97350

--- Comment #12 from Jan Hubicka  ---
Aha, the code in question is:
  # USE = nonlocal null { D.8330 D.22051 D.22054 D.22059 D.22060 } (nonlocal,
escaped, interposable)
  # CLB = nonlocal null { D.8330 D.22051 D.22054 D.22059 D.22060 } (nonlocal,
escaped, interposable)
  find_placeholder_in_expr (size_8(D), &self_refs); 
  # PT = nonlocal escaped null  
  _30 = self_refs.m_vec;
  if (_30 != 0B)
goto ; [100.00%] 
  else  
goto ; [0.00%]   

   [count: 7690]:
  _31 = MEM[(const struct vec *)_30].m_vecpfx.m_num;
  if (_31 == 0) 
goto ; [0.00%]   
  else  
goto ; [100.00%] 

What we seem to optimize out is the to m_vec, here alias set 12 makes more
sense.
and indeed it seems that this is missing in the summary. Smells like a bug in
ipa_merge_modref_summary_after_inlining since the function is split and
re-merged by inliner.

[Bug bootstrap/97350] [11 Regression] Ada bootstrap fails with: self_referential_size, at stor-layout.c:172

2020-10-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97350

--- Comment #13 from Jan Hubicka  ---
bug in SCC discovery.   I am testing
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 4f86b9ccea1..771a0a88f9a 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1603,6 +1603,11 @@ make_pass_ipa_modref (gcc::context *ctxt)
 static bool
 ignore_edge (struct cgraph_edge *e)
 {
+  /* We merge summaries of inline clones into summaries of functions they
+ are inlined to.  For that reason the complete function bodies must
+ act as unit.  */
+  if (!e->inline_failed)
+return false;
   enum availability avail;
   cgraph_node *callee = e->callee->function_or_virtual_thunk_symbol
  (&avail, e->caller);

[Bug c/97172] [11 Regression] ICE: tree code ‘ssa_name’ is not supported in LTO streams since r11-3303-g6450f07388f9fe57

2020-10-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172

--- Comment #8 from Jan Hubicka  ---
Generally LTO is organized into a global stream containing types, decls etc.
and local streams containing funtion bodies and initializers.
Global stream thus can not contain references that are local to function
bodies, like SSA_NAME, beause these are not instantiated at WPA stage and thus
have no meaing.

The ICE is about SSA_NAME being refered by something that is in the global
stream. Judging from the testcase there is probably reference to variadic type
and the variadic type now has SSA_NAME in its TYPE_SIZE or so, which should not
happen.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #19 from Jan Hubicka  ---
get_order unwinds to:

   [local count: 1073741824]:
  _1 = __builtin_constant_p (size_68(D));
  if (_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  if (size_68(D) == 0)
goto ; [21.72%]
  else
goto ; [78.28%]

   [local count: 420262548]:
  if (size_68(D) <= 4095)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 210131274]:
  _2 = size_68(D) + 18446744073709551615;
  _3 = __builtin_constant_p (_2);
  if (_3 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 105065637]:
  _4 = (signed long) _2;
  if (_4 >= 0)
goto ; [59.00%]
  else
goto ; [41.00%]

... [very long code]

   [local count: 105065637]:
  __asm__("bsrq %1,%q0" : "=r" bitpos_75 : "rm" _2, "0" -1);
  iftmp.1_73 = bitpos_75 + -11;

   [local count: 210131274]:
  # iftmp.1_67 = PHI <52(6), iftmp.1_73(69), 51(7), 50(8), 49(9), 48(10),
47(11), 46(12), 45(13), 44(14), 43(15), 42(16), 41(17), 40(18), 39(19), 38(20),
37(21), 36(22), 35(23), 34(24), 33(25), 32(26), 31(27), 30(28), 29(29), 28(30),
27(31), 26(32), 25(33), 24(34), 23(35), 22(36), 21(37), 20(38), 19(39), 18(40),
17(41), 16(42), 15(43), 14(44), 13(45), 12(46), 11(47), 10(48), 9(49), 8(50),
7(51), 6(52), 5(53), 4(54), 3(55), 2(56), 1(57), 0(58), -1(59), -2(60), -3(61),
-4(62), -5(63), -6(64), -7(65), -8(66), -10(68), -9(67)>
  goto ; [100.00%]

   [local count: 536870913]:
  size_69 = size_68(D) + 18446744073709551615;
  size_70 = size_69 >> 12;
  __asm__("bsrq %1,%q0" : "=r" bitpos_72 : "rm" size_70, "0" -1);
  _74 = bitpos_72 + 1;

   [local count: 1073741824]:
  # _66 = PHI <52(3), 0(4), iftmp.1_67(70), _74(71)>
  return _66;

We get summary:

IPA function summary for get_order/303 inlinable
  global time: 8.716289 
  self size:   201  
  global size: 201  
  min size:   4 
  self stack:  0
  global stack:0
size:4.00, time:3.00
size:3.00, time:2.00,  executed if:(not inlined)
size:4.00, time:2.00,  executed if:(op0 not constant)   
size:2.00, time:0.782800,  executed if:(op0 != 0)   
size:3.00, time:0.391400,  executed if:(op0 > 4095) && (op0 != 0)   
size:2.00, time:0.195700,  executed if:(op0 > 4095) && (op0 != 0) &&
(op0 not constant)
size:3.00, time:0.173194,  executed if:(op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.00, time:0.086597,  executed if:(op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.00, time:0.043299,  executed if:(op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.00, time:0.021649,  executed if:(op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:3.00, time:0.010825,  executed if:(op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
size:168.00, time:0.010825,  executed if:(op0,(# +
18446744073709551615),(# & 288230376151711744) == 0) && (op0,(# +
18446744073709551615),(# & 576460752303423488) == 0) && (op0,(# +
18446744073709551615),(# & 1152921504606846976) == 0) && (op0,(# +
18446744073709551615),(# & 2305843009213693952) == 0) && (op0,(# +
18446744073709551615),(# & 4611686018427387904) == 0) && (op0,(# +
18446744073709551615),((signed long) #) >= 0) && (op0 > 4095) && (op0 != 0)
  calls:
__builtin_constant_p/4546 function body not available   
  freq:0.20 loop depth: 0 size: 0 time:  0 predicate: (op0 > 4095) && (op0
!= 0)
   op0 points to local or readonly memory   
__builtin_constant_p/4546 func

[Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

Jan Hubicka  changed:

   What|Removed |Added

  Component|c   |ipa

--- Comment #48 from Jan Hubicka  ---
Changing component to IPA.

Concerning comment #37 about summaries not being updated after ipa-cp, I was
actually wrong there: they are updated and the behaviour is quite sane. We work
out that kmalloc has constant argument and produce specialized clone for it.
Because it is estimated quite large it is not inlined.  While when ipa-cp is
disabled we work out that inlining it will simplify body a lot and bump up the
limits.

Jakub, concerning
 asm volatile ("movl $-1, %eax") 
that was of course a hack.  I was confused about bsr instruction - for some
time I tought it stores only 8bit value until I re-read the manual.

Honza

[Bug tree-optimization/97519] New: builtin_constant_p (x + cst) should be optimized to builtin_constant_p (x)

2020-10-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97519

Bug ID: 97519
   Summary: builtin_constant_p (x + cst) should be optimized to
builtin_constant_p (x)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

As discussed in PR97445 we should optimize builtins_constant_p (var+cst) and
similar cases.

[Bug ipa/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

Jan Hubicka  changed:

   What|Removed |Added

 Depends on||97519, 97503

--- Comment #49 from Jan Hubicka  ---
Patch posted for the inline heuristics change
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556685.html

Also opened spearate PR on builtin_constant_p folding. I am not sure how to
implement that correctly (what are the conditions that make this valid -
perhaps for all "i op cst" after all?)

Martin, how does the if chain conversion behave on the example?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503
[Bug 97503] Suboptimal use of cntlzw and cntlzd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97519
[Bug 97519] builtin_constant_p (x + cst) should be optimized to
builtin_constant_p (x)

[Bug ipa/97576] [11 Regression] ICE: verify_cgraph_node failed (error: reference to dead statement)

2020-10-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97576

--- Comment #2 from Jan Hubicka  ---
The problem here is that clone materialization invalidates statement pointers
in refs.  We clean these at the begining of late optimization, I guess it
should be done on demand during materialization (they are not used past that
point, but we do not have convenient place to clear them).

[Bug c/97578] ice during IPA pass: inline

2020-10-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97578

Jan Hubicka  changed:

   What|Removed |Added

 CC||jakub at redhat dot com,
   ||mjambor at suse dot cz
  Component|ipa |c
Summary|[11 Regression] ice during  |ice during IPA pass: inline
   |IPA pass: inline|

--- Comment #3 from Jan Hubicka  ---
What hits us here is the hack I needed to introduce to
ipa_param_adjustments::modify_call which triggers materialization to make debug
info code working.  In this case redirection happens from tree-inline and
materialization gets us back to tree-inline. Inliner is however not intended to
be recursive (it uses bb->aux pointers and in this case it will use it twice).

Martin, Jambor,
it would be really great if we did not need to materialize.  I do not see how
attaching debug info to decls can work if caller is in one partition and callee
in another.

We could also just add a loop walking all such calls and trigger
materialization before going to tree-inline to avoid the recursion problem, but
still IMO debug info will get missing on the partitioning boundary. We could
also just avoid the (ab)use of bb->aux and replace it by a vector here which
would be also an option.

[Bug ipa/97576] [11 Regression] ICE: verify_cgraph_node failed (error: reference to dead statement)

2020-10-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97576

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Jan Hubicka  ---
Fixed.

[Bug ipa/97593] [11 Regression] ICE in gt_pch_nx, at symbol-summary.h:290 since r11-4329-g67f3791f7d133214

2020-10-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97593

--- Comment #2 from Jan Hubicka  ---
Hmm, this is anoying: we can not store summary to PCH. I guess we want to
collect thunks to a vector and annotate them to callgraph at finalization time
:(

[Bug fortran/97652] New: New pdt14 failure after g:617695cdc2b3d950f1e4deb5ea85d5cc302943f4

2020-10-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97652

Bug ID: 97652
   Summary: New pdt14 failure after
g:617695cdc2b3d950f1e4deb5ea85d5cc302943f4
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

pdt14 is miscompiled with -fipa-modref.  This is triggered by handling fnspec,
but it seems to only trigger latent problem.

The only disambiguations are:
ipa-modref: call stmt push_8 (&root, &C.4105);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11
ipa-modref: call stmt push_8 (&root, &C.4104);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11
ipa-modref: call stmt push_8 (&root, &C.4103);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11
ipa-modref: call stmt push_8 (&root, &C.4105);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11
ipa-modref: call stmt push_8 (&root, &C.4104);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11
ipa-modref: call stmt push_8 (&root, &C.4103);
ipa-modref: call to push_8/6 does not clobber ref:
__vtab_link_module_Pdtlink_8._deallocate alias sets: 12->11

these ought to be safe since __vtab_link_module_Pdtlink_8 is readonly in the
testcase. With LTO we detect that variable as such (and the testcase stil work
without modref and fails different with modref).

fre3 does quite a lot of additional changes and I am not sure what gets wrong
here:

 __attribute__((externally_visible))
 main (integer(kind=4) argc, character(kind=1) * * argv)
 {
+  struct array01_unknown cdesc.10;
+  struct array01_unknown cdesc.9;
+  real(kind=8) res;
+  struct Pdtlink_8 * previous;
+  struct Pdtlink_8 * current;
+  real(kind=8) res;
   struct pdtlink_8 * root;
   static integer(kind=4) options.11[7] = {2150, 4095, 1, 1, 1, 0, 31};
-  real(kind=8) _7;
-  integer(kind=4) _8;
-  real(kind=8) _9;
-  integer(kind=4) _10;
-  real(kind=8) _11;
-  integer(kind=4) _12;
-  real(kind=8) _13;
-  integer(kind=4) _14;
+  struct Pdtlink_8 * _15;
+  struct Pdtlink_8 * _17;
+  struct Pdtlink_8 * _21;
+  struct Pdtlink_8 * _22;
+  void (*) () _23;
+  struct Pdtlink_8 * _25;
+  void (*) () _26;

[local count: 1073741824]:
   _gfortran_set_args (argc_2(D), argv_3(D));
@@ -1972,52 +2120,75 @@
   push_8 (&root, &C.4103);
   push_8 (&root, &C.4104);
   push_8 (&root, &C.4105);
-  _7 = pop_8 (&root);
-  _8 = (integer(kind=4)) _7;
-  if (_8 != 3)
-goto ; [0.04%]
+  _15 = MEM[(struct Pdtlink_8 * &)&root];
+  if (_15 != 0B)
+goto ; [70.00%]
   else
-goto ; [99.96%]
+goto ; [30.00%]

-   [local count: 429496]:
-  _gfortran_stop_numeric (1, 0);
-
-   [local count: 1073312329]:
-  _9 = pop_8 (&root);
-  _10 = (integer(kind=4)) _9;
-  if (_10 != 2)
-goto ; [0.04%]
+   [local count: 75913541732]:
+  # current_16 = PHI <_15(2), _17(3)>
+  # previous_29 = PHI <_15(2), current_16(3)>
+  _17 = current_16->next;
+  if (_17 == 0B)
+goto ; [0.00%]
   else
-goto ; [99.96%]
-
-   [local count: 429324]:
-  _gfortran_stop_numeric (2, 0);
+goto ; [100.00%]

-   [local count: 1072883005]:
-  _11 = pop_8 (&root);
-  _12 = (integer(kind=4)) _11;
-  if (_12 != 1)
-goto ; [0.04%]
+   [count: 0]:
+  res_19 = current_16->n;
+  _21 = previous_29->next;
+  if (_21 == 0B)
+goto ; [30.00%]
   else
-goto ; [99.96%]
+goto ; [70.00%]

-   [local count: 429152]:
-  _gfortran_stop_numeric (3, 0);
+   [count: 0]:
+  _22 = _15->next;
+  if (_22 != 0B)
+goto ; [70.00%]
+  else
+goto ; [30.00%]

-   [local count: 1072453853]:
-  _13 = pop_8 (&root);
-  _14 = (integer(kind=4)) _13;
-  if (_14 != 0)
-goto ; [0.04%]
+   [count: 0]:
+  MEM  [(struct dtype_type *)&cdesc.9 + 24B] = {};
+  cdesc.9.dtype.elem_len = 24;
+  cdesc.9.dtype.rank = 1;
+  cdesc.9.dtype.type = 11;
+  cdesc.9.dim[0].lbound = 1;
+  cdesc.9.dim[0].stride = 1;
+  cdesc.9.dim[0].ubound = 1;
+  cdesc.9.data = _22;
+  _23 = __vtab_link_module_Pdtlink_8._deallocate;
+  __builtin_unreachable ();
+
+   [count: 0]:
+  __builtin_unreachable ();
+
+   [count: 0]:
+  _25 = _21->next;
+  if (_25 != 0B)
+goto ; [70.00%]
   else
-goto ; [99.96%]
+goto ; [30.00%]
+
+   [count: 0]:
+  MEM  [(struct dtype_type *)&cdesc.10 + 24B] = {};
+  cdesc.10.dtype.elem_len = 24;
+  cdesc.10.dtype.rank = 1;
+  cdesc.10.dtype.type = 11;
+  cdesc.10.dim[0].lbound = 1;
+  cdesc.10.dim[0].stride = 1;
+  cdesc.10.dim[0].ubound = 1;
+  cdesc.10.data = _25;
+  _26 = __vtab_link_module_Pdtlink_8._deallocate;
+  __builtin_unreachable ();

-   [local count: 428981]:
-  _gfortran_stop_numeric (4, 0);
+ 

[Bug fortran/97652] New pdt14 failure after g:617695cdc2b3d950f1e4deb5ea85d5cc302943f4

2020-10-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97652

--- Comment #1 from Jan Hubicka  ---
Actually there is another propagation happening in ipa-cp analysis:

--- aa/pdt_14.f03.077i.cp   2020-10-31 09:00:52.809726530 +0100
+++ pdt_14.f03.077i.cp  2020-10-31 09:10:35.204755828 +0100
@@ -10,6 +10,8 @@
   Starting walk at: push_8 (&root, &C.4104);
   instance pointer: &root  Outer instance pointer: root offset: 0 (bits) vtbl
reference: 
   Function call may change dynamic type:push_8 (&root, &C.4103);
+ipa-modref: call stmt push_8 (&root, &C.4103);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
 Determining dynamic type for call: push_8 (&root, &C.4104);
   Starting walk at: push_8 (&root, &C.4104);
   instance pointer: &C.4104  Outer instance pointer: C.4104 offset: 0 (bits)
vtbl reference: 
@@ -19,6 +21,10 @@
   instance pointer: &root  Outer instance pointer: root offset: 0 (bits) vtbl
reference: 
   Function call may change dynamic type:push_8 (&root, &C.4104);
   Function call may change dynamic type:push_8 (&root, &C.4103);
+ipa-modref: call stmt push_8 (&root, &C.4104);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
+ipa-modref: call stmt push_8 (&root, &C.4103);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
 Determining dynamic type for call: push_8 (&root, &C.4105);
   Starting walk at: push_8 (&root, &C.4105);
   instance pointer: &C.4105  Outer instance pointer: C.4105 offset: 0 (bits)
vtbl reference: 
@@ -30,6 +36,12 @@
   Function call may change dynamic type:push_8 (&root, &C.4105);
   Function call may change dynamic type:push_8 (&root, &C.4104);
   Function call may change dynamic type:push_8 (&root, &C.4103);
+ipa-modref: call stmt push_8 (&root, &C.4105);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
+ipa-modref: call stmt push_8 (&root, &C.4104);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
+ipa-modref: call stmt push_8 (&root, &C.4103);
+ipa-modref: call to push_8/6 does not clobber ref: root alias sets: 14->14
 Determining dynamic type for call: _3 = pop_8 (&root);
   Starting walk at: _3 = pop_8 (&root);
   instance pointer: &root  Outer instance pointer: root offset: 0 (bits) vtbl
reference: 
@@ -129,10 +141,14 @@
no arg info
 callsite  ch2701/7 -> pop_8/5 : 
param 0: UNKNOWN
+ Aggregate passed by reference:
+   offset: 0, type: struct pdtlink_8 *, CONST: 0B
  value: 0x0, mask: 0xfff8
  VR  [1, -1]
 callsite  ch2701/7 -> push_8/6 : 
param 0: UNKNOWN
+ Aggregate passed by reference:
+   offset: 0, type: struct pdtlink_8 *, CONST: 0B
  value: 0x0, mask: 0xfff8
  VR  [1, -1]
param 1: CONST: &C.4105 -> 3.0e+0
@@ -140,6 +156,8 @@
  Unknown VR
 callsite  ch2701/7 -> push_8/6 : 
param 0: UNKNOWN
+ Aggregate passed by reference:
+   offset: 0, type: struct pdtlink_8 *, CONST: 0B
  value: 0x0, mask: 0xfff8
  VR  [1, -1]
param 1: CONST: &C.4104 -> 2.0e+0

The jump function is not used for cloning, only triggers inline, but the
conclusion seems wrong.  push_8 can make root non-0.  Root is of type pdtlink_8
so perhaps Frontend produces multiple copies of these.

push_8 store is:
 - Analyzing store: *self_34(D) 
   - Recording base_set=8 ref_set=8 parm=0  
so indeed a different alias set than 14 used by ch2701

[Bug middle-end/97672] [11 Regression] gfortran.dg/pdt_14.f03 – runtime: timeout with -O2 (and higher)

2020-11-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97672

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Jan Hubicka  ---
Duplicate. I added some analysis to the other PR. It is apprently a TBAA issue
in the frontend.

*** This bug has been marked as a duplicate of bug 97652 ***

[Bug fortran/97652] New pdt14 failure after g:617695cdc2b3d950f1e4deb5ea85d5cc302943f4

2020-11-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97652

Jan Hubicka  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org

--- Comment #2 from Jan Hubicka  ---
*** Bug 97672 has been marked as a duplicate of this bug. ***

[Bug c/97578] ice during IPA pass: inline

2020-11-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97578

--- Comment #8 from Jan Hubicka  ---
OK, I comitted patch as is and we could see if any memory can be conserved by
being more precise.  I still think the debug info should not need decls here.
Honza

[Bug ipa/97698] [11 Regression] ICE: Segmentation fault (in duplicate_thunk_for_node) since r11-4587-gae7a23a3fab74

2020-11-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97698

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Jan Hubicka  ---
Fixed.

[Bug ipa/97673] [11 Regression] ICE in remap_gimple_stmt, at tree-inline.c:1922 since r11-4267-g0e590b68fa374365

2020-11-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97673

--- Comment #2 from Jan Hubicka  ---
This should be dup of PR97578

[Bug ipa/97593] [11 Regression] ICE in gt_pch_nx, at symbol-summary.h:290 since r11-4329-g67f3791f7d133214

2020-11-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97593

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #6 from Jan Hubicka  ---
Fixed.

[Bug ipa/97300] [11 regression] several test cases fail after r11-3308

2020-11-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97300

Jan Hubicka  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #2 from Jan Hubicka  ---
Assumed type failures are fixed now by the Fortran array descriptor TBAA fix. 
g:40cb3f8ac875c6cf6610a5f93da571cfdd2a1513

If there are other failures, lets open independent PR for that.

[Bug ipa/97735] New: ipa-prop should handle simple casts

2020-11-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97735

Bug ID: 97735
   Summary: ipa-prop should handle simple casts
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Compiling:
test (int *a, int size)
{
 __builtin_memset (a, 0, size);
}

gets:
Jump functions:
  Jump functions of caller  __builtin_memset/1:
  Jump functions of caller  test/0:
callsite  test/0 -> __builtin_memset/1 : 
   param 0: PASS THROUGH: 0, op nop_expr, agg_preserved
 value: 0x0, mask: 0x
 Unknown VR
   param 1: CONST: 0
 value: 0x0, mask: 0x0
 Unknown VR
   param 2: UNKNOWN
 value: 0x0, mask: 0x
 VR  ~[2147483648, -2147483649]
I think we should be able to represent that SIZE is passthrough with a
conversion.

[Bug c++/93008] Need a way to make inlining heuristics ignore whether a function is inline

2020-11-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93008

--- Comment #6 from Jan Hubicka  ---
I just noticed this PR and wonder if there is anything to do on inliner side. 
It uses DECL_DECLARED_INLINE that was invented to distinguish between implicit
inlines and explicit ones. So even if it would be bit misnamed it should mean
"this is an inline hint for inliner", so I guess frontend needs to distinguish
between constexpr and normal places where inline hint still means "inline
more"?

Inliner is really not on level to be able to completely ignore used inline
hints without regressing various code.

I made inline weaker for -O2 in GCC10 but for -O3 we still take it very
seriously and I do not see way out of that: in many cases it is very hard to
predict how much optimization will happen after inlining and a lot of code is
carefully crafted under assumption that some specific inline happens (and a lot
of such code is in C++)

[Bug lto/80379] Redundant note: code may be misoptimized unless -fno-strict-aliasing is used

2020-11-08 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379

--- Comment #3 from Jan Hubicka  ---
The problem here is that the hint is output at decl merging and
-fno-strict-aliasing is a function local flag. At that time we do not even know
what functions will be since units are not streamed in yet.  This means that we
do not know if some unit has function that is -fno-strict-aliasing. So
supressing the warning does not fit the implementation very easily :(

[Bug ipa/97757] [11 Regression] fortran save_6.f90 fails with a segv for -flto -O >= 2

2020-11-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97757

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #1 from Jan Hubicka  ---
indeed this is obviously garbage collected that is weird because all things
should be reachable via the modref summary (where THIS pointer is taken). I
will try cross.

[Bug ipa/97766] ipa/modref-2.c fails on 32 bits targets

2020-11-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97766

Jan Hubicka  changed:

   What|Removed |Added

   Last reconfirmed||2020-11-09
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Jan Hubicka  ---
That value is sizeof(double)*8.
I tpicked double since we have builtin that writes it assumed it is 64 bits on
all targets. Forgot that it can be 32bit.

We could change it to float. Is float of same size everywhere? If not we could
restrict test only to targets where size is known.

[Bug middle-end/97775] New: Wrong code with bitfield

2020-11-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97775

Bug ID: 97775
   Summary: Wrong code with bitfield
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

The follwing testcase reduced from ssd/t2.c 

#include 

void dump (void *p, unsigned int len)
{
  const char digits[17] = "0123456789abcdef";
  unsigned char *a = (unsigned char *)p;
  int i;

  for (i = 0; i < len; i++)
{
  putchar (' ');
  putchar (digits[a[i] / 16]);
  putchar (digits[a[i] % 16]);
}
}

void put (const char s[])
{
  int i;
  for (i = 0; s[i]; i++)
putchar (s[i]);
}

void new_line (void)
{
  putchar ('\n');
}
struct __attribute__((scalar_storage_order("little-endian"), packed)) R1
{
  unsigned S1 : 2;
  unsigned I  : 32;
  unsigned S2 : 2;
  unsigned A1 : 9;
  unsigned A2 : 9;
  unsigned A3 : 9;
  unsigned B  : 1;
};

struct R1 My_R1 = { 2, 0x12345678, 1, 0xAB, 0xCD, 0xEF, 1 };

int main (void)
{
  struct R1 Local_R1;


  Local_R1.B  = 1;

#ifdef BAD
  new_line ();
#endif
  /* { dg-output "Local_R1 : e2 59 d1 48 b4 aa d9 bb.*\n" } */

  Local_R1.S1 = 0;
  Local_R1.I  = 0;
  Local_R1.S2 = 0;
  Local_R1.A1 = 0;
  Local_R1.A2 = 0;
  Local_R1.A3 = 0;
  Local_R1.B  = !Local_R1.B;

  put ("Local_R1 :");
  dump (&Local_R1, sizeof (struct R1));
  new_line ();
  /* { dg-output "Local_R1 : e5 59 d1 48 b0 a0 c1 03.*\n" } */

  new_line ();
  return 0;
}

Defining BAD canges output 
< Local_R1 : 00 00 00 00 00 00 00 00
---
> Local_R1 : 00 00 00 00 00 00 00 80

Difference is already in fre1:
-Value numbering store Local_R1.B to _3
+Value numbering store Local_R1.B to 1

-RPO tracked 17 values available at 3 locations and 17 lattice elements
+RPO tracked 17 values available at 0 locations and 17 lattice elements
+Replaced BIT_FIELD_REF  with 0 in all uses of _1 =
BIT_FIELD_REF ;
+Replaced (signed char) _1 with 0 in all uses of _2 = (signed char) _1;
+Replaced _2 >= 0 with 1 in all uses of _3 = _2 >= 0;
+Deleted redundant store Local_R1.B = _3;
+Removing dead stmt Local_R1.B = _3;
+Removing dead stmt _3 = _2 >= 0;
+Removing dead stmt _2 = (signed char) _1;
+Removing dead stmt _1 = BIT_FIELD_REF ;
 main ()
 {
   struct R1 Local_R1;
-  unsigned char _1;
-  signed char _2;
-  _Bool _3;

:
   Local_R1.B = 1;
@@ -533,10 +540,6 @@
   Local_R1.A1 = 0;
   Local_R1.A2 = 0;
   Local_R1.A3 = 0;
-  _1 = BIT_FIELD_REF ;
-  _2 = (signed char) _1;
-  _3 = _2 >= 0;
-  Local_R1.B = _3;
   put ("Local_R1 :");
   dump (&Local_R1, 8);
   new_line ();

Clearly B should be 0 and not 1.

[Bug middle-end/97775] Wrong code with bitfield

2020-11-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97775

--- Comment #1 from Jan Hubicka  ---
Forgot to say, flags to reproduce are: -Os t2.c -fno-tree-sra -fno-ipa-modref

[Bug rtl-optimization/97836] [11 Regression] wrong code at -O1 on x86_64-pc-linux-gnu by r11-5029

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97836

Jan Hubicka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||hubicka at gcc dot gnu.org

--- Comment #3 from Jan Hubicka  ---
Confirmed.
The wrong code happens already in fre1 where we do:

main ()
{
  int f;
  int * _1;

   :
  _1 = d (&f);
  __builtin_abort ();

}

Modref summary for d is:
  loads:
Limits: 32 bases, 16 refs
  Base 0: alias set 1
Ref 0: alias set 1
  Every access
  stores:
Limits: 32 bases, 16 refs
  Base 0: alias set 1
Ref 0: alias set 1
  Every access
  parm 0 flags: direct noclobber noescape unused

for body:

d (int * e)
{
  int D.1973;
  int a.0_1;

   :
  a.0_1 = a;
  if (a.0_1 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  a = 0;

   :
  return e_10(D);

}

direct noclobber noescape looks correct to me: value is only returned and
noescape values are allowed to escape to return value (per IRC discussion we
had with Richi).

I think problem is with unused that makes tree-ssa-structalias to completely
skip the parameter rather than adding it to return value alias set.

I guess we want to specify what unused really means. Indeed current comment is
"Nonzero if the argument is not used by the function." and in this case we wold
need to have separate EAF_NOREAD so current EAF_UNUSED would be EAF_NOCLOBBER |
EAF_NOREAD or track that internally in ipa-modref.

A quick fix is to make return statement clear EAF_UNUSED flag.

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

Jan Hubicka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-11-15
 Status|UNCONFIRMED |NEW
 CC||hubicka at gcc dot gnu.org

--- Comment #1 from Jan Hubicka  ---
Confirmed. Reproduces on aarch64 cross for me, not on x86-64 native.

Warning is on:
#1  0x01343ad5 in maybe_warn_pass_by_reference (stmt=0x732ec558,
wlims=...) at ../../gcc/tree-ssa-uninit.c:530
530   tree argbase = maybe_warn_operand (ref, stmt, NULL_TREE, arg,
wlims);
(gdb) down
#0  maybe_warn_operand (ref=..., stmt=0x732ec558, lhs=0x0,
rhs=0x755b93f0, wlims=...) at ../../gcc/tree-ssa-uninit.c:434
434 warned = warning_at (location, OPT_Wmaybe_uninitialized,
(gdb) p debug_generic_stmt (rhs)
D.89878

std::filesystem::__cxx11::recursive_directory_iterator::pop (struct
recursive_directory_iterator * const this)
{   
  struct error_code ec; 
  struct allocator D.89878; 

  std::__cxx11::basic_string::basic_string<> (&D.89879, iftmp.99_1,
&D.89878);

  D.89878 ={v} {CLOBBER};   


and is otherwise unused.

Function looks identical with -fno-ipa-modref.

std::__cxx11::basic_string::basic_string<> is defined locally and the
last parameter (__a) is unused.

modref determines flags 

parm 2 flags: direct noclobber noescape unused

That seems all OK to me, so it seems that somehow uninit pass gets more active
because of different alias info.

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

--- Comment #2 from Jan Hubicka  ---
Ok, so the warning is triggering when uninitialized memory is passed to
function argument declared as const.  This happens here but is false positive
since the parameter is not used at all.  This may have become worse with EAF
analysis since we now optimize the dead code initializing unused parameters in
case kill analysis triggers.

Following patch fixes it but I do not understand why this does not trigger on
x86-64 for me.

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index f23514395e0..1e074793b02 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -443,7 +443,7 @@ maybe_warn_operand (ao_ref &ref, gimple *stmt, tree lhs,
tree rhs,
access implying read access to those objects.  */

 static void
-maybe_warn_pass_by_reference (gimple *stmt, wlimits &wlims)
+maybe_warn_pass_by_reference (gcall *stmt, wlimits &wlims)
 {
   if (!wlims.wmaybe_uninit)
 return;
@@ -501,6 +501,10 @@ maybe_warn_pass_by_reference (gimple *stmt, wlimits
&wlims)
  && !TYPE_READONLY (TREE_TYPE (argtype)))
continue;

+ /* Ignore args we are not going to read from.  */
+ if (gimple_call_arg_flags (stmt, argno - 1) & EAF_UNUSED)
+   continue;
+
  if (save_always_executed && access->mode == access_read_only)
/* Attribute read_only arguments imply read access.  */
wlims.always_executed = true;
@@ -639,8 +643,8 @@ warn_uninitialized_vars (bool wmaybe_uninit)
  if (gimple_vdef (stmt))
wlims.vdef_cnt++;

- if (is_gimple_call (stmt))
-   maybe_warn_pass_by_reference (stmt, wlims);
+ if (gcall *call = dyn_cast  (stmt))
+   maybe_warn_pass_by_reference (call, wlims);
  else if (gimple_assign_load_p (stmt)
   && gimple_has_location (stmt))
{

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

Jan Hubicka  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #3 from Jan Hubicka  ---
OK, on x86_64 the corresponding warning does not trigger since TYPE_EMPTY_P is
true.

x86_64 compiler I get:
(gdb) p debug_tree (rhstype)
  constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set 76 canonical-type
0x77624498
fields  unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set 77 canonical-type
0x7684a888 fields  context

full-name "class __gnu_cxx::new_allocator"
needs-constructor needs-destructor X() X(constX&) this=(X&)
n_parents=0 use_template=1 interface-unknown
pointer_to_this  reference_to_this
 chain >
ignored decl_6 BLK
/opt/gcc/test/Build/aarch64-suse-linux/libstdc++-v3/include/bits/allocator.h:116:11
size  unit-size 
align:8 warn_if_not_align:0 offset_align 8 offset 
bit-offset  context 
chain 
ignored decl_1 VOID
/opt/gcc/test/Build/aarch64-suse-linux/libstdc++-v3/include/bits/allocator.h:129:9
align:1 warn_if_not_align:0 context 
parms 
value 
length:1
elt:0 >>>
full-name "template struct
std::allocator::rebind" chain >>
context 
full-name "class std::allocator"
needs-constructor needs-destructor X() X(constX&) this=(X&) n_parents=1
use_template=3 interface-only
pointer_to_this  reference_to_this
 chain >
$50 = void
(gdb) p rhstype->type_common.empty_flag
$51 = 1


while on aarch64 I get:

(gdb) p debug_tree (rhstype)
  constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set 76 canonical-type
0x771ff3f0
fields  unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set 77 canonical-type
0x766297e0 fields  context

full-name "class __gnu_cxx::new_allocator"
needs-constructor needs-destructor X() X(constX&) this=(X&)
n_parents=0 use_template=1 interface-unknown
pointer_to_this  reference_to_this
 chain >
ignored decl_6 BLK
/opt/gcc/test/Build/aarch64-suse-linux/libstdc++-v3/include/bits/allocator.h:116:11
size  unit-size 
align:8 warn_if_not_align:0 offset_align 8 offset 
bit-offset  context 
chain 
ignored decl_1 VOID
/opt/gcc/test/Build/aarch64-suse-linux/libstdc++-v3/include/bits/allocator.h:129:9
align:1 warn_if_not_align:0 context 
parms 
value 
length:1
elt:0 >>>
full-name "template struct
std::allocator::rebind" chain >>
context 
full-name "class std::allocator"
needs-constructor needs-destructor X() X(constX&) this=(X&) n_parents=1
use_template=3 interface-only
pointer_to_this  reference_to_this
 chain >
$21 = void
(gdb) p rhstype->type_common.empty_flag
$22 = 0

that is set by
1972  /* Handle empty records as per the x86-64 psABI.  */
1973  TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);

So I suppose relying on TYPE_EMPTY_P to silence false positives on empty
structures is not very portable.

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

--- Comment #4 from Jan Hubicka  ---
And to explain why warning does not trigger without modref, it is because we
are not able to disambiguate the variable with another function call (since we
think it escapes)

(gdb) p debug_gimple_stmt (def_stmt)
# .MEM_7 = VDEF <.MEM_5>
_8 = __cxa_allocate_exception (48);

Martin, I think this is much more your area than mine.
I will post the patch on silencing warning on unused args, but I think we
shoulid resovle the empty field issue.

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

--- Comment #6 from Jan Hubicka  ---
I remember that first_field was returning non-NULL (perhaps it is derived from
empty base)?

My patch touched nothing on the condition: it just improved the alias analysis.
 So while previously we tought that the variable can be intialized by the
function call

_8 = __cxa_allocate_exception (48);

now we are able to track and figure out that it is non-escaping and thus can
not be touched by it.

[Bug rtl-optimization/97836] [11 Regression] wrong code at -O1 on x86_64-pc-linux-gnu by r11-5029

2020-11-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97836

--- Comment #5 from Jan Hubicka  ---
I forgot to attach the PR number, but I commited the quick fix (to prevent
wrong code) as g:26285af40f98dfdb809b98b08386073c63b65db1

I will discuss the EAF_UNUSED flag today after teaching.

[Bug ipa/97757] [11 Regression] fortran save_6.f90 fails with a segv for -flto -O >= 2

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97757

--- Comment #3 from Jan Hubicka  ---
This is problem with propagate_in_scc sometimes freeing the summary and losing
track of it.  It is fixed in
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559116.html

[Bug objc/97854] New: [11 regression] ODR violation in stub-objc.c

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97854

Bug ID: 97854
   Summary: [11 regression] ODR violation in stub-objc.c
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: objc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

stub-objc provides dummy RID enum which causes ODR violation.
This produces warnings with lto-bootstrap:

../../gcc/c-family/c-common.h:63: warning: type ‘rid’ violates the C++ One
Definition Rule [-Wodr]
   63 | enum rid
  |
../../gcc/c-family/stub-objc.c:30: note: an enum with different value name is
defined in another translation unit
   30 | enum rid { DUMMY };
  |
../../gcc/c-family/c-common.h:67: note: name ‘RID_STATIC’ differs from name
‘DUMMY’ defined in another translation unit
   67 |   RID_STATIC = 0,
  |
../../gcc/c-family/stub-objc.c:30: note: mismatching definition
   30 | enum rid { DUMMY };
  |
../../gcc/c-family/c-common.h:63: warning: type ‘rid’ violates the C++ One
Definition Rule [-Wodr]
   63 | enum rid
  |
../../gcc/c-family/stub-objc.c:30: note: an enum with different value name is
defined in another translation unit
   30 | enum rid { DUMMY };
  |
../../gcc/c-family/c-common.h:67: note: name ‘RID_STATIC’ differs from name
‘DUMMY’ defined in another translation unit
   67 |   RID_STATIC = 0,
  |
../../gcc/c-family/stub-objc.c:30: note: mismatching definition
   30 | enum rid { DUMMY };
  |

I think this was introduced in g:9a34a5cce6b50fc3527e7c7ab356808ed435883c

[Bug middle-end/97855] New: [11 regression] Bogus warning locations during lto-bootstrap

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97855

Bug ID: 97855
   Summary: [11 regression] Bogus warning locations during
lto-bootstrap
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

For a while we get odd looking locations:

D.5677.coeffs[0]’../../gcc/calls.c: In function ‘expand_call’:
../../gcc/dojump.c:118:28: warning:  may be used uninitialized in this function
[-Wmaybe-uninitialized]
  118 |   pending_stack_adjust = save->x_pending_stack_adjust;
  |^
D.5677.coeffs[0]’../../gcc/calls.c:4082:34: note:  was declared here
 4082 |   saved_pending_stack_adjust save;
  |  ^
D.5677.coeffs[0]’../../gcc/dojump.c:119:27: warning:  may be used uninitialized
in this function [-Wmaybe-uninitialized]
  119 |   stack_pointer_delta = save->x_stack_pointer_delta;
  |   ^
D.5677.coeffs[0]’../../gcc/calls.c:4082:34: note:  was declared here
 4082 |   saved_pending_stack_adjust save;
  |  ^

This is not due to parallel write and seems that location code somehow conclude
to output the additional D.5677.coeffs[0]’

[Bug middle-end/97840] [11 regression] Bogus -Wmaybe-uninitialized

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97840

--- Comment #9 from Jan Hubicka  ---
Created attachment 49571
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49571&action=edit
Warnings building cc1plus with LTO

This is current set of wranings building cc1plus with LTO. there are 66
maybe-uninitialized.

I always wondered if we want to print warnings exposing GCC internals like:
../gmp/mpz/../../../gmp/mpz/swap.c:38:3: warning: ‘MEM[(struct __mpz_struct
*)&cst]._mp_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
../gmp/mpz/../../../gmp/mpz/swap.c:37:3: warning: ‘MEM[(struct __mpz_struct
*)&cst]._mp_alloc’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
../isl/../../isl/isl_tab.c:2940:29: warning: ‘var’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
../gmp/mpz/../../../gmp/mpz/swap.c:39:3: warning: ‘MEM[(struct __mpz_struct
*)&cst]._mp_d’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
../gmp/mpz/../../../gmp/mpz/swap.c:38:3: warning: ‘MEM[(struct __mpz_struct
*)&cst]._mp_size’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
../gmp/mpz/../../../gmp/mpz/swap.c:37:3: warning: ‘MEM[(struct __mpz_struct
*)&cst]._mp_alloc’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
../../gcc/machmode.h:546:49: warning: ‘MEM[(struct scalar_int_mode
*)&int_mode]’ may be used uninitialized in this function
[-Wmaybe-uninitialized]

A lot of warnings are about remainder_len in wide-int.  Tehere is loop
iniitalizeing it and seems we do not work out it has non-0 number of
iteraitons.
../../gcc/analyzer/store.cc:647:13: warning: ‘MEM[(long int *)&sval_bit_size +
8B]’ may be used uninitialized [-Wmaybe-uninitialized]
../../gcc/analyzer/store.cc:647:13: warning: ‘MEM[(long int *)&sval_bit_size +
16B]’ may be used uninitialized [-Wmaybe-uninitialized]

the MEM_REF syntax is not very pretty.

[Bug bootstrap/97857] New: profiledbootstrap broken freeing speculative call summary

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97857

Bug ID: 97857
   Summary: profiledbootstrap broken freeing speculative call
summary
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Configuring with 
../configure --with-build-config=bootstrap-lto --enable-checking=release
--disable-plugin

leads to ICE building stage feedback libstdc++. This is already with optimized
cc1plus so it may a miscompile of cc1plus.

0x8fcd5a crash_signal
../../gcc/toplev.c:330
0x7789c83f ???
   
/build/glibc-vjB4T1/glibc-2.28/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x11fcf44 vec::release()
../../gcc/vec.h:1813
0x11fcf2e auto_vec::~auto_vec()
../../gcc/vec.h:1542
0x11fcf2e speculative_call_summary::~speculative_call_summary()
../../gcc/ipa-profile.c:178
0x11fcf2e
object_allocator::remove(speculative_call_summary*)
../../gcc/alloc-pool.h:522
0x11fcf2e
call_summary_base::release(speculative_call_summary*)
../../gcc/symbol-summary.h:625
0xd03fbe call_summary::~call_summary()
../../gcc/symbol-summary.h:771
0x11e106f ipa_profile_call_summaries::~ipa_profile_call_summaries()
../../gcc/ipa-profile.c:192
0x11e106f ipa_profile_call_summaries::~ipa_profile_call_summaries()
../../gcc/ipa-profile.c:192
0x11e0cff ipa_profile
../../gcc/ipa-profile.c:1031
0x11e0cff execute
../../gcc/ipa-profile.c:1070

[Bug middle-end/97858] New: [11 regression] Bogus warnings about va_list during profiledbootstrap

2020-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97858

Bug ID: 97858
   Summary: [11 regression] Bogus warnings about va_list during
profiledbootstrap
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

During profiledbootstrap we get the following warnings:
../libcpp/../../libcpp/mkdeps.c: In function ‘munge.constprop’:
../libcpp/../../libcpp/mkdeps.c:176:13: warning: ‘MEM[(struct 
*)&args].reg_save_area’ may be used uninitialized [-Wmaybe-uninitialized]
  176 | str = va_arg (args, const char *);
  | ^
../libcpp/../../libcpp/mkdeps.c:120:11: note: ‘MEM[(struct 
*)&args].reg_save_area’ was declared here
  120 |   va_list args;
  |   ^
../libcpp/../../libcpp/mkdeps.c:176:13: warning: ‘MEM[(struct 
*)&args].overflow_arg_area’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  176 | str = va_arg (args, const char *);
  | ^
../libcpp/../../libcpp/mkdeps.c:120:11: note: ‘MEM[(struct 
*)&args].overflow_arg_area’ was declared here
  120 |   va_list args;
  |   ^
../libcpp/../../libcpp/mkdeps.c:176:13: warning: ‘MEM[(struct 
*)&args].gp_offset’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  176 | str = va_arg (args, const char *);
  | ^
../libcpp/../../libcpp/mkdeps.c:120:11: note: ‘MEM[(struct  *)&args].gp_offset’
was declared here
  120 |   va_list args;
  |   ^

This seems to be due to conditional initialization of va_list:
static const char * 
munge (const char *str, const char *trail = NULL, ...)  
{   
  static unsigned alloc;
  static char *buf; 
  unsigned dst = 0; 
  va_list args; 
  if (trail)
va_start (args, trail); 

but it does not make much sense to me to warn about internals of va_arg
iplementation at first place.  It is not user visible.

  1   2   3   4   5   6   7   8   9   >