[Bug debug/103975] DWARF .debug_frame incorrect for ISRs on AVR; pushing SREG creates off-by-one error

2022-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103975

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-debug

--- Comment #2 from Andrew Pinski  ---
Can you try a newer verison of GCC since 7.x is no longer support like maybe
11.2.0 or even 10.3.0?

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #3 from Richard Biener  ---
Yes, the oracle assumes that for MEM[(struct ovs_list *)pos_32 + 64B] pos_32
needs to still point to some valid object (even if it's not of type ovs_list)
and a pointer to 'start' cannot be constructed from this without invoking
UB (and the pointer offsetting rules are not part of TBAA).

The MEMs appear in forwprop:

:
   _15 = &member_59->elem;
   _16 = &pos_32->elem;
-  _48 = _16->prev;
-  _15->prev = _48;
-  _15->next = _16;
+  _48 = MEM[(struct ovs_list *)pos_32 + 64B].prev;
+  MEM[(struct ovs_list *)member_59 + 64B].prev = _48;
+  MEM[(struct ovs_list *)member_59 + 64B].next = _16;

but there it's already pointer arithmetic.  In .original we have

pos = 0B;, pos = (struct member *) ((long unsigned int) start.next +
18446744073709551552);;
goto ;
:;
if (member->order > pos->order)
  {
goto ;
  }
pos = (struct member *) ((long unsigned int) pos->elem.next +
18446744073709551552);
:;
if (&pos->elem != &start) goto ; else goto ;
:;
ovs_list_insert (&pos->elem, &member->elem);

where I think passing &pos->elem and &member->elem to ovs_list_insert is
already wrong since 'pos' doesn't point to a valid object if the
ultimate written destination is 'start'.

Doing the 'pos' initialization with uintptr_t isn't enough - you need to
do this all the way up to the &pos->elem computation as you say:

// TESTED: This works:
//ovs_list_insert((void *)((uintptr_t)pos + __builtin_offsetof(struct
member,elem)), &member->elem);

so yes, it's UB.  And UB that's not sanctioned with -fno-strict-aliasing.

[Bug bootstrap/103820] [12 Regression] i686 failed to bootstrap with ada by r12-6077

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103820

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:0552605b7b27dc6beed62e71bd05bc1efd191c0d

commit r12-6430-g0552605b7b27dc6beed62e71bd05bc1efd191c0d
Author: Xionghu Luo 
Date:   Mon Jan 10 20:05:56 2022 -0600

testsuite: Fix regression on m32 by r12-6087 [PR103820]

r12-6087 will avoid move cold bb out of hot loop, while the original
intent of this testcase is to hoist divides out of loop and CSE them to
only one divide.  So increase the loop count to turn the cold bb to hot
bb again.  Then the 3 divides could be rewritten with same reciptmp.

Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux.

gcc/testsuite/ChangeLog:

PR testsuite/103820
* gcc.dg/tree-ssa/recip-3.c: Adjust.

[Bug bootstrap/103820] [12 Regression] i686 failed to bootstrap with ada by r12-6077

2022-01-11 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103820

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||luoxhu at gcc dot gnu.org

--- Comment #7 from luoxhu at gcc dot gnu.org ---
(In reply to CVS Commits from comment #6)
> The master branch has been updated by Xiong Hu Luo :
> 
> https://gcc.gnu.org/g:0552605b7b27dc6beed62e71bd05bc1efd191c0d
> 
> commit r12-6430-g0552605b7b27dc6beed62e71bd05bc1efd191c0d
> Author: Xionghu Luo 
> Date:   Mon Jan 10 20:05:56 2022 -0600
> 
> testsuite: Fix regression on m32 by r12-6087 [PR103820]
> 
> r12-6087 will avoid move cold bb out of hot loop, while the original
> intent of this testcase is to hoist divides out of loop and CSE them to
> only one divide.  So increase the loop count to turn the cold bb to hot
> bb again.  Then the 3 divides could be rewritten with same reciptmp.
> 
> Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux.
> 
> gcc/testsuite/ChangeLog:
> 
> PR testsuite/103820
> * gcc.dg/tree-ssa/recip-3.c: Adjust.

Typo. should be PR103802.

[Bug tree-optimization/103802] [12 regression] recip-3.c fails after r12-6087 on Power m32

2022-01-11 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103802

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from luoxhu at gcc dot gnu.org ---
Fixed by

The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:0552605b7b27dc6beed62e71bd05bc1efd191c0d

commit r12-6430-g0552605b7b27dc6beed62e71bd05bc1efd191c0d
Author: Xionghu Luo 
Date:   Mon Jan 10 20:05:56 2022 -0600

testsuite: Fix regression on m32 by r12-6087 [PR103820]

r12-6087 will avoid move cold bb out of hot loop, while the original
intent of this testcase is to hoist divides out of loop and CSE them to
only one divide.  So increase the loop count to turn the cold bb to hot
bb again.  Then the 3 divides could be rewritten with same reciptmp.

Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux.

gcc/testsuite/ChangeLog:

PR testsuite/103820
* gcc.dg/tree-ssa/recip-3.c: Adjust.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Richard Biener  ---
And for the curious it's indeed

static bool
indirect_ref_may_alias_decl_p (tree ref1...
...
  /* If only one reference is based on a variable, they cannot alias if
 the pointer access is beyond the extent of the variable access.
 (the pointer base cannot validly point to an offset less than zero
 of the variable).
 ???  IVOPTs creates bases that do not honor this restriction,
 so do not apply this optimization for TARGET_MEM_REFs.  */
  if (TREE_CODE (base1) != TARGET_MEM_REF
  && !ranges_maybe_overlap_p (offset1 + moff, -1, offset2, max_size2))
return false;

the IVOPTs reference is likely due to the fact that while IVOPTs uses
uintptrs to create the base pointer the TARGET_MEM_REF contained arithmetic
itself is still considered pointer arithmetic (like also here the embedded
MEM_REF pointer offsetting) and the base "pointer" cannot be a non-pointer
to disable that behavior.

Not a bug btw.  There's no flag to make violating the C pointer offsetting
rules well-defined.

[Bug c++/103968] [11/12 Regression] ICE and segfault when instantiating template with lvalue ref argument and nested template type

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103968

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/103971] [12 regression] build fails after r12-6420, ICE at libgfortran/generated/matmul_i1.c:2450

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103971

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug target/103973] x86: 4-way comparison of floats/doubles with spaceship operator possibly suboptimal

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103973

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Last reconfirmed||2022-01-11
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 Target||x86_64-*-* i?86-*-*

--- Comment #5 from Richard Biener  ---
We are lowering this to

return  = TARGET_EXPR  != SAVE_EXPR  ?
SAVE_EXPR  < SAVE_EXPR  ? less : SAVE_EXPR  < SAVE_EXPR  ? greater
: unordered : equivalent>>>;

I don't think we can elide the first ucomisd, but we should be able to CSE the
last (from the original quoted assembler), not sure if it ultimatively results
in better code though.

[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/103971] [12 regression] build fails after r12-6420, ICE at libgfortran/generated/matmul_i1.c:2450

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103971

Martin Liška  changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-01-11
 Status|UNCONFIRMED |NEW

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

Martin Liška  changed:

   What|Removed |Added

 CC||siddhesh at gcc dot gnu.org
Summary|[12 Regression] gcc-12  |[12 Regression] gcc-12
   |apparently miscompiles  |apparently miscompiles
   |libcap's cap_to_text()  |libcap's cap_to_text()
   |function|function since
   ||r12-6030-g422f9eb7011b76c1
   Keywords|needs-bisection |

--- Comment #11 from Martin Liška  ---
Started with r12-6030-g422f9eb7011b76c1.

[Bug libgomp/103976] New: Very large overhead for if(false) openmp pragmas

2022-01-11 Thread nickpapior at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976

Bug ID: 103976
   Summary: Very large overhead for if(false) openmp pragmas
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nickpapior at gmail dot com
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

In OpenMP it may be beneficial to not use OpenMP in a region given that the
calculated workload is very small.

Something like this:

test.c:

#include 

int main(int narg, char args[]) {
  float sum = 0.;
  int i, j;

  for ( i = 1 ; i<=1 ; i++) {
#ifdef _OPENMP
#pragma omp parallel for private(j) reduction(+:sum) if(0)
#else
// Just to have something that resembles the omp-if
if ( sum >= -1. ) {
#endif
  for (j = 1 ; j<= 10 ; j++)
sum += 1./j;
#ifndef _OPENMP
}
#endif
  }

  printf("%15.7e\n", sum);
}


Never mind the faulty results, the idea is the culprit here.

I get the following timings:

gcc -O3 -o a.out test.c  && time ./a.out
./a.out  1.60s user 0.00s system 99% cpu 1.604 total


gcc -O3 -o a.out test.c  -fopenmp && time ./a.out   
./a.out  9.13s user 4.62s system 99% cpu 13.743 total


Fortran has the same behaviour.
test.F90:

program main

  real :: sum
  integer :: i, j

  sum = 0.

  do i = 1, 1
#ifdef _OPENMP
!$OMP parallel do private(j) reduction(+:sum) if(.false.)
#else
! Just to have something that resembles the omp-if
if ( sum >= -1. ) then
#endif
  do j = 1, 10
sum = sum + 1./j
  end do
#ifdef _OPENMP
  !$OMP end parallel do
#else
end if
#endif
  end do

  print *, sum
end program main


The complexity of adding the explicit if's was to emulate an if statement as
inserted in the openmp code. I know that openmp has lots more boiler plate
code, but this still seems awfully slow for me.

This came up in a sparse matrix solution library (MUMPS) which uses the #pragma
omp ... if(workload_huge) to decide on paths.

Let me know if anything else is necessary.

[Bug target/103975] DWARF .debug_frame incorrect for ISRs on AVR; pushing SREG creates off-by-one error

2022-01-11 Thread kimballa at apache dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103975

--- Comment #3 from Aaron Kimball  ---
(In reply to Andrew Pinski from comment #2)
> Can you try a newer verison of GCC since 7.x is no longer support like maybe
> 11.2.0 or even 10.3.0?

Hi Andrew,

I tried again and figured out the issue in my `PATH` that was preventing my
cross-compiler from working correctly. I now have gcc 11.2.0, binutils 2.37,
and avr-gcc 2.0.0 built from source, working together.

The issue remains. (Actually, if anything it's worse.)

I had to modify the test case as the original very-simple test now succeeds by
virtue of gcc being smarter about eliminating unnecessary register saves. The
issue can still be reproduced by increasing register pressure in the ISR. I
defined a larger number of `volatile unsigned char` global variables and added
them together with:

```
ISR(TIMER1_COMPA_vect) {
  x++;
  x += a + b + c + d + e + f + g + h + i + j + k + m + n + p;
}
```

This yields the following disassembly:
```
ISR(TIMER1_COMPA_vect) {
  ce:   1f 92   pushr1
  d0:   1f b6   in  r1, 0x3f; 63  # read SREG port
  d2:   1f 92   pushr1# push SREG
  d4:   11 24   eor r1, r1
  d6:   1f 93   pushr17
  d8:   2f 93   pushr18
  da:   3f 93   pushr19
  dc:   4f 93   pushr20
# ... prologue continues ... (snip)
  f2:   ff 93   pushr31   # last line of prologue
# remainder of disasm snipped
```

As a bonus w/ this toolchain, binutils 2.37 has fixed the bug in objdump that
prevented it from reading .debug_frame entries correctly. 

Thus, `avr-objdump -Wframe ~/isr.elf` shows the following:
```
 0010  CIE
  Version:   1
  Augmentation:  ""
  Code alignment factor: 2
  Data alignment factor: -1
  Return address column: 36

  DW_CFA_def_cfa: r32 ofs 2
  DW_CFA_offset: r36 at cfa-1
  DW_CFA_nop
  DW_CFA_nop

0014 0054  FDE cie= pc=00ce..00ce
  DW_CFA_restore: r14
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
 # ... no records for $PC=0xd0, 0xd4, 0xd8 ?
  DW_CFA_advance_loc: 12 to 00da
  DW_CFA_def_cfa_offset: 3
  DW_CFA_offset: r18 at cfa-2# <-- r18 marked as 1'st reg saved. 
 # (cfa-0 & cfa-1 implicitly consumed by r36/PC
 # per the CIE)
 # ... but what happened to r1, SREG, and r17?
 # even if ABI says r1/r17 are disposable (??),
 # r18 should be at CFA-5 based on the pushes.
  DW_CFA_advance_loc: 2 to 00dc
  DW_CFA_def_cfa_offset: 4
  DW_CFA_offset: r19 at cfa-3
  DW_CFA_advance_loc: 2 to 00de
  DW_CFA_def_cfa_offset: 5
  DW_CFA_offset: r20 at cfa-4
# ... remainder snipped...
```

At 0xf4 (after prologue is complete) the FDE should have CFA at offset r32 +
19. However, it's missing 3 bytes and has:

```
  DW_CFA_advance_loc: 2 to 00f4
  DW_CFA_def_cfa_offset: 16   # Should be 19: 2 for return pc + 17 push ops
  DW_CFA_offset: r31 at cfa-15
```


Compilation command was:
```
$ avr/bin/avr-gcc  -Os -g  -Wl,--relax,--gc-section -DARCH_AVR \
  -mmcu=atmega32u4 -DF_CPU=1600 -fno-exceptions ~/isr.c -o ~/isr.elf
```

Compiler build configuration is:
```
aaron@ubuntu:~/share/scratch/gcc11avr$ avr/bin/avr-gcc -v
Using built-in specs.
Reading specs from
/home/aaron/share/scratch/gcc11avr/avr/lib/gcc/avr/11.2.0/device-specs/specs-avr2
COLLECT_GCC=avr/bin/avr-gcc
COLLECT_LTO_WRAPPER=/home/aaron/share/scratch/gcc11avr/avr/libexec/gcc/avr/11.2.0/lto-wrapper
Target: avr
Configured with: ../gcc-11.2.0/configure
--prefix=/home/aaron/share/scratch/gcc11avr
--exec-prefix=/home/aaron/share/scratch/gcc11avr/avr
--with-local-prefix=/home/aaron/share/scratch/gcc11avr/local
--enable-fixed-point --enable-languages=c,c++ --disable-nls --disable-libssp
--disable-libada --disable-shared --disable-threads --with-avrlibc
--with-dwarf2 --disable-doc --target=avr
--with-toolexeclibdir=/home/aaron/share/scratch/gcc11avr/avr/lib
--with-sysroot=/home/aaron/share/scratch/gcc11avr/avr --with-avrlibc
--with-double=32
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 11.2.0 (GCC)
```

Log when building with `-v -save-temps` is:
```
aaron@ubuntu:~/share/scratch/gcc11avr$ avr/bin/avr-gcc -v -save-temps  -Os -g 
-Wl,--relax,--gc-section -DARCH_AVR -mmcu=atmega32u4 -DF_CPU=1600 
-fno-exceptions ~/isr.c -o ~/isr.elf
Using built-in specs.
Reading specs from
/home/aaron/share/scratch/gcc11avr/avr/lib/gcc/avr/11.2.0/device-specs/specs-atmega32u4
COLLECT_GCC=avr/bin/avr-gcc
COLLECT_LTO_WRAPPER=/home/aaron/share/scratch/gcc11avr/avr/libexec/gcc/avr/11.2.0/lto-wrapper
Target: avr
Configured with: ../gcc-11.2.0/configure
--prefix=/home/aaron/share/scratch/gcc11avr
--exec-prefix=/home/aaron/share/scratch/gcc11avr/avr
--with-local-prefix=/home/aaron/share/scratch/gcc11avr/local
--enable-fixed-po

[Bug target/103975] DWARF .debug_frame incorrect for ISRs on AVR; pushing SREG creates off-by-one error

2022-01-11 Thread kimballa at apache dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103975

--- Comment #4 from Aaron Kimball  ---
Created attachment 52161
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52161&action=edit
Test case for gcc 11.2.0 demonstrating issue

Attaching updated test case for gcc 11.2.0

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

--- Comment #12 from Martin Liška  ---
Self-contained test-case:

extern __inline __attribute__((__gnu_inline__)) int sprintf(
char *__restrict __s, const char *__restrict __fmt, ...) {
  return __builtin___sprintf_chk(__s, 2 - 1, __builtin_object_size(__s, 2 > 1),
 __fmt, __builtin_va_arg_pack());
}

int main() {
  char buf[16];
  char *p = buf;

  for (int t = 0; t < 1; t++) {
for (int n = 0; n < 1; n++) p += sprintf(p, "a,");

p--;
sprintf(p, "+");
  }

  __builtin_printf("buf: %s\n", buf);
  if (buf[0] != 'a') __builtin_abort();

  return 0;
}

[Bug target/102239] powerpc suboptimal boolean test of contiguous bits

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102239

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:19d81fda48f30c4fc11c8912749351acd9159c17

commit r12-6433-g19d81fda48f30c4fc11c8912749351acd9159c17
Author: Xionghu Luo 
Date:   Sun Dec 12 23:17:13 2021 -0600

rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]

Add specialized version to combine two instructions from

 9: {r123:CC=cmp(r124:DI&0x6,0);clobber scratch;}
   REG_DEAD r124:DI
 10: pc={(r123:CC==0)?L15:pc}
  REG_DEAD r123:CC

to:

 10: {pc={(r123:DI&0x6==0)?L15:pc};clobber scratch;clobber %0:CC;}

then split2 will split it to one rotate dot instruction (to save one
rotate back instruction) as shifted result doesn't matter when comparing
to 0 in CCEQmode.

Bootstrapped and regression tested pass on Power 8/9/10.

gcc/ChangeLog:

PR target/102239
* config/rs6000/rs6000-protos.h (rs6000_is_valid_rotate_dot_mask):
New
declare.
* config/rs6000/rs6000.c (rs6000_is_valid_rotate_dot_mask): New
function.
* config/rs6000/rs6000.md (*branch_anddi3_dot): New.

gcc/testsuite/ChangeLog:

PR target/102239
* gcc.target/powerpc/pr102239.c: New test.

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

--- Comment #13 from Jakub Jelinek  ---
Testcase with nicer formatting:

extern inline __attribute__ ((__gnu_inline__)) int
sprintf (char *restrict s, const char *restrict fmt, ...)
{
  return __builtin___sprintf_chk (s, 1, __builtin_object_size (s, 2 > 1),
  fmt, __builtin_va_arg_pack ());
}

void
cap_to_text (int c)
{
  char buf[1572];
  char *p;
  int n, t;
  p = 20 + buf;
  for (t = 8; t--; )
{
  for (n = 0; n < c; n++)
p += sprintf (p, "a,");
  p--;
  sprintf (p, "+");
}
}

Indeed, early_objsz already inserts the bogus:
  p_16 = p_3 + 18446744073709551615;
  _17 = __builtin_object_size (p_16, 1);
  _24 = MIN_EXPR <_17, 0>;
  _25 = __builtin___sprintf_chk (p_16, 1, _24, "+");

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

Siddhesh Poyarekar  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |siddhesh at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #14 from Siddhesh Poyarekar  ---
Looking into it.

[Bug tree-optimization/100315] missed optimization for dead code elimination at -O3, -O2 (vs. -O1, -Os)

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100315

Richard Biener  changed:

   What|Removed |Added

Version|unknown |12.0
   Keywords|wrong-code  |
 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Richard Biener  ---
I think that's OK - the outer while (1) loop invokes UB if a is zero (e
overflows) so we can assume that path isn't taken (CD-DCE produces this
by noting that the outer loop is finite).  It's a bit "unconcious" arriving
at this, but well ...

Marking useful stmt: foo ();

Found loop 1 to be finite: upper bound found.   ()

Processing worklist:
processing: foo ();


Eliminating unnecessary statements:
Deleting : e_7 = e_2 + 1;

Deleting : if (a.0_1 != 0)

Deleting : a.0_1 = a;

Deleting : e_2 = PHI <0(2), _7(6)>

Removing basic block 6
;; basic block 6, loop depth 1
;;  pred:
goto ; [INV]

() that causes the loop 'control' to be elided which for endless loops
where only UB makes it "finite" causes some random transform.

At -O1 we do not perform CD-DCE, instead IPA analyzes 'a' to be not modified
and thus zero so we can fold if (c) to if (0).  With -O3 we have already
eliminated this control based on UB (but in the other direction).

So in the end the testcase invokes UB and thus is quite meaningless.

Making 'e' unsigned int produces the same result from -O3 as -O1 and the
call to foo is eliminated.

[Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences)

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

Richard Biener  changed:

   What|Removed |Added

Version|unknown |12.0
   Last reconfirmed|2021-09-25 00:00:00 |2022-1-11
 CC||hubicka at gcc dot gnu.org,
   ||jamborm at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
the key is to notice that 'a' is not modified and thus zero and propagate that
to 'd'.  With -O3 IPA figures out 'a' is not modified but that isn't taken into
account by IPA CP which sees

  a.0_1 = a;
  d (a.0_1);

with -O1 we happen to inline everything during IPA and nothing early while
with -O3 'e' is early inlined into 'd' (but not d into main) and IPA
refuses to inline d into main:

t.c:18:3: missed:   not inlinable: main/7 -> d/6, --param
large-stack-frame-growth limit reached

sth is off with the stack limit I guess, maybe it's not cummulated correctly
at -O1.

[Bug ipa/100308] IPA CP ipcp_modif_dom_walker removes calls w/o updating the cgraph

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100308

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||11.1.1
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
But not further.  Fixed.

[Bug target/32593] Missed optimization of 'y = constant - x' operation

2022-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |4.8.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Andrew Pinski  ---
For the original testcase.
Starting in GCC 4.8.0 we produce now:

movl$7, %ecx
leal-1(%edi), %eax
xorl%edi, %eax
sarl$15, %eax
subbff_h264_norm_shift(%eax), %cl
sall%cl, %esi

Which is better and still one less register as you wanted.

So the original issue was fixed in GCC 4.8.

[Bug ipa/100220] missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2) (inlining differences)

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100220

--- Comment #2 from Richard Biener  ---
Indeed similar interaction between inlining, static var const promotion and IPA
CP / inline heuristics

[Bug target/32735] i686 sse2 generates more movdqa than necessary

2022-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ra
   Severity|normal  |enhancement

--- Comment #7 from Andrew Pinski  ---
Even with AVX we still have an extra move:
.L2:
vpslldq $4, %xmm1, %xmm0
vpaddd  %xmm1, %xmm0, %xmm0
vpslldq $8, %xmm0, %xmm1
vpaddd  %xmm1, %xmm0, %xmm0
vmovdqa %xmm0, %xmm1
subl$1, %eax
jne .L2

Re: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread Jan Hubicka via Gcc-bugs
on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2
and -Ofast -march=native which is both very nice!
Zen1 for some reason sees less improvement, about 6%.
With PGO it is 3.8%

Overall it seems a win, but there are few noteworthy issues.

I also see a 6.69% regression on x64 with -Ofast -march=native -flto
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0
and perhaps 3-5% on sphinx
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0

For non-spec benchmarks spec there is a regression on nbench
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1
There are also large changes in tsvc
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
it may be noise since kernels are tiny, but for example x293 reproduces
both on kabylake and zen by about 80-90% regression that may be easy to
track (the kernel is included in the testsuite). Same regression is not
seen on zen3, so may be an ISA specific or so.

FInally there seems relatively large code size savings on polyhedron
benchmarks today (8% on capacita, 

Thanks a lot!


[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #42 from hubicka at kam dot mff.cuni.cz ---
on zen2 and 3 with -flto the speedup seems to be cca 12% for both -O2
and -Ofast -march=native which is both very nice!
Zen1 for some reason sees less improvement, about 6%.
With PGO it is 3.8%

Overall it seems a win, but there are few noteworthy issues.

I also see a 6.69% regression on x64 with -Ofast -march=native -flto
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0
and perhaps 3-5% on sphinx
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0

For non-spec benchmarks spec there is a regression on nbench
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1
There are also large changes in tsvc
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
it may be noise since kernels are tiny, but for example x293 reproduces
both on kabylake and zen by about 80-90% regression that may be easy to
track (the kernel is included in the testsuite). Same regression is not
seen on zen3, so may be an ISA specific or so.

FInally there seems relatively large code size savings on polyhedron
benchmarks today (8% on capacita, 

Thanks a lot!

[Bug tree-optimization/100221] Takes two passes at DSE to remove some dead stores

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100221

Richard Biener  changed:

   What|Removed |Added

Version|unknown |12.0

--- Comment #4 from Richard Biener  ---
For the reduced testcase the issue is simply that DSE doesn't walk across
multiple paths and we have

 :
# .MEM_9 = VDEF 
g = &c;
a.1_2 = a;
if (a.1_2 != 0)
  goto ; [INV]
else
  goto ; [INV]

   :
  b = 0;

   :
  .MEM_5 = PHI <.MEM_9(4), .MEM_12(5)>
  b.2_3 = b;
  if (b.2_3 != 0)
goto ; [INV]
  else
goto ; [INV]

 :
.MEM_6 = PHI <.MEM_9(4), .MEM_5(6)>
g ={v} {CLOBBER};
return 0;

with the walking gathered the .MEM_5 and .MEM_6 defs when following the
.MEM_9 uses and

  /* In addition to kills we can remove defs whose only use
 is another def in defs.  That can only ever be PHIs of which
 we track two for simplicity reasons, the first and last in
 {first,last}_phi_def (we fail for multiple PHIs anyways).
 We can also ignore defs that feed only into
 already visited PHIs.  */
  else if (single_imm_use (vdef, &use_p, &use_stmt)
   && (use_stmt == first_phi_def
   || use_stmt == last_phi_def
   || (gimple_code (use_stmt) == GIMPLE_PHI
   && bitmap_bit_p (visited,
SSA_NAME_VERSION
  (PHI_RESULT (use_stmt))
defs.unordered_remove (i);

does not trigger to remove either PHI def from consideration (but in
principle we could elide .MEM_6 and continue processing .MEM_5 which
eventually will lead us to .MEM_6 anyway).  I suppose the key would be
realizing that one of the PHI defs is a PHI argument of the PHI we can
postpone in this round.

[Bug target/34719] N_GSYM stabs warning with common blocks on Mac OS X Leopard

2022-01-11 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34719

Iain Sandoe  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||iains at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #6 from Iain Sandoe  ---
As of GCC-12 (!!)

$ /opt/iains/i686-apple-darwin9/gcc-12-0-0/bin/gfortran test.f -o t -gstabs
-fcommon
f951: Warning: STABS debugging information is obsolete and not supported
anymore

So the bug was fixed somewhere along the line - but stabs is now obsolete so
closing.

[Bug target/70763] Use SSE for DImode load/store

2022-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70763

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |8.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Andrew Pinski  ---
x5 and x6 don't use sse but that is ok as it is an instruction to create the
0/-1 and then one store. Right now it is just two stores.

So closing as fixed for GCC 8.

[Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences)

2022-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

--- Comment #3 from Jan Hubicka  ---
With -O1 i get:
IPA function summary for main/7 inlinable
  global time: 72.936364
  self size:   6
  global size: 19
  min size:   16
  self stack:  0
  global stack:44
size:15.00, time:67.636364
size:3.00, time:2.00,  executed if:(not inlined)
  calls:
d/6 inlined
  freq:1.00
  Stack frame offset 0, callee self size 0
  e/5 inlined
freq:1.00
Stack frame offset 0, callee self size 44
  foo/8 function body not available
freq:0.33 loop depth: 0 size: 1 time: 10

So we get 44 bytes of stack use.

However with -O3 we see:
IPA function summary for main/7 inlinable
  global time: 14.00
  self size:   6
  global size: 6
  min size:   3
  self stack:  0
  global stack:0
size:1.00, time:1.00
size:3.00, time:2.00,  executed if:(not inlined)
  calls:
d/6 function not considered for inlining
  freq:1.00 loop depth: 0 size: 2 time: 11 callee size:11 stack:284

IPA function summary for d/6 inlinable
  global time: 87.936364
  self size:   23
  global size: 23
  min size:   17
  self stack:  284
  global stack:284
size:17.00, time:80.636364
size:3.00, time:2.00,  executed if:(not inlined)
size:2.00, time:2.00,  nonconst if:(op0 changed)
  calls:
foo/8 function body not available
  freq:0.33 loop depth: 0 size: 1 time: 10 predicate: (op0 != 0)

So now the stack is 284 bytes for d which hits the limits.
It seems to be array h[30] (240 bytes) and l[5] (40 bytes).

With -O1 array h is optimized out completely however with -O3 I see:
void d (int m)  
{   
  int k;
  int * l[5];   
  int * * h[30];
  int b.1_8;
  int _9;   
  int g.2_10;   

   [local count: 118111600]:  
  MEM  [(int * *[30] *)&h + 8B] = {};
  h[0] = &c;
  if (m_5(D) != 0)  
goto ; [33.00%]   
  else  
goto ; [67.00%]   

   [local count: 38976828]:   
  foo ();   

   [local count: 118111600]:  
  l[0] = &k;
  l[1] = &k;
  l[2] = &k;
  l[3] = &k;
  l[4] = &k;
  goto ; [100.00%]

   [local count: 955630225]:  
  j = &l[0];
  b.1_8 = b;
  _9 = b.1_8 + 1;   
  b = _9;   

   [local count: 1073741824]: 
  g.2_10 = g;   
  if (g.2_10 != 0)  
goto ; [89.00%]   
  else  
goto ; [11.00%]   

   [local count: 118111600]:  
  k ={v} {CLOBBER}; 
  l ={v} {CLOBBER}; 
  h ={v} {CLOBBER}; 
  return;   

}   

So h i

[Bug ipa/100314] missed optimization for dead code elimination at -O3 (vs. -O1) (inlining differences)

2022-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314

--- Comment #4 from Jan Hubicka  ---
However i is also dead at dse time:
void d (int m)  
{   
  int k;
  int * l[5];   
  int * * * i[1];   
  int * * h[30];
  int b.1_11;   
  int _12;  
  int g.2_13;   

   :  
  MEM  [(int * *[30] *)&h + 8B] = {};
  h[0] = &c;
  i[0] = &h[3]; 
  if (m_6(D) != 0)  
goto ; [INV]  
  else  
goto ; [INV]  

   :  
  foo ();   

   :  
  l[0] = &k;
  l[1] = &k;
  l[2] = &k;
  l[3] = &k;
  l[4] = &k;
  goto ; [100.00%]

   :  
  j = &l[0];
  b.1_11 = b;   
  _12 = b.1_11 + 1; 
  b = _12;  

   :  
  g.2_13 = g;   
  if (g.2_13 != 0)  
goto ; [89.00%]   
  else  
goto ; [11.00%]   

   :  
  k ={v} {CLOBBER}; 
  l ={v} {CLOBBER}; 
  h ={v} {CLOBBER}; 
  i ={v} {CLOBBER}; 
  return;   

}   

Not sure why DSE is not optimizing this out.

[Bug ipa/100220] missed optimization for dead code elimination at -O3 (vs. -O1, -Os, -O2) (inlining differences)

2022-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100220

Jan Hubicka  changed:

   What|Removed |Added

 Depends on||100314

--- Comment #3 from Jan Hubicka  ---
Here the stack frame size of i is stimated to 244 bytes
void i ()
{
  int p[8];
  int n[27];
  int k[24];
  int l;
  int j;
  int _1;
  int _2;
  int _3;
  int c.2_4;
  int _5;

   [local count: 236223200]:
  l = 0;
  k = {};
  h = &n;
  _1 = n[0];
  _2 = _1 & 1;
  if (_2 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 118111600]:
  goto ; [100.00%]

   [local count: 955630225]:
  _3 = c.2_4 + -1;
  c = _3;

   [local count: 1073741824]:
  c.2_4 = c;
  if (c.2_4 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 236223200]:
  h = &p;
  _5 = p[0];
  if (_5 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 118111600]:
  h = &j;

   [local count: 236223200]:
  e = 0;
  j ={v} {CLOBBER};
  l ={v} {CLOBBER};
  k ={v} {CLOBBER};
  n ={v} {CLOBBER};
  p ={v} {CLOBBER};
  return;

}

so it indeed has larger arrays. k is initialized but never used (so it is
missed DSE). n is used in stupid way:

  h = &n;   
  _1 = n[0];

where h is write only static var, but we do not know that during early opts (we
could try our luck and schedule one extra writeonly detection before early
optimization passes, but I am not sure it is worth).

I would say that main issue is also missed DSE


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100314
[Bug 100314] missed optimization for dead code elimination at -O3 (vs. -O1) 
(inlining differences due to missed dse)

[Bug tree-optimization/100221] Takes two passes at DSE to remove some dead stores

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100221

--- Comment #5 from Richard Biener  ---
Note handling this special case doesn't resolve the issue in the original
testcase since the virtual operand setup is different there.  As said, DSE
is not set up to follow multiple paths to uses (sth akin to maybe_skip_until
for the CSE case), but we only try somewhat hard to reduce multiple paths
to one.

[Bug tree-optimization/100221] Takes two passes at DSE to remove some dead stores

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100221

--- Comment #6 from Richard Biener  ---
Created attachment 52162
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52162&action=edit
untested patch fixing the reduced testcase

[Bug tree-optimization/100221] Takes two passes at DSE to remove some dead stores

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100221

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug bootstrap/99920] [10 regression] ICE building gcc 10 on power 7 BE with GCC 4.3.4 (SUSE) host compiler

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99920

Richard Biener  changed:

   What|Removed |Added

Summary|[10 regression] ICE |[10 regression] ICE
   |building gcc 10 on power 7  |building gcc 10 on power 7
   |BE  |BE with GCC 4.3.4 (SUSE)
   ||host compiler
 Status|REOPENED|RESOLVED
 Resolution|--- |WONTFIX

--- Comment #19 from Richard Biener  ---
I suppose we should close the bug here and open one on the SUSE side if we/IBM
cares enough to debug the 4.3.4 PPC BE compiler.  Clarifying the summary at
least.  The issue sofar has not been reproduced with a clean FSF tree.

I doubt we're going to work around this issue in the GCC 10 tree so WONTFIX
on the GCC 10 side and of course GCC 4.3 is no longer maintained here.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread i.maximets at ovn dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #5 from Ilya Maximets  ---
(In reply to Richard Biener from comment #4)
> the IVOPTs reference is likely due to the fact that while IVOPTs uses
> uintptrs to create the base pointer the TARGET_MEM_REF contained arithmetic
> itself is still considered pointer arithmetic (like also here the embedded
> MEM_REF pointer offsetting) and the base "pointer" cannot be a non-pointer
> to disable that behavior.

Does this mean that this is UB and the GCC itself relies on a certain result
of it?

Shouldn't this mean that this should not be an UB in the end, so other users
can rely on that behavior as it is beneficial for certain code patterns?
Or at least be configurable?

Is that correct that dynamically allocated memory is never a subject for
this kind of a pointer check, i.e. we can safely offset pointers to the
dynamically allocated memory?  If so, can we be sure that this check will
always be there for dynamically allocated memory, so we can avoid unnecessary
fixes for this kind of use cases?
(Allocating ovs_list dynamically indeed fixes our test case.)

We have a bit less than a million lines of code heavily using discussed
data structures and without automated ways to find the problematic places
it will be not an easy job to fix (UBsan doesn't flag anything and I doubt
someone will actually sit and re-check the whole code base manually).

(In reply to Richard Biener from comment #3)
> where I think passing &pos->elem and &member->elem to ovs_list_insert is
> already wrong since 'pos' doesn't point to a valid object if the
> ultimate written destination is 'start'.

> Doing the 'pos' initialization with uintptr_t isn't enough - you need to
> do this all the way up to the &pos->elem computation

Maybe there is a way to not treat a &pos->elem as a pointer arithmetic?
Maybe there should be one?  I mean, compilers allows users to perform
computations with offsets of structure fields where the base pointer
is NULL, and NULL obviously doesn't point to any valid object.  I'm not
sure if it's an UB or not, but constructions like '&((struct s *)NULL)->field'
are very common.

[Bug middle-end/98865] Missed transform of (a >> 63) * b

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98865

Richard Biener  changed:

   What|Removed |Added

   Severity|enhancement |normal
   Last reconfirmed|2021-03-07 00:00:00 |2022-1-11

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #11 from Jonathan Wakely  ---
I already reported https://cplusplus.github.io/LWG/issue3653 for the std::hash
use in . Removing exceptions, typeinfo and coroutines is unnecessary
and irrelevant to this bug report. Stop using GCC bugzilla for your off-topic
rants about WG21.

You have been warned several times now.

[Bug target/98167] [x86] Failure to optimize operation on indentically shuffled operands into a shuffle of the result of the operation

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

--- Comment #20 from Richard Biener  ---
-fno-trapping-math tells us we are not concerned about FP exception flags (so
say spurious FP_INEXACT is OK), -fno-signalling-nans is needed as well I guess.

Oh, and in practice performing the multiplication for elements that are
NaN or denormal might trigger very slow paths in the CPU which means
the optimization could be a pessimization runtime wise.  Eventually
zeroing the unused lanes in one(?) of the operands is enough to avoid that
(for denormal, I guess 0 * NaN is still NaN).

[Bug target/97194] optimize vector element set/extract at variable position

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #26 from Richard Biener  ---
Let's declare this generic bug fixed.  Specific unhandled cases on x86 should
get a new bug with a specific testcase.

[Bug target/97147] GCC uses vhaddpd which is bad for latency

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97147

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
  Known to work||12.0

--- Comment #8 from Richard Biener  ---
Fixed.

[Bug c/103977] New: ice in try_vectorize_loop_1

2022-01-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

Bug ID: 103977
   Summary: ice in try_vectorize_loop_1
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Created attachment 52163
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52163&action=edit
gzipped C source code

The attached C code does this:

$ /home/dcb/gcc/results/bin/gcc  -c -w   -std=gnu89  -mno-sse -mno-mmx   -O3 
bug785.c
during GIMPLE pass: vect
mm/slab_common.c: In function ‘cache_random_seq_create’:
mm/slab_common.c:992:5: internal compiler error: in operator[], at vec.h:889
0x10367ab vect_analyze_loop(loop*, vec_info_shared*)
../../trunk.git/gcc/tree-vect-loop.c:0
0x1082487 try_vectorize_loop_1(hash_table*&,
unsigned int*, loop*, gimple*, gimple*, function*)
../../trunk.git/gcc/tree-vectorizer.c:1047
0x1082487 try_vectorize_loop(hash_table*&,
unsigned int*, loop*, function*)
../../trunk.git/gcc/tree-vectorizer.c:1162

Code derived from the Linux kernel.

I will have a go at reducing the code.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #6 from Jakub Jelinek  ---
(In reply to Ilya Maximets from comment #5)
> (In reply to Richard Biener from comment #4)
> > the IVOPTs reference is likely due to the fact that while IVOPTs uses
> > uintptrs to create the base pointer the TARGET_MEM_REF contained arithmetic
> > itself is still considered pointer arithmetic (like also here the embedded
> > MEM_REF pointer offsetting) and the base "pointer" cannot be a non-pointer
> > to disable that behavior.
> 
> Does this mean that this is UB and the GCC itself relies on a certain result
> of it?

If GCC through optimizations introduces UB in a code which wasn't there in the
user's code, then it would be a GCC bug and something the compiler needs to
fix.

> Maybe there is a way to not treat a &pos->elem as a pointer arithmetic?
> Maybe there should be one?  I mean, compilers allows users to perform
> computations with offsets of structure fields where the base pointer
> is NULL, and NULL obviously doesn't point to any valid object.  I'm not
> sure if it's an UB or not, but constructions like '&((struct s
> *)NULL)->field'
> are very common.

&((struct s *)NULL)->field is not valid in C or C++, but for many years the
offsetof macro which is valid in those has been defined like that and various
projects occassionally still use the above, so GCC supports those as an
extension (poor man's offsetof).  See e.g. spots with comments like
Cope with user tricks that amount to offsetof.
etc. in GCC sources.  That doesn't change anything about this case, the poor
man's offsetof is folded into a constant very early (well, with variable
offsets in array refs in there could also into an expression, but still
integral expression, the pointer arithmetics is gone from there).

What is the reason why OVS (and kernel) doesn't use 2 variables, one for the
iterator that is a pointer to the prev/next structure only and one assigned
e.g. in the condition from the iterator that is used only when it isn't the
start?
At least if targetting C99 and above (or C++) one can declare one of those
iterators in the for loop init expression...

[Bug rtl-optimization/97071] Fails to CSE / inherit constant pool load

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97071

--- Comment #7 from Richard Biener  ---
Another possibility would be to do this on GIMPLE, creating parts of the
constant pool early with CONST_DECLs and loads from them for constants that are
never legitimate (immediate) in instructions.

[Bug tree-optimization/97064] BB vectorization behaves sub-optimal

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97064

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-* i?86-*-*
 CC||crazylht at gmail dot com

--- Comment #1 from Richard Biener  ---
Also partly because we are not evaluating costing of multiple vector sizes on
x86.

[Bug tree-optimization/103977] ice in try_vectorize_loop_1

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

Jakub Jelinek  changed:

   What|Removed |Added

  Component|c   |tree-optimization
   Target Milestone|--- |12.0
 Ever confirmed|0   |1
 CC||avieira at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Priority|P3  |P1
   Last reconfirmed||2022-01-11

--- Comment #1 from Jakub Jelinek  ---
Started with r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #2 from David Binderman  ---
Reduced C source code is

int *freelist_randomize_list;
int cache_random_seq_create_count_i;
void cache_random_seq_create_count() {
  for (; cache_random_seq_create_count_i; cache_random_seq_create_count_i++)
freelist_randomize_list[cache_random_seq_create_count_i] =
cache_random_seq_create_count_i;
}

[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349

--- Comment #42 from Richard Biener  ---
See PR101641 for an interesting case where eliding a round-trip causes
wrong-code generation.  It's union related so might not apply 1:1 to C++.

[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349

--- Comment #43 from Richard Biener  ---
(In reply to Andrew Downing from comment #41)
> > Thus for types without a non-trivial ctor/dtor you do not need to use
> > placement new.  So take your example and remove the placement new.
> > Does that change its semantics?
> 
> These are C++17 rules.
> 
> 4.5/1) An object is created by a definition, by a new-expression, when
> implicitly changing the active member of a union, or when a temporary object
> is created.
> 
> 6.8/1) The lifetime of an object of type T begins when: storage with the
> proper alignment and size for type T is obtained, and if the object has
> non-vacuous initialization, its initialization is complete.
> 
> double d;
> 
> My interpretation of the above rules would be that only a double object is
> created in the storage for d because T in 6.8/1 is set to double by the
> definition of d. According to these rules the only way to change the dynamic
> type of the object in d's storage would be with placement new (pre C++20).
> memcpy only overwrites the object representation. It doesn't affect it's
> type or lifetime.

What would

  *(long *)&d = 1;

do?  My reading of earlier standards say it starts lifetime of a new object
of type long (the storage of 'd' gets reused).  Following that stmt a read
like

  foo (d);

invokes undefined behavior (it accesses the storage of effective type long
via an effective type of double).  The same example with placement new
would be

  *(new (&d) long) = 1;

and I'm arguing the placement new is not required to start the lifetime
of an object of type long in the storage of 'd'.

[Bug rtl-optimization/97071] Fails to CSE / inherit constant pool load

2022-01-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97071

--- Comment #8 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #7)
> Another possibility would be to do this on GIMPLE, creating parts of the
> constant pool early with CONST_DECLs and loads from them for constants that
> are never legitimate (immediate) in instructions.

How can Gimple know this though?  Gimple does not know what instructions will
be generated.

The constant pools are a very machine-specific concept, poorly suited to
Gimple.

What abstraction does Gimple have for immediates currently?

[Bug tree-optimization/103977] [12 Regression] ice in try_vectorize_loop_1 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103977

--- Comment #3 from Richard Biener  ---
  /* For epilogues start the analysis from the first mode.  The motivation
 behind starting from the beginning comes from cases where the VECTOR_MODES
 array may contain length-agnostic and length-specific modes.  Their
 ordering is not guaranteed, so we could end up picking a mode for the main
 loop that is after the epilogue's optimal mode.  */
  mode_i = 1;

that's only valid if vector_modes.length () > 1 but for the testcase it only
contains the artificial VOIDmode aka autodetect mode.  I wonder why
we don't start from 0?

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug libgomp/103976] Very large overhead for if(false) openmp pragmas

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-11
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  The kernel is outlined even in the if (0) path but
eventually executed serially (which is faster than with using threads).
The only difference with using if (1) is

--- a-t.c.244t.optimized0   2022-01-11 14:07:52.152665056 +0100
+++ a-t.c.244t.optimized2022-01-11 14:07:58.696751625 +0100
@@ -121,7 +121,7 @@
   # sum_17 = PHI 
   # ivtmp_4 = PHI 
   .omp_data_o.1.sum = sum_17;
-  __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 1, 0);
+  __builtin_GOMP_parallel (main._omp_fn.0, &.omp_data_o.1, 0, 0);
   sum_10 = .omp_data_o.1.sum;
   .omp_data_o.1 ={v} {CLOBBER};
   ivtmp_3 = ivtmp_4 + 4294967295;

the loop kernel still executes workload computation and reduction
commoning with atomics.  Without -fopenmp we unroll the kernel
and constant evaluate all 1./j

[Bug rtl-optimization/97071] Fails to CSE / inherit constant pool load

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97071

--- Comment #9 from Richard Biener  ---
(In reply to Segher Boessenkool from comment #8)
> (In reply to Richard Biener from comment #7)
> > Another possibility would be to do this on GIMPLE, creating parts of the
> > constant pool early with CONST_DECLs and loads from them for constants that
> > are never legitimate (immediate) in instructions.
> 
> How can Gimple know this though?  Gimple does not know what instructions will
> be generated.
> 
> The constant pools are a very machine-specific concept, poorly suited to
> Gimple.

Sure.

> What abstraction does Gimple have for immediates currently?

There's no "abstraction" for immediates in GIMPLE, likewise for symbolic
addresses.

[Bug libstdc++/103891] clang-13 fails to compile libstdc++'s std::variant>: error: attempt to use a deleted function

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103891

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:5b417b35824fb5c15e3ee958cb86436b3409ebea

commit r12-6439-g5b417b35824fb5c15e3ee958cb86436b3409ebea
Author: Jonathan Wakely 
Date:   Mon Jan 10 17:28:19 2022 +

libstdc++: Make std::variant work with Clang in C++20 mode [PR103891]

Clang has some bugs with destructors that use constraints to be
conditionally trivial, so disable the P2231R1 constexpr changes to
std::variant unless the compiler is GCC 12 or later.

If/when P2493R0 gets accepted and implemented by G++ we can remove the
__GNUC__ check and use __cpp_concepts >= 202002 instead.

libstdc++-v3/ChangeLog:

PR libstdc++/103891
* include/bits/c++config
(_GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS):
Define.
* include/std/variant (__cpp_lib_variant): Only define C++20
value when the compiler is known to support conditionally
trivial destructors.
* include/std/version (__cpp_lib_variant): Likewise.

[Bug ada/79724] GNAT tools do not respect --program-suffix and --program-prefix

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79724

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Pierre-Marie de Rodat
:

https://gcc.gnu.org/g:7aa3800216ea991050ec904a28c628cd7799021b

commit r12-6459-g7aa3800216ea991050ec904a28c628cd7799021b
Author: Arnaud Charlet 
Date:   Mon Jan 3 10:51:07 2022 +

[Ada] PR ada/79724

gcc/ada/

PR ada/79724
* osint.adb (Program_Name): Fix handling of suffixes.

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:265d3e1a4e3d6c71d354f859302f023dc2d33f62

commit r12-6474-g265d3e1a4e3d6c71d354f859302f023dc2d33f62
Author: Jonathan Wakely 
Date:   Mon Jan 10 20:48:53 2022 +

libstdc++: Install  header for freestanding [PR103726]

The standard says that  should be present for freestanding.
That was intentionally left out of the initial implementation, but can
be done without much trouble. The header should be moved to libsupc++ at
some point in stage 1.

The standard also says that  defines a std::hash
specialization, which was missing from our implementation. That's a
problem for freestanding (see LWG 3653) so only do that for hosted.

We can use concepts to constrain the __coroutine_traits_impl base class
when compiled with concepts enabled. In a pure C++20 implementation we
would not need that base class at all and could just use a constrained
partial specialization of coroutine_traits. But the absence of the
__coroutine_traits_impl base would create an ABI difference
between the non-standard C++14/C++17 support for coroutines and the same
code compiled as C++20. If we drop support for  pre-C++20 we
should revisit this.

libstdc++-v3/ChangeLog:

PR libstdc++/103726
* include/Makefile.am: Install  for freestanding.
* include/Makefile.in: Regenerate.
* include/std/coroutine: Adjust headers and preprocessor
conditions.
(__coroutine_traits_impl): Use concepts when available.
[_GLIBCXX_HOSTED] (hash): Define.

[Bug ada/79724] GNAT tools do not respect --program-suffix and --program-prefix

2022-01-11 Thread charlet at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79724

Arnaud Charlet  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #9 from Arnaud Charlet  ---
Should now be fully fixed!

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |12.0

--- Comment #13 from Jonathan Wakely  ---
Fixed on trunk.

[Bug libstdc++/103891] clang-13 fails to compile libstdc++'s std::variant>: error: attempt to use a deleted function

2022-01-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103891

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|MOVED   |FIXED
   Target Milestone|--- |12.0

--- Comment #7 from Jonathan Wakely  ---
Fixed for trunk with a workaround to disable the new constexpr stuff in
std::variant when not using GCC.

[Bug libgomp/103976] Very large overhead for if(false) openmp pragmas

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103976

--- Comment #2 from Jakub Jelinek  ---
Even with if (0) it has to do that.  if (0) doesn't say act as if the construct
isn't there, it says that it should act as if num_threads is forced to 1.  It
still creates an inactive parallel region and needs to do various actions to
the ICVs etc. that can be observed through various OpenMP APIs.
What we perhaps could do if we see if (0) or num_threads (1) is not use atomics
for the reductions if there aren't task reductions, as with a guaranteed single
thread they aren't needed.  But it would be an optimization for something very
rare, when if is used, it typically is not a compile time value and when we
don't know if it is true or false, we can't optimize away those atomics.

Note, this is unlike the task directive which with mergeable directive can (but
doesn't have to and GCC doesn't implement that right now) avoid the data
environment changes in certain conditions, so
  #pragma omp task mergeable other_clauses
  body;
could be implemented either something like:
  GOMP_task (outlined_body, ...);
as GCC does now, but also as say:
  if (GOMP_taskWHATEVER (outlined_body, ... GOMP_MERGEABLE, ...)
body;
where the library would determine if it is an undeferred or included task (and
does any needed waiting for depend clauses etc.), but instead of creating a new
task in that case it would signal to the caller that the outlined_body won't be
called and that it should invoke body instead.  But probably only if there
aren't any depend clauses, otherwise variables changed during the waiting for
other task could get different values from what they contained originally.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread i.maximets at ovn dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #7 from Ilya Maximets  ---
(In reply to Jakub Jelinek from comment #6)
> What is the reason why OVS (and kernel) doesn't use 2 variables, one for the
> iterator that is a pointer to the prev/next structure only and one assigned
> e.g. in the condition from the iterator that is used only when it isn't the
> start?
> At least if targetting C99 and above (or C++) one can declare one of those
> iterators in the for loop init expression...

The problem is that we need 2 variables and one of them need to be accessible
outside of the loop.  And I don't think it is possible to declare one variable
and only initialize another one.  So, we'll have to ask users of the FOR_EACH
macro to declare an extra dummy variable, AFAICT.  And we'll also need to have
an extra check outside of the loop before calling ovs_list_insert in our
example.
So, basically, both variables has to be accessible outside of the loop.
Or do you know how to avoid this?

One thing that is not clear to me is if the following code has an UB or not:

struct member* pos;
struct ovs_list start;

pos = (struct member *)(void*)((uintptr_t)(&start) - 64);
ovs_list_insert((void *)((uintptr_t)pos + 64), &member->elem);

?

This code works fine.  Basically, the question is: can we cast and store
the random (aligned) integer to a pointer type, if we're not going to
perform any kind of pointer arithmetic (using the integer arithmetic for
the ovs_list_insert) nor dereference it, unless it points to the actual
valid object?

If we can do that, then we can maybe avoid introduction of dummy variables
by re-working the loop to only use uintptr_t operations.  Inside the loop
the pointer is always valid, so that should not be a problem (will it?).

Outside of the loop we'll have to use the uintptr_t arithmetic if it's
possible to have a pointer to a non-object (e.g. 'start' with an offset),
but programmer should know that.  We will still have to manually re-check
a lot of existing code.  For now I don't see the solution that will allow
us to avoid manual re-checking, unfortunately.

[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72

2022-01-11 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #6 from Jeffrey A. Law  ---
Richard -- if you need an alternate testcase, reach out -- I've seen what is
likely the same failure on lm32-elf building libgcc.  I have the un-reduced .i
if it'd be useful.

[Bug c++/102670] Erroneous "missing template arguments" message for wrapper of ADL function template

2022-01-11 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102670

Patrick Palka  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Patrick Palka  ---
Fixed for GCC 12, thanks for the bug report.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #8 from Richard Biener  ---
(In reply to Ilya Maximets from comment #7)
> One thing that is not clear to me is if the following code has an UB or not:
> 
> struct member* pos;
> struct ovs_list start;
> 
> pos = (struct member *)(void*)((uintptr_t)(&start) - 64);
> ovs_list_insert((void *)((uintptr_t)pos + 64), &member->elem);
> 
> ?
>
> This code works fine.  Basically, the question is: can we cast and store
> the random (aligned) integer to a pointer type, if we're not going to
> perform any kind of pointer arithmetic (using the integer arithmetic for
> the ovs_list_insert) nor dereference it, unless it points to the actual
> valid object?

That should be OK, yes.

[Bug tree-optimization/103603] [11 Regression] stack overflow on ranger for huge program, but OK for legacy

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103603

--- Comment #7 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Andrew Macleod
:

https://gcc.gnu.org/g:3760d9d7b5410f16236ed15d02ec1d8a7d16fddb

commit r11-9452-g3760d9d7b5410f16236ed15d02ec1d8a7d16fddb
Author: Andrew MacLeod 
Date:   Tue Dec 7 12:09:33 2021 -0500

Directly resolve range_of_stmt dependencies. (Port of PR 103231/103464)

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with.  This can cause very deep call chains
when
satisfied via the normal API.  Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have
happened
via the normal API calls.

PR tree-optimization/103603
gcc/
* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt
stack.
(gimple_ranger::~gimple_ranger): New.
(gimple_ranger::range_of_stmt): Process dependencies if they have
no
global cache entry.
(gimple_ranger::prefill_name): New.
(gimple_ranger::prefill_stmt_dependencies): New.
* gimple-range.h (class gimple_ranger): Add prototypes.

[Bug tree-optimization/103464] [12 Regression] ICE on valid code at -O1 (with -ftree-vrp and ulimit -s 512) on x86_64-linux-gnu: Segmentation fault since r12-5522-g661c02e54ea72fb5

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103464

--- Comment #19 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Andrew Macleod
:

https://gcc.gnu.org/g:3760d9d7b5410f16236ed15d02ec1d8a7d16fddb

commit r11-9452-g3760d9d7b5410f16236ed15d02ec1d8a7d16fddb
Author: Andrew MacLeod 
Date:   Tue Dec 7 12:09:33 2021 -0500

Directly resolve range_of_stmt dependencies. (Port of PR 103231/103464)

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with.  This can cause very deep call chains
when
satisfied via the normal API.  Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have
happened
via the normal API calls.

PR tree-optimization/103603
gcc/
* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt
stack.
(gimple_ranger::~gimple_ranger): New.
(gimple_ranger::range_of_stmt): Process dependencies if they have
no
global cache entry.
(gimple_ranger::prefill_name): New.
(gimple_ranger::prefill_stmt_dependencies): New.
* gimple-range.h (class gimple_ranger): Add prototypes.

[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72

2022-01-11 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
(In reply to Jeffrey A. Law from comment #6)
> Richard -- if you need an alternate testcase, reach out -- I've seen what is
> likely the same failure on lm32-elf building libgcc.  I have the un-reduced
> .i if it'd be useful.
It's OK, thanks.  It's a simple “don't do this for old reload”
thing, but first I needed a few attempts to get a full testsuite
run for cris-elf, and then I changed my mind about where exactly
the shortcut should go.  Hope to post a patch later today.

Re one of the things I hit getting the cris-elf testsuite run:
it would be good to make the port rtl-checking clean if possible.
At the moment, building with --enable-checking=yes,extra,rtl
(my usual flags) falls over on cris.c:395:

insn = as_a  XVECEXP (pat, 0, 1);

(missing parens).

[Bug analyzer/102692] -Wanalyzer-null-dereference false alarm with (!p || q || !p->next)

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102692

--- Comment #3 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:4f34f8cc1d064bfaaed723312c71e92f495d429b

commit r12-6476-g4f34f8cc1d064bfaaed723312c71e92f495d429b
Author: David Malcolm 
Date:   Fri Jan 7 17:44:23 2022 -0500

analyzer: fix false +ve on bitwise binops (PR analyzer/102692)

PR analyzer/102692 reports a false positive at -O2 from
-Wanalyzer-null-dereference on:
  if (!p || q || !p->next)

At the gimple level, -O2 has converted the first || into bitwise or
controlling a jump:
  _4 = _2 | _3;
  if (_4 != 0)
and a recursive call has been converted to iteration.  The analyzer hits
the symbolic value complexity limit whilst analyzing the iteration and
hits a case where _2 is (_Bool)1 (i.e. true) and _3 (i.e. q) is
UNKNOWN(_Bool).

There are two issues leading to the false positive; fixing either issue
fixes the false positive; this patch fixes both for good measure:

(a) The analyzer erronously treats bitwise ops on UNKNOWN(_Bool) as
UNKNOWN,
even for case like (1 | UNKNOWN) where we know the result, leading to
bogus edges in the exploded graph.  The patch fixes these cases.

(b) The state-handling code creates "UNKNOWN" symbolic values, as a way
to give up when symbolic expressions get too complicated, and in various
other cases.  This makes sense when building the exploded graph, since
we want the analysis to terminate, but doesn't make sense when checking
the feasibility along a specific path, where we want precision.  The patch
prevents all use of "unknown" symbolic values when performing feasibility
checking of a path (before it merely stopped doing complexity-checking),
by creating a unique placeholder_svalue instead.

This fixes the -Wanalyzer-null-dereference false positive.

Unfortunately, with GCC 12 there's also a false positive from
-Wanalyzer-use-of-uninitialized-value on this code, which is a separate
issue that the patch does not fix.

gcc/analyzer/ChangeLog:
PR analyzer/102692
* diagnostic-manager.cc
(class auto_disable_complexity_checks): Rename to...
(class auto_checking_feasibility): ...this, updating
the calls accordingly.
(epath_finder::explore_feasible_paths): Update for renaming.
* region-model-manager.cc
(region_model_manager::region_model_manager): Update for change
from
m_check_complexity to m_checking_feasibility.
(region_model_manager::reject_if_too_complex): Likewise.
(region_model_manager::get_or_create_unknown_svalue): Handle
m_checking_feasibility.
(region_model_manager::create_unique_svalue): New.
(region_model_manager::maybe_fold_binop): Handle BIT_AND_EXPR and
BIT_IOR_EXPRs on booleans where we know the result.
* region-model.cc (test_binop_svalue_folding): Add test coverage
for the above.
* region-model.h (region_model_manager::create_unique_svalue): New
decl.
(region_model_manager::enable_complexity_check): Replace with...
(region_model_manager::begin_checking_feasibility): ...this.
(region_model_manager::disable_complexity_check): Replace with...
(region_model_manager::end_checking_feasibility): ...this.
(region_model_manager::m_check_complexity): Replace with...
(region_model_manager::m_checking_feasibility): ...this.
(region_model_manager::m_managed_dynamic_svalues): New field.

gcc/testsuite/ChangeLog:
PR analyzer/102692
* gcc.dg/analyzer/pr102692.c: New test.

Signed-off-by: David Malcolm 

[Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-11 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #43 from rsandifo at gcc dot gnu.org  
---
(In reply to hubicka from comment #42)
> I also see a 6.69% regression on x64 with -Ofast -march=native -flto
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0
I can reproduce this with -Ofast -flto -march=znver3 (but not running
on a Zen 3).  It looks like it's due to g:d3ff7420e94 instead though
(sorry Andre!).  With a 3-iteration run, I see a 6.2% regression after
that revision compared with before it.

It would be great if someone more familiar than me with x86
could confirm the bisection though.

> and perhaps 3-5% on sphinx
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.280.0
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.280.0
I'll look at these next.

> For non-spec benchmarks spec there is a regression on nbench
> https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=26.645.1
> There are also large changes in tsvc
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
> it may be noise since kernels are tiny, but for example x293 reproduces
> both on kabylake and zen by about 80-90% regression that may be easy to
> track (the kernel is included in the testsuite). Same regression is not
> seen on zen3, so may be an ISA specific or so.
To summarise what we discussed on irc (for the record): it looks like
the s293 regression is in the noise, like you say.  I can't convince
GCC to generate different code before and after the IRA patches for that.
I haven't looked at the other tsvc tests yet.

[Bug bootstrap/103974] [12 Regression] ICE in ira_flattening building libstdc++ with r12-6415-g01f3e6a40e72

2022-01-11 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103974

--- Comment #8 from Jeffrey A. Law  ---
ACK.  I wandered through the tester this morning, the vast majority of the
current failures are the ira_flattening ICE.  Though I think there's likely one
other ICE in IRA (frv-elf, ICE in check_allocation).

I'll restart everything once you've got your patch ready and file fresh bugs
for anything that's still problematical after the ira_flattening ICE is
resolved.

[Bug target/103804] ICE: 'global_options' are modified in local context

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103804

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Martin Liška  ---
Fixed on master, can't reproduce.

--- Comment #3 from Martin Liška  ---
Fixed on master, can't reproduce.

[Bug target/103804] ICE: 'global_options' are modified in local context

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103804

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Martin Liška  ---
Fixed on master, can't reproduce.

--- Comment #3 from Martin Liška  ---
Fixed on master, can't reproduce.

[Bug tree-optimization/103821] [12 Regression] huge compile time (jump threading) at -O3 for simple code since r12-4790-g4b3a325f07acebf47e82de227ce1d5ba62f5bcae

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103821

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:71b72132011a47a4b39950d95718f18d1218978c

commit r12-6477-g71b72132011a47a4b39950d95718f18d1218978c
Author: Andrew MacLeod 
Date:   Mon Jan 10 13:33:44 2022 -0500

Prevent exponential range calculations.

Produce a summary result for any operation involving too many subranges.

PR tree-optimization/103821
* range-op.cc (range_operator::fold_range): Only do precise ranges
when there are not too many subranges.

[Bug tree-optimization/103821] [12 Regression] huge compile time (jump threading) at -O3 for simple code since r12-4790-g4b3a325f07acebf47e82de227ce1d5ba62f5bcae

2022-01-11 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103821

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #9 from Jakub Jelinek  ---
(In reply to Ilya Maximets from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > What is the reason why OVS (and kernel) doesn't use 2 variables, one for the
> > iterator that is a pointer to the prev/next structure only and one assigned
> > e.g. in the condition from the iterator that is used only when it isn't the
> > start?
> > At least if targetting C99 and above (or C++) one can declare one of those
> > iterators in the for loop init expression...
> 
> The problem is that we need 2 variables and one of them need to be accessible
> outside of the loop.  And I don't think it is possible to declare one
> variable
> and only initialize another one.

If you only need to use one variable outside of the loop and not the other one,
that should be doable.  If you use one in some spots and another in other spots
but never both, you could have different macros for those 2 versions.

E.g. the one where the list iterator is used inside of the loop only and the
var pointing to the containing objects could look like:
for (struct ovs_list *iter = (pos = NULL, &start); iter != (&start)
 && (((pos) = ((typeof(pos))(void*)((uintptr_t)((iter->next) -
   __builtin_offsetof(struct member,
  elem), 1);
 iter = iter->next, pos = NULL)
which would get you after the loop pos = NULL if break; wasn't done and
pos non-NULL otherwise.
But in your testcase you actually need to use the other var after the loop,
so it would need to be done the other way around, but then the user variable
would need to be struct ovs_list * and the macro would need to be told in a
different way what type the var declared in the loop should have.

> One thing that is not clear to me is if the following code has an UB or not:
> 
> struct member* pos;
> struct ovs_list start;
> 
> pos = (struct member *)(void*)((uintptr_t)(&start) - 64);
> ovs_list_insert((void *)((uintptr_t)pos + 64), &member->elem);

It still creates a pointer out of something that doesn't point to such an
object or points to some unrelated one, doesn't necessarily have sufficient
alignment etc., so I think it is UB too, but perhaps the compiler at least now
will handle it the way you expect.  A lot of programs including e.g. POSIX rely
on at least
(void *) -1 and similar pointer constants to be usable in equality comparisons
(e.g. MAP_FAILED macro).

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Siddhesh Poyarekar
:

https://gcc.gnu.org/g:026d44cbbd42653908f9faf6b80773f03e1bb1a0

commit r12-6478-g026d44cbbd42653908f9faf6b80773f03e1bb1a0
Author: Siddhesh Poyarekar 
Date:   Tue Jan 11 16:07:29 2022 +0530

tree-optimization/103961: Never compute offset for -1 size

Never try to compute size for offset when the object size is -1, which
is either unknown maximum or uninitialized minimum irrespective of the
osi->pass number.

gcc/ChangeLog:

PR tree-optimization/103961
* tree-object-size.c (plus_stmt_object_size): Always avoid
computing offset for -1 size.

gcc/testsuite/ChangeLog:

PR tree-optimization/103961
* gcc.dg/pr103961.c: New test case.

Co-authored-by: Jakub Jelinek 
Signed-off-by: Siddhesh Poyarekar 

[Bug middle-end/70090] add non-constant variant of __builtin_object_size for _FORTIFY_SOURCE and -fsanitize=object-size

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Siddhesh Poyarekar
:

https://gcc.gnu.org/g:404c787e2bfe8cae666b075ed903990ea452220e

commit r12-6479-g404c787e2bfe8cae666b075ed903990ea452220e
Author: Siddhesh Poyarekar 
Date:   Tue Jan 11 05:27:51 2022 +0530

tree-object-size: Support dynamic sizes in conditions

Handle GIMPLE_PHI and conditionals specially for dynamic objects,
returning PHI/conditional expressions instead of just a MIN/MAX
estimate.

This makes the returned object size variable for loops and conditionals,
so tests need to be adjusted to look for precise size in some cases.
builtin-dynamic-object-size-5.c had to be modified to only look for
success in maximum object size case and skip over the minimum object
size tests because the result is no longer a compile time constant.

I also added some simple tests to exercise conditionals with dynamic
object sizes.

gcc/ChangeLog:

PR middle-end/70090
* builtins.c (fold_builtin_object_size): Adjust for dynamic size
expressions.
* tree-object-size.c: Include gimplify-me.h.
(struct object_size_info): New member UNKNOWNS.
(size_initval_p, size_usable_p, object_sizes_get_raw): New
functions.
(object_sizes_get): Return suitable gimple variable for
object size.
(bundle_sizes): New function.
(object_sizes_set): Use it and handle dynamic object size
expressions.
(object_sizes_set_temp): New function.
(size_for_offset): Adjust for dynamic size expressions.
(emit_phi_nodes, propagate_unknowns, gimplify_size_expressions):
New functions.
(compute_builtin_object_size): Call gimplify_size_expressions
for OST_DYNAMIC.
(dynamic_object_size): New function.
(cond_expr_object_size): Use it.
(phi_dynamic_object_size): New function.
(collect_object_sizes_for): Call it for OST_DYNAMIC.  Adjust to
accommodate dynamic object sizes.

gcc/testsuite/ChangeLog:

PR middle-end/70090
* gcc.dg/builtin-dynamic-object-size-0.c: New tests.
* gcc.dg/builtin-dynamic-object-size-10.c: Add comment.
* gcc.dg/builtin-dynamic-object-size-5-main.c: New file.
* gcc.dg/builtin-dynamic-object-size-5.c: Use it and change test
to dg-do run.
* gcc.dg/builtin-object-size-5.c [!N]: Define N.
(test1, test2, test3, test4) [__builtin_object_size]: Expect
exact result for __builtin_dynamic_object_size.
* gcc.dg/builtin-object-size-1.c [__builtin_object_size]: Expect
exact size expressions for __builtin_dynamic_object_size.
* gcc.dg/builtin-object-size-2.c [__builtin_object_size]:
Likewise.
* gcc.dg/builtin-object-size-3.c [__builtin_object_size]:
Likewise.
* gcc.dg/builtin-object-size-4.c [__builtin_object_size]:
Likewise.

Signed-off-by: Siddhesh Poyarekar 

[Bug middle-end/70090] add non-constant variant of __builtin_object_size for _FORTIFY_SOURCE and -fsanitize=object-size

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Siddhesh Poyarekar
:

https://gcc.gnu.org/g:ea19c8f33a3a8d2b52f89f1fade0a21e3c779190

commit r12-6480-gea19c8f33a3a8d2b52f89f1fade0a21e3c779190
Author: Siddhesh Poyarekar 
Date:   Tue Jan 11 19:51:37 2022 +0530

tree-object-size: Handle function parameters

Handle hints provided by __attribute__ ((access (...))) to compute
dynamic sizes for objects.

gcc/ChangeLog:

PR middle-end/70090
* tree-object-size.c: Include tree-dfa.h.
(parm_object_size): New function.
(collect_object_sizes_for): Call it.

gcc/testsuite/ChangeLog:

PR middle-end/70090
* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple,
test_parmsz_scaled, test_parmsz_unknown): New functions.
(main): Call them.  Add new arguments argc and argv.

Signed-off-by: Siddhesh Poyarekar 

[Bug middle-end/70090] add non-constant variant of __builtin_object_size for _FORTIFY_SOURCE and -fsanitize=object-size

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Siddhesh Poyarekar
:

https://gcc.gnu.org/g:1f07810659616221c3bf4177c1fc2ca3607f7728

commit r12-6481-g1f07810659616221c3bf4177c1fc2ca3607f7728
Author: Siddhesh Poyarekar 
Date:   Tue Jan 11 19:52:43 2022 +0530

tree-object-size: Handle GIMPLE_CALL

Handle non-constant expressions in GIMPLE_CALL arguments.  Also handle
alloca.

gcc/ChangeLog:

PR middle-end/70090
* tree-object-size.c (alloc_object_size): Make and return
non-constant size expression.
(call_object_size): Return expression or unknown based on
whether dynamic object size is requested.

gcc/testsuite/ChangeLog:

PR middle-end/70090
* gcc.dg/builtin-dynamic-object-size-0.c: Add new tests.
* gcc.dg/builtin-object-size-1.c (test1)
[__builtin_object_size]: Alter expected result for dynamic
object size.
* gcc.dg/builtin-object-size-2.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-3.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-4.c (test1)
[__builtin_object_size]: Likewise.

Signed-off-by: Siddhesh Poyarekar 

[Bug middle-end/70090] add non-constant variant of __builtin_object_size for _FORTIFY_SOURCE and -fsanitize=object-size

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Siddhesh Poyarekar
:

https://gcc.gnu.org/g:06bc1b0c539e3a60692d7432d15e701c38610f80

commit r12-6482-g06bc1b0c539e3a60692d7432d15e701c38610f80
Author: Siddhesh Poyarekar 
Date:   Tue Jan 11 08:08:27 2022 +0530

tree-object-size: Dynamic sizes for ADDR_EXPR

Allow returning dynamic expressions from ADDR_EXPR for
__builtin_dynamic_object_size and also allow offsets to be dynamic.

gcc/ChangeLog:

PR middle-end/70090
* tree-object-size.c (size_valid_p): New function.
(size_for_offset): Remove OFFSET constness assertion.
(addr_object_size): Build dynamic expressions for object
sizes and use size_valid_p to decide if it is valid for the
given OBJECT_SIZE_TYPE.
(compute_builtin_object_size): Allow dynamic offsets when
computing size at O0.
(call_object_size): Call size_valid_p.
(plus_stmt_object_size): Allow non-constant offset and use
size_valid_p to decide if it is valid for the given
OBJECT_SIZE_TYPE.

gcc/testsuite/ChangeLog:

PR middle-end/70090
* gcc.dg/builtin-dynamic-object-size-0.c: Add new tests.
* gcc.dg/builtin-object-size-1.c (test1)
[__builtin_object_size]: Adjust expected output for dynamic
object sizes.
* gcc.dg/builtin-object-size-2.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-3.c (test1)
[__builtin_object_size]: Likewise.
* gcc.dg/builtin-object-size-4.c (test1)
[__builtin_object_size]: Likewise.

Signed-off-by: Siddhesh Poyarekar 

[Bug other/103617] Debugging gcc: can't use 'pp' command for irange

2022-01-11 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103617

--- Comment #6 from Andrew Macleod  ---
Is this actually a bug?  I don't believe wide_int works either?  PP doesn't
work with class instances...


(gdb) p lh.lower_bound(0)
$3 = { = {val = {-2147483648, 18992502, 140737232043872,
140737233406440, 140737232152040, 18986914, 140737233670232, 140737488249944,
140737233631992}, len = 1, precision = 32}, 
  static is_sign_extended = true}
(gdb) pp lh.lower_bound(0)
Attempt to take address of value not located in memory.

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

--- Comment #16 from Siddhesh Poyarekar  ---
Should be fixed with that patch.  May I close this or wait for confirmation
from the reporter?

[Bug middle-end/77608] missing protection on trivially detectable runtime buffer overflow

2022-01-11 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77608

--- Comment #8 from Siddhesh Poyarekar  ---
The test case for pr 103961 exposed a flaw in my patch, where assuming
wholesize isn't always safe or at least would need more careful consideration. 
I need to think this through some more.

[Bug sanitizer/103978] New: AddressSanitizer CHECK failed ../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)

2022-01-11 Thread contino at epigenesys dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103978

Bug ID: 103978
   Summary: AddressSanitizer CHECK failed
../../../../src/libsanitizer/asan/asan_thread.cpp:367
"((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0,
0x0)
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: contino at epigenesys dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

Created attachment 52164
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52164&action=edit
This is the source file

On Debian Bookworm x86-64 with gcc 11.2.0.
Compiled with: gcc -o test test.c -fsanitize=address -pthread

This bug is triggered by the pthread_join() if nanosleep is called by a
function. With the optimization O3 there is no error.
The full code is in the attachment.

I have got the error:

=
==98391==AddressSanitizer CHECK failed:
../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] ==
kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
#0 0x7feb0a48fe6b in AsanCheckFailed
../../../../src/libsanitizer/asan/asan_rtl.cpp:74
#1 0x7feb0a4ae84e in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)
../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78
#2 0x7feb0a494864 in __asan::AsanThread::GetStackFrameAccessByAddr(unsigned
long, __asan::AsanThread::StackFrameAccess*)
../../../../src/libsanitizer/asan/asan_thread.cpp:367
#3 0x7feb0a406bdb in __asan::GetStackAddressInformation(unsigned long,
unsigned long, __asan::StackAddressDescription*)
../../../../src/libsanitizer/asan/asan_descriptions.cpp:203
#4 0x7feb0a407e98 in
__asan::AddressDescription::AddressDescription(unsigned long, unsigned long,
bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:455
#5 0x7feb0a407e98 in
__asan::AddressDescription::AddressDescription(unsigned long, unsigned long,
bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:439
#6 0x7feb0a40a3b4 in __asan::ErrorGeneric::ErrorGeneric(unsigned int,
unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned
long) ../../../../src/libsanitizer/asan/asan_errors.cpp:389
#7 0x7feb0a48f4c6 in __asan::ReportGenericError(unsigned long, unsigned
long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool)
../../../../src/libsanitizer/asan/asan_report.cpp:476
#8 0x7feb0a42b35b in __interceptor_sigaltstack
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:9986
#9 0x7feb0a4a35cd in __sanitizer::UnsetAlternateSignalStack()
../../../../src/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:195
#10 0x7feb0a493dbc in __asan::AsanThread::Destroy()
../../../../src/libsanitizer/asan/asan_thread.cpp:104
#11 0x7feb0a3bff10 in __nptl_deallocate_tsd.part.0
(/lib/x86_64-linux-gnu/libpthread.so.0+0x7f10)
#12 0x7feb0a3c0da0 in start_thread
(/lib/x86_64-linux-gnu/libpthread.so.0+0x8da0)
#13 0x7feb0a2ebb6e in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfcb6e)

[Bug sanitizer/103978] AddressSanitizer CHECK failed ../../../../src/libsanitizer/asan/asan_thread.cpp:367 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)

2022-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103978

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-11
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Confirmed. Also GCC 7 crashes:

gcc-7 pr103978.C -fsanitize=address -g && ./a.out
==2184==ERROR: AddressSanitizer failed to allocate 0x0 (0) bytes of
SetAlternateSignalStack (error code: 22)
==2184==Process memory map follows:

Note clang is fine.

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread manuel.lauss at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

--- Comment #17 from Manuel Lauss  ---
(In reply to Siddhesh Poyarekar from comment #16)
> Should be fixed with that patch.  May I close this or wait for confirmation
> from the reporter?

I can no longer reproduce the original issue.

[Bug tree-optimization/103961] [12 Regression] gcc-12 apparently miscompiles libcap's cap_to_text() function since r12-6030-g422f9eb7011b76c1

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103961

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #18 from Jakub Jelinek  ---
Fixed then.

[Bug libstdc++/103891] clang-13 fails to compile libstdc++'s std::variant>: error: attempt to use a deleted function

2022-01-11 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103891

--- Comment #8 from Sergei Trofimovich  ---
That allowed me to build mold-1.0.0 with clang-13 + libstdc++. Thank you!

[Bug c/103979] New: asm goto is not considered volatile

2022-01-11 Thread gareth.webb+gccbugzilla at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979

Bug ID: 103979
   Summary: asm goto is not considered volatile
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gareth.webb+gccbugzilla at outlook dot com
  Target Milestone: ---

Created attachment 52165
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52165&action=edit
Minimal reproduction of bug

I have discovered a bug with GCC's 'asm goto'. It is documented that "note that
an asm goto statement is always implicitly considered volatile." but it is not.
It is using the same heuristic as non-goto asm (no outputs => volatile. outputs
=> not-volatile, unless explicitly specified).

This can result in the optimizer removing the asm block completely.

Confirmed the bug is present on Ubuntu 20.10's gcc package, as well as my own
build from source (tag: releases/gcc-11.2.0).

A minimal pre-processed reproduction is attached. Compile with gcc -O2 -c
asmgoto.c -o asmgoto.o, observe the disassembled output contains nothing but
endb64/ret, no jmp.

I also received an ICE at one point when trying to create a minimal
reproduction, but could not reproduce. It's included below, but as I can't
reproduce I don't know how useful it will be.

Ubuntu package:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.2.0-7ubuntu2'
--with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,
ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-share
d --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap
--enabl
e-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --ena
ble-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiar
ch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload
-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr
--without-cu
da-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enab
le-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2) 

Myy own build:

Using built-in specs.
COLLECT_GCC=x86_64-elf-gcc
COLLECT_LTO_WRAPPER=/home/gwebb/opt/cross/libexec/gcc/x86_64-elf/11.2.0/lto-wrapper
Target: x86_64-elf
Configured with: ../gcc/configure --target=x86_64-elf
--prefix=/home/gwebb/opt/cross --disable-nls --enable-languages=c
--without-headers
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 11.2.0 (GCC)

Possibly related ICE?

during RTL pass: fwprop1
gdt.c: In function 'flush_gdt':
gdt.c:66:1: internal compiler error: in purge_dead_edges, at cfgrtl.c:3346
  66 | }
 | ^
0x5d253d purge_dead_edges(basic_block_def*)
   ../../gcc/gcc/cfgrtl.c:3346
0x143dc1f delete_trivially_dead_insns(rtx_insn*, int)
   ../../gcc/gcc/cse.c:7178
0x145baa9 fwprop_done
   ../../gcc/gcc/fwprop.c:917
0x145baa9 fwprop
   ../../gcc/gcc/fwprop.c:1001
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug fortran/97390] [OpenACC] 'async' clause on 'data' construct

2022-01-11 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97390

Thomas Schwinge  changed:

   What|Removed |Added

 Status|SUSPENDED   |NEW
   Assignee|tschwinge at gcc dot gnu.org   |unassigned at gcc dot 
gnu.org

--- Comment #9 from Thomas Schwinge  ---
OpenACC 3.2, 1.16 "Changes from Version 3.1 to 3.2": "The 'async', 'wait' [...]
clauses may be specified on 'data' constructs."  (Not yet implemented in GCC.)

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread i.maximets at ovn dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #10 from Ilya Maximets  ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Ilya Maximets from comment #7)
> > (In reply to Jakub Jelinek from comment #6)
> > > What is the reason why OVS (and kernel) doesn't use 2 variables, one for 
> > > the
> > > iterator that is a pointer to the prev/next structure only and one 
> > > assigned
> > > e.g. in the condition from the iterator that is used only when it isn't 
> > > the
> > > start?
> > > At least if targetting C99 and above (or C++) one can declare one of those
> > > iterators in the for loop init expression...
> > 
> > The problem is that we need 2 variables and one of them need to be 
> > accessible
> > outside of the loop.  And I don't think it is possible to declare one
> > variable
> > and only initialize another one.
> 
> If you only need to use one variable outside of the loop and not the other
> one,
> that should be doable.  If you use one in some spots and another in other
> spots but never both, you could have different macros for those 2 versions.
> 
> E.g. the one where the list iterator is used inside of the loop only and the
> var pointing to the containing objects could look like:
> for (struct ovs_list *iter = (pos = NULL, &start); iter != (&start)
>  && (((pos) = ((typeof(pos))(void*)((uintptr_t)((iter->next) -
>__builtin_offsetof(struct member,
>   elem), 1);
>  iter = iter->next, pos = NULL)
> which would get you after the loop pos = NULL if break; wasn't done and
> pos non-NULL otherwise.

Hmm.  I missed the possibility of a comma trick in the initialization part.
I think, we can try to re-write our macros this way.  Thanks!

> But in your testcase you actually need to use the other var after the loop,
> so it would need to be done the other way around, but then the user variable
> would need to be struct ovs_list * and the macro would need to be told in a
> different way what type the var declared in the loop should have.

Doesn't sound very intuitive.  I guess, it's easier to just add a NULL pointer
check after the loop, i.e. ovs_list_insert(pos ? &pos->elem : &start, ...).
Assuming our unit tests will catch all NULL-pointer dereferences (they will
likely not, but still), that could be a semi-automated way to find all the
problematic code parts.
Not very friendly for dependent projects though.

> 
> > One thing that is not clear to me is if the following code has an UB or not:
> > 
> > struct member* pos;
> > struct ovs_list start;
> > 
> > pos = (struct member *)(void*)((uintptr_t)(&start) - 64);
> > ovs_list_insert((void *)((uintptr_t)pos + 64), &member->elem);
> 
> It still creates a pointer out of something that doesn't point to such an
> object or points to some unrelated one, doesn't necessarily have sufficient
> alignment etc., so I think it is UB too, but perhaps the compiler at least
> now will handle it the way you expect.  A lot of programs including e.g.
> POSIX rely on at least
> (void *) -1 and similar pointer constants to be usable in equality
> comparisons (e.g. MAP_FAILED macro).

We also have this kind of stuff:

  static const struct ovs_list OVS_LIST_POISON =
  { (struct ovs_list *) (UINTPTR_MAX / 0xf * 0xc),
(struct ovs_list *) (UINTPTR_MAX / 0xf * 0xc) };

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #11 from Jakub Jelinek  ---
(In reply to Ilya Maximets from comment #10)
> Doesn't sound very intuitive.  I guess, it's easier to just add a NULL
> pointer
> check after the loop, i.e. ovs_list_insert(pos ? &pos->elem : &start, ...).

That would be certainly cleaner.

[Bug tree-optimization/103964] [9/10/11/12 Regression] OVS miscompilation since r0-92313-g5006671f1aaa63cd

2022-01-11 Thread i.maximets at ovn dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103964

--- Comment #12 from Ilya Maximets  ---
> (In reply to Ilya Maximets from comment #7)
> > One thing that is not clear to me is if the following code has an UB or not:
> > 
> > struct member* pos;
> > struct ovs_list start;
> > 
> > pos = (struct member *)(void*)((uintptr_t)(&start) - 64);
> > ovs_list_insert((void *)((uintptr_t)pos + 64), &member->elem);
> > 
> > ?
> >
> > This code works fine.  Basically, the question is: can we cast and store
> > the random (aligned) integer to a pointer type, if we're not going to
> > perform any kind of pointer arithmetic (using the integer arithmetic for
> > the ovs_list_insert) nor dereference it, unless it points to the actual
> > valid object?
> 

(In reply to Richard Biener from comment #8)
> That should be OK, yes.

(In reply to Jakub Jelinek from comment #9)
> It still creates a pointer out of something that doesn't point to such an
> object or points to some unrelated one, doesn't necessarily have sufficient
> alignment etc., so I think it is UB too, but perhaps the compiler at least
> now will handle it the way you expect.

I think, we'll try to re-write the loops to have 2 distinct variables, but
for the case we'll need the uintptr_t solution too as a backup plan, I have
one more question.  How bad is this:

struct member* pos;
struct ovs_list start;
pos = (struct member *)(void*)((uintptr_t)(&start) - 64);

struct ovs_list *list;
list = (typeof(&pos->elem))(void *)((uintptr_t)pos + 64);

?

Basically, how much undefined is the 'typeof(&pos->elem)' construction?
(I just tried to prototype the part of the solution and I found that I need
a correct cast inside the macro, so the code will not look way too ugly.)

In general, I think, typeof() should not care about the actual value
of a pointer, but as we concluded the '&pos->elem' evaluation itself can
be harmful, so I don't know.

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #14 from cqwrteur  ---
(In reply to Jonathan Wakely from comment #13)
> Fixed on trunk.

What about source_location?

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #14 from cqwrteur  ---
(In reply to Jonathan Wakely from comment #13)
> Fixed on trunk.

What about source_location?

--- Comment #15 from cqwrteur  ---
(In reply to Jonathan Wakely from comment #13)
> Fixed on trunk.

What about source_location?

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #16 from Jonathan Wakely  ---
This is the first time anybody has pointed out it's missing, that's why it's
still missing.

[Bug libstdc++/103726] --disable-hosted-libstdcxx (freestanding C++) does not provide as what standard requires

2022-01-11 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103726

--- Comment #17 from cqwrteur  ---
(In reply to Jonathan Wakely from comment #16)
> This is the first time anybody has pointed out it's missing, that's why it's
> still missing.

https://en.cppreference.com/w/cpp/freestanding
Btw
ciso646, cstdalign and cstdbool have been removed since C++20.

These headers have no points due to features being builtin into the C++
language.
However, iso646.h stdalign.h and stdbool.h are still there for C compatibility.

I suggest we remove them from freestanding.
libstdc++'s freestanding were never truly available, removing them causing no
issues but conforming more standard I think.

  1   2   >