[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen{2,3} since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #5 from Haochen Jiang  ---
I have tried on Zen3 Client machine with -Ofast -flto -march=native/znver2 and
see no regression on both options before and after the commit.

[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358

--- Comment #26 from Sam James  ---
I'm going to try clean it up with my poor C++, as I can't follow it at all
right now.

[Bug target/120943] [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since r16-1643-gd073bb6cfc219d

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943

H.J. Lu  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from H.J. Lu  ---


*** This bug has been marked as a duplicate of bug 120683 ***

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 120943, which changed state.

Bug 120943 Summary: [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since 
r16-1643-gd073bb6cfc219d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

[Bug target/120683] vector_loop/unrolled_loop generates poor codes on memset/memcpy

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120683

H.J. Lu  changed:

   What|Removed |Added

 CC||pheeck at gcc dot gnu.org

--- Comment #4 from H.J. Lu  ---
*** Bug 120943 has been marked as a duplicate of this bug. ***

[Bug ipa/104457] ipa-cp with autofdo: internal compiler error in update_specialized_profile, at ipa-cp.c:4422

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457

Jan Hubicka  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED
 CC||hubicka at gcc dot gnu.org

--- Comment #3 from Jan Hubicka  ---
I believe update_specialized_profile should now be safe WRT ICE on
contradicting profiles. I can build SPEC on x86 reliably (and we now run daily
testing at LNT https://lnt.opensuse.org/db_default/v4/SPEC/recent_activity as
autofdo). I have no AARCH64 setup but hope at IPA level they are similar
enough.

Last fix was for PR119924

[Bug tree-optimization/120867] [metabug] AutoFDO issues

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120867
Bug 120867 depends on bug 104457, which changed state.

Bug 104457 Summary: ipa-cp with autofdo: internal compiler error in 
update_specialized_profile, at ipa-cp.c:4422
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104457

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug sanitizer/120950] New: Incorrect codegen for ASan when -mpreferred-stack-boundary < 3

2025-07-04 Thread yshuiv7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120950

Bug ID: 120950
   Summary: Incorrect codegen for ASan when
-mpreferred-stack-boundary < 3
   Product: gcc
   Version: 15.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yshuiv7 at gmail dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org
  Target Milestone: ---

The generated ASan instrumentation always assume the stack pointer is aligned
to at least 8 bytes aligned, and shadow address calculation will be all wrong
if the stack is aligned to 4 bytes.

This is normally masked by the use of fake stacks, which because of how they
are allocated, has at least 64 bytes alignment. But this can be disabled with
ASAN_OPTIONS=detect_stack_use_after_return=0

Here is a repro:

#include 
struct T {
  uint16_t a;
  uint16_t b;
  uint32_t c;
};

int __attribute__((noinline)) g(struct T *__attribute__((unused)) value) {
  return 0;
}

int __attribute__((noinline)) f(int a) {
  struct T value = {a, a * 2, sizeof(a)};
  return g(&value);
}
int main() {
volatile char pad[4]; // for me this pushes sp off 8 bytes alignment,
  // but this can be tricky. maybe i can switch
  // this to a little bit of inline assembly if
  // this has trouble repro-ing
f(10);
}

-- Build with:

i686-unknown-linux-gnu-gcc -mpreferred-stack-boundary=2 -fsanitize=address a.c 

-- Run with:

ASAN_OPTIONS=detect_stack_use_after_return=0 ./a.out

-- Expected behavior:

Program runs fine.

-- Actual behavior:

Initialization of `struct T value` trips ASan, and it calls
__asan_report_store4, which should not be called at all. Additionally, the
stack frame magic is stored at address misaligned with the shadow, so the asan
runtime can't find it and aborts.

-- Additional information:

Tried similar setup on clang. Looks like clang will always realign the stack
pointer to 32 bytes alignment in prologue.

[Bug c++/120878] [12/13/14/15/16 Regression] ICE: in adjust_temp_type, at cp/constexpr.cc:1791

2025-07-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120878

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4
   Target Milestone|--- |12.5

[Bug libstdc++/120949] New: [16 regression] rejected with clang-20.1.7

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
Supported LTO compression algorithms: zlib zstd
gcc version 16.0.0 20250704 (experimental)
d6ed12ed5ebac8e50da9defea6af832039782cbf (Gentoo Hardened 16.0. p, commit
9fdf5a30ded9c691d9fcdb787e72f8dd0f111f8a)
```

[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949

Sam James  changed:

   What|Removed |Added

 CC||redi at gcc dot gnu.org
   Target Milestone|--- |16.0

--- Comment #1 from Sam James  ---
r16-1911-g6596f5ab746533

[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949

--- Comment #2 from Sam James  ---
Created attachment 61798
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61798&action=edit
readable.ii.xz

[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949

--- Comment #3 from Sam James  ---
Not sure if missing something but I see a bunch of the changes in that commit
do it right (?) (as mentioned in the commit message), just some don't follow
that pattern.

[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7

2025-07-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
The old code was using
_GLIBCXX_NODISCARD __attribute__((__always_inline__))
ordering while the new one
__attribute__((__always_inline__)) _GLIBCXX_NODISCARD
Perhaps only the former works with both compilers?

Though
__attribute__((always_inline)) [[nodiscard]] constexpr int foo () { return 42;
}
[[nodiscard]] __attribute__((always_inline)) constexpr int bar () { return 43;
}
struct S {
  template 
  __attribute__((always_inline)) [[nodiscard]] friend inline int baz () {
return 42; }
  template 
  [[nodiscard]] __attribute__((always_inline)) friend inline int qux () {
return 43; }
};
compiles by both gcc 15.1 and clang trunk.

[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Jan Hubicka  ---
Sorry, I meant to look into this - I originally used
RUNTESTFLAGS=tree-prof.exp=afdo-crossmodule-1.c since that is intended to be
the testcase and missed the part that the other file will be handled as
independent test.

I wanted to find way to build it just once and still get dump fils matched, but
do not know how to do that. Your fix is fine. Thanks!
Honza

[Bug target/120900] C++ passes user aligned struct differently from C

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120900

--- Comment #11 from H.J. Lu  ---
(In reply to H.J. Lu from comment #10)
> This makes C similar to C++:
> 
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index 8bbd6ebc66a..0da6c65fc6a 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -5706,8 +5706,17 @@ start_decl (struct c_declarator *declarator, struct
> c_declspecs *declspecs,
>&& !flag_no_common)
>  DECL_COMMON (decl) = 1;
>  
> +  /* Similar to C++, apply any attributes directly to the record or
> + union type.  */
> +  int flags;
> +  if (TREE_CODE (decl) == TYPE_DECL
> +  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl)))
> +flags = ATTR_FLAG_TYPE_IN_PLACE;
> +  else
> +flags = 0;
> +
>/* Set attributes here so if duplicate decl, will have proper attributes.
> */
> -  c_decl_attributes (&decl, attributes, 0);
> +  c_decl_attributes (&decl, attributes, flags);
>  
>/* Handle gnu_inline attribute.  */
>if (declspecs->inline_p

I got the following regressions with the change above:

FAIL: c-c++-common/builtin-has-attribute-7.c  -Wc++-compat  (test for excess
errors)
FAIL: gcc.dg/sso-4.c  (test for errors, line 18)
FAIL: gcc.dg/sso-4.c  (test for errors, line 19)

[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358

--- Comment #25 from Sam James  ---
(In reply to Sam James from comment #24)
> @@ -1882,7 +1881,6 @@ IPA function summary for void ar::bc::operator++()
> [with aq = QStringView]/7
>calls:

Adding __attribute__((noipa)) on that template works.

[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable

2025-07-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948

Richard Biener  changed:

   What|Removed |Added

  Component|c   |tree-optimization
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
So you say that we fail to optimize

int foo (unsigned x)
{
  unsigned tem = 1/x;
  if (x == 0)
return 5;
  return tem;
}

because we turn 1/x into x == 1?

A phase ordering issue, obviously.  Relevant in practice?  I'm not sure.

[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358

--- Comment #24 from Sam James  ---
(In reply to Sam James from comment #11)
> Created attachment 61749 [details]
> small.cxx

OK, on this, with a small adjustment to change the two ""s as args to char*
str1, str2:

--- a/small.cxx.057t.local-fnsummary2
+++ b/small.cxx.057t.local-fnsummary2
@@ -1795,7 +1795,6 @@ IPA function summary for ar::as ar::aw(ao) [with
aq = QStringView]/104 i
   calls:
 long long int QtPrivate::ck(QStringView, long long int)/27 function not
considered for inlining
   freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0
predicate: (op1[offset: 32] >= 0)
-   op1 points to local or readonly memory

 struct as ar::aw (struct ar * const this, struct ao o)
 {
@@ -1882,7 +1881,6 @@ IPA function summary for void ar::bc::operator++()
[with aq = QStringView]/7
   calls:
 long long int QtPrivate::ck(QStringView, long long int)/27 function not
considered for inlining
   freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0
predicate: (op0[ref offset: 256] >= 0)
-   op1 points to local or readonly memory

 void ar::bc::operator++ (struct bc * const this)
 {
diff --git a/small.cxx.088i.fnsummary b/small.cxx.088i.fnsummary
index 3b673e5..09d0885 100644
--- a/small.cxx.088i.fnsummary
+++ b/small.cxx.088i.fnsummary
@@ -233,7 +233,6 @@ IPA function summary for void ar::bc::operator++()
[with aq = QStringView]/7
   calls:
 long long int QtPrivate::ck(QStringView, long long int)/27 function not
considered for inlining
   freq:0.41 loop depth: 0 size: 4 time: 13 predicate: (op0[ref offset:
256] >= 0)
-   op1 points to local or readonly memory


 Analyzing function: decltype (QtPrivate::dc{0, l, df::t ...}) df(aq,
dd, de ...) [with aq = QString; dd = int; de = {am, }]/72

--- a/small.cxx.088i.fnsummary
+++ b/small.cxx.088i.fnsummary
@@ -233,7 +233,6 @@ IPA function summary for void ar::bc::operator++()
[with aq = QStringView]/7
   calls:
 long long int QtPrivate::ck(QStringView, long long int)/27 function not
considered for inlining
   freq:0.41 loop depth: 0 size: 4 time: 13 predicate: (op0[ref offset:
256] >= 0)
-   op1 points to local or readonly memory


 Analyzing function: decltype (QtPrivate::dc{0, l, df::t ...}) df(aq,
dd, de ...) [with aq = QString; dd = int; de = {am, }]/72

diff --git a/small.cxx.089i.inline b/small.cxx.089i.inline
index c31c4f8..abe0280 100644
--- a/small.cxx.089i.inline
+++ b/small.cxx.089i.inline
@@ -182,7 +182,6 @@ IPA function summary for void ar::bc::operator++()
[with aq = QStringView]/7
   calls:
 long long int QtPrivate::ck(QStringView, long long int)/27 function not
considered for inlining
   freq:0.41 loop depth: 0 size: 4 time: 13 callee size:12 stack: 0
predicate: (op0[ref offset: 256] >= 0)
-   op1 points to local or readonly memory

 IPA summary for bn<  >::~bn() [with ag = QString]/68
is missing.
 IPA function summary for bn<  >::~bn() [with ag =
QString]/67 inlinable
@@ -820,7 +819,7 @@ Updated mod-ref summary for int main()/52
 Considering long long int QtPrivate::ck(QStringView, long long int)/27 with 24
size
  to be inlined into void ar::bc::operator++() [with aq = QStringView]/79
in small.cxx:177
  Estimated badness is -0.15, frequency 3.36.
-  Parm map:  -1 -5
+  Parm map:  -1 -1
 Updated mod-ref summary for int main()/52
   loads:
   Base 0: alias set 8

[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen{2,3} since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

H.J. Lu  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #4 from H.J. Lu  ---
(In reply to Richard Biener from comment #3)
> placing sth at the nearest common dominator can increase register pressure
> and cause extra spilling?

Haochen couldn't reproduce this regression on Zen3 client.  Can someone
help him reproduce it?

[Bug tree-optimization/120952] New: GCC fails to optimize division to shift when divisor is non-constant power of two

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952

Bug ID: 120952
   Summary: GCC fails to optimize division to shift when divisor
is non-constant power of two
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

Given std::bit_ceil (which returns the smallest power of two not less than its
argument), I hoped the following would be optimized to use a shift:

#include 
using std::size_t;

size_t get_block_size(size_t bytes, size_t alignment)
{
alignment = std::__bit_ceil(alignment);
if (!std::__has_single_bit(alignment)) // definitely a power of two now!
__builtin_unreachable();
return ((bytes + alignment - 1) / alignment) * alignment;
}

on x86_64 GCC optimizes this to:

"get_block_size(unsigned long, unsigned long)":
mov rax, rdi
mov edi, 1
cmp rsi, 1
jbe .L2
sub rsi, 1
bsr rsi, rsi
lea ecx, [rsi+1]
sal rdi, cl
.L2:
lea rcx, [rdi-1+rax]
xor edx, edx
mov rax, rcx
div rdi
mov rax, rcx
sub rax, rdx
ret

Whereas clang uses a shift:

get_block_size(unsigned long, unsigned long):
mov eax, 1
cmp rsi, 2
jb  .LBB0_2
dec rsi
mov ecx, 127
bsr rcx, rsi
xor ecx, 63
neg cl
shl rax, cl
.LBB0_2:
lea rcx, [rdi + rax]
dec rcx
neg rax
and rax, rcx
ret


If I remove the bit_ceil and just tell the compiler it's already a power of two
we avoid the branch but GCC still uses a shift:

using size_t = decltype(sizeof(0));

size_t get_block_size(size_t bytes, size_t alignment)
{
if (alignment & (alignment - 1))
__builtin_unreachable();
return ((bytes + alignment - 1) / alignment) * alignment;
}


GCC still uses DIV:

"get_block_size(unsigned long, unsigned long)":
lea rax, [rsi-1+rdi]
xor edx, edx
div rsi
lea rax, [rsi-1+rdi]
sub rax, rdx
ret

And Clang still shifts:

get_block_size(unsigned long, unsigned long):
lea rax, [rdi + rsi]
dec rax
rep   bsf rcx, rsi
shr rax, cl
imulrax, rsi
ret

On IRC Andrew P said that we don't track pop count on ssa names.

[Bug libstdc++/118681] [C++17] unsynchronized_pool_resource may fail to respect alignment

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118681

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=120952

--- Comment #9 from Jonathan Wakely  ---
Thanks. I was already thinking we should do this first:

alignment = std::bit_ceil(alignment);

to be safe against callers that violate the precondition to pass a valid
alignment, and then we know it's a power of two. And with your version, we're
not affected by GCC's missed-optimization (filed as Bug 120952).

[Bug tree-optimization/120952] GCC fails to optimize division to shift when divisor is non-constant power of two

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952

--- Comment #1 from Jonathan Wakely  ---
As noted in Bug 118681 comment 8, when we know it's a power of two we can do:

return (bytes + alignment - 1) & ~(alignment - 1);

to get even better codegen.

[Bug jit/120960] jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960

--- Comment #2 from Andrew Pinski  ---
Actually both_p being false or true is both ok here.


#undef DEF_GOACC_BUILTIN_COMPILER
#define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \
  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
   flag_openacc, true, true, ATTRS, false, true)


The only place which uses both_p and fallback_p is in get_asm_name .
```
  const char *get_asm_name () const
  {
if (both_p && fallback_p)
  return name + prefix_len;
else
  return name;
  }
```

Which means if there is a call added to it and it does not get simplified, then
it will call __builtin_.  So the question becomes will anyone uses JIT with
openmp/openacc offloading?

Maybe not so much.

[Bug c++/120961] ICE on C++20 module when compiling Eigen.

2025-07-04 Thread shyeyian at petalmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120961

--- Comment #1 from shyeyian  ---
The attachment with size 8.7MB (zipped into 1.1MB) exceeds the size limit of
gcc-bugzilla requirements. **attachment** has been uploaded to
[attachment-online](https://raw.githubusercontent.com/AnonymousPC/gcc-bug-report/refs/heads/main/gcc-ice-preprocessed.out).

**Thanks :)**

[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread kargls at comcast dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

--- Comment #5 from kargls at comcast dot net ---
(In reply to Martin Jambor from comment #1)
> And indeed the following hack in Fortran FE "fixes" the benchmark (of
> course, this is not meant as a proposed fix, just as a demonstration
> where the problem is):
> 

So, if I understand, you want an fnspec of ". . w w w w w w w".
Can you show f->sym and f->sym-attr from gdb?

Prior to F2008, MPI says the interface is 

USE MPI
! or the older form: INCLUDE 'mpif.h'
MPI_IRECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR)
   BUF(*)
 INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR

with F2008, one has

USE mpi_f08
MPI_Irecv(buf, count, datatype, source, tag, comm, request, ierror)
 TYPE(*), DIMENSION(..), ASYNCHRONOUS :: buf
 INTEGER, INTENT(IN) :: count, source, tag
 TYPE(MPI_Datatype), INTENT(IN) :: datatype
 TYPE(MPI_Comm), INTENT(IN) :: comm
 TYPE(MPI_Request), INTENT(OUT) :: request
 INTEGER, OPTIONAL, INTENT(OUT) :: ierror

in either case an explicit interface is required, and gfortran 
should deal with the ' buf(*)' and 'type(*), dimension(..)'
without your hack.  Of particular interest, is the f->sym->ts.type.

[Bug demangler/67268] demangler does not elide empty argument pack in operator()

2025-07-04 Thread hmeyer.eu at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67268

Henning Meyer  changed:

   What|Removed |Added

 CC||hmeyer.eu at gmail dot com

--- Comment #2 from Henning Meyer  ---
This bug still exists in GCC 15.

I noticed when creating a std::thread

_ZNSt6threadC1IRFvvEJEvEEOT_DpOT0_ demangles to "std::thread::thread(void (&)())"

The simplest example is


template
void funk(Args&&... ) {}

which when called as funk();

mangles to

_Z4funkIJEvEvDpOT_

which demangles to "void funk<, void>()"

For a templated function or method, a template parameter pack may be followed
by further template arguments, including more packs, possibly empty.

When I build the standalone demangler in libiberty, I get this parse tree
./a.out -v _Z4funkIJEvEvDpOT_
typed name
  template
name 'funk'
template argument list
  template argument list
  template argument list
builtin type void
  function type
builtin type void
argument list
  pack expansion
rvalue reference
  template parameter 0
void funk<, void>()

which is not what I would expect. I would expect 

typed name
  template
name 'funk'
template argument list
  template argument list
  builtin type void

for a template argument list consisting of an empty pack and the type void.

Which indicates to me that the bug is in the parsing logic and not in the
formatting logic.

[Bug jit/120960] jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2025-07-04
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
The builtins that matter:
DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)


DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_IS_INITIAL_DEVICE,
   "omp_is_initial_device", BT_FN_INT,
   ATTR_CONST_NOTHROW_LIST)
DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_GET_INITIAL_DEVICE,
   "omp_get_initial_device", BT_FN_INT,
   ATTR_PURE_NOTHROW_LIST)
DEF_GOMP_BUILTIN_COMPILER (BUILT_IN_OMP_GET_NUM_DEVICES, "omp_get_num_devices",
   BT_FN_INT, ATTR_PURE_NOTHROW_LIST)

Which are used for offloading and figuring out where the code is going to be
run on. I am 99% sure if we always enable them in jit it would just work.

So we could do:
#define flag_openacc true
#define flag_openmp true

around the static array.

[Bug c++/120962] " XXX references internal linkage entity YYY" when compiling a module interface

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120962

Andrew Pinski  changed:

   What|Removed |Added

 Blocks||103524

--- Comment #1 from Andrew Pinski  ---
Hmm, can you try gcc 15.1.0 which has a large number of modules fixes?

I am not sure if this is valid or not either.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103524
[Bug 103524] [meta-bug] modules issue

[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 and 470.lbm on Zen5 since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #9 from H.J. Lu  ---
Created attachment 61803
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61803&action=edit
A patch

Please try this.

[Bug c++/120962] " XXX references internal linkage entity YYY" when compiling a module interface

2025-07-04 Thread markus.parker at arcor dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120962

--- Comment #2 from Markus  ---
I would have tried a newer version if I had had easy access to it. The Alpine
distro does not provide newer packages, yet. What I could try was clang, which
compiled the file without any errors.

[Bug c++/37590] g++ should emit different debug info for variable's type

2025-07-04 Thread hmeyer.eu at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37590

Henning Meyer  changed:

   What|Removed |Added

 CC||hmeyer.eu at gmail dot com

--- Comment #9 from Henning Meyer  ---
This is is not a GCC bug, this is a display issue in readelf --debug-dump=info.

This is a display issue specfic to GNU binutils.

If I compile the example with GCC 15 and run it through readelf
--debug-dump=info, I get

 <1><49f2>: Abbrev Number: 102 (DW_TAG_variable)
<49f3>   DW_AT_name: (string) s
<49f5>   DW_AT_decl_file   : (data1) 1
<49f6>   DW_AT_decl_line   : (data1) 2
<49f7>   DW_AT_decl_column : (data1) 13
<49f8>   DW_AT_linkage_name: (strp) (offset: 0x5830): _Z1sB5cxx11
<49fc>   DW_AT_type: (ref4) <0x315a>, string, basic_string, std::allocator >

DW_AT_type points to a typedef which has name string, but is a child of the
namespace DIE with name std, the debug information is correct.

If you use elfutils readelf instead of binutils readelf, the output is 

 [  49f2]variable abbrev: 102
 name (string) "s"
 decl_file(data1) string.cpp (1)
 decl_line(data1) 2
 decl_column  (data1) 13
 linkage_name (strp) "_Z1sB5cxx11"
 type (ref4) [  315a]
 external (flag_present) yes
 location (exprloc) 

it won't show an incomplete name for the typedef.

[Bug tree-optimization/120952] GCC fails to optimize division to shift when divisor is non-constant power of two

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120952

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
   Last reconfirmed||2025-07-04
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
.

[Bug target/120782] RISC-V: vector-strict-align not working for spec17 521 ref size

2025-07-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120782

--- Comment #5 from Robin Dapp  ---
I tried reproducing this with a recent trunk (r16-1965-gc512c9090f52e7) but
didn't see the exact code sequence. wrf also ran to completion on the Banana
Pi.

Did you use a stock GCC 15.1 or a specific commit?

[Bug other/120963] New: Missed optimization of (a > b ? a - b : 0) pattern

2025-07-04 Thread Explorer09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120963

Bug ID: 120963
   Summary: Missed optimization of (a > b ? a - b : 0) pattern
   Product: gcc
   Version: 15.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

This could be multiple missed optimization issues in one same test code. I just
don't know how to reduce the issues to even simpler test cases.

```c
#include 

int32_t func1a(int32_t a, int32_t b) {
if (b - a < 0)
a = b;
return b - a;
}
int32_t func1b(int32_t a, int32_t b) {
if (b < a)
a = b;
return b - a;
}
int32_t func1c(int32_t a, int32_t b) {
if (b < a)
return 0;
return b - a;
}
```

x86-64 gcc 15.1 with `-Os -fno-wrapv` options produces the following assembly:

```x86asm
func1a:
subl%edi, %esi
movl$0, %eax
cmovns  %esi, %eax
ret
func1b:
cmpl%edi, %esi
movl%esi, %eax
cmovle  %esi, %edi
subl%edi, %eax
ret
func1c:
movl%esi, %eax
xorl%edx, %edx
subl%edi, %eax
cmpl%edi, %esi
cmovl   %edx, %eax
ret
```

Note that the three functions are equivalent, and yet the generated assembly
codes are different, and none of them are optimal.

The optimal code, AFAIK, is this:

```x86asm
func1:
xorl%eax, %eax
subl%edi, %esi
cmovg   %esi, %eax
ret
```

When compared with the assembly of three functions above:

* func1a missed the part that the `movl $0, %eax` can be simplified to `xorl   
%eax, %eax` if the instruction is performed _before_ the `sub` instruction, the
condition code of which is checked later.
* As signed integer overflow is undefined behavior in C, the `cmovns` in func1a
could as well be transformed to `cmovg`. (This optimization should be disabled
for `-fwrapv`.)
* func1c missed that the `sub` and `cmp` could be merged to one operation. This
can also save the `mov` instruction at the beginning. (I have reported Bug
113680 as similar to this issue.)

[Bug testsuite/120859] FAIL: gcc.dg/tree-prof/afdo-crossmodule-1b.c compilation

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120859

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #5 from Andrew Pinski  ---
Will handle this tomorrow or later tonight ...

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

--- Comment #9 from Andrew Pinski  ---
Created attachment 61804
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61804&action=edit
V2 of the patch

ok, so as I originally was going to do before I thought changing EQ_EXPR over
to ORDERED_EXPR would fix the issue. That is create the comparison in its own
statement.

[Bug c++/120953] New: Accepts invalid with range for

2025-07-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953

Bug ID: 120953
   Summary: Accepts invalid with range for
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

struct S { int a; long b; short c; };
constexpr S d = { 1, 2, 3 }, e = { 4, 5, 6 }, f = { 7, 8, 9 };

void
foo ()
{
  long long r = 0;
  constexpr S h[] = { d, e, f };
  for (static auto &g : h)
r += g.a + g.b + g.c;
}

is accepted by g++ and rejected by clang++:
loop variable 'g' may not be declared 'static'
https://eel.is/c++draft/stmt.ranged#2
says that only type-specifier and constexpr can be present, so wonder what all
we are missing to complain about and whether because we've been accepting it it
should be just pedwarn or error.

[Bug c++/120954] New: [15 Regression] False positive -Warray-bounds=2 warning

2025-07-04 Thread sirl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954

Bug ID: 120954
   Summary: [15 Regression] False positive -Warray-bounds=2
warning
   Product: gcc
   Version: 15.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sirl at gcc dot gnu.org
  Target Milestone: ---

This small code snippet warns with r15-9921 (r15-9866 was still OK):

static const int map01[32] = { 11, 12, 13, 14, 15 };
static const int map02[32] = { 21, 22, 23, 24, 25 };
static const int map03[32] = { 31, 32, 33, 34, 35 };
static const int map11[32] = { 111, 112, 113, 114, 115 };
static const int map12[32] = { 121, 122, 123, 124, 125 };
static const int map13[32] = { 131, 132, 133, 134, 135 };

int test(int n, int ver)
{
int r = 0;
if (n >= 0 && n < 32) {
r = ((ver >= 4) ? ((ver >= 0x65) ? map01 : map02 ) : map03)[n];
} else if (n >= 0x100 && n < 0x120) {
r = ((ver >= 4) ? ((ver >= 0x65) ? map11 : map12 ) : map13)[n -
0x100];
};
return r;
}

# g++-15 -c -O2 -Warray-bounds=2 test-Warray-bounds.cpp
test-Warray-bounds.cpp: In function 'int test(int, int)':
test-Warray-bounds.cpp:16:86: warning: intermediate array offset -256 is
outside array bounds of 'const int [32]' [-Warray-bounds=]
   16 | r = ((ver >= 4) ? ((ver >= 0x65) ? map11 : map12 ) :
map13)[n - 0x100];
  |
~^

This one also might have been caused by

commit r15-9896-g7fdf47538a659f6af8dadbecbb63c8a226b63754
Author: Jakub Jelinek 
Date:   Tue Jul 1 15:28:10 2025 +0200

c++: Fix up cp_build_array_ref COND_EXPR handling [PR120471]

But this time I'm not very confident about that. Compiling this a C code
doesn't warn.

[Bug c++/120955] New: 50 % increase in data segment size on avr-gcc for -Os

2025-07-04 Thread fiesh at zefix dot tv via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955

Bug ID: 120955
   Summary: 50 % increase in data segment size on avr-gcc for -Os
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fiesh at zefix dot tv
  Target Milestone: ---

For our software that controls various machinery in our factory, we have the
following sizes:

Commit 0c83096f19b:
   textdata bss dec hex filename
 21283013983760  217988   35384 /p_o2/phacility
   textdata bss dec hex filename
 15242018343767  158021   26945 /p_os/phacility

Commit 12de1942a0a:
   textdata bss dec hex filename
 25009011403760  254990   3e40e /p_o2/phacility
   textdata bss dec hex filename
 14211227423767  148621   2448d /p_os/phacility

(o2 refers to an -O2 build with LTO, os refers to an -Os build with LTO)

So 12de1942a0a caused:

For O2:
* The text section to become 17.5 % larger
* The data section to become 18.5 % smaller

For Os:
* The text section to become 6.8 % smaller
* The data section to become 49.5 % larger

Please note that since this is a Harvard architecture, data and bss occupy RAM
and thus reduce the available stack size.

(In our case, we now can use neither O2 nor Os.  One has too big a text
segment, the other has data + bss leave too little stack for our software to
work.)

Alas I do not have a single translation unit to reproduce this easily.  But I'm
happy to share the project code that reproduces this to anyone who can do
something with it or help in any other way.  Please let me know and sorry for
the lack of a testcase.

[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954

--- Comment #1 from Sam James  ---
(In reply to Franz Sirl from comment #0)
> But this time I'm not very confident about that. Compiling this a C code
> doesn't warn.

It does for me, and it warns before Jakub's change.

[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954

--- Comment #2 from Sam James  ---
(In reply to Franz Sirl from comment #0)
> This small code snippet warns with r15-9921 (r15-9866 was still OK):
> 

r15-9873-g06a26f4d643a5d warns for me.

```
$ git shortlog r15-9866-g8d600e98004b63..r15-9873-g06a26f4d643a5d

Eric Botcazou (3):
  Fix misoptimization of CONSTRUCTOR with reverse SSO
  Ada: Fix assertion failure on problematic container aggregate
  Fix compilation of concatenation with illegal character constant

GCC Administrator (2):
  Daily bump.
  Daily bump.

Harald Anlauf (2):
  Fortran: fix checking of renamed-on-use interface name [PR120784]
  Fortran: follow-up fix to checking of renamed-on-use interface name
[PR120784]
```

It could be Eric's gimple-fold change but that looks unlikely to me. Are you
sure your references are right, given it warns for me with C too?

> But this time I'm not very confident about that. Compiling this a C code
> doesn't warn.

[Bug c++/120955] [15/16 Regression] 50 % increase in data segment size on avr-gcc for -Os

2025-07-04 Thread fiesh at zefix dot tv via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955

--- Comment #1 from fiesh at zefix dot tv ---
12de1942a0a is r15-6052-g12de1942a0a673

[Bug c++/120953] Accepts invalid with range for

2025-07-04 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953

Jason Merrill  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2025-07-04

--- Comment #1 from Jason Merrill  ---
In cp_parser_init_declarator we've decided whether it's a range-for and still
have the declspecs, so we could scan them at that point.

pedwarn seems fine for this.

[Bug other/120963] Missed optimization of (a > b ? a - b : 0) pattern

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120963

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||3507

--- Comment #1 from Andrew Pinski  ---
I have not fully looked but I suspect PR 3507 is most of the issue.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3507
[Bug 3507] appalling optimisation with sub/cmp on multiple targets

[Bug libstdc++/119742] [C++26] Implement P2697R1, Interfacing bitset with string_view

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119742

--- Comment #4 from Jonathan Wakely  ---
I've just noticed that the debug mode bitset in  needs the new
constructor added.

make check RUNTESTFLAGS="conformance.exp=*bitset*
--target_board=unix/-D_GLIBCXX_DEBUG"

FAIL: 20_util/bitset/cons/string_view.cc  -std=gnu++26 (test for excess errors)

[Bug c++/84009] No diagnostic issued if the decl-specifier in the decl-specifier-seq of a for-range-declaration is register, static,or thread_local

2025-07-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84009

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Jakub Jelinek  ---
Created attachment 61802
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61802&action=edit
gcc16-pr84009.patch

Untested fix.

[Bug target/118241] RISC-V ICE: internal compiler error: in int_mode_for_mode, at stor-layout.cc:407 caused by prefetch instructions

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118241

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Vineet Gupta :

https://gcc.gnu.org/g:f2a3ab7ebf3c40da77f54e8329272fe048ec48a6

commit r16-2032-gf2a3ab7ebf3c40da77f54e8329272fe048ec48a6
Author: Vineet Gupta 
Date:   Fri Jul 4 12:33:51 2025 -0700

RISC-V: prefetch: fix LRA failing to allocate reg [PR118241]

prefetch was recently fixed/tightened (with Q reg constraint) to only
support right address patterns (REG or REG+D with lower 5 bits clear).
However in some cases that's too restrictive for LRA and it fails to
allocate a reg resulting in following ICE...

| gcc/testsuite/gcc.target/riscv/pr118241-b.cc:31:19: error: unable to
generate reloads for:
|   31 | void m() { a.l(); }
|  |   ^
|(insn 26 25 27 7 (prefetch (mem/f:DI (plus:DI (reg/f:DI 143 [ _5 ])
|(const_int 56 [0x38])) [5 _5->batch[6]+0 S8 A64])
|(const_int 0 [0])
|(const_int 3 [0x3]))
"gcc/testsuite/gcc.target/riscv/pr118241-b.cc":18:29 498 {prefetch}
| (expr_list:REG_DEAD (reg/f:DI 142 [ _5->batch[6] ])
|(nil)))
|during RTL pass: reload

Fix that by providing a fallback alternative register constraint to reload
the address.

PR target/118241

gcc/ChangeLog:

* config/riscv/riscv.md (prefetch): Add alternative "r".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr118241-b.cc: New test.

Signed-off-by: Vineet Gupta 

[Bug c++/120961] ICE on C++20 module when compiling Eigen.

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120961

--- Comment #2 from Andrew Pinski  ---
Created attachment 61801
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61801&action=edit
testcase compressed using xz

xz is able to compress this below the limit.

[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 and 470.lbm on Zen5 since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

Filip Kastl  changed:

   What|Removed |Added

Summary|[16 Regression] 24-40%  |[16 Regression] 24-40%
   |slowdown of 519.lbm_r on|slowdown of 519.lbm_r on
   |Zen2 since  |Zen2 and 470.lbm on Zen5
   |r16-1644-gaba3b9d3a48a07|since
   ||r16-1644-gaba3b9d3a48a07

--- Comment #8 from Filip Kastl  ---
The same commit (r16-1644-gaba3b9d3a48a07) causes ~20% slowdown of 470lbm from
2006 SPEC on Zen5 with -Ofast -march=native -flto -fprofile-use.

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1283.240.0

[Bug target/118891] [14/15/16 regression] gcc 14 fails to build from source on aarch64_be: "error: ‘dynamic_cast’ not permitted with ‘-fno-rtti’"

2025-07-04 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118891

Richard Sandiford  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #23 from Richard Sandiford  ---
I've posted a couple of patches that should help with this:

* https://gcc.gnu.org/pipermail/gcc-patches/2025-July/688599.html
* https://gcc.gnu.org/pipermail/gcc-patches/2025-July/688605.html

[Bug target/120782] RISC-V: vector-strict-align not working for spec17 521 ref size

2025-07-04 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120782

--- Comment #6 from Li Pan  ---
(In reply to Robin Dapp from comment #5)
> I tried reproducing this with a recent trunk (r16-1965-gc512c9090f52e7) but
> didn't see the exact code sequence. wrf also ran to completion on the Banana
> Pi.
> 
> Did you use a stock GCC 15.1 or a specific commit?

Yes, I use 15.1 with rv64gcvb can hit that code sequenece, but it disappear
when building with rv64gcvb_zvl256b.

riscv64-linux-gnu-gcc (GCC) 15.1.0
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


riscv64-linux-gnu-g++ (GCC) 15.1.0
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965

--- Comment #2 from Jan Hubicka  ---
This is likely ipa-cp heuristics issue which decides to clone now but after all
the benefits are not really visible.

[Bug tree-optimization/119965] [16 Regression] 531.deepsjeng_r binary is 50% bigger since r16-116-gcfb04e0de6aa43

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119965

--- Comment #3 from Jan Hubicka  ---
There is also 3% performance regressions that got lost on transition to ne PR
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=958.387.0

[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable

2025-07-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948

--- Comment #4 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #1)
> So you say that we fail to optimize
> 
> int foo (unsigned x)
> {
>   unsigned tem = 1/x;
>   if (x == 0)
> return 5;
>   return tem;
> }
> 
> because we turn 1/x into x == 1?
> 
> A phase ordering issue, obviously.  Relevant in practice?  I'm not sure.

We could surely defer the optimization until GIMPLE and optimize 1/x into ({ if
(!x) __builtin_unreachable (); 1}).  The question is if it is worth it, plus
having conditionals in the IL created by match.pd is difficult.
Perhaps we want some variant way of representing very simple if (!cond)
__builtin_unreachable (); as IFN_RANGE (x_123, ...); where ... would somehow
represent what range x_123 can have.  And a question where to drop those as
well.

[Bug tree-optimization/120929] [16 Regression] file-5.45 triggers _FORTIFY_SOURCE false positives since r16-1905-g7165ca43caf470

2025-07-04 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120929

--- Comment #21 from Siddhesh Poyarekar  ---
(In reply to Richard Biener from comment #20)
> so for
> 
>  _1 = _2;
> 
> we merge from _2.  For
> 
>  _1 = *_2;
> 
> we _also_ merge from _2.  But those are semantically not the same!

Yes, it only "makes sense" in the context of .ACCESS_WITH_SIZE as Qing
originally conjectured, because .ACCESS_WITH_SIZE for _1 is stored in the
context of &_1.

> IMO this change was bogus and should be reverted.

I'm testing a simple fix that constrains this to just .ACCESS_WITH_SIZE. 
Hopefully that should avoid the need to revert.

[Bug c++/120955] [15/16 Regression] 50 % increase in data segment size on avr-gcc for -Os

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120955

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |15.2
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2025-07-04
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=120702
 Ever confirmed|0   |1
  Component|middle-end  |c++

--- Comment #2 from Andrew Pinski  ---
Without a testcase it is hard to see what is going wrong.

It could be similar to PR 120702 or not; especially when it comes to constexpr.

[Bug c++/120953] Accepts invalid with range for

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120953

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 84009 ***

[Bug c++/84009] No diagnostic issued if the decl-specifier in the decl-specifier-seq of a for-range-declaration is register, static,or thread_local

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84009

Andrew Pinski  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Andrew Pinski  ---
*** Bug 120953 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

--- Comment #5 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #4)
> Better reduced testcase:
> ```
> double f(double r, double i) {
>   return__builtin_fmod(r, i);
> }
> ```

```
double f(double r, double i) {
   return __builtin_fmod(r, i);
}
```
`-O1  -fnon-call-exceptions -fsignaling-nans `

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

--- Comment #4 from Andrew Pinski  ---
Better reduced testcase:
```
double f(double r, double i) {
  return__builtin_fmod(r, i);
}
```

[Bug tree-optimization/120929] [16 Regression] file-5.45 triggers _FORTIFY_SOURCE false positives since r16-1905-g7165ca43caf470

2025-07-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120929

--- Comment #20 from Richard Biener  ---
(In reply to qinzhao from comment #16)
> (In reply to Siddhesh Poyarekar from comment #12)
> > This is interesting here's the IR dump right after objsz:
> > 
> > The key bit is:
> > 
> >   map2_4 = __builtin_malloc (8);
> >   pin_pointer (&buf);
> >   _1 = &map2_4->magic;
> >   _9 = __builtin_malloc (9);
> >   *_1 = _9;
> >   goto ; [100.00%]
> > 
> >[local count: 1073741824]:
> >   b = "";
> >   ptr_10 = *_1;
> >   _11 = 8;
> >   __builtin___memcpy_chk (ptr_10, &b, 9, _11);
> > 
> > where *_1 gets updated to _9, but when one follows the *_1 through ptr_10,
> > it doesn't end up with _9, the def statement is:
> > 
> >   _1 = &map2_4->magic;
> > 
> > which leads to the incorrect value for the object size.  This is because the
> > pass doesn't know that a MEM_REF could be the LHS (for the zero byte offset
> > case like it is here) for an assignment, i.e. this:
> > 
> >   *_1 = _9;
> 
> There are two possible solutions to the above issue:
> 
> A. Add handling for *_1 = _9 to enable the object size propagate through the
> correct data flow path.
> Or
> B. in my latest change that triggered this issue:
> 
> /* Handle the following stmt #2 to propagate the size from the
>stmt #1 to #3:
> 1  _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B);
> 2  _5 = *_1;
> 3  _6 = __builtin_dynamic_object_size (_5, 1);
>  */
> else if (TREE_CODE (rhs) == MEM_REF
>  && POINTER_TYPE_P (TREE_TYPE (rhs))
>  && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME
>  && integer_zerop (TREE_OPERAND (rhs, 1)))
>   reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs,
> 0));
> 
> 
> I feel that propagating the size through _5 = *_1 might not be correct in
> general, we should only limit it to the case when the RHS is a pointer
> defined by .ACCESS_WITH_SIZE?
> 
> what do you think?

I'm confused as to what this does for the case in question.  The IL is

  _1 = &map2_4->magic;
  _9 = __builtin_malloc (9);
  *_1 = _9;
  ptr_10 = *_1;
  _11 = __builtin_dynamic_object_size (ptr_10, 0);

ptr_10 is equal to _9, but the objsize dump suggests we are instead
looking at the size of what _1 points to, instead of the size of what
*_1 points to?!  That would be completely incorrect of course!

Also if that's not the case but we still look at *_1 but with _1
from the earlier def that cannot be done either because you skipped
an intermediate definition of *_1.

I think the code is obviously bogus.  Quoting more in full:

else if (gimple_assign_single_p (stmt)
 || gimple_assign_unary_nop_p (stmt))
  {
if (TREE_CODE (rhs) == SSA_NAME 
&& POINTER_TYPE_P (TREE_TYPE (rhs)))
  reexamine = merge_object_sizes (osi, var, rhs);
/* Handle the following stmt #2 to propagate the size from the
   stmt #1 to #3:
1  _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B);
2  _5 = *_1;
3  _6 = __builtin_dynamic_object_size (_5, 1);
 */
else if (TREE_CODE (rhs) == MEM_REF 
 && POINTER_TYPE_P (TREE_TYPE (rhs))
 && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME
 && integer_zerop (TREE_OPERAND (rhs, 1)))
  reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs, 0));

so for

 _1 = _2;

we merge from _2.  For

 _1 = *_2;

we _also_ merge from _2.  But those are semantically not the same!

IMO this change was bogus and should be reverted.

[Bug c/120951] New: error: gimple cond condition cannot throw

2025-07-04 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

Bug ID: 120951
   Summary: error: gimple cond condition cannot throw
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Created attachment 61799
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61799&action=edit
C source code

The attached code does this with recent gcc:

foundBugs $ ~/gcc/results/bin/gcc -c  -O1 -fnon-call-exceptions
-fsignaling-nans  bug1107.c
floatobj.c: In function ‘calc2_double’:
floatobj.c:315:53: error: gimple cond condition cannot throw
if (r_48 == r_48)
during GIMPLE pass: cdce
floatobj.c:315:53: internal compiler error: verify_gimple failed

foundBugs $ ~/gcc/results/bin/gcc -c  -O1 -fnon-call-exceptions   bug1107.c
foundBugs $ ~/gcc/results/bin/gcc -c  -O1 -fsignaling-nans   bug1107.c
foundBugs $ ~/gcc/results/bin/gcc -c   -fsignaling-nans -fnon-call-exceptions 
bug1107.c
foundBugs $

[Bug c/120951] error: gimple cond condition cannot throw

2025-07-04 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

David Binderman  changed:

   What|Removed |Added

   Keywords||needs-bisection

--- Comment #1 from David Binderman  ---
Reduced code seems to be:

typedef struct {
  double real
} Float;
double float_from_double_inplace_r;
double fmod(double, double);
void float_from_double_inplace() {
  float_from_double_inplace_r = fmod(((Float *)0)->real, ((Float *)0)->real);
}

[Bug tree-optimization/120916] debug line info for IV increment is lost

2025-07-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120916

--- Comment #7 from Jan Hubicka  ---
LLVM also gets execution counts wrong, just the different (and less harmful)
way:

test:270773509:9780
 1: 9116
 2: 51984  for (
 4: 51984   iThis Inner Loop Header: Depth=1
.loc0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15
movdqa  (%rsi,%rdi), %xmm1  
movdqa  16(%rsi,%rdi), %xmm2
psubd   %xmm0, %xmm1
psubd   %xmm0, %xmm2
movdqa  %xmm1, (%rsi,%rdi)  
movdqa  %xmm2, 16(%rsi,%rdi)
.loc0 9 15 discriminator 33 # ll.c:9:15
addq$32, %rsi   
cmpq%rsi, %rdx  
jne .LBB0_4 

So it has only line 9 and 10. Large discriminator numbers seems to be FS
discriminator encoding.  LLVM assigns discriminators twice.  First one is done
similarly as we do, but scaled up.  

I think it is supposed to handle when statement gets duplicated into multiple
basic blocks, like a[i]++ does.  So it has:

.loc0 10 15 is_stmt 1 discriminator 33 # ll.c:10:15
movdqa  (%rsi,%rdi), %xmm1  
movdqa  16(%rsi,%rdi), %xmm2
psubd   %xmm0, %xmm1
psubd   %xmm0, %xmm2
movdqa  %xmm1, (%rsi,%rdi)  
movdqa  %xmm2, 16(%rsi,%rdi)

for the vectorized body and

.loc0 10 15 is_stmt 1   # ll.c:10:15
leaq(%rcx,%rdx,4), %rdi 
incl(%rsi,%rdi) 

for epilogue. Tool has -fuse_discriminator_encoding option which then merges
values back.  I will look into what this really does.

[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable

2025-07-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
(In reply to huyubiao from comment #2)
> I believe such non-compliant code makes maintenance troubleshooting
> significantly harder. Could we implement a compiler warning when x=0
> potentially occurs?

No.  That is the most common case that the compiler doesn't know if divisor can
or can't be 0.  Such warning would trigger all the time.

[Bug tree-optimization/120944] [12/13/14/15/16 Regression] Incorrect optimization with accessing a volatile structure member

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120944

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:6ed1e2ae1a742d859c2dd74c9e7cebdd3618e8b1

commit r16-2019-g6ed1e2ae1a742d859c2dd74c9e7cebdd3618e8b1
Author: Richard Biener 
Date:   Fri Jul 4 09:08:19 2025 +0200

tree-optimization/120944 - bogus VN with volatile copies

The following avoids translating expressions through volatile
copies.

PR tree-optimization/120944
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Gate optimizations
invalid when volatile is involved.

* gcc.dg/torture/pr120944.c: New testcase.

[Bug tree-optimization/120944] [12/13/14/15 Regression] Incorrect optimization with accessing a volatile structure member

2025-07-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120944

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to work||16.0
  Known to fail||15.1.0
Summary|[12/13/14/15/16 Regression] |[12/13/14/15 Regression]
   |Incorrect optimization with |Incorrect optimization with
   |accessing a volatile|accessing a volatile
   |structure member|structure member

[Bug target/120956] New: [16 Regression] 6% slowdown of 503.bwaves_r since

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120956

Bug ID: 120956
   Summary: [16 Regression] 6% slowdown of 503.bwaves_r since
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pheeck at gcc dot gnu.org
  Target Milestone: ---

[Bug target/120956] [16 Regression] 6% slowdown of 503.bwaves_r since

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120956

Filip Kastl  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Filip Kastl  ---
sorry, please ignore this

[Bug c++/120575] [15/16 Regression] ICE: in cp_parser_abort_tentative_parse, at cp/parser.cc:36574

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120575

--- Comment #2 from GCC Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:35d6f55f7d6655a8683b45286283d44674fa997e

commit r16-2024-g35d6f55f7d6655a8683b45286283d44674fa997e
Author: Jason Merrill 
Date:   Fri Jul 4 05:15:00 2025 -0400

c++: -Wtemplate-body and tentative parsing [PR120575]

Here we were asserting non-zero errorcount, which is not the case if the
parse error was reduced to a warning (or silenced) in a template body.  So
check seen_error instead.

PR c++/120575
PR c++/116064

gcc/cp/ChangeLog:

* parser.cc (cp_parser_abort_tentative_parse): Check seen_error
instead of errorcount.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error3.C: New test.

[Bug c++/116064] [15 Regression] SPEC 2017 523.xalancbmk_r failed to build

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064

--- Comment #16 from GCC Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:35d6f55f7d6655a8683b45286283d44674fa997e

commit r16-2024-g35d6f55f7d6655a8683b45286283d44674fa997e
Author: Jason Merrill 
Date:   Fri Jul 4 05:15:00 2025 -0400

c++: -Wtemplate-body and tentative parsing [PR120575]

Here we were asserting non-zero errorcount, which is not the case if the
parse error was reduced to a warning (or silenced) in a template body.  So
check seen_error instead.

PR c++/120575
PR c++/116064

gcc/cp/ChangeLog:

* parser.cc (cp_parser_abort_tentative_parse): Check seen_error
instead of errorcount.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error3.C: New test.

[Bug c++/116064] [15 Regression] SPEC 2017 523.xalancbmk_r failed to build

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064

--- Comment #17 from GCC Commits  ---
The releases/gcc-15 branch has been updated by Jason Merrill
:

https://gcc.gnu.org/g:799dfe7c5f638a645d33b47750f797b3fb87329b

commit r15-9927-g799dfe7c5f638a645d33b47750f797b3fb87329b
Author: Jason Merrill 
Date:   Fri Jul 4 05:15:00 2025 -0400

c++: -Wtemplate-body and tentative parsing [PR120575]

Here we were asserting non-zero errorcount, which is not the case if the
parse error was reduced to a warning (or silenced) in a template body.  So
check seen_error instead.

PR c++/120575
PR c++/116064

gcc/cp/ChangeLog:

* parser.cc (cp_parser_abort_tentative_parse): Check seen_error
instead of errorcount.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error3.C: New test.

(cherry picked from commit 35d6f55f7d6655a8683b45286283d44674fa997e)

[Bug c++/120575] [15/16 Regression] ICE: in cp_parser_abort_tentative_parse, at cp/parser.cc:36574

2025-07-04 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120575

Jason Merrill  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jason Merrill  ---
Fixed for 15.2/16.

[Bug tree-optimization/120948] Cannot detect potential division-by-zero when numerator is 1 and denominator is variable

2025-07-04 Thread h13958451065 at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120948

--- Comment #2 from huyubiao  ---
(In reply to Richard Biener from comment #1)
> So you say that we fail to optimize
> 
> int foo (unsigned x)
> {
>   unsigned tem = 1/x;
>   if (x == 0)
> return 5;
>   return tem;
> }
> 
> because we turn 1/x into x == 1?
> 
> A phase ordering issue, obviously.  Relevant in practice?  I'm not sure.

I believe such non-compliant code makes maintenance troubleshooting
significantly harder. Could we implement a compiler warning when x=0
potentially occurs?

[Bug middle-end/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2025-07-04
   Keywords|needs-bisection |ice-on-valid-code
 Ever confirmed|0   |1
Summary|error: gimple cond  |[16 regression] error:
   |condition cannot throw  |gimple cond condition
   ||cannot throw
 CC||pinskia at gcc dot gnu.org
   Target Milestone|--- |16.0
 Status|UNCONFIRMED |ASSIGNED
  Component|c   |middle-end

--- Comment #2 from Andrew Pinski  ---
I suspect this started to ICEing when the verification was added.

I will take a look soon.

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

Andrew Pinski  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization

--- Comment #3 from Andrew Pinski  ---
Note the bug is in the cdce pass where it creates GIMPLE_COND. When the
condition can throw, a temporary needs to be used.

[Bug target/120959] New: [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5 since r16-1645-g309dbcea2cabb3

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120959

Bug ID: 120959
   Summary: [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5
since r16-1645-g309dbcea2cabb3
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pheeck at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1244.527.0

there was a 9% exec time slowdown of the 549.fotonik3d_r SPEC 2017
benchmark when run with -O2 -flto -fprofile-use (generic march) on an AMD Zen 5
machine.
I bisected it to r16-1645-g309dbcea2cabb3.

commit 309dbcea2cabb31bde1a65cdfd30bb7f87b170a2
Author: Tamar Christina 
AuthorDate: Tue Jun 24 07:13:22 2025 +0100
Commit: Tamar Christina 
CommitDate: Tue Jun 24 07:13:22 2025 +0100

middle-end: replace log_vf usages with vf to allow support for non-power of
two vf


This is a regression against GCC 15.  If I measure manually the most recent
commit of releases/gcc-15, I get the same exec time as before Tamar's commit on
trunk.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/120959] [16 Regression] 9% slowdown of 549.fotonik3d_r on Zen5 since r16-1645-g309dbcea2cabb3

2025-07-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120959

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #1 from Tamar Christina  ---
OK, Will take a look next monday.

[Bug target/120957] [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957

Filip Kastl  changed:

   What|Removed |Added

   Target Milestone|--- |16.0

[Bug target/120957] New: [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957

Bug ID: 120957
   Summary: [16 Regression] 6-9% slowdown of 503.bwaves_r on
Zen{2,3} since r16-1647-gc06979ff957485
   Product: gcc
   Version: 16.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pheeck at gcc dot gnu.org
CC: liuhongt at gcc dot gnu.org
Blocks: 26163
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

As seen here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.427.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.427.0

there was a 6% exec time slowdown of the 503.bwaves SPEC 2017
benchmark when run with -Ofast -march=native on an AMD Zen2 machine, 9%
slowdown on an AMD Zen3 machine.
I bisected this to r16-1647-gc06979ff957485 (2025-06-24).

c06979ff95748559da0c2d3aa4eda9d5999eaaf6 is the first bad commit
commit c06979ff95748559da0c2d3aa4eda9d5999eaaf6
Author: hongtao.liu 
Date:   Wed Mar 5 12:25:32 2025 +0100

Don't duplicate setup code cost when do group-candidate cost calucalution.

-  /* Uses in a group can share setup code, so only add setup cost once. 
*/
-  cost -= cost.scratch;

It looks like the original code took into account avoiding double
counting, but unfortunately cost is reset inside the follow loop which
invalidates the upper code, and makes same setup code cost duplicated in
each use of the group.

The patch fix the issue. It can also improve 548.exchange_r by 6% with
-march=x86-64-v3 -O2 due to better ivopt on EMR.

No big performance impact for SPEC2017 on graviton4/SPR with -mcpu=native
-Ofast -fomit-framepointer -flto=auto.

gcc/ChangeLog:

PR target/115842
* tree-ssa-loop-ivopts.cc (determine_group_iv_cost_address):
Don't recalculate inv_expr when group-candidate cost
calucalution.

 gcc/tree-ssa-loop-ivopts.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)
bisect found first bad commit
Connection to tiber.arch.suse.cz closed.


This is a regression against GCC 15. See the comparison (Zen2)
here:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=1070.427.0&plot.1=1219.427.0&plot.2=295.427.0&;


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug fortran/120958] New: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

Bug ID: 120958
   Summary: tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in
Fortran 77 because of wrong fnspec
   Product: gcc
   Version: 15.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Since my commit r14-5831-gaae723d360ca26 (Martin Jambor: sra: SRA of
non-escaped aggregates passed by reference to calls), gcc produces a
non-workinfg MPI version of the CG benchmark from NAS Parallel
Benchmarks version 3.3.1, which is written in Fortran 77. (The
benchmark has been re-written in a newer version of Fortran in version
3.4 of the suite and I suspect that one no longer has this problem).

The problem is that tree-sra is told by escape analysis that the
address of the first parameter of mpi_irecv does not escape.  And so
the aggregate passed in that parameter is an SRA candidate and is
broken down into scalar components and these are reloaded immediately
after the function returns returns and not after a call of mpi_wait.

The reason why escape analysis says that is that fnspec of mpi_irecv,
is ". w w w w w w w w " which indeed says (the first w) that the first
parameter does not escape.

This fnspec is created by function in gcc/fortran/trans-types.cc
which, AFAICT, simply deduces it from the call statement in the
benchmark source (but I may be easily wrong here).

My first impression was that this is simply a limitation of Fortran 77
and asynchronous MPI simply cannot work in this language standard.
However, Richi pointed out that there must be a lot of Fortran 77 code
using asynchronous MPI that we do not want to break, which is a
reasonable point of view.

The benchmark can be downloaded from
https://www.nas.nasa.gov/software/npb.html.  I have used mpich 4.1.2
MPI implementation from openSUSE Leap 15.6 and my configuration file
config/make.def is:

--
## Compiler
MPIF77 = mpif77
MPICC  = mpicc

# libhugetlbfs relinking
LHBDT = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=BDT
LHB = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-link=B
LHALIGN = -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

# Fortran Optimisation
FLINK  = mpif77
F_LIB  = $(LHRELINK) $(LHLIB)
F_INC  =
FFLAGS = -O3  -mcmodel=large  -g -fallow-argument-mismatch
-fallow-invalid-boz -m64
FLINKFLAGS = -O3  -lmpi -g -fallow-argument-mismatch
-fallow-invalid-boz -mcmodel=large -m64 $(LHRELINK) $(LHLIB)

# C Optimisation
CLINK  = mpicc
C_LIB  = $(LHRELINK) $(LHLIB)
C_INC  =
CFLAGS = -O3  -mcmodel=large -m64
CLINKFLAGS = -O3  -lmpi -mcmodel=large -m64 $(LHRELINK) $(LHLIB)

# Other
UCC= mpicc
BINDIR = ../bin
RAND   = randi8
WTIME  = wtime.c
--

The problematic variable which is SRAed is norm_temp2 defined on line:

  double precision   norm_temp1(2), norm_temp2(2)

and then used in code snippet:

 do i = 1, l2npcols
if (timeron) call timer_start(t_ncomm)
call mpi_irecv( norm_temp2,
 >  2,
 >  dp_type,
 >  reduce_exch_proc(i),
 >  i,
 >  mpi_comm_world,
 >  request,
 >  ierr )
call mpi_send(  norm_temp1,
 >  2,
 >  dp_type,
 >  reduce_exch_proc(i),
 >  i,
 >  mpi_comm_world,
 >  ierr )
call mpi_wait( request, status, ierr )
if (timeron) call timer_stop(t_ncomm)

norm_temp1(1) = norm_temp1(1) + norm_temp2(1)
norm_temp1(2) = norm_temp1(2) + norm_temp2(2)
 enddo

[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

--- Comment #1 from Martin Jambor  ---
And indeed the following hack in Fortran FE "fixes" the benchmark (of
course, this is not meant as a proposed fix, just as a demonstration
where the problem is):

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 1754d982153..7ee65a983a7 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3327,6 +3327,7 @@ create_fn_spec (gfc_symbol *sym, tree fntype)
}
 }

+  bool is_mpi_irecv = (strcmp (sym->name, "mpi_irecv") == 0);
   for (f = gfc_sym_get_dummy_args (sym); f; f = f->next)
 if (spec_len < sizeof (spec))
   {
@@ -3342,6 +3343,7 @@ create_fn_spec (gfc_symbol *sym, tree fntype)
  }

if (f->sym == NULL || is_pointer || f->sym->attr.target
+   || is_mpi_irecv
|| f->sym->attr.external || f->sym->attr.cray_pointer
|| (f->sym->ts.type == BT_DERIVED
&& (f->sym->ts.u.derived->attr.proc_pointer_comp

[Bug c/118948] [15/16 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_single_nonnegative_warnv_p, at fold-const.cc:14878 since r15-328

2025-07-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118948

--- Comment #7 from GCC Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:f24015a4c2ca6d6fbbf7090004b3a83081f18f03

commit r16-2026-gf24015a4c2ca6d6fbbf7090004b3a83081f18f03
Author: Andrew Pinski 
Date:   Thu Jul 3 11:58:50 2025 -0700

fold: Change comparison of error_mark_node to use error_operand_p in
tree_expr_nonnegative_warnv_p [PR118948]

This is an obvious fix for this small regression. Basically after
r15-328-g5726de79e2154a,
there is a call to tree_expr_nonnegative_warnv_p where the type of the
expression is now
error_mark_node. Though there was only a check if the expression was
error_mark_node.

Bootstrapped and tested on x86_64-linux-gnu.

PR c/118948

gcc/ChangeLog:

* fold-const.cc (tree_expr_nonnegative_warnv_p): Use
error_operand_p instead of checking for error_mark_node directly.

gcc/testsuite/ChangeLog:

* gcc.dg/pr118948-1.c: New test.

Signed-off-by: Andrew Pinski 

[Bug c/118948] [15/16 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in tree_single_nonnegative_warnv_p, at fold-const.cc:14878 since r15-328

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118948

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|15.2|16.0

--- Comment #8 from Andrew Pinski  ---
Fixed on the trunk

[Bug target/117850] GCC emits DUP, UMULL instead of UMULL2

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #7 from Andrew Pinski  ---
.

[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread kargls at comcast dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

kargls at comcast dot net changed:

   What|Removed |Added

 CC||kargls at comcast dot net

--- Comment #2 from kargls at comcast dot net ---
(In reply to Martin Jambor from comment #0)
>
> The problem is that tree-sra is told by escape analysis that the
> address of the first parameter of mpi_irecv does not escape.  And so
> the aggregate passed in that parameter is an SRA candidate and is
> broken down into scalar components and these are reloaded immediately
> after the function returns returns and not after a call of mpi_wait.

Don't know what SRA is.  What does it mean for a parameter to 
escape or not escape?  In particular, this is likely not 
restricted to just Fortran 77.  It may effect all versions of
Fortran.

[Bug tree-optimization/120358] [15/16 regression] qtbase-6.9.0 miscompiled since r15-580-gf3e5f4c58591f5

2025-07-04 Thread holger--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120358

--- Comment #27 from Holger Hoffstätte  ---
If I add lifetime-extending printfs to QStringTokenizerBase::next() like this:

--- qstringtokenizer.h  2025-07-04 17:56:28.523676630 +0200
+++ qstringtokenizer-printf.h   2025-07-04 17:56:03.998840901 +0200
@@ -389,11 +389,14 @@ auto QStringTokenizerBase return final element:
 result = m_haystack.sliced(state.start);
+__builtin_printf("\nfinal: '%s'\n",
result.toString().toStdString().c_str());
 }
 if ((m_sb & Qt::SkipEmptyParts) && result.isEmpty()) {
+__builtin_printf("\nskipping empty result: '%s'\n",
result.toString().toStdString().c_str());
 continue;
 }
 return {result, true, state};

..it starts to work. Removing any one of these printfs breaks the lifetime of
the
intermediate "Haystack result" and leads to garbage when trying to find the
last element.

[Bug libstdc++/120949] [16 regression] rejected with clang-20.1.7

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120949

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2025-07-04

[Bug target/120957] [16 Regression] 6-9% slowdown of 503.bwaves_r on Zen{2,3} since r16-1647-gc06979ff957485

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120957

--- Comment #1 from Filip Kastl  ---
The slowdown is also present on 410.bwaves from 2006 SPEC
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=467.40.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=301.40.0
again, both on Zen2 and Zen3

[Bug target/120943] [16 Regression] 5% slowdown of 527.cam4_r on Zen{4,5} since r16-1643-gd073bb6cfc219d

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120943

--- Comment #3 from Filip Kastl  ---
(In reply to H.J. Lu from comment #1)
> Please try:
> 
> https://patchwork.sourceware.org/project/gcc/list/?series=48886

Yes, if I apply this patch, the slowdown goes away

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

--- Comment #6 from Andrew Pinski  ---
use_internal_fn has:
```
  /* Skip the call if LHS == LHS.  If we reach here, EDOM is the only
 valid errno value and it is used iff the result is NaN.  */
  conds.quick_push (gimple_build_cond (EQ_EXPR, lhs, lhs,
   NULL_TREE, NULL_TREE));

```
Which is only a problem as `a == a` with signaling nans can trap and with non
call exceptions can cause an exception.

The easiest and correct way to fix this is s/EQ_EXPR/ORDERED_EXPR/ which is the
same when not having signaling NaNs but with signaling NaNs ORDERED_EXPR is
non-trapping so it will work here and such.

[Bug libstdc++/118681] [C++17] unsynchronized_pool_resource may fail to respect alignment

2025-07-04 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118681

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
   Target Milestone|--- |13.5
 Status|NEW |ASSIGNED

[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

--- Comment #3 from Martin Jambor  ---
SRA is "Scalar Replacement of Aggregates" pass.  It is one of our
optimization passes which can split up aggregates into scalar
components and thus allow further optimizations.

To "escape" means that the address of the parameter is stored
somewhere in the global state of the program during the execution of
the called function and so it can be modified through an alias or in a
subsequent call to some function.

My knowledge of Fortran is limited but my understanding is that later
versions of Fortran introduced the ASYNCHRONOUS attribute to deal with
thee situations.

[Bug tree-optimization/120951] [16 regression] error: gimple cond condition cannot throw

2025-07-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120951

--- Comment #7 from Andrew Pinski  ---
Created attachment 61800
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=61800&action=edit
Patch which I am testing

[Bug jit/120960] New: jit: The static initialization of builtin_data uses flag_openacc and flag_openmp (which have not been set yet)

2025-07-04 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120960

Bug ID: 120960
   Summary: jit: The static initialization of builtin_data uses
flag_openacc and flag_openmp (which have not been set
yet)
   Product: gcc
   Version: 15.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: jit
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: antoyo at gcc dot gnu.org
  Target Milestone: ---

In jit "FE," builtin_data array holding information about builtin
functions is a static array and is statically initialized using macros
DEF_GOACC_BUILTIN_COMPILER, DEF_GOMP_BUILTIN_COMPILER and others which
use flag_openacc and flag_openmp to initialize one of the fields of
elements of the array.

Unless I am missing something obvious, because this is done in a
static initializer, this happens before the flags have a chance to
hold any meaningful value - although I am not sure whether there is
such a thing as a meaningful value for these in the case of JIT.

Of course, feel free to close this as WONTFIX if this is not deemed to
be a problem for this use case.

[Bug fortran/120958] tree-sra "miscompiles" asynchronous MPI (mpi_irecv) in Fortran 77 because of wrong fnspec

2025-07-04 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958

--- Comment #4 from rguenther at suse dot de  ---
> Am 04.07.2025 um 18:18 schrieb jamborm at gcc dot gnu.org 
> :
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120958
> 
> --- Comment #3 from Martin Jambor  ---
> SRA is "Scalar Replacement of Aggregates" pass.  It is one of our
> optimization passes which can split up aggregates into scalar
> components and thus allow further optimizations.
> 
> To "escape" means that the address of the parameter is stored
> somewhere in the global state of the program during the execution of
> the called function and so it can be modified through an alias or in a
> subsequent call to some function.
> 
> My knowledge of Fortran is limited but my understanding is that later
> versions of Fortran introduced the ASYNCHRONOUS attribute to deal with
> thee situations.

In particular Fortran 77 doesn’t have POINTER, but Fortran 77 can call C
functions (though not in a standard defined way).  It is probably uncommon that
this case happens on automatic variables.  I‘m unsure if we can do better than
ignoring the problem or not set any fnspec for all functions for legacy/f77
standard programs?

> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug target/120941] [16 Regression] 24-40% slowdown of 519.lbm_r on Zen2 since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #7 from Filip Kastl  ---
>(In reply to Filip Kastl from comment #0)
> there was a 40% exec time slowdown (on another machine I measured only 24%)
> of 527.cam4_r SPEC 2017 benchmark when run with -Ofast -march=native -flto

and this should have said 519.lbm_r instead of 527.cam4_r.  Apparently I was
confused when writing this.

[Bug target/120941] [16 Regression] 10-40% slowdown of 519.lbm_r on Zen2 since r16-1644-gaba3b9d3a48a07

2025-07-04 Thread pheeck at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

Filip Kastl  changed:

   What|Removed |Added

Summary|[16 Regression] 10-40%  |[16 Regression] 10-40%
   |slowdown of 519.lbm_r on|slowdown of 519.lbm_r on
   |Zen{2,3} since  |Zen2 since
   |r16-1644-gaba3b9d3a48a07|r16-1644-gaba3b9d3a48a07

--- Comment #6 from Filip Kastl  ---
(In reply to Haochen Jiang from comment #5)
> I have tried on Zen3 Client machine with -Ofast -flto -march=native/znver2
> and see no regression on both options before and after the commit.

Ah, my bad.  The Zen3 slowdown must be something else.  I forgot to check if
the dates match and they don't (the Zen3 slowdown happened earlier than the
Zen2 slowdown).  So this seems to be only Zen2 issue.  Sorry for that.

[Bug c++/120954] [15 Regression] False positive -Warray-bounds=2 warning

2025-07-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120954

--- Comment #3 from Sam James  ---
15.1: https://godbolt.org/z/3bM5bEY7b
14.3: https://godbolt.org/z/fahjWYEfr
13.4: https://godbolt.org/z/KK3bh5Gz4
12.4: https://godbolt.org/z/fM3c913qh

However, when compiling it as C++, I only see it on trunk (but with a few days
old builds for the branches).

  1   2   >