[Bug c++/119102] New: GCC 15.0 'import std;' fails with Ofast (not with O3) due to some openmp internal error

2025-03-03 Thread igor.machado at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119102

Bug ID: 119102
   Summary: GCC 15.0 'import std;' fails with Ofast (not  with O3)
due to some openmp internal error
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: igor.machado at gmail dot com
  Target Milestone: ---

I successfully used GCC 15.0 to compile several code with CXX Modules, but I
noticed it fails with -Ofast, what does not happen with -O0, -O1, -O2 and -O3.
The error somehow mentions -fopenmp, which I don't use... code is trivial,
just:

file: main.cpp
```
import std;
int main() {  return 0; }
```

Compiled with FAILS:
g++-15 -std=c++23 -Ofast -fmodules -fsearch-include-path bits/std.cc main.cpp
-o example

```
In module imported at ./main.cpp:1:1:
std: error: module contains OpenMP, use ‘-fopenmp’ to enable
std: error: failed to read compiled module: Bad file data
std: note: compiled module file is ‘gcm.cache/std.gcm’
std: fatal error: returning to the gate for a mechanical issue
compilation terminated.
```

This works fine:
g++-15 -std=c++23 -O3 -fmodules -fsearch-include-path bits/std.cc main.cpp -o
example

Even when `-fopenmp` is enabled, compiler breaks:

```
$ g++-15 -std=c++23 -O3 -fopenmp -fmodules -freport-bug -fsearch-include-path
bits/std.cc main.cpp -o example]

/usr/include/c++/15/bits/std.cc:37:8: internal compiler error: in decl_node, at
cp/module.cc:8808
   37 | export module std;
  |^~
0x73d907e2a1c9 __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x73d907e2a28a __libc_start_main_impl
../csu/libc-start.c:360
Please submit a full bug report, with preprocessed source.
Please include the complete backtrace with any bug report.
See  for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
```

Version is: g++-15 (Ubuntu 15-20250213-1ubuntu1) 15.0.1 20250213 (experimental)
[master r15-7502-g26baa2c09b3]

Operating System is Ubuntu 24.04, with gcc-15 from Ubuntu 25.04 repo:
- deb http://cz.archive.ubuntu.com/ubuntu plucky main universe

I know that CXX Modules and "import std;" is still quite experimental, but it
seemed strange to generate a compiler bug, that's why I'm reporting.

Good luck!

[Bug c++/119102] GCC 15.0 'import std;' fails with Ofast (not with O3) due to some openmp internal error

2025-03-03 Thread igor.machado at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119102

--- Comment #1 from Igor Machado Coelho  ---
Note that if first building the std module with O3 and then linking with Ofast,
it works fine:

g++-15 -std=c++23 -O3 -fmodules -fsearch-include-path bits/std.cc main.cpp -o
example

# this generates gcm.cache/std.gcm

Then... use Ofast:

g++-15 -std=c++23 -Ofast -fmodules main.cpp -o example

This does not break the compiler... so it's indeed something related to
compiling the std.cc with Ofast.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #6 from Iain Sandoe  ---
is this related to or maybe a dup of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117364 ?

[Bug c++/119073] [15 Regression] ICE in cp_gimplify_expr, at cp/cp-gimplify.cc:911 with temporary vector in range-for with -std=c++23 since r15-7481

2025-03-03 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119073

Jason Merrill  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #7 from Jakub Jelinek  ---
Most likely yes.

[Bug fortran/103391] [12/13/14/15 Regression] ICE: gimplification failed since r7-4021-g574284e9c49687d8

2025-03-03 Thread paul.richard.thomas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103391

--- Comment #9 from paul.richard.thomas at gmail dot com  ---
That was a question at the end,  not a statement :-) I cannot see anything
wrong with the test case but wondered if one of the more eagle-eyed of us
could see a standardese problem with it.

Have you had any experience with ChatGPT or similar? I was wondering
whether or not it is up to the resolution of standard questions.

Cheers

Paul


On Mon, 3 Mar 2025 at 14:34, vehre at gcc dot gnu.org <
gcc-bugzi...@gcc.gnu.org> wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103391
>
> Andre Vehreschild  changed:
>
>What|Removed |Added
>
> 
>Assignee|unassigned at gcc dot gnu.org  |vehre at gcc dot
> gnu.org
>  Status|NEW |ASSIGNED
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug fortran/77872] [12/13/14/15 Regression] ICE in gfc_conv_descriptor_token, at fortran/trans-array.c:305

2025-03-03 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77872

Andre Vehreschild  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #15 from Andre Vehreschild  ---
Awaiting review at:
https://gcc.gnu.org/pipermail/fortran/2025-March/061822.html

[Bug rtl-optimization/119071] [12/13/14/15 Regression] Miscompile at -O2 since r10-7268

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119071

--- Comment #10 from Jakub Jelinek  ---
cse1 optimizes insn optimizes insn 15 away in:
(insn 10 9 11 2 (parallel [
(set (reg:SI 84 [ _3 ])
(plus:SI (reg:SI 96)
(reg:SI 92 [ _18 ])))
(clobber (reg:CC 17 flags))
]) 185 {*addsi_1}
 (nil))
(insn 11 10 12 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 84 [ _3 ])
(const_int 1 [0x1]))) 11 {*cmpsi_1}
 (nil))
(insn 12 11 13 2 (set (reg:QI 98)
(ne:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) 732 {*setcc_qi}
 (nil))
(insn 13 12 14 2 (set (reg:SI 97)
(zero_extend:SI (reg:QI 98))) 119 {*zero_extendqisi2}
 (nil))
(insn 14 13 15 2 (set (reg:SI 89 [ _15 ])
(reg:SI 97)) 67 {*movsi_internal}
 (nil))
(insn 15 14 16 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 84 [ _3 ])
(const_int 1 [0x1]))) 11 {*cmpsi_1}
 (nil))
(insn 16 15 17 2 (set (reg:QI 100)
(eq:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) 732 {*setcc_qi}
 (nil))
which looks reasonable, nothing clobbers flags in between and insn 11 sets it
to the same value.

I think things go wrong during combine.
Before that revision we have
(insn 5 2 6 2 (set (reg:CCZ 17 flags)
(compare:CCZ (mem/c:SI (symbol_ref:DI ("a") [flags 0x2]  ) [1 a+0 S4 A32])
(const_int -2 [0xfffe]))) 11 {*cmpsi_1}
 (nil))
...
(insn 18 17 19 2 (set (reg:SI 97)
(eq:SI (reg:CCZ 17 flags)
(const_int 0 [0]))) 731 {*setcc_si_1_movzbl}
 (expr_list:REG_DEAD (reg:CC 17 flags)
(nil)))
(insn 19 18 20 2 (set (reg:SI 102)
(ne:SI (reg:CCZ 17 flags)
(const_int 0 [0]))) 731 {*setcc_si_1_movzbl}
 (expr_list:REG_DEAD (reg:CC 17 flags)
(nil)))
(insn 20 19 21 2 (parallel [
(set (reg:SI 93 [  ])
(minus:SI (reg:SI 102)
(reg:SI 97)))
(clobber (reg:CC 17 flags))
]) 254 {*subsi_1}
 (expr_list:REG_DEAD (reg:SI 102)
(expr_list:REG_DEAD (reg:SI 97)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
(insn 21 20 25 2 (set (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  ) [1 b+0 S4 A32])
(reg:SI 93 [  ])) 67 {*movsi_internal}
 (nil))
which feels reasonable.  The testcase has UB when a == -2 (left shift by -1)
and otherwise sets b to 1.
But starting with r10-7268 combiner combines this into
(insn 5 2 6 2 (set (reg:CCZ 17 flags)
(compare:CCZ (mem/c:SI (symbol_ref:DI ("a") [flags 0x2]  ) [1 a+0 S4 A32])
(const_int -2 [0xfffe]))) 11 {*cmpsi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
...
(insn 20 19 21 2 (set (reg:SI 93 [  ])
(const_int 0 [0])) 67 {*movsi_internal}
 (nil))
(insn 21 20 25 2 (set (mem/c:SI (symbol_ref:DI ("b") [flags 0x2]  ) [1 b+0 S4 A32])
(reg:SI 93 [  ])) 67 {*movsi_internal}
 (nil))
which is wrong for the non-UB case of a not being -2.

[Bug rtl-optimization/119099] [15 regression] Compile-time hang in ext-dce

2025-03-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119099

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2025-03-03
 Status|UNCONFIRMED |ASSIGNED

--- Comment #3 from Jeffrey A. Law  ---
Bi-directional dataflow is notoriously hard to get correct and I have zero
confidence this code handles that reasonably.  I thought I had some checks for
this, though I don't immediately see them.

While I see 2-3 WTF things going on, but as Alexey noted, the key one is the
expansion and contraction of the sets.

[Bug rtl-optimization/119071] [12/13/14/15 Regression] Miscompile at -O2 since r10-7268

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119071

Jakub Jelinek  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek  ---
So, I think the problematic combination is
Trying 13, 18, 17 -> 19:
   13: r97:SI=flags:CCZ!=0
   18: {r101:SI=-r97:SI;clobber flags:CC;}
  REG_UNUSED flags:CC
   17: r99:SI=flags:CCZ==0
  REG_DEAD flags:CCZ
   19: {r102:SI=r99:SI<

[Bug rtl-optimization/118739] [15 Regression] wrong code at -O{s,3} with "-fno-tree-forwprop -fno-tree-vrp" on x86_64-linux-gnu since r15-268-g9dbff9c05520a7

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118739

--- Comment #18 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:a92dc3fe31c95d56019b2fb95a58414bca06241f

commit r15-7793-ga92dc3fe31c95d56019b2fb95a58414bca06241f
Author: Uros Bizjak 
Date:   Wed Feb 12 11:19:57 2025 +0100

combine: Discard REG_UNUSED note in i2 when register is also referenced in
i3 [PR118739]

The combine pass is trying to combine:

Trying 16, 22, 21 -> 23:
   16: r104:QI=flags:CCNO>0
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
  REG_UNUSED flags:CC
   21: r119:QI=flags:CCNO<=0
  REG_DEAD flags:CCNO
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
  REG_DEAD r120:QI
  REG_DEAD r119:QI
  REG_UNUSED flags:CC

and creates the following two insn sequence:

modifying insn i222: r104:QI=flags:CCNO>0
  REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i323: r110:QI=flags:CCNO<=0
  REG_DEAD flags:CC
deferring rescan insn with uid = 23.

where the REG_DEAD note in i2 is not correct, because the flags
register is still referenced in i3.  In try_combine() megafunction,
we have this part:

--cut here--
/* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
if (i3notes)
  distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i2notes)
  distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i1notes)
  distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
elim_i2, local_elim_i1, local_elim_i0);
if (i0notes)
  distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, local_elim_i0);
if (midnotes)
  distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
--cut here--

where the compiler distributes REG_UNUSED note from i2:

   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
  REG_UNUSED flags:CC

via distribute_notes() using the following:

--cut here--
  /* Otherwise, if this register is used by I3, then this register
 now dies here, so we must put a REG_DEAD note here unless
there
 is one already.  */
  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
   && ! (REG_P (XEXP (note, 0))
 ? find_regno_note (i3, REG_DEAD,
REGNO (XEXP (note, 0)))
 : find_reg_note (i3, REG_DEAD, XEXP (note, 0
{
  PUT_REG_NOTE_KIND (note, REG_DEAD);
  place = i3;
}
--cut here--

Flags register is used in I3, but there already is a REG_DEAD note in I3.
The above condition doesn't trigger and continues in the "else" part where
REG_DEAD note is put to I2.  The proposed solution corrects the above
logic to trigger every time the register is referenced in I3, avoiding the
"else" part.

PR rtl-optimization/118739

gcc/ChangeLog:

* combine.cc (distribute_notes) : Correct the
logic when the register is used by I3.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr118739.c: New test.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #5 from Jakub Jelinek  ---
I'm unsure if this should be a P1, the P1-ish part on this is solely that a
test was added that ICEs, but the test ICEd before for several years as well.

[Bug tree-optimization/117919] [14/15 Regression] ICE: in propagate, at gimple-ssa-sccopy.cc:625 with -O -fno-tree-forwprop -fnon-call-exceptions --param=early-inlining-insns=192

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117919

--- Comment #7 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Filip Kastl
:

https://gcc.gnu.org/g:6ffbc711afbda9446df51fd2b542ecd61853283d

commit r14-11373-g6ffbc711afbda9446df51fd2b542ecd61853283d
Author: Filip Kastl 
Date:   Sun Mar 2 06:39:17 2025 +0100

gimple: sccopy: Prune removed statements from SCCs [PR117919]

While writing the sccopy pass I didn't realize that 'replace_uses_by ()'
can
remove portions of the CFG.  This happens when replacing arguments of some
statement results in the removal of an EH edge.  Because of this sccopy can
then work with GIMPLE statements that aren't part of the IR anymore.  In
PR117919 this triggered an assertion within the pass which assumes that
statements the pass works with are reachable.

This patch tells the pass to notice when a statement isn't in the IR
anymore
and remove it from it's worklist.

PR tree-optimization/117919

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
statements that 'replace_uses_by ()' removed.

gcc/testsuite/ChangeLog:

* g++.dg/pr117919.C: New test.

Signed-off-by: Filip Kastl 
(cherry picked from commit 5349aa2accdf34a7bf9cabd1447878aaadfc0e87)

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #8 from Iain Sandoe  ---
the comments in PR117364, lead me to believe that this is a problem down-stream
of the FE that happens to be exposed frequently by coroutines (since we need to
populate  because of the phasing required of that with the initial
suspend.

Eric seemed to agree that NVRO could just as easily result in the same
circumstance.

As of now I am not sure how to proceed... if this is to become a P1 we need to
discuss how to meet the constraints.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #9 from Jakub Jelinek  ---
Looking at the simpler
struct C { void *p; explicit C (void *p) : p(p) {} };

C foo (int i, void *p) { C c (p); return c; }
test, -O2 -m32 vs. -O2 -m64 -mptr64 the reason why  is used in the
first case and not in the latter is in want_nrvo_p.
can_do_nrvo_p is true in both cases, but aggregate_value_p (functype,
current_function_decl) is true only in the former case but in the latter.
So, the question is what is going during the coroutine handling that NRVO is
still used in a function which uses pretty much the same class.

[Bug ipa/119093] ICE: in function_and_variable_visibility, at ipa-visibility.cc:715 with weakref to target_clone

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119093

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code

--- Comment #1 from Richard Biener  ---
We need a ice-on-dubious-code ;)

[Bug fortran/118747] [15 Regression]: seg fault on accessing an elemental procedure dummy argument's deferred-length component

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118747

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:43c11931acc50f3a44efb485b03e6a8d44df97e0

commit r15-7789-g43c11931acc50f3a44efb485b03e6a8d44df97e0
Author: Andre Vehreschild 
Date:   Wed Feb 26 14:30:13 2025 +0100

Fortran: Fix regression on double free on elemental function [PR118747]

Fix a regression were adding a temporary variable inserted a copy of the
argument to the elemental function.  That copy was then later used to
free allocated memory, but the freeing was not tracked in the source
array correctly.

PR fortran/118747

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_array_ctor_element): Remove copy to
temporary variable.
* trans-expr.cc (gfc_conv_procedure_call): Use references to
array members instead of copies when freeing after use.
Formatting fix.

gcc/testsuite/ChangeLog:

* gfortran.dg/alloc_comp_auto_array_4.f90: New test.

[Bug go/119098] GO built from GCC 14 sources no longer works when installing libgo23 build from GCC 15

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119098

--- Comment #2 from Richard Biener  ---
In the Makefile I see

version.go: s-version; @true
s-version: Makefile
rm -f version.go.tmp
echo "package sys" > version.go.tmp
echo 'const GccgoToolDir = "$(libexecsubdir)"' >> version.go.tmp
echo 'const StackGuardMultiplierDefault = 1' >> version.go.tmp
$(SHELL) $(srcdir)/mvifdiff.sh version.go.tmp version.go
$(STAMP) $@

[Bug go/119098] GO built from GCC 14 sources no longer works when installing libgo23 build from GCC 15

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119098

--- Comment #3 from Richard Biener  ---
So for GCC 15 I suggest to bump the SONAME.  But this behavior really looks odd
with bad separation of compiler driver and runtime?

[Bug c++/119076] [15 Regression] ICE with Segmentation fault with modules due to char array in a template

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119076

--- Comment #10 from Jakub Jelinek  ---
Created attachment 60643
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60643&action=edit
gcc15-pr119076.patch

Untested fix.

[Bug target/119083] Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-03-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119083

--- Comment #7 from Hongtao Liu  ---
(In reply to Hongtao Liu from comment #5)
> (In reply to H.J. Lu from comment #3)
> > Created attachment 60640 [details]
> > A patch to remove SSE_FIRST_REG from ix86_class_likely_spilled_p
> > 
> > Hongtao, can you measure its impact on SPEC CPU2017?
> 
> Sure.

No big impact for this.

[Bug c/118983] I'm using the gcc comes from the Ubuntu 20.04, but it faied to compile a C program

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118983

--- Comment #3 from Andrew Pinski  ---
*** Bug 119095 has been marked as a duplicate of this bug. ***

[Bug c/119095] GCC in Ubuntu 20.04, 22.04 and 24.04 all have this problem.

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119095

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Without a full testcase, it is hard to help you. Also PR 118983 was the one you
filed.

Plus we don't support GCC that is provided by distro, it is distro that
supports it.

*** This bug has been marked as a duplicate of bug 118983 ***

[Bug target/119090] [MAME] [Model 1] 3D graphics are full of glitches if built with CXXFLAGS="-march=native -mtune=native"

2025-03-03 Thread redwindwanderer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119090

--- Comment #3 from Manuel Alfayate  ---
 WHAT "-march=native -mtune=native" ENABLES ON MY SYSTEM 

gcc -march=native -mtune=native -Q --help=target

The following options are target specific:
  -m128bit-long-double  [enabled]
  -m16  [disabled]
  -m32  [disabled]
  -m3dnow   [disabled]
  -m3dnowa  [disabled]
  -m64  [enabled]
  -m80387   [enabled]
  -m8bit-idiv   [disabled]
  -m96bit-long-double   [disabled]
  -mabi=sysv
  -mabm [enabled]
  -maccumulate-outgoing-args[disabled]
  -maddress-mode=   long
  -madx [enabled]
  -maes [enabled]
  -malign-data= compat
  -malign-double[disabled]
  -malign-functions=0
  -malign-jumps=0
  -malign-loops=0
  -malign-stringops [enabled]
  -mamx-bf16[disabled]
  -mamx-complex [disabled]
  -mamx-fp16[disabled]
  -mamx-int8[disabled]
  -mamx-tile[disabled]
  -mandroid [disabled]
  -mapx-features=   none
  -mapx-inline-asm-use-gpr32[disabled]
  -mapxf[disabled]
  -march=   znver4
  -masm=att
  -mavx [enabled]
  -mavx10.1 -mavx10.1-256
  -mavx10.1-256 [disabled]
  -mavx10.1-512 [disabled]
  -mavx2[enabled]
  -mavx256-split-unaligned-load [disabled]
  -mavx256-split-unaligned-store[disabled]
  -mavx5124fmaps[disabled]
  -mavx5124vnniw[disabled]
  -mavx512bf16  [enabled]
  -mavx512bitalg[enabled]
  -mavx512bw[enabled]
  -mavx512cd[enabled]
  -mavx512dq[enabled]
  -mavx512er[disabled]
  -mavx512f [enabled]
  -mavx512fp16  [disabled]
  -mavx512ifma  [enabled]
  -mavx512pf[disabled]
  -mavx512vbmi  [enabled]
  -mavx512vbmi2 [enabled]
  -mavx512vl[enabled]
  -mavx512vnni  [enabled]
  -mavx512vp2intersect  [disabled]
  -mavx512vpopcntdq [enabled]
  -mavxifma [disabled]
  -mavxneconvert[disabled]
  -mavxvnni [disabled]
  -mavxvnniint16[disabled]
  -mavxvnniint8 [disabled]
  -mbionic  [disabled]
  -mbmi [enabled]
  -mbmi2[enabled]
  -mbranch-cost=<0,5>   3
  -mcall-ms2sysv-xlogues[disabled]
  -mcet-switch  [disabled]
  -mcld [disabled]
  -mcldemote[disabled]
  -mclflushopt  [enabled]
  -mclwb[enabled]
  -mclzero  [enabled]
  -mcmodel= [default]
  -mcmpccxadd   [disabled]
  -mcpu=  
  -mcrc32   [enabled]
  -mcx16[enabled]
  -mdaz-ftz [disabled]
  -mdirect-extern-access[enabled]
  -mdispatch-scheduler  [disabled]
  -mdump-tune-features  [disabled]
  -menqcmd  [disabled]
  -mevex512 [enabled]
  -mf16c[enabled]
  -mfancy-math-387  [enabled]
  -mfentry  [disabled]
  -mfentry-name=  
  -mfentry-section=   
  -mfma [enabled]
  -mfma4[disabled]
  -mforce-drap  [disabled]
  -mforce-indirect-call [disabled]
  -mfp-ret-in-387   [enabled]
  -mfpmath= sse
  -mfsgsbase[enabled]
  -mfunction-retur

[Bug c++/99538] ICE: in maybe_add_lambda_conv_op, at cp/lambda.c:1037

2025-03-03 Thread simartin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99538

Simon Martin  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |simartin at gcc dot 
gnu.org

--- Comment #2 from Simon Martin  ---
Working on this one.

[Bug lto/119067] [14/15 Regression] ICE when building firefox-135.0.1 with LTO (tree check: expected none of vector_type, have vector_type in odr_types_equivalent_p, at ipa-devirt.cc:1262)

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119067

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f22e89167b3abfbf6d67f42fc4d689d8ffdc1810

commit r15-7790-gf22e89167b3abfbf6d67f42fc4d689d8ffdc1810
Author: Richard Biener 
Date:   Mon Mar 3 09:54:15 2025 +0100

ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE

odr_types_equivalent_p can end up using TYPE_PRECISION on vector
types which is a no-go.  The following instead uses TYPE_VECTOR_SUBPARTS
for vector types so we also end up comparing the number of vector elements.

PR ipa/119067
* ipa-devirt.cc (odr_types_equivalent_p): Check
TYPE_VECTOR_SUBPARTS for vectors.

* g++.dg/lto/pr119067_0.C: New testcase.
* g++.dg/lto/pr119067_1.C: Likewise.

[Bug lto/119067] [14 Regression] ICE when building firefox-135.0.1 with LTO (tree check: expected none of vector_type, have vector_type in odr_types_equivalent_p, at ipa-devirt.cc:1262)

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119067

Richard Biener  changed:

   What|Removed |Added

  Known to work||15.0
Summary|[14/15 Regression] ICE when |[14 Regression] ICE when
   |building firefox-135.0.1|building firefox-135.0.1
   |with LTO (tree check:   |with LTO (tree check:
   |expected none of|expected none of
   |vector_type, have   |vector_type, have
   |vector_type in  |vector_type in
   |odr_types_equivalent_p, at  |odr_types_equivalent_p, at
   |ipa-devirt.cc:1262) |ipa-devirt.cc:1262)

--- Comment #13 from Richard Biener  ---
Fixed on trunk sofar, I have queued a patch to sync hashing and streaming for
GC 16.

[Bug lto/119067] [14/15 Regression] ICE when building firefox-135.0.1 with LTO (tree check: expected none of vector_type, have vector_type in odr_types_equivalent_p, at ipa-devirt.cc:1262)

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119067

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #11 from Richard Biener  ---
So the odr_types_equivalent_p code is obviously broken, but more interesting is
why we didn't merge the two at this point identical vector types.  This is
because their SCC (size one) hash is different, 4263663699 vs 4287848316 and
that is because of how we hash modes vs. how we stream them:

  if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
{
  hstate.add_hwi (TYPE_MODE (t));

but

  /* For offloading, avoid streaming out TYPE_MODE for aggregate type since
 it may be host-specific. For eg, aarch64 uses OImode for ARRAY_TYPE
 whose size is 256-bits, which is not representable on accelerator.
 Instead stream out VOIDmode, and while streaming-in, recompute
 appropriate TYPE_MODE for accelerator.  */
  if (lto_stream_offload_p
  && (AGGREGATE_TYPE_P (expr) || VECTOR_TYPE_P (expr)))
bp_pack_machine_mode (bp, VOIDmode);
  /* for VECTOR_TYPE, TYPE_MODE reevaluates the mode using target_flags
 not necessary valid in a global context.
 Use the raw value previously set by layout_type.  */
  else
bp_pack_machine_mode (bp, TYPE_MODE_RAW (expr));

I have a fix for the ICE, leaving the two identical type copies around.

[Bug c/119092] Add support for clang/LLVM builtin __builtin_{reduce,elementwise}_*

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119092

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |c

--- Comment #3 from Richard Biener  ---
__builtin_elementwise_atan - what do they do?  Use __has_builtin and then
assume there's a library implementation (which what ABI?).

IMO _iff_ we want to support those the only practical way at the moment
is to lower them in the frontend to scalar operations on vector extracts
and build up a vector from the results.

The reduction builtins look somewhat more useful (what does openCL have
here?), so I'd rather track both in separate bugreports.

[Bug c/119095] New: GCC in Ubuntu 20.04, 22.04 and 24.04 all have this problem.

2025-03-03 Thread wzis at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119095

Bug ID: 119095
   Summary: GCC in Ubuntu 20.04, 22.04 and 24.04 all have this
problem.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wzis at hotmail dot com
  Target Milestone: ---

wzsgcreg.c:237:6: internal compiler error: in subspan, at input.h:68
  237 |  st.st_size, ptime, line);
  |  ^~
0x7f8f35858082 __libc_start_main
../csu/libc-start.c:308

As I tested the issue happened on Ubuntu 20.04, 24.04, that means it is from
GCC 9.4.0 all the way to 13.3.0。
The C statement that it complained about is as following:
sprintf(key, CERT_FMT_NORMAL,
  licCRC, MaxCertSize, sha384sumf(line),
  st.st_size, ptime, line);
Just for your info, this program has been compiled on many other versions of
Linux, and on AIX,Solaris,MacOS, with no issue.

I submitted the bug a few days ago, but I couldn't find it any more, in that
one, I was told it's a duplicate one, and the original one is fixed, ask me to
talk to Ubuntu, but I asked Ubuntu for this, they didn't recognize it.

[Bug tree-optimization/119096] New: Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread someone12469 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

Bug ID: 119096
   Summary: Loop with conditional, cast and reduction vectorized
incorrectly with AVX-512
   Product: gcc
   Version: 14.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: someone12469 at gmail dot com
  Target Milestone: ---

The following C program, which should output 0, outputs 8 when compiled with
gcc -O2 -mavx512f on 64-bit Linux.

int printf(const char *, ...);
long sum(int* A, int* B)
{
long total = 0;
for(int j = 0; j < 16; j++)
if((A[j] > 0) & (B[j] > 0))
total += (long)A[j];
return total;
}
int main()
{
int A[16] = { 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1 };
int B[16] = { };
printf("%ld\n", sum(A, B));
}

The singular & is intentional and significant, in the original program it was
used to hint that the read is safe to get better vectorization. In the
resulting assembly for f:

sum:
.LFB0:
.cfi_startproc
vmovdqu32   (%rdi), %zmm0
vpxor   %xmm2, %xmm2, %xmm2
vmovdqu32   (%rsi), %zmm3
vpcmpd  $6, %zmm2, %zmm0, %k1
vextracti64x4   $0x1, %zmm0, %ymm1
vpmovsxdq   %ymm0, %zmm0
vpmovsxdq   %ymm1, %zmm1
vpcmpd  $6, %zmm2, %zmm3, %k1{%k1}
vmovdqa64   %zmm1, %zmm2
kshiftrw$8, %k1, %k1
vpaddq  %zmm1, %zmm0, %zmm2{%k1}
vextracti64x4   $0x1, %zmm2, %ymm1
vpaddq  %ymm2, %ymm1, %ymm1
vextracti128$0x1, %ymm1, %xmm0
vpaddq  %xmm1, %xmm0, %xmm0
vpsrldq $8, %xmm0, %xmm1
vpaddq  %xmm1, %xmm0, %xmm0
vmovq   %xmm0, %rax
vzeroupper
ret
.cfi_endproc

the main issue appears to be the "vpaddq %zmm1, %zmm0, %zmm2{%k1}", which keeps
the value from the lower half when the upper half is masked, even if the lower
half is masked as well.

Tested on x86_64-pc-linux-gnu on gcc 14.2.1 and a local build of the latest
commit. Since the bug reporting instructions insist, the output of gcc -v for
my distribution's 14.2.1:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust
--enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues
--with-build-config=bootstrap-lto --with-linker-hash-style=gnu
--with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-libstdcxx-backtrace --enable-link-serialization=1
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.2.1 20240910 (GCC)

[Bug c++/119097] New: Modules references internal linkage entity

2025-03-03 Thread hypengwip at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119097

Bug ID: 119097
   Summary: Modules references internal linkage entity
   Product: gcc
   Version: 14.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hypengwip at gmail dot com
  Target Milestone: ---

When a variable is declared inside an unnamed namespace in a header file and
then used in a struct definition, it causes a compilation error due to internal
linkage. The issue occurs when the struct is used both in a header (.h) and
inside a module (.cppm).  

However, using the same variable in a function inside a module (.cppm) does not
produce any error.

```
// error: ‘struct A’ references internal linkage entity 'constexpr const int
{anonymous}::default_val'`

// part1.h


#pragma once

namespace {
static constexpr const int default_val { 0 };
}


// if define A here
struct A {
 int value { default_val }; // failed
 A() : value(default_val) {} // failed
 static auto a() { return default_val; } // failed
};


// part1.cppm

module;
#include "part1.h"
export module mymod;


// if define A here
struct A {
 int value { default_val }; // failed
 A() : value(default_val) {} // ok
 static auto a() { return default_val; } // ok
};

export auto a() { return default_val; } // ok

export void part1_fun(A) {}
```

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

--- Comment #2 from Andrew Pinski  ---
The vectorizer looks ok though:
  mask_patt_37.15_52 = [vec_unpack_lo_expr] mask__9.14_51;
  mask_patt_37.15_53 = [vec_unpack_hi_expr] mask__9.14_51;
  vect_patt_36.18_57 = .COND_ADD (mask_patt_37.15_52, vect__10.16_54,
vect_total_21.17_56, vect__10.16_54);
  vect_patt_36.18_58 = .COND_ADD (mask_patt_37.15_53, vect__10.16_55,
vect_patt_36.18_57, vect__10.16_55);


 /* A ? B : B -> B.  */
 (simplify
  (cnd @0 @1 @1)
  @1)

Confirmed, I think COND_ADD  folding goes wrong.

[Bug tree-optimization/119096] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

--- Comment #1 from Andrew Pinski  ---
Gimple level:
  vect__4.8_45 = MEM  [(int *)A_15(D)];
  vect__10.16_54 = [vec_unpack_lo_expr] vect__4.8_45;
  vect__10.16_55 = [vec_unpack_hi_expr] vect__4.8_45;
  mask__5.9_46 = vect__4.8_45 > { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0 };
  vect__7.12_49 = MEM  [(int *)B_16(D)];
  mask__8.13_50 = vect__7.12_49 > { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0 };
  mask__9.14_51 = mask__5.9_46 & mask__8.13_50;
  mask_patt_37.15_53 = [vec_unpack_hi_expr] mask__9.14_51; // Only use the
upper half
  vect_patt_36.18_58 = .COND_ADD (mask_patt_37.15_53, vect__10.16_54,
vect__10.16_55, vect__10.16_55);
  _60 = .REDUC_PLUS (vect_patt_36.18_58); [tail call]

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

Sam James  changed:

   What|Removed |Added

   Target Milestone|--- |14.3
  Known to work||13.3.1
  Known to fail||14.2.1, 15.0
Summary|Loop with conditional, cast |[14/15 regression] Loop
   |and reduction vectorized|with conditional, cast and
   |incorrectly with AVX-512|reduction vectorized
   ||incorrectly with AVX-512

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-03-03
 Ever confirmed|0   |1

[Bug go/119098] New: GO built from GCC 14 sources no longer works when installing libgo23 build from GCC 15

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119098

Bug ID: 119098
   Summary: GO built from GCC 14 sources no longer works when
installing libgo23 build from GCC 15
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: go
  Assignee: ian at airs dot com
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

We face the issue that when updating the compiler runtime which includes
libgo23 from GCC 14 provided to GCC 15 provided (which has unchanged libgo
SONAME) the GO compiler built from GCC 14 no longer works.

The symptom is

[   25s] go: no such tool "cgo"

I suspect that somehow (parts of?) the path to cgo are built into this
shared library instead of the driver for whatever reason.  Possibly
the GO runtime knows how to compile?!

Strace difference between shlibs when invoking gccgo-14 is thus

- 29430 newfstatat(AT_FDCWD, "/usr/lib64/gcc/x86_64-suse-linux/14/cgo", 

+ 29833 newfstatat(AT_FDCWD, "/usr/lib64/gcc/x86_64-suse-linux/15/cgo", 


It does feel somewhat like a deja-vu ..?

[Bug go/119098] GO built from GCC 14 sources no longer works when installing libgo23 build from GCC 15

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119098

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108057

--- Comment #1 from Richard Biener  ---
PR108057 also had the 'no such tool "cgo"' issue, but after other type issues.

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> The vectorizer looks ok though:
>   mask_patt_37.15_52 = [vec_unpack_lo_expr] mask__9.14_51;
>   mask_patt_37.15_53 = [vec_unpack_hi_expr] mask__9.14_51;
>   vect_patt_36.18_57 = .COND_ADD (mask_patt_37.15_52, vect__10.16_54,
> vect_total_21.17_56, vect__10.16_54);
>   vect_patt_36.18_58 = .COND_ADD (mask_patt_37.15_53, vect__10.16_55,
> vect_patt_36.18_57, vect__10.16_55);
> 
> 
>  /* A ? B : B -> B.  */
>  (simplify
>   (cnd @0 @1 @1)
>   @1)
> 
> Confirmed, I think COND_ADD  folding goes wrong.

Wait maybe the original COND_ADD is incorrect. I can't remember how COND_ADD
works. I thought it was `mask_patt_37.15_52 ?
(vect__10.16_54+vect_total_21.17_56) : vect__10.16_54` if so then the original
COND_ADD is wrong.

[Bug target/119090] [MAME] [Model 1] 3D graphics are full of glitches if built with CXXFLAGS="-march=native -mtune=native"

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119090

--- Comment #2 from Richard Biener  ---
Also can you specify what 'native' maps to for you?  What processor do you
have?

[Bug rtl-optimization/119099] [15 regression] Compile-time hang in ext-dce

2025-03-03 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119099

Sam James  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
Summary|Compile-time hang in DCE|[15 regression]
   ||Compile-time hang in
   ||ext-dce
   Keywords||compile-time-hog
   Target Milestone|--- |15.0

--- Comment #1 from Sam James  ---
We have at least one other PR about ext-dce's use of df.

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

--- Comment #5 from Richard Biener  ---
I'm testing

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index dc15b955aad..52533623cab 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9064,7 +9064,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
new_stmt = gimple_build_call_internal (internal_fn (code),
   op.num_ops,
   vop[0], vop[1], vop[2],
-  vop[1]);
+  vop[reduc_index]);
  else
new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
vop[0], vop[1], vop[2]);

[Bug fortran/68241] [meta-bug] [F03] Deferred-length character

2025-03-03 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68241
Bug 68241 depends on bug 118747, which changed state.

Bug 118747 Summary: [15 Regression]: seg fault on accessing an elemental 
procedure dummy argument's deferred-length component
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118747

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/119057] [15 regression] ICE at -O{2,3} with "-fno-tree-vrp -fno-tree-forwprop" on x86_64-linux-gnu: in operator[], at vec.h:910 since r15-1055

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119057

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:758de6263dfc7ba8701965fa468691ac23cb7eb5

commit r15-7791-g758de6263dfc7ba8701965fa468691ac23cb7eb5
Author: Richard Biener 
Date:   Mon Mar 3 13:21:53 2025 +0100

tree-optimization/119057 - bogus double reduction detection

We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses.  This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance.  It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.

PR tree-optimization/119057
* tree-vect-loop.cc (check_reduction_path): Add argument
specifying whether we're analyzing the inner loop of a
double reduction.  Do not allow extra uses outside of the
double reduction cycle in this case.
(vect_is_simple_reduction): Adjust.

* gcc.dg/vect/pr119057.c: New testcase.

[Bug tree-optimization/119057] [12/13/14 regression] ICE at -O{2,3} with "-fno-tree-vrp -fno-tree-forwprop" on x86_64-linux-gnu: in operator[], at vec.h:910 since r15-1055

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119057

Richard Biener  changed:

   What|Removed |Added

  Known to work||15.0
Summary|[15 regression] ICE at  |[12/13/14 regression] ICE
   |-O{2,3} with "-fno-tree-vrp |at -O{2,3} with
   |-fno-tree-forwprop" on  |"-fno-tree-vrp
   |x86_64-linux-gnu: in|-fno-tree-forwprop" on
   |operator[], at vec.h:910|x86_64-linux-gnu: in
   |since r15-1055  |operator[], at vec.h:910
   ||since r15-1055
   Keywords||fixed-but-no-testcase,
   ||wrong-code
   Priority|P1  |P2
   Target Milestone|15.0|12.5

--- Comment #5 from Richard Biener  ---
So the issue goes back much longer, it was first partly fixed by
r11-4865-g2686de5617bfb5

I'm queueing this for backports, even without having a wrong-code testcase
(the outer loop use lacks a inner loop reduction "epilogue").

[Bug rtl-optimization/119099] New: Compile-time hang in DCE

2025-03-03 Thread alexey.merzlyakov at samsung dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119099

Bug ID: 119099
   Summary: Compile-time hang in DCE
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alexey.merzlyakov at samsung dot com
  Target Milestone: ---

Created attachment 60644
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60644&action=edit
Reduced test-case

The following code c-reduced from csmith generated testcase, causes GCC to hang
at -O2 during the compile-time:
  int a, b;

  void func2(short);

  void func1() {
while (1) {
  int loc = 8;
  while (1) {
func2(loc);
if (a)
  loc = 3;
else if (b)
  break;
loc |= a;
  }
}
  }

The hang appears on trunk and affects at least RISC-V 64/32, ARM 64/32,
powerpc64, MRISC32 targets.
Here is the example link on godbolt for RV64: https://godbolt.org/z/eWxfW3q1K

GCC hangs in ext-dce pass, where the "df_worklist_dataflow_doublequeue()" loops
forwever on the following basic blocks taken from double-worklists queue:

  BB7->BB6->BB4->BB3->BB5->worklists swap->BB7->BB6->BB4->BB3->BB5->worklists
swap ...

Dataflow solver algorithm will serve each BB node while BB sets are still
changing. The key point for solver in deciding whether to continue or not - is
"changed" variable state, which is set by "con_fun_n" function. This function
takes a pointer to "ext_dce_rd_transfer_n()" for ext-dce case. It initializes
"livenow" variables bitmap by "livein[]" states, processes it and finally emits
back "livenow" to the "livein[current BB]". For the selected basic block,
"livein" state flip-flops each time when loop returns back to this BB
processing. E.g. in the example above for the BB=4:

  Breakpoint 1, ext_dce_rd_transfer_n (bb_index=4) at ext-dce.cc:980
  980 return true;
  (gdb) p/x (&livein[bb_index])->first->next->bits
  $8 = {0xf30f000, 0xff0}
  Continuing...
  worklist <-> pedning swap
  Breakpoint 2, df_worklist_dataflow_doublequeue (...) at df-core.cc:1097
  1097std::swap (pending, worklist);
  Continuing...
  (gdb) p/x (&livein[bb_index])->first->next->bits
  $11 = {0xf70f000, 0xff0}
  Continuing...

So livein[4] bits flip 0xf30f000 -> 0xf70f000 ->
0xf30f000 -> ... states forever.

In other words, ext-dce + df-core could be treated as finite-state machine,
whose states acting on "livenow" and "livein" bitmaps. On the given testcase
machine loops forever, flipping/flopping or widening/narrowing "livein" states
with "livenow". Dataflow solver algorith will never come to the null-worklist
state in this case.

The cornerstone place for all of the described above seems to be "livein"
state, which could be changed in two directions (widen or narrowed) and thus
allowing machine to loop forever. If so, the solution could be to allow
changing of "livein" bitmap state only in one direction: in our case - to
expand. It will cause "ext_dce_rd_transfer_n()" function to also guarantee the
dataflow solver algorithm to converge to its final state.

The proposed solution could be as follows below:
  @@ -1094,8 +1094,13 @@ ext_dce_rd_transfer_n (int bb_index)
the generic dataflow code that something changed.  */
 if (!bitmap_equal_p (&livein[bb_index], livenow))
   {
  -  bitmap_copy (&livein[bb_index], livenow);
  -  return true;
  +  bitmap tmp = BITMAP_ALLOC (NULL);
  +  bitmap_and (tmp, &livein[bb_index], livenow);
  +  if (!bitmap_equal_p (tmp, livenow))
  + {
  +   bitmap_ior_into (&livein[bb_index], livenow);
  +   return true;
  + }
   }

 return false;

It allows "livein[bb_index]" to be only widened and returns true only if it is
being really changed.
The patch works for me locally, and passed simple internal GCC testing. But if
this idea to be considered positively to accept, I will to go with more serious
testing on different targets and prepare its final version ready for mail-list.

[Bug fortran/118747] [15 Regression]: seg fault on accessing an elemental procedure dummy argument's deferred-length component

2025-03-03 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118747

Andre Vehreschild  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

[Bug tree-optimization/116125] [12/13/14/15 Regression] Does not fully checking for overlapping memory regions with the vectorizer

2025-03-03 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125

Richard Sandiford  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Sandiford  ---
Will have a look (but might be a few days).

[Bug ipa/118318] [15 regression] ICE when building firefox-134.0 with PGO

2025-03-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318

--- Comment #18 from Martin Jambor  ---
I have proposed the patch on the mailing list:
https://inbox.sourceware.org/gcc-patches/ri6bjui45il@virgil.suse.cz/T/#u

[Bug c++/119082] GCC Incorrectly Accepts Explicit Destructor Call for Scalar Type in constexpr Context

2025-03-03 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119082

--- Comment #3 from Jonathan Wakely  ---
(In reply to qurong from comment #0)
> GCC 12.4/13.3/11.4 erroneously compiles code

So you already figured out that this bug was fixed in GCC 14?

[Bug tree-optimization/118976] [12/13/14/15 regression] Correctness Issue: SVE vectorization results in data corruption when cpu has 128bit vectors but compiled with -mcpu=neoverse-v1 (which is only f

2025-03-03 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118976

Richard Sandiford  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #15 from Richard Sandiford  ---
Oops, yes, a typo indeed.

[Bug tree-optimization/119057] [15 regression] ICE at -O{2,3} with "-fno-tree-vrp -fno-tree-forwprop" on x86_64-linux-gnu: in operator[], at vec.h:910 since r15-1055

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119057

--- Comment #3 from Richard Biener  ---
So there's code in check_reduction_path that checks for additional uses of the
path defs that explicitly allows out-of-loop uses for the "tail":

  /* Check there's only a single stmt the op is used on.  For the
 not value-changing tail and the last stmt allow out-of-loop uses.
 ???  We could relax this and handle arbitrary live stmts by
 forcing a scalar epilogue for example.  */
...
  else if (!is_gimple_debug (op_use_stmt)
   && (*code != ERROR_MARK
   || flow_bb_inside_loop_p (loop,
 gimple_bb (op_use_stmt
FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
  cnt++;

we have

 [local count: 955630224]:
# a.1_28 = PHI <_2(9), a.1_6(3)>
# b_lsm.15_27 = PHI <_25(9), b_lsm.15_33(3)>   <---
# c_lsm.17_3 = PHI <_20(9), c_lsm.17_26(3)>

 [local count: 7731917314]:
# d.6_31 = PHI <_14(10), 0(4)>
# b_lsm.15_13 = PHI <_12(10), b_lsm.15_27(4)>  <---
# ivtmp_23 = PHI 
b.3_9 = (unsigned int) b_lsm.15_13;<---
_11 = b.3_9 | e.4_10;  <--- (*)
_12 = (int) _11;   <---
_14 = d.6_31 + 1;
ivtmp_22 = ivtmp_23 - 1;
if (ivtmp_22 != 0)
  goto ; [89.00%]
else
  goto ; [11.00%]

 [local count: 955630224]:
# _51 = PHI <_11(5)>(*)
# _25 = PHI <_12(5)>  <---
c.9_18 = (unsigned int) c_lsm.17_3;
_19 = _51 | c.9_18;

with the (*) def being used outside of (the inner) loop.

That's undesirable behavior for the double-reduction case.  Testing a patch.

[Bug rtl-optimization/119099] [15 regression] Compile-time hang in ext-dce

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119099

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #2 from Richard Biener  ---
I also noticed this odd thing (without a testcase), and questioned whether it
would converge.

So yes, making the solution either only grow or only shrink sounds like
a correct fix for this (of course making the thing even slower).  It might
be that this was intended to happen by the way the thing is written of course
and it just not working out.  In this case the "fix" would be a workaround
for the real bug.

Not converging is a P1 definitely.

[Bug tree-optimization/119096] [14/15 regression] Loop with conditional, cast and reduction vectorized incorrectly with AVX-512

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119096

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > The vectorizer looks ok though:
> >   mask_patt_37.15_52 = [vec_unpack_lo_expr] mask__9.14_51;
> >   mask_patt_37.15_53 = [vec_unpack_hi_expr] mask__9.14_51;
> >   vect_patt_36.18_57 = .COND_ADD (mask_patt_37.15_52, vect__10.16_54,
> > vect_total_21.17_56, vect__10.16_54);
> >   vect_patt_36.18_58 = .COND_ADD (mask_patt_37.15_53, vect__10.16_55,
> > vect_patt_36.18_57, vect__10.16_55);
> > 
> > 
> >  /* A ? B : B -> B.  */
> >  (simplify
> >   (cnd @0 @1 @1)
> >   @1)
> > 
> > Confirmed, I think COND_ADD  folding goes wrong.
> 
> Wait maybe the original COND_ADD is incorrect. I can't remember how COND_ADD
> works. I thought it was `mask_patt_37.15_52 ?
> (vect__10.16_54+vect_total_21.17_56) : vect__10.16_54` if so then the
> original COND_ADD is wrong.

Yes, I think the .COND_ADD handling fails to handle the single-use-def chain
optimization.

[Bug ipa/118785] [15 Regression] ICE when building vpl-gpu-rt (during IPA pass, ICE in decompose, at wide-int.h:1049)

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118785

--- Comment #13 from GCC Commits  ---
The master branch has been updated by Martin Jambor :

https://gcc.gnu.org/g:d05b64bdd048ffb7f72d97553888934a9bcd13fa

commit r15-7792-gd05b64bdd048ffb7f72d97553888934a9bcd13fa
Author: Martin Jambor 
Date:   Mon Mar 3 14:53:03 2025 +0100

ipa-vr: Handle non-conversion unary ops separately from conversions (PR
118785)

Since we construct arithmetic jump functions even when there is a
type conversion in between the operation encoded in the jump function
and when it is passed in a call argument, the IPA propagation phase
must also perform the operation and conversion in two steps.  IPA-VR
had actually been doing it even before for binary operations but, as
PR 118756 exposes, not in the case on unary operations.  This patch
adds the necessary step to rectify that.

Like in the scalar constant case, we depend on
expr_type_first_operand_type_p to determine the type of the result of
the arithmetic operation.  On top this, the patch special-cases
ABSU_EXPR because it looks useful an so that the PR testcase exercises
the added code-path.  This seems most appropriate for stage 4, long
term we should probably stream the types, probably after also encoding
them with a string of expr_eval_op rather than what we have today.

A check for expr_type_first_operand_type_p was also missing in the
handling of binary ops and the intermediate value_range was
initialized with a wrong type, so I also fixed this.

gcc/ChangeLog:

2025-02-24  Martin Jambor  

PR ipa/118785

* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Handle
non-conversion
unary operations separately before doing any conversions.  Check
expr_type_first_operand_type_p for non-unary operations too.  Fix
type
of op_res.

gcc/testsuite/ChangeLog:

2025-02-24  Martin Jambor  

PR ipa/118785
* g++.dg/lto/pr118785_0.C: New test.

[Bug tree-optimization/118756] tree-ssa-loop-ivopts.cc:1156: Function defined but not used

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118756

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Martin Jambor :

https://gcc.gnu.org/g:d05b64bdd048ffb7f72d97553888934a9bcd13fa

commit r15-7792-gd05b64bdd048ffb7f72d97553888934a9bcd13fa
Author: Martin Jambor 
Date:   Mon Mar 3 14:53:03 2025 +0100

ipa-vr: Handle non-conversion unary ops separately from conversions (PR
118785)

Since we construct arithmetic jump functions even when there is a
type conversion in between the operation encoded in the jump function
and when it is passed in a call argument, the IPA propagation phase
must also perform the operation and conversion in two steps.  IPA-VR
had actually been doing it even before for binary operations but, as
PR 118756 exposes, not in the case on unary operations.  This patch
adds the necessary step to rectify that.

Like in the scalar constant case, we depend on
expr_type_first_operand_type_p to determine the type of the result of
the arithmetic operation.  On top this, the patch special-cases
ABSU_EXPR because it looks useful an so that the PR testcase exercises
the added code-path.  This seems most appropriate for stage 4, long
term we should probably stream the types, probably after also encoding
them with a string of expr_eval_op rather than what we have today.

A check for expr_type_first_operand_type_p was also missing in the
handling of binary ops and the intermediate value_range was
initialized with a wrong type, so I also fixed this.

gcc/ChangeLog:

2025-02-24  Martin Jambor  

PR ipa/118785

* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): Handle
non-conversion
unary operations separately before doing any conversions.  Check
expr_type_first_operand_type_p for non-unary operations too.  Fix
type
of op_res.

gcc/testsuite/ChangeLog:

2025-02-24  Martin Jambor  

PR ipa/118785
* g++.dg/lto/pr118785_0.C: New test.

[Bug ipa/118785] [15 Regression] ICE when building vpl-gpu-rt (during IPA pass, ICE in decompose, at wide-int.h:1049)

2025-03-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118785

Martin Jambor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Martin Jambor  ---
Fixed.

[Bug ipa/118318] [15 regression] ICE when building firefox-134.0 with PGO

2025-03-03 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318

--- Comment #19 from Sam James  ---
Thank you both. I wanted to have a go but was a bit lost.

[Bug c++/103379] ICE: tree check: expected class 'type', have 'declaration' (namespace_decl) in comptypes, at cp/typeck.c:1544

2025-03-03 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103379

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed|2021-11-23 00:00:00 |2025-3-3

--- Comment #4 from Jonathan Wakely  ---
This is an ice-on-invalid C++23 example that produces the same error:

template
constexpr auto&&
forward_like(U&& x) noexcept
{ return x; }

template class C
{
  int value{};

  template
  friend constexpr auto&&
  get(this Self&& z) noexcept
  {
  return std::forward_like(z.value);
  }
};


(invalid because you can't use deducing this on a non-member function)

[Bug target/119100] New: RISC-V: missed opportunities for vector-scalar instructions

2025-03-03 Thread parras at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100

Bug ID: 119100
   Summary: RISC-V: missed opportunities for vector-scalar
instructions
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: parras at gcc dot gnu.org
  Target Milestone: ---

Created attachment 60645
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60645&action=edit
Source reduced from 554.roms

A number of RVV instructions have two variants: vector-vector and
vector-scalar. For instance, vfmadd.vv and vfmadd.vf: the latter accepts one
scalar operand.

However, SPEC2017's 554.roms shows that many opportunities of emitting the
vector-scalar variant are missed.

$ riscv64-linux-gnu-gfortran -S -Ofast -mabi=lp64d
-march=rv64gcv_zvl256b_zba_zbb_zbs_zicond -mrvv-vector-bits=zvl
rho_eos_tile.F90 -o rho_eos_tile.riscv64.s
$ cat rho_eos_tile.riscv64.s
...
(1) vfmv.v.fv6,fa0
vlse64.vv2,0(t0),s2
vmv.v.i v5,0
(2) vfmadd.vv   v9,v6,v7
...

Here (1) and (2) could be combined into:
vfmadd.vf   v9,fa0,v7

In RTL terms, it means combining:

(set (reg:RVVM1DF 516)
(vec_duplicate:RVVM1DF (reg:DF 517)))

into:

(set (reg:RVVM1DF 515)
(plus:RVVM1DF (mult:RVVM1DF (reg:RVVM1DF 362 [ vect_M.84_273.156 ])
(reg:RVVM1DF 516))
(reg:RVVM1DF 519)))

I have a draft patch dealing with the simple case where both instructions live
in the same basic block. However, the vec_duplicate often gets hoisted to the
loop preamble before reaching the combine pass.

[Bug fortran/103391] [12/13/14/15 Regression] ICE: gimplification failed since r7-4021-g574284e9c49687d8

2025-03-03 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103391

Andre Vehreschild  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |vehre at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug fortran/119101] New: Function compiled with Gflortran appears to produce a pointer that points at itself.

2025-03-03 Thread David.Applegate at global dot amentum.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119101

Bug ID: 119101
   Summary: Function compiled with Gflortran appears to produce a
pointer that points at itself.
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: David.Applegate at global dot amentum.com
  Target Milestone: ---

Created attachment 60646
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60646&action=edit
Source file

This bug occurs on gfortran 13.2 and 14.2, but not 7.3, 8.5 or 11.5

OS was Oracle Linux 8 and Oracle Linux 9.

One of the modules in the fortran code has two return functions: getBC at line
142 and getBC2 at line 165. As far as I can tell these should produce identical
results, but you will see from the program output that they do not. Running ddd
I was able to see that a circular pointer reference was created at line 97 for
the incorrect output. I think the fortran I have written is valid, but even if
not, ideally some sort of compilation error when compiled or a runtime error
when run would be handy. It took me a long time to track down the issue. 

This is the compiler version output:

COLLECT_GCC=/project/connectflow/gcc/linux64-13.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/export/project/connectflow/gcc/linux64-13.2.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-13.2.0/configure
--prefix=/users/davida/SOFTWARE/gcc-13.2.0_install/ --disable-multilib
--enable-languages=c,c++,fortran
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.2.0 (GCC)

This is the compilation command (-save-temps was tried but didn't produce
anything):

(base) [davida@ada ~]$ /project/connectflow/gcc/linux64-13.2.0/bin/gfortran
-Wall -Wextra -fsanitize=address,undefined -fno-strict-aliasing -fwrapv
-fno-aggressive-loop-optimizations  test.f90

No output was produced to the terminal.

This was the output when the program was run:

(base) [davida@ada ~]$ ./a.out 
 DIRICHLET
 DEFAULT
 DEFAULT
 DEFAULT

[Bug target/119100] RISC-V: missed opportunities for vector-scalar instructions

2025-03-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100

--- Comment #1 from Richard Biener  ---
doesn't late-combine and/or forwprop not have the single-BB restriction?  Also
when the vec-duplicate is hoisted out of a loop this then becomes a
register pressure in vector vs. scalar regset issue only?

[Bug tree-optimization/119070] gcc15 incorrectly reporting negative array-bounds errors

2025-03-03 Thread taylor.hutt at broadcom dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119070

--- Comment #7 from Taylor Hutt  ---
(In reply to Andrew Pinski from comment #6)
> You could do:
> 
>struct_1  *v1 = &global_0.f_2_0;
>asm("":"+r"(v1));
>unsigned char *v2 = (unsigned char *)v1;
> 
> to hide from GCC that the address of v2 is related to a global variable.
> And that should get rid of the warning too.
> 
> But otherwise this is undefined code.

Why not cast the pointer to uintptr_t at the point of the undefined behavior
pointer arithmetic?

[Bug libstdc++/119089] FAIL: 23_containers/vector/debug/assign4_backtrace_neg.cc -std=gnu++17 (test for excess errors)

2025-03-03 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119089

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from John David Anglin  ---
Resolved.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #12 from Jakub Jelinek  ---
(In reply to Iain Sandoe from comment #10)
> In the coroutine handling to deal with 
> https://eel.is/c++draft/dcl.fct.def.coroutine#7
> 
> we unconditionally create the return object in the  slot - if we
> create it somewhere else, that causes us to produce an unexpected additional
> copy.

If the return type has non-trivial copy ctor, then it is TREE_ADDRESSABLE type
and so aggregate_value_p is true.  This PR is just about the corner cases where
the return type doesn't need to be returned in memory.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #13 from Jakub Jelinek  ---
That check_return_expr call in
  /* Without a relevant location, bad conversions in check_return_expr
 result in unusable diagnostics, since there is not even a mention
 of the relevant function.  Here we carry out the first part of
 finish_return_expr().  */
  input_location = fn_start;
  r = check_return_expr (get_ro, &no_warning, &dangling);
  input_location = UNKNOWN_LOCATION;
  gcc_checking_assert (!dangling);
actually calls want_nrvo_p and that returns false.  But guess check_return_expr
isn't expecting first argument like TARGET_EXPR, normal user code would return
some VAR_DECL, not a TARGET_EXPR.
So perhaps add before that call something like
  if (!aggregate_return_p (fn_return_type, current_function_decl))
{
  tree var = build_local_temp (fn_return_type);
  // Perhaps pushdecl it or else arrange it to be in BLOCK_VARS?
  // Emit INIT_EXPR for it from get_ro.
  get_ro = var;
}
so that you initialize the temp var instead of RESULT_DECL directly?

[Bug rtl-optimization/119071] [12/13/14/15 Regression] Miscompile at -O2 since r10-7268

2025-03-03 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119071

--- Comment #13 from Uroš Bizjak  ---
(In reply to Sam James from comment #12)
> This works for me on trunk. Did Uros' r15-7793-ga92dc3fe31c95d fix it?

Yes, this is the same issue.

[Bug rtl-optimization/119071] [12/13/14/15 Regression] Miscompile at -O2 since r10-7268

2025-03-03 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119071

Sam James  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #12 from Sam James  ---
This works for me on trunk. Did Uros' r15-7793-ga92dc3fe31c95d fix it?

[Bug fortran/119101] Function compiled with Gflortran appears to produce a pointer that points at itself.

2025-03-03 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119101

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

  Known to fail||13.3.0, 14.2.0
   Keywords||wrong-code
  Known to work||11.5.0, 12.4.1, 13.3.1,
   ||14.2.1, 15.0
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2025-03-03
 Ever confirmed|0   |1

--- Comment #1 from anlauf at gcc dot gnu.org ---
I can confirm this with the release versions 13.3.0, 14.2.0, which print:

 DIRICHLET
 DEFAULT
 DEFAULT
 DEFAULT

but at r12-10972, r13-9407, r14-11370, and 15-trunk I get:

 DIRICHLET
 DEFAULT
 DIRICHLET
 DEFAULT

So it apparently has been fixed on these branches in the meantime.

Are you able to update and verify that it is fixed for you, too?

[Bug target/101507] ICE for gcc.dg/Wstringop-overflow-69.c with -march=iwmmxt (internal compiler error: maximum number of generated reload insns per insn achieved (90))

2025-03-03 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101507

--- Comment #2 from Vladimir Makarov  ---
Sorry, I've tried gcc-12, gcc-13, gcc-14, trunk dated by Aug 1, and today trunk
but I did not managed to reproduce the error.

Probably, it was fixed by some LRA patch (there were a lot of them since 2021).

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #11 from Jakub Jelinek  ---
In the
#include 

struct B {
  bool await_ready () const noexcept;
  void await_suspend (std::coroutine_handle<> h) const noexcept;
  void await_resume () const noexcept;
};

struct C {
  struct promise_type {
const char *value;
std::suspend_never initial_suspend ();
std::suspend_always final_suspend () noexcept;
void return_value (const char *v);
void unhandled_exception ();
C get_return_object () { return C{this}; }
  };
  promise_type *p;
  explicit C (promise_type *p) : p(p) {}
  const char *get ();
};

C
bar (bool x)
{
  if (x)
co_await B{};
  co_return "foobar";
}
testcase (-O2 -m64 -mptr64 -fcoroutines -std=c++23) I see can_do_nrvo_p return
twice false when functype is RECORD_TYPE C,
once in get_return_object and once in bar function; the reason it returns false
is that retval is not a VAR_DECL, but TARGET_EXPR in both cases.
The coroutines.cc code refers to DECL_RESULT unconditionally then.
I think the problematic INIT_EXPR is created by
#5  0x02090641 in build2 (code=INIT_EXPR, tt=, arg0=, arg1=) at ../../gcc/tree.cc:5199
#6  0x00df98f8 in build2_loc (loc=84139394, code=INIT_EXPR,
type=, arg0=,
arg1=)
at ../../gcc/tree.h:4825
#7  0x0120f81e in cp_build_init_expr (loc=84139394, target=, init=) at
../../gcc/cp/typeck2.cc:2820
#8  0x00d8a860 in cp_build_init_expr (t=,
i=) at ../../gcc/cp/cp-tree.h:8600
#9  0x012027cb in check_return_expr (retval=, no_warning=0x7fffd14f, dangling=0x7fffd14e) at
../../gcc/cp/typeck.cc:11513
#10 0x00e23d13 in cp_coroutine_transform::build_ramp_function
(this=0x3c0bb60) at ../../gcc/cp/coroutines.cc:5186

Obviously, for the aggregate_value_p (TREE_TYPE (TREE_TYPE
(current_function_decl)), current_function_decl) case what the code does is
what we want.  But I guess as these 2 PRs show, for !aggregate_value_p we want
to just initialize a temporary VAR_DECL and then have GIMPLE_RETURN which
returns that VAR_DECL.

[Bug rtl-optimization/119071] [12/13/14/15 Regression] Miscompile at -O2 since r10-7268

2025-03-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119071

--- Comment #14 from Jakub Jelinek  ---
Indeed, r15-7793-ga92dc3fe31c95d56019b2fb95a58414bca06241f fixed this.
I'll prepare a patch with the testcases.

[Bug c/119104] Unclear documentation for [[gnu::nonnull_if_nonzero]]

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119104

--- Comment #1 from Andrew Pinski  ---
Non zero and zero are runtime values of here. Rather than compile
characteristics of that argument.

Maybe just:
If the runtume value of the integral argument is zero, the pointer argument can
be null; or if it is non-zero, the pointer argument must not be null.

[Bug target/119083] Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-03-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119083

--- Comment #8 from H.J. Lu  ---
Created attachment 60647
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60647&action=edit
A patch to remove CREG and BREG from ix86_class_likely_spilled_p

Hongtao, can you measure its impact on SPEC CPU 2017?

[Bug c++/118924] [12/13/14/15 regression] Wrong code at -O2 and above leading to uninitialized accesses on aarch64-linux-gnu since r10-917-g3b47da42de621c

2025-03-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118924

Martin Jambor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jamborm at gcc dot 
gnu.org

--- Comment #13 from Martin Jambor  ---
(In reply to rguent...@suse.de from comment #10)
[...]
> 
> And still SRA should not use a random RHS "model" to build a new
> LHS access, most definitely not when the original aggregate LHS
> isn't TBAA compatible with it.

That could be accomplished by:

diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index c26559edc66..f780285254f 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -3451,7 +3451,7 @@ create_total_scalarization_access (struct access *parent,
HOST_WIDE_INT pos,
   access->grp_write = parent->grp_write;
   access->grp_total_scalarization = 1;
   access->grp_hint = 1;
-  access->grp_same_access_path = path_comparable_for_same_access (expr);
+  access->grp_same_access_path = 0;
   access->reverse = reverse_storage_order_for_component_p (expr);

   access->next_sibling = next_sibling;


Which works for the testcase but I am afraid it might not be
sufficient.  If there was a way to actually create a pre-SRA access to
an individual element of the array with the wrong (int) type in the
function and there wasn't any with the other type, then, SRA not being
flow sensitive pass, would happily use the type again because it would
not be "random" any more.

> The array assignment from the front-end is good enough for the
> middle-end as far as IL type hygiene is concerned given the
> element types are useless-type-convertible.

It is quite evil :-)  What would be a good predicate to detect such
compatible but TBAA-different assignments, if there is one?

Because I think we need to prevent building of references "according to
a model" for all scalar replacements under them.

[Bug c++/118924] [12/13/14/15 regression] Wrong code at -O2 and above leading to uninitialized accesses on aarch64-linux-gnu since r10-917-g3b47da42de621c

2025-03-03 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118924

--- Comment #14 from Martin Jambor  ---
So something like the following - which is completely untested, the
type test may be a wrong one, I'd like to think this through a little
more before actually proposing this, but any comments still welcome:

diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index c26559edc66..88b350800ce 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -979,6 +979,7 @@ create_access (tree expr, gimple *stmt, bool write)
   access->type = TREE_TYPE (expr);
   access->write = write;
   access->grp_unscalarizable_region = unscalarizable_region;
+  access->grp_same_access_path = true;
   access->stmt = stmt;
   access->reverse = reverse;

@@ -1522,6 +1523,10 @@ build_accesses_from_assign (gimple *stmt)
   racc = build_access_from_expr_1 (rhs, stmt, false);
   lacc = build_access_from_expr_1 (lhs, stmt, true);

+  bool tbaa_hazard
+= (TYPE_MAIN_VARIANT (TREE_TYPE (lhs))
+   == TYPE_MAIN_VARIANT (TREE_TYPE (rhs)));
+
   if (lacc)
 {
   lacc->grp_assignment_write = 1;
@@ -1536,6 +1541,8 @@ build_accesses_from_assign (gimple *stmt)
bitmap_set_bit (cannot_scalarize_away_bitmap,
DECL_UID (lacc->base));
}
+  if (tbaa_hazard)
+   lacc->grp_same_access_path = false;
 }

   if (racc)
@@ -1555,6 +1562,8 @@ build_accesses_from_assign (gimple *stmt)
}
   if (storage_order_barrier_p (lhs))
racc->grp_unscalarizable_region = 1;
+  if (tbaa_hazard)
+   racc->grp_same_access_path = false;
 }

   if (lacc && racc
@@ -2396,7 +2405,7 @@ sort_and_splice_var_accesses (tree var)
   bool grp_partial_lhs = access->grp_partial_lhs;
   bool first_scalar = is_gimple_reg_type (access->type);
   bool unscalarizable_region = access->grp_unscalarizable_region;
-  bool grp_same_access_path = true;
+  bool grp_same_access_path = access->grp_same_access_path;
   bool bf_non_full_precision
= (INTEGRAL_TYPE_P (access->type)
   && TYPE_PRECISION (access->type) != access->size
@@ -2432,7 +2441,8 @@ sort_and_splice_var_accesses (tree var)
  return NULL;
}

-  grp_same_access_path = path_comparable_for_same_access (access->expr);
+  if (grp_same_access_path)
+   grp_same_access_path = path_comparable_for_same_access (access->expr);

   j = i + 1;
   while (j < access_count)

[Bug c/119104] New: Unclear documentation for [[gnu::nonnull_if_nonzero]]

2025-03-03 Thread alx at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119104

Bug ID: 119104
   Summary: Unclear documentation for [[gnu::nonnull_if_nonzero]]
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alx at kernel dot org
  Target Milestone: ---

The documentation about [[gnu::nonnull]] says:

nonnull_if_nonzero
nonnull_if_nonzero (arg-index, arg2-index)

The nonnull_if_nonzero attribute is a conditional version of the nonnull
attribute. It has two arguments, the first argument shall be argument index of
a pointer argument which must be in some cases non-null and the second argument
shall be argument index of an integral argument (other than boolean). If the
integral argument is zero, the pointer argument can be null, if it is non-zero,
the pointer argument must not be null.

extern void *
my_memcpy (void *dest, const void *src, size_t len)
__attribute__((nonnull (1, 2)));
extern void *
my_memcpy2 (void *dest, const void *src, size_t len)
__attribute__((nonnull_if_nonzero (1, 3),
   nonnull_if_nonzero (2, 3)));

With these declarations, it is invalid to call my_memcpy (NULL, NULL, 0);
or to call my_memcpy2 (NULL, NULL, 4); but it is valid to call my_memcpy2
(NULL, NULL, 0);. This attribute should be used on declarations which have e.g.
an exception for zero sizes, in which case null may be passed.


It says what happens when the value is 0.  It says what happens when the value
is nonzero.  But these are rarely passed as constant expressions, so the
compiler will most of the time not be able to determine if it is zero or
nonzero.

What's the behavior when a variable that the compiler cannot know if it's zero
or not is passed?  Does it trigger the diagnostics documented for
[[gnu::nonnull]] or not?

[Bug c++/117061] Error on use of parameter in lambda outside function body

2025-03-03 Thread eczbek.void at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117061

--- Comment #4 from eczbek.void at gmail dot com ---
Constructors too :(

```
template
struct S {
S(int x) requires(requires { [x] { x; }; }) {}
};
```


```
: In lambda function:
:3:41: error: use of parameter outside function body before ';' token
[-Wtemplate-body]
3 | S(int x) requires(requires { [x] { x; }; }) {}
  | ^
```

[Bug target/118996] Should TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P return false for x86-64?

2025-03-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118996

--- Comment #16 from Hongtao Liu  ---
(In reply to Hongtao Liu from comment #14)
> (In reply to H.J. Lu from comment #13)
> > (In reply to H.J. Lu from comment #11)
> > > Created attachment 60609 [details]
> > > An untested patch
> > 
> > Hongtao, do you have SPEC CPU2017 data on this patch?
> 
> I haven't since #c9, assume the new patch fix the issue?
> I'll start a test.

No big impact(all benchmarks are in <1% change) for both -march=x86-64-v3 -O2
and -march=icelaker-server -Ofast -funroll-loops -flto on ICX.

[Bug target/119083] Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-03-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119083

--- Comment #9 from Hongtao Liu  ---
(In reply to H.J. Lu from comment #8)
> Created attachment 60647 [details]
> A patch to remove CREG and BREG from ix86_class_likely_spilled_p
> 
> Hongtao, can you measure its impact on SPEC CPU 2017?
Ok.

[Bug ipa/119009] [15 regression] AArch64: Commit 'Node clones share order' (r15-6345-g0895aef01c64c3) causes regression in Snappy workload for -mcpu=neoverse-v2 with LTO

2025-03-03 Thread mjires at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119009

--- Comment #3 from Michal Jireš  ---
Thanks a lot for the script.

I have reproduced it:
# bad3714b - before my patch
BM_UIOVecSink/0   33.8 us   33.8 us   20659 bytes_per_second=2.82508G/s html
# 0895aef0 - my patch
BM_UIOVecSink/0   41.0 us   41.0 us   16890 bytes_per_second=2.32381G/s html

However current trunk shows the opposite:
# 3605e057 - trunk
BM_UIOVecSink/0   33.7 us   33.7 us   20161 bytes_per_second=2.82955G/s html
# revert patch
BM_UIOVecSink/0   39.9 us   39.9 us   17399 bytes_per_second=2.38832G/s html

Is it still a problem on your machine with current trunk?



Perf record/report of:
snappy_benchmark --benchmark_filter=BM_UIOVecSink/0
--benchmark_min_warmup_time=5 --benchmark_time_unit=us

shows regression in functions:
  61.46% void
snappy::SnappyDecompressor::DecompressAllTags(snappy::SnappyIOVecWriter*)
 
  25.65% snappy::(anonymous namespace)::IncrementalCopy(char const*, char*,
char*, char*)

relevant symbols:
_ZN6snappy18SnappyDecompressor17DecompressAllTagsINS_17SnappyIOVecWriterEEEvPT_ 
_ZN6snappy12_GLOBAL__N_1L15IncrementalCopyEPKcPcS3_S3_
are identical outside of address changes.

Changing alignment of DecompressAllTags with asm("nop; nop") or
__attribute__((aligned(128))) removes the regression.

19,023,629  branch-misses:u # bad3714b
53,781,446  branch-misses:u # 0895aef0
The underlying problem seems to be branch misses caused by different alignment,
but I cannot pinpoint any specific instruction(s) as a source.

I am not sure we can reliably prevent this. In any case, reliable solution
would be unrelated to my patch.

[Bug fortran/103391] [12/13/14/15 Regression] ICE: gimplification failed since r7-4021-g574284e9c49687d8

2025-03-03 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103391

--- Comment #11 from anlauf at gcc dot gnu.org ---
(In reply to anlauf from comment #10)
> ChatGPT seems to be no real help.  I just tried it on comment#7, and it said:
> 
> "Conclusion
> 
> The original code is not standard-conforming because it performs intrinsic
> assignment to a pointer array, which is not allowed by the Fortran standard.
> Changing the pointer to an allocatable array resolves this issue."

Torturing ChatGPT, i.e. telling it several times that its analysis is wrong,
and giving some hints, I finally get:

"Conclusion

✅ The assignment f%a = x is standard-conforming, as long as f%a is properly
allocated before assignment.

So, my original claim that the assignment was invalid was incorrect. You were
right to question it!"

:-)

[Bug c++/119102] GCC 15.0 'import std;' fails with Ofast (not with O3) due to some openmp internal error

2025-03-03 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119102

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||c++23, rejects-valid
 Ever confirmed|0   |1
   Last reconfirmed||2025-03-03

--- Comment #2 from Jonathan Wakely  ---
(In reply to Igor Machado Coelho from comment #0)
> I know that CXX Modules and "import std;" is still quite experimental, but
> it seemed strange to generate a compiler bug, that's why I'm reporting.

Yes, thanks for reporting it - we need to fix things like this to make it less
experimental!

I see the same behaviour, so confirmed.

[Bug c++/119102] GCC 15.0 'import std;' fails with Ofast (not with O3) due to some openmp internal error

2025-03-03 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119102

--- Comment #3 from Jonathan Wakely  ---
Comparing the preprocessed source for the std.cc module definition file, I see
lots of lines with a simd attribute added when compiled with -Ofast:

--- std-O3.ii  2025-03-03 17:20:32.885607902 +
+++ std-Ofast.ii   2025-03-03 17:20:09.410578347 +
@@ -103248,49 +103248,49 @@
 # 313 "/usr/include/math.h" 2 3 4
 # 1 "/usr/include/bits/mathcalls.h" 1 3 4
 # 53 "/usr/include/bits/mathcalls.h" 3 4
- extern double acos (double __x) noexcept (true); extern double __acos (double
__x) noexcept (true);
+__attribute__ ((__simd__ ("notinbranch"))) extern double acos (double __x)
noexcept (true); extern double __acos (double __x) noexcept (true);


The attribute is added as a result of glibc  doing:

/* Get machine-dependent vector math functions declarations.  */
#include 

which does:

#if defined __x86_64__ && defined __FAST_MATH__
# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

[Bug tree-optimization/119103] New: Very suboptimal AVX2 code generation of simple shift loop

2025-03-03 Thread gcc at haasn dot dev via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119103

Bug ID: 119103
   Summary: Very suboptimal AVX2 code generation of simple shift
loop
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc at haasn dot dev
  Target Milestone: ---

== Summary ==

On x86_64 with -mavx2, GCC has a very hard time optimizing a shift by a small
unsigned unknown, even if I add knowledge that the shift amount is sufficiently
small.

In particular, GCC always chooses vpslld instead of vpsllw, and there seems to
be no way to convince it otherwise short of hand written asm or intrinsics.

See demonstration here: https://godbolt.org/z/4YobqhsG4

== Code ==

#include 

void lshift(uint16_t *x, uint8_t amount)
{
if (amount > 15)
__builtin_unreachable();

for (int i = 0; i < 16; i++)
x[i] <<= amount;
}

== Output of `gcc -O3 -mavx2 -ftree-vectorize` ==

lshift:
vmovdqu ymm1, YMMWORD PTR [rdi]
movzx   eax, sil
vmovq   xmm2, rax
vpmovzxwd   ymm0, xmm1
vextracti128xmm1, ymm1, 0x1
vpmovzxwd   ymm1, xmm1
vpslld  ymm0, ymm0, xmm2
vpslld  ymm1, ymm1, xmm2
vpxor   xmm2, xmm2, xmm2
vpblendwymm0, ymm2, ymm0, 85
vpblendwymm2, ymm2, ymm1, 85
vpackusdw   ymm0, ymm0, ymm2
vpermq  ymm0, ymm0, 216
vmovdqu YMMWORD PTR [rdi], ymm0
vzeroupper
ret

== Expected result ==

lshift:
vmovdqu ymm1, YMMWORD PTR [rdi]
movzx   esi, sil
vmovd   xmm0, esi
vpsllw  ymm0, ymm1, xmm0
vmovdqu YMMWORD PTR [rdi], ymm0
vzeroupper
ret

Compiled from:

void lshift(uint16_t *x, uint8_t amount)
{
__m256i data = _mm256_loadu_si256((__m256i *) x);
__m128i shift_amount = _mm_cvtsi32_si128(amount);
__m256i shifted = _mm256_sll_epi16(data, shift_amount);
_mm256_storeu_si256((__m256i *) x, shifted);
}

[Bug fortran/101577] [Interop] TYPE with BIND(C): Reject empty TYPE with zero components

2025-03-03 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101577

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from anlauf at gcc dot gnu.org ---
Fixed in gcc-15.

[Bug fortran/32630] [meta-bug] ISO C binding

2025-03-03 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32630
Bug 32630 depends on bug 101577, which changed state.

Bug 101577 Summary: [Interop] TYPE with BIND(C): Reject empty TYPE with zero 
components
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101577

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/119100] RISC-V: missed opportunities for vector-scalar instructions

2025-03-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100

--- Comment #2 from Jeffrey A. Law  ---
It's even more complicated than that.  You have to consider that there can be a
cost to move data across the units.  ie, it may actually be cheaper to use the
variant that broadcasts the value across a vector (vv form) rather than using a
value from the scalar int/fp register file (vf/vi forms).  It really depends on
the uarch behavior.

Profitability may also depend on how many other similar cases are nearby.  At
least in our uarch we have the concept of a "scalar source buffer" where these
values are queued up speculatively from the scalar units into a limited sized
buffer for consumption on the vector units.  If you don't fill up that buffer,
then the vf/vi forms are likely profitable, but if you fill up the buffer, then
you're going to stall various things waiting for that buffer to drain and make
entries available.

My general sense is that we probably want to default towards the vf/vi forms,
but I don't have emperical data to back that up yet.

Paul -- have you run your patch on any design?  And if so what did you run and
what was the performance delta before/after?

[Bug c/119104] Unclear documentation for [[gnu::nonnull_if_nonzero]]

2025-03-03 Thread alx at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119104

--- Comment #2 from Alejandro Colomar  ---
(In reply to Andrew Pinski from comment #1)
> Non zero and zero are runtime values of here. Rather than compile
> characteristics of that argument.
> 
> Maybe just:
> If the runtume value of the integral argument is zero, the pointer argument
> can be null; or if it is non-zero, the pointer argument must not be null.

Hi Andrew,

They are run-time properties, but the analyzer still warns about them with
[[gnu::nonnull]].  I'm worried that this new attribute might reduce the number
of diagnostics, which would be a bad thing IMO.  Indeed, I have been able to
install gcc-15 from Debian experimental, and my worries seem to confirm.


alx@debian:~/tmp$ cat foo.c 
#include 

[[gnu::nonnull]]
void f(void *);
void g(void *);
[[gnu::nonnull_if_nonzero(1, 2)]]
void h(void *, int);

int
main(int argc, char *[])
{
void *p;

p = malloc(100);
f(p);
free(p);

p = malloc(100);
g(p);
free(p);

p = malloc(100);
h(p, argc);
free(p);
}
alx@debian:~/tmp$ gcc-15 -Wall -Wextra -fanalyzer -S foo.c 
foo.c: In function ‘main’:
foo.c:15:9: warning: use of possibly-NULL ‘p’ where non-null expected [CWE-690]
[-Wanalyzer-possible-null-argument]
   15 | f(p);
  | ^~~~
  ‘main’: events 1-2
   14 | p = malloc(100);
  | ^~~
  | |
  | (1) this call could return NULL
   15 | f(p);
  |  
  | |
  | (2) ⚠️  argument 1 (‘p’) from (1) could be NULL where non-null
expected
foo.c:4:6: note: argument 1 of ‘f’ must be non-null
4 | void f(void *);
  |  ^


This is a regression for memcpy(3) et al.  There was a diagnostic with
-fanalyzer when it was marked [[gnu::nonnull]], and we're losing that with
[[gnu::nonnull_if_nonzero]].

I've been trying to convince Joseph, Aaron, and the C Committee that it was a
terrible mistake to allow a null pointer here, precisely for this worry, and it
seems my worries were correct.

[Bug fortran/103391] [12/13/14/15 Regression] ICE: gimplification failed since r7-4021-g574284e9c49687d8

2025-03-03 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103391

--- Comment #12 from Andre Vehreschild  ---
Mhhh, when one needs to know the "correct answer" to get it from an AI, what
help is the AI then?

[Bug libstdc++/119089] FAIL: 23_containers/vector/debug/assign4_backtrace_neg.cc -std=gnu++17 (test for excess errors)

2025-03-03 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119089

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #12 from Xi Ruoyao  ---
(In reply to Jonathan Wakely from comment #11)
> (In reply to John David Anglin from comment #9)
> > In addition to regenerating the gcc fixincludes, I believe gcc needs
> > rebuilding as the initializer is used in gthr.h.
> 
> I *think* it's an ABI-compatible change. At least it had better be! So
> existing code shouldn't need to be recompiled.

It seems so.  We (Linux From Scratch) have some users who complained about the
fixincludes issue but their GCC is just fine after removing the stale "fixed"
headers, without being rebuilt.

[Bug tree-optimization/119103] shift not demotated when shift amount range is known

2025-03-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119103

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org

--- Comment #4 from Hongtao Liu  ---
vect_recog_over_widening_pattern could be extended with range info for this?

[Bug tree-optimization/119103] shift not demotated when shift amount range is known

2025-03-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119103

--- Comment #5 from Hongtao Liu  ---
(In reply to Hongtao Liu from comment #4)
> vect_recog_over_widening_pattern could be extended with range info for this?

Looks like vectorizer already have range_info from
vect_determine_precisions_from_range

[Bug c/119095] GCC in Ubuntu 20.04, 22.04 and 24.04 all have this problem.

2025-03-03 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119095

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #2 from Xi Ruoyao  ---
(In reply to wzis from comment #0)

> I submitted the bug a few days ago, but I couldn't find it any more

Use the "Open bugs reported by me" link on https://gcc.gnu.org/bugzilla/.

> but I asked Ubuntu for this, they didn't recognize it.

It's their problem then, and it's not a valid reason to spam the upstream bug
tracker.

[Bug middle-end/118874] [15 regression] ICE in copy_rtx, at rtl.cc:372

2025-03-03 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118874

--- Comment #10 from Iain Sandoe  ---

In the coroutine handling to deal with 
https://eel.is/c++draft/dcl.fct.def.coroutine#7

we unconditionally create the return object in the  slot - if we create
it somewhere else, that causes us to produce an unexpected additional copy.

So, I suppose, the difference is the unconditional use (it's not clear to me at
the moment how to avoid that - the intent (AFAIU) is that the return object is
available to the coroutine body (including initial suspend)).

[Bug c++/119102] GCC 15.0 'import std;' fails with Ofast (not with O3) due to some openmp internal error

2025-03-03 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119102

--- Comment #4 from Jonathan Wakely  ---
We're taking the non-OpenMP branch there, but the error from GCC seems to be
incorrectly referring to OpenMP.

The docs for attribute simd say:

If the attribute is specified and #pragma omp declare simd is present on a
declaration and the -fopenmp or -fopenmp-simd switch is specified, then the
attribute is ignored.

[Bug tree-optimization/119103] Very suboptimal AVX2 code generation of simple shift loop

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119103

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Severity|normal  |enhancement
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-03-03

--- Comment #2 from Andrew Pinski  ---
  # RANGE [irange] int [0, 65535] MASK 0x VALUE 0x0
  _5 = (intD.6) _4;
  # RANGE [irange] int [0, 15] MASK 0xf VALUE 0x0
  _6 = (intD.6) amount_11(D);
  # RANGE [irange] int [0, 2147450880] MASK 0x7fff VALUE 0x0
  _7 = _5 << _6;
  _8 = (short unsigned intD.18) _7;


That should be able to reduce down to just:
_8 = _4 << _6;

Since _6 has a range for [0,15] so we know it is defined.

I suspect once that happens the other part will be optimized.

[Bug tree-optimization/119103] shift not demotated when shift amount range is known

2025-03-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119103

Andrew Pinski  changed:

   What|Removed |Added

Summary|Very suboptimal AVX2 code   |shift not demotated when
   |generation of simple shift  |shift amount range is known
   |loop|

--- Comment #3 from Andrew Pinski  ---
RTL handles &0xf but if the range is there we don't optimize it:
E.g. it can be shown by:
```
#include 

void lshift(uint16_t *x, uint8_t amount)
{
  x[0] = x[0] << (amount&0xF);
}
void lshift1(uint16_t *x, uint8_t amount)
{
  if (amount >= 16) 
__builtin_unreachable();
  x[0] = x[0] << (amount&0xF);
}
```

[Bug fortran/101577] [Interop] TYPE with BIND(C): Reject empty TYPE with zero components

2025-03-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101577

--- Comment #2 from GCC Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:f9f16b9f74b767ca799a82f25be66a5fed25756d

commit r15-7798-gf9f16b9f74b767ca799a82f25be66a5fed25756d
Author: Harald Anlauf 
Date:   Sun Mar 2 22:20:28 2025 +0100

Fortran: reject empty derived type with bind(C) attribute [PR101577]

PR fortran/101577

gcc/fortran/ChangeLog:

* symbol.cc (verify_bind_c_derived_type): Generate error message
for derived type with no components in standard conformance mode,
indicating that this is a GNU extension.

gcc/testsuite/ChangeLog:

* gfortran.dg/empty_derived_type.f90: Adjust dg-options.
* gfortran.dg/empty_derived_type_2.f90: New test.

  1   2   >