[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533

--- Comment #18 from Richard Biener  ---
With -fipa-pta we add

+t.c:28:24: optimized: loop with 5 iterations completely unrolled (header
execution count 43151276)
+t.c:30:16: optimized: loop turned into non-loop; it never loops
...
-t.c:28:24: optimized: loop with 4 iterations completely unrolled (header
execution count 107374186)
+t.c:36:11: optimized: basic block part vectorized using 32 byte vectors
+t.c:36:11: optimized: basic block part vectorized using 8 byte vectors

the testcase still breaks when adding -fdisable-tree-cunroll
-fno-tree-loop-vectorize, then the only change is

+t.c:36:11: optimized: basic block part vectorized using 16 byte vectors
+t.c:36:11: optimized: basic block part vectorized using 8 byte vectors

when failing we have

t.c:36:11: missed:   can't determine dependence between *_65 and *ad_68
t.c:36:11: note:  removing SLP instance operations starting from: *_65 = _66;

w/o IPA PTA we have

  # PT = nonlocal escaped null
  _65 = a.6_13 + _64;
  # PT = nonlocal escaped
  ad_68 = ad_205 + 4;

with IPA PTA

  # PT = null { D.4063 D.4066 } (nonlocal, escaped, escaped heap)
  _65 = a.6_13 + _64;
  # PT = { D.4062 D.4065 } (nonlocal, escaped, escaped heap)
  ad_68 = ad_205 + 4;

[Bug rtl-optimization/951] Documentation of compiler passes and sources very out of date

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=951

--- Comment #17 from Richard Biener  ---
There's definitely passes missing and the initial section of how the
compilation flows until the pass manager takes over needs work.  I'd say we
keep this as a general bug that the passes section needs TLC.

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-24

--- Comment #6 from Richard Biener  ---
Mine.

[Bug middle-end/115528] [15 regression] segmentation fault in legacy F77 code since r15-1238-g1fe55a1794863b

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115528

--- Comment #29 from Richard Biener  ---
(In reply to Jürgen Reuter from comment #28)
> Richard, unfortunately the fix (it seems it was committed to gcc git master
> on last Friday) did not fix our problem yet. The original test case still
> segfaults:
> Backtrace for this error:
> #0  0x7f36f52a3a6c in ???
> #1  0x7f36f52a2b45 in ???
> #2  0x7f36f4fe204f in ???
> #3  0x7f36f5c9b323 in curr_
>   at ../../../contrib/tauola/formf.f:501
> #4  0x7f36f5cabc16 in dam4pi_
>   at ../../../contrib/tauola/tauola.f:4106
> #5  0x7f36f5cacdb6 in dph4pi_
>   at ../../../contrib/tauola/tauola.f:4067
> #6  0x7f36f5cb1330 in dadnew_
>   at ../../../contrib/tauola/tauola.f:3667
> #7  0x7f36f5cb167c in dexnew_
>   at ../../../contrib/tauola/tauola.f:3592
> #8  0x7f36f5cb7a4d in dexay_
>   at ../../../contrib/tauola/tauola.f:525
> #9  0x7f36f5cb9e3a in initdk_
>   at ../../../contrib/tauola/tauola_photos_ini.f:452
> #10  0x7f36f5aeebbc in __tauola_interface_MOD_wo_tauola_init_call
>   at ../../../src/tauola/tauola_interface_sub.f90:903

The fix wasn't yet committed.

[Bug tree-optimization/115599] ICE: qsort checking failed during GIMPLE pass: reassoc (error: qsort comparator non-negative on sorted output: 150142972)

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115599

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:ae13af26060eb686418ea9c9d455cd665049402d

commit r15-1577-gae13af26060eb686418ea9c9d455cd665049402d
Author: Richard Biener 
Date:   Sun Jun 23 14:37:53 2024 +0200

tree-optimization/115599 - reassoc qsort comparator issue

The compare_repeat_factors comparator fails qsort checking eventually
because it uses rf2->rank - rf1->rank to compare unsigned numbers
which causes issues for ranks that interpret negative as signed.

Fixed by re-writing the obvious way.  I've also fixed the count
comparison which suffers from truncation as count is 64bit signed
while the comparator result is 32bit int (that's a lot less likely
to hit in practice though).

The testcase from the PR is too large to include.

PR tree-optimization/115599
* tree-ssa-reassoc.cc (compare_repeat_factors): Use explicit
compares to avoid truncations.

[Bug tree-optimization/115599] ICE: qsort checking failed during GIMPLE pass: reassoc (error: qsort comparator non-negative on sorted output: 150142972)

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115599

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Richard Biener  ---
Fixed.

[Bug middle-end/82407] [meta-bug] qsort_chk fallout tracking

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82407
Bug 82407 depends on bug 115599, which changed state.

Bug 115599 Summary: ICE: qsort checking failed during GIMPLE pass: reassoc 
(error: qsort comparator non-negative on sorted output: 150142972)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115599

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org
   Keywords||missed-optimization

--- Comment #7 from Richard Biener  ---
Ah, interesting - somehow we managed to create a self-referencing cycle!?

t.c:13:6: note: node 0x4ab34e0 (max_nunits=1, refcnt=3) vector(2) double
t.c:13:6: note: op: VEC_PERM_EXPR
t.c:13:6: note: stmt 0 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: stmt 1 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: lane permutation { 0[1] 0[0] }
t.c:13:6: note: children 0x4ab34e0

that's because this permute is the same as the load that was originally
feeding it:

t.c:13:6: note: node 0x4ab3690 (max_nunits=2, refcnt=1) vector(2) double
t.c:13:6: note: op template: _11 = gvevent_motion_job.zoom;
t.c:13:6: note: stmt 0 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: stmt 1 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: load permutation { 2 2 }

that's a missed optimization caused by the SLP optimize pass which inserts
this permute as compensation.

Richard - can you look where to best see that uniform nodes (through a
uniform load permute) do not require a permute?

I'll see to somehow make CSE robust against this.

[Bug rtl-optimization/106594] [13/14/15 Regression] sign-extensions no longer merged into addressing mode

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #29 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
  ...unconditional use of gen_reg_rtx ()...;
}

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

"&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems better to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target Tests   GoodBad   %Good   Delta  Median
== =   ===   =   =  ==
aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
arc-elf 2166   1932234  89.20%  -37742  -1
arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
avr-elf 4789   4330459  90.42% -441276  -4
bfin-elf2795   2394401  85.65%  -19252  -1
bpf-elf 3122   2928194  93.79%   -8785  -1
c6x-elf 2227   1929298  86.62%  -17339  -1
cris-elf3464   3270194  94.40%  -23263  -2
csky-elf2915   2591324  88.89%  -22146  -1
epiphany-elf2399   2304 95  96.04%  -28698  -2
fr30-elf7712   7299413  94.64%  -99830  -2
frv-linux-gnu   3332   2877455  86.34%  -25108  -1
ft32-elf2775   2667108  96.11%  -25029  -1
h8300-elf   3176   2862314  90.11%  -29305  -2
hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
iq2000-elf  9684   9637 47  99.51% -126557  -2
lm32-elf2681   2608 73  97.28%  -59884  -3
loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
m32r-

[Bug rtl-optimization/114575] [15 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575

--- Comment #4 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
  ...unconditional use of gen_reg_rtx ()...;
}

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

"&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems better to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target Tests   GoodBad   %Good   Delta  Median
== =   ===   =   =  ==
aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
arc-elf 2166   1932234  89.20%  -37742  -1
arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
avr-elf 4789   4330459  90.42% -441276  -4
bfin-elf2795   2394401  85.65%  -19252  -1
bpf-elf 3122   2928194  93.79%   -8785  -1
c6x-elf 2227   1929298  86.62%  -17339  -1
cris-elf3464   3270194  94.40%  -23263  -2
csky-elf2915   2591324  88.89%  -22146  -1
epiphany-elf2399   2304 95  96.04%  -28698  -2
fr30-elf7712   7299413  94.64%  -99830  -2
frv-linux-gnu   3332   2877455  86.34%  -25108  -1
ft32-elf2775   2667108  96.11%  -25029  -1
h8300-elf   3176   2862314  90.11%  -29305  -2
hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
iq2000-elf  9684   9637 47  99.51% -126557  -2
lm32-elf2681   2608 73  97.28%  -59884  -3
loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
m32r-e

[Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

--- Comment #13 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
  ...unconditional use of gen_reg_rtx ()...;
}

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

"&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems better to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target Tests   GoodBad   %Good   Delta  Median
== =   ===   =   =  ==
aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
arc-elf 2166   1932234  89.20%  -37742  -1
arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
avr-elf 4789   4330459  90.42% -441276  -4
bfin-elf2795   2394401  85.65%  -19252  -1
bpf-elf 3122   2928194  93.79%   -8785  -1
c6x-elf 2227   1929298  86.62%  -17339  -1
cris-elf3464   3270194  94.40%  -23263  -2
csky-elf2915   2591324  88.89%  -22146  -1
epiphany-elf2399   2304 95  96.04%  -28698  -2
fr30-elf7712   7299413  94.64%  -99830  -2
frv-linux-gnu   3332   2877455  86.34%  -25108  -1
ft32-elf2775   2667108  96.11%  -25029  -1
h8300-elf   3176   2862314  90.11%  -29305  -2
hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
iq2000-elf  9684   9637 47  99.51% -126557  -2
lm32-elf2681   2608 73  97.28%  -59884  -3
loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
m32r-

[Bug rtl-optimization/115104] [15 Regression] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104

--- Comment #6 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
  ...unconditional use of gen_reg_rtx ()...;
}

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

"&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems better to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target Tests   GoodBad   %Good   Delta  Median
== =   ===   =   =  ==
aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
arc-elf 2166   1932234  89.20%  -37742  -1
arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
avr-elf 4789   4330459  90.42% -441276  -4
bfin-elf2795   2394401  85.65%  -19252  -1
bpf-elf 3122   2928194  93.79%   -8785  -1
c6x-elf 2227   1929298  86.62%  -17339  -1
cris-elf3464   3270194  94.40%  -23263  -2
csky-elf2915   2591324  88.89%  -22146  -1
epiphany-elf2399   2304 95  96.04%  -28698  -2
fr30-elf7712   7299413  94.64%  -99830  -2
frv-linux-gnu   3332   2877455  86.34%  -25108  -1
ft32-elf2775   2667108  96.11%  -25029  -1
h8300-elf   3176   2862314  90.11%  -29305  -2
hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
iq2000-elf  9684   9637 47  99.51% -126557  -2
lm32-elf2681   2608 73  97.28%  -59884  -3
loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
m32r-e

[Bug rtl-optimization/114996] [15 Regression] [RISC-V] 2->2 combination no longer occurring

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114996

--- Comment #7 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.

This is just a first step.  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.

Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

  (define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
  ...unconditional use of gen_reg_rtx ()...;
}

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.

xtensa has a variation in which the split condition is:

"&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems better to add the pass first, though,
to make it easier to test any such fixes.

gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output.  That might be
worth doing, but it seems too complex to do as part of this patch.

I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite.  This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark.  All targets seemed to improve on average:

Target Tests   GoodBad   %Good   Delta  Median
== =   ===   =   =  ==
aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
arc-elf 2166   1932234  89.20%  -37742  -1
arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
avr-elf 4789   4330459  90.42% -441276  -4
bfin-elf2795   2394401  85.65%  -19252  -1
bpf-elf 3122   2928194  93.79%   -8785  -1
c6x-elf 2227   1929298  86.62%  -17339  -1
cris-elf3464   3270194  94.40%  -23263  -2
csky-elf2915   2591324  88.89%  -22146  -1
epiphany-elf2399   2304 95  96.04%  -28698  -2
fr30-elf7712   7299413  94.64%  -99830  -2
frv-linux-gnu   3332   2877455  86.34%  -25108  -1
ft32-elf2775   2667108  96.11%  -25029  -1
h8300-elf   3176   2862314  90.11%  -29305  -2
hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
iq2000-elf  9684   9637 47  99.51% -126557  -2
lm32-elf2681   2608 73  97.28%  -59884  -3
loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
m32r-e

[Bug c++/115609] New: Wrongly numbered ‘auto’ in diagnostics when used as template argument of a function

2024-06-24 Thread jwo at jwo dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115609

Bug ID: 115609
   Summary: Wrongly numbered ‘auto’ in diagnostics when used as
template argument of a function
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jwo at jwo dot cz
  Target Milestone: ---

Created attachment 58503
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58503&action=edit
demonstration of the bug

For functions declared as ‘void whats_this(foo arg)’, it is possible to
cause error message with misnumbered types substituted for ‘auto’:

> example.cpp: In instantiation of ‘void whats_this(foo) [with auto:2 = 
> void]’:
> [...]

In the message above, ‘auto:2’ should be ‘auto:1’.

This is caused by placing function declarations with ‘auto’ used as a template
parameter above functions causing diagnostics.

Complete example with diagnostic message is provided in an attachment. No
preprocessing is required.


Tested with GCC commit fd536b8412d4dae42aa04739c06f99a915be6261 (the latest at
the time of writing).

Tested with GCC 13.2.1_p20240210.

[Bug target/115608] ICE in extract_insn, at recog.cc:2812 when building with -mv8plus

2024-06-24 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115608

Eric Botcazou  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-24
 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING

--- Comment #3 from Eric Botcazou  ---
You always need to specify how the compiler has been configured.

[Bug rtl-optimization/115261] [11/12/13/14/15 regression] FAIL: gcc.target/s390/vector/vec-abi-vararg-1.c

2024-06-24 Thread stefansf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115261

Stefan Schulze Frielinghaus  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Stefan Schulze Frielinghaus  
---
Fixed on mainline.

[Bug rtl-optimization/106594] [13/14/15 Regression] sign-extensions no longer merged into addressing mode

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

Richard Sandiford  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #30 from Richard Sandiford  ---
Fixed on trunk.

[Bug rtl-optimization/114515] [15 Regression] Failure to use aarch64 lane forms after PR101523

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515

Richard Sandiford  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Sandiford  ---
Fixed.

[Bug rtl-optimization/8537] Optimizer Removes Code Necessary for Security

2024-06-24 Thread divinity76 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537

Hans Henrik Bergan  changed:

   What|Removed |Added

 CC||divinity76 at gmail dot com

--- Comment #6 from Hans Henrik Bergan  ---
just for future reference, in >=C11 we have memset_s for this particular issue,
and in >=C23 we also have memset_explicit, both of which are not allowed to be
optimized out

[Bug target/115610] New: -flate-combine disabled by default for x86 port

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115610

Bug ID: 115610
   Summary: -flate-combine disabled by default for x86 port
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
CC: crazylht at gmail dot com, hubicka at gcc dot gnu.org,
ubizjak at gmail dot com
  Target Milestone: ---
Target: i?86-*-* x86_64-*-*

The late-combine pass is disabled by default for x86:

  /* Late combine tends to undo some of the effects of STV and RPAD,
 by combining instructions back to their original form.  */
  if (!OPTION_SET_P (flag_late_combine_instructions))
flag_late_combine_instructions = 0;

To give more details, from an earlier version of the pass:


For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI
(\
"d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.


[Bug target/115610] -flate-combine disabled by default for x86 port

2024-06-24 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115610

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org
   Last reconfirmed||2024-06-24
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #1 from Hongtao Liu  ---
Thanks, I'll take a look.

[Bug target/115519] s390 fallout from removing vcond{,u,eq} patterns

2024-06-24 Thread stefansf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115519

--- Comment #2 from Stefan Schulze Frielinghaus  
---
Just saw on the ML that a match.pd fix already exists
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655484.html

A quick test shows that this fixes vcond-shift.c where we now emit

  ((int) ((unsigned int) xx >> 31) + xx) >> 1

and previously

  (xx - (xx >> 31)) >> 1

which is basically the same.  We just have to adapt the times for
scan-assembler w.r.t. signed/unsigned shifts:

diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-shift.c
b/gcc/testsuite/gcc.target/s390/vector/vcond-shift.c
index a6b4e97aa50..b942f44039d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vcond-shift.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vcond-shift.c
@@ -3,13 +3,13 @@
 /* { dg-do compile { target { s390*-*-* } } } */
 /* { dg-options "-O3 -march=z13 -mzarch" } */

-/* { dg-final { scan-assembler-times "vesraf\t%v.?,%v.?,31" 6 } } */
-/* { dg-final { scan-assembler-times "vesrah\t%v.?,%v.?,15" 6 } } */
-/* { dg-final { scan-assembler-times "vesrab\t%v.?,%v.?,7" 6 } } */
+/* { dg-final { scan-assembler-times "vesraf\t%v.?,%v.?,31" 4 } } */
+/* { dg-final { scan-assembler-times "vesrah\t%v.?,%v.?,15" 4 } } */
+/* { dg-final { scan-assembler-times "vesrab\t%v.?,%v.?,7" 4 } } */
 /* { dg-final { scan-assembler-not "vzero\t*" } } */
-/* { dg-final { scan-assembler-times "vesrlf\t%v.?,%v.?,31" 4 } } */
-/* { dg-final { scan-assembler-times "vesrlh\t%v.?,%v.?,15" 4 } } */
-/* { dg-final { scan-assembler-times "vesrlb\t%v.?,%v.?,7" 4 } } */
+/* { dg-final { scan-assembler-times "vesrlf\t%v.?,%v.?,31" 6 } } */
+/* { dg-final { scan-assembler-times "vesrlh\t%v.?,%v.?,15" 6 } } */
+/* { dg-final { scan-assembler-times "vesrlb\t%v.?,%v.?,7" 6 } } */

 /* Make it expand to two vector operations.  */
 #define ITER(X) (2 * (16 / sizeof (X[1])))

[Bug target/115611] New: mve: vsetq_lane for 64-bits has wrong codegen when setting lane 1

2024-06-24 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115611

Bug ID: 115611
   Summary: mve: vsetq_lane for 64-bits has wrong codegen when
setting lane 1
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

When compiling:
$ cat t.s
#include 

int64x2_t fn (int64x2_t v, int64_t a)
{
return vsetq_lane_s64(a, v, 1);
}

compiled with:
$ gcc -O2 -mfloat-abi=hard -mcpu=cortex-m85 t.c -S

yields:
fn:
vmovd1, r2, r3
bx  lr

The r2, r3 are pointing to an undefined register, it should have been r0, r1.

This is due to an issue with the printing operands in mve_vec_setv2di_internal.
I have a patch for this.

[Bug target/115612] New: powerpc: define_insn_and_splits calling gen_reg_rtx unconditionally

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115612

Bug ID: 115612
   Summary: powerpc: define_insn_and_splits calling gen_reg_rtx
unconditionally
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
CC: dje at gcc dot gnu.org, linkw at gcc dot gnu.org, segher at 
gcc dot gnu.org
  Target Milestone: ---
Target: powerpc*-*-*

The late-combine pass is disabled by default for rs6000:

  /* One of the late-combine passes runs after register allocation
 and can match define_insn_and_splits that were previously used
 only before register allocation.  Some of those define_insn_and_splits
 use gen_reg_rtx unconditionally.  Disable late-combine by default
 until the define_insn_and_splits are fixed.  */
  if (!OPTION_SET_P (flag_late_combine_instructions))
flag_late_combine_instructions = 0;

For example, compiling gcc.c-torture/compile/20021001-1.c with -Os
-flate-combine-instructions results in:

0x9c34cb gen_reg_rtx(machine_mode)
   .././src/gcc/emit-rtl.cc:1177
0x1b248a7 gen_split_452(rtx_insn*, rtx_def**)
   .././src/gcc/config/rs6000/rs6000.md:13257
0x1c1c783 split_17
   .././src/gcc/config/rs6000/rs6000.md:13254
0x1c1c783 split_insns(rtx_def*, rtx_insn*)
   .././src/gcc/config/rs6000/rs6000.md:12707
0x9c8847 try_split(rtx_def*, rtx_insn*, int)
   .././src/gcc/emit-rtl.cc:3941
0xe58963 split_insn
   .././src/gcc/recog.cc:3409
0xe5e3ef split_all_insns()
   .././src/gcc/recog.cc:3513
0xe5e5cb execute
   .././src/gcc/recog.cc:4482

due to:

(define_insn_and_split "*_cc"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(fp_rev:GPR (match_operand:CCFP 1 "cc_reg_operand" "y")
(const_int 0)))]
  "!flag_finite_math_only"
  "#"
  "&& 1"
  [(pc)]
{
  rtx_code revcode = reverse_condition_maybe_unordered ();
  rtx eq = gen_rtx_fmt_ee (revcode, mode, operands[1], const0_rtx);
  rtx tmp = gen_reg_rtx (mode);
  emit_move_insn (tmp, eq);
  emit_insn (gen_xor3 (operands[0], tmp, const1_rtx));
  DONE;
}
  [(set_attr "length" "12")])

The instruction can be matched before or after RA, but the split only works
before RA.

It looked from a quick scan like there were a few other instances of this
(although the vast majority of define_insn_and_splits are written to work both
before and after RA).

[Bug target/115613] New: xtensa: splits dependent on can_create_pseudo_p

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115613

Bug ID: 115613
   Summary: xtensa: splits dependent on can_create_pseudo_p
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
CC: jcmvbkbc at gcc dot gnu.org
  Target Milestone: ---
Target: xtensa-*-*

The late-combine pass is disabled by default for xtensa:

  /* One of the late-combine passes runs after register allocation
 and can match define_insn_and_splits that were previously used
 only before register allocation.  Some of those define_insn_and_splits
 require the split to take place, but have a split condition of
 can_create_pseudo_p, and so matching after RA will give an
 unsplittable instruction.  Disable late-combine by default until
 the define_insn_and_splits are fixed.  */
  if (!OPTION_SET_P (flag_late_combine_instructions))
flag_late_combine_instructions = 0;

For example, compiling gcc.c-torture/compile/bx.c with -Os -flate-combine
gives:

(insn 53 21 27 4 (set (reg/i:SI 2 a2)
(and:SI (not:SI (reg:SI 8 a8 [orig:44 _10 ] [44]))
(reg:SI 2 a2 [55])))
"/home/ricsan01/gcc/git/gcc/gcc/testsuite/gcc.c-torture/compile/bx.c":5:1 36
{*andsi3_bitcmpl}
 (expr_list:REG_DEAD (reg:SI 8 a8 [orig:44 _10 ] [44])
(nil)))
during RTL pass: final
.../gcc.c-torture/compile/bx.c:5:1: internal compiler error: in
final_scan_insn_1, at final.cc:2807
0xe331b7 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   .././src/gcc/rtl-error.cc:108
0x9e5667 final_scan_insn_1
   .././src/gcc/final.cc:2807
0x9e5877 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
   .././src/gcc/final.cc:2886
0x9e659f final_1
   .././src/gcc/final.cc:1977
0x9e689f rest_of_handle_final
   .././src/gcc/final.cc:4239
0x9e689f execute
   .././src/gcc/final.cc:4317

The associated define_insn_and_split is:

(define_insn_and_split "*andsi3_bitcmpl"
  [(set (match_operand:SI 0 "register_operand" "=a")
(and:SI (not:SI (match_operand:SI 1 "register_operand" "r"))
(match_operand:SI 2 "register_operand" "r")))]
  ""
  "#"
  "&& can_create_pseudo_p ()"
  [(set (match_dup 3)
(and:SI (match_dup 1)
(match_dup 2)))
   (set (match_dup 0)
(xor:SI (match_dup 3)
(match_dup 2)))]
{
  operands[3] = gen_reg_rtx (SImode);
}
  [(set_attr "type" "arith")
   (set_attr "mode" "SI")
   (set_attr "length"   "6")])

The define_insn can be matched before or after register allocation, but the "&&
can_create_pseudo_p ()" condition means that the split can only happen before
RA.  And the "#" means that the split must happen: there is no assembly
fallback.  Matching the define_insn after register allocation therefore leads
to an ICE.

See: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655446.html for more
discussion.

[Bug target/115608] ICE in extract_insn, at recog.cc:2812 when building with -mv8plus

2024-06-24 Thread glaubitz at physik dot fu-berlin.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115608

--- Comment #4 from John Paul Adrian Glaubitz  ---
(In reply to Eric Botcazou from comment #3)
> You always need to specify how the compiler has been configured.

Here you go:

(sid_sparc64-dchroot)glaubitz@stadler:~$ gcc-14 -v
Using built-in specs.
COLLECT_GCC=gcc-14
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/sparc64-linux-gnu/14/lto-wrapper
Target: sparc64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 14.1.0-2'
--with-bugurl=file:///usr/share/doc/gcc-14/README.Bugs
--enable-languages=c,ada,c++,go,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-14
--program-prefix=sparc64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace
--enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support
--enable-plugin --enable-default-pie --with-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-cpu-32=ultrasparc
--enable-targets=all --with-long-double-128 --enable-multilib
--enable-checking=release --build=sparc64-linux-gnu --host=sparc64-linux-gnu
--target=sparc64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.1.0 (Debian 14.1.0-2) 
(sid_sparc64-dchroot)glaubitz@stadler:~$

[Bug c++/115614] New: Invalid (?) template substitution on variadic constrained packs

2024-06-24 Thread cohenarthur at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115614

Bug ID: 115614
   Summary: Invalid (?) template substitution on variadic
constrained packs
   Product: gcc
   Version: 14.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cohenarthur at gcc dot gnu.org
  Target Milestone: ---

GCC 14.1 rejects code that has been accepted by clang since version 15, which
makes me think it should be accepted. 

```
#include 

#include 
#include 
#include 

template 
struct l {};

template 
struct c {
  using type = T;
};

template 
concept e = true;

template 
auto subs0(std::index_sequence,
   std::index_sequence,
   std::index_sequence) {
  return []... prefix,
e... infix,
e... suffix>
  (prefix..., infix..., suffix...) {
return l{};
  };
}

template 
auto subs(l ls) {
  return subs0(std::make_index_sequence{},
   std::make_index_sequence{},
   std::make_index_sequence{}
  )(c{}...);
}

template 
using sub = decltype(subs(Ts{}));

void x(sub<0, 3, l>) {}
void x(sub<0, 2, l>) {}
void x(sub<1, 2, l>) {}
void x(sub<1, 3, l>) {}
```

clang shows no errors, whereas g++ outputs the following:

```
: In instantiation of 'auto subs(l) [with long unsigned int a =
0; long unsigned int b = 3; Ts = {int, int*, int**}]':
:39:32:   required by substitution of 'template using sub = decltype (subs(Ts{})) [with
long unsigned int a = 0; long unsigned int b = 3; Ts = l]'
   39 | using sub = decltype(subs(Ts{}));
  |  ~~^~
:41:36:   required from here
   41 | void x(sub<0, 3, l>) {}
  |^~
:35:16: error: no match for call to '(subs0<, 0, 1,
2>(std::index_sequence<>, std::index_sequence<0, 1, 2>,
std::index_sequence<>)::) (c,
c, c)'
   32 |   return subs0(std::make_index_sequence{},
  |  
   33 |std::make_index_sequence{},
  |~~
   34 |std::make_index_sequence{}
  |~
   35 |   )(c{}...);
  |   ~^~~~
:22:10: note: candidate: 'template  requires ((... && e)) && ((... && e))
&& ((... && e)) subs0<, 0, 1, 2>(std::index_sequence<>,
std::index_sequence<0, 1, 2>, std::index_sequence<>)::'
   22 |   return []... prefix,
  |  ^
:22:10: note:   template argument deduction/substitution failed:
:22:10: note: constraints not satisfied
: In substitution of 'template  requires ((... && e)) && ((... && e)) &&
((... && e)) subs0<, 0, 1, 2>(std::index_sequence<>,
std::index_sequence<0, 1, 2>, std::index_sequence<>):: [with prefix = {}; infix = {0, 1, 2}; suffix = {}]':
:35:16:   required from 'auto subs(l) [with long unsigned int a
= 0; long unsigned int b = 3; Ts = {int, int*, int**}]'
   32 |   return subs0(std::make_index_sequence{},
  |  
   33 |std::make_index_sequence{},
  |~~
   34 |std::make_index_sequence{}
  |~
   35 |   )(c{}...);
  |   ~^~~~
:39:32:   required by substitution of 'template using sub = decltype (subs(Ts{})) [with
long unsigned int a = 0; long unsigned int b = 3; Ts = l]'
   39 | using sub = decltype(subs(Ts{}));
  |  ~~^~
:41:36:   required from here
   41 | void x(sub<0, 3, l>) {}
  |^~
:22:10:   required by the constraints of 'template template  requires ((... && e)) &&
((... && e)) && ((... && e)) subs0(std::index_sequence, std::index_sequence, std::index_sequence)::'
:35:16: error: mismatched argument pack lengths while expanding
'e'
   32 |   return subs0(std::make_index_sequence{},
  |  
   33 |std::make_index_sequence{},
  |~~
   34 |std::make_index_sequence{}
  |~
   35 |   )(c{}...);
  |   ~^~~~
: In instantiation of 'auto subs(l) [with long unsigned int a =
0; long unsigned int b = 2; Ts = {int, int*, int**}]':
:39:32:   required by substitution of 'template using sub = decltype (subs(Ts{})) [with
long unsigned int a = 0; long unsigned int b = 2; Ts = l]'
   39 | using sub = decltype(subs(Ts{}));
  |  ~~^~
:42:36:   required from here
   42 | void x(sub<0, 2, l>) {}
  |^~
:35:16: error: no match for call to '(subs0<, 0, 1,
0>(std::index_sequenc

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #8 from David Binderman  ---
Created attachment 58504
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58504&action=edit
C++ source code

A C++ example. -O3 required. From project box2d.

cvise $ ~/gcc/results/bin/gcc -c -O3 bug1039.cc
during GIMPLE pass: slp
bug1039.cc: In function ‘void b2Distance()’:
bug1039.cc:27:6: internal compiler error: in vect_schedule_slp_node, at
tree-vect-slp.cc:9644
   27 | void b2Distance() {
  |  ^~
0x14c966b vect_schedule_slp_node(vec_info*, _slp_tree*, _slp_instance*)
/home/dcb40b/gcc/working/gcc/../../trunk/gcc/tree-vect-slp.cc:9643

[Bug target/115591] internal error on global variable-length array

2024-06-24 Thread simon at pushface dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115591

--- Comment #4 from simon at pushface dot org ---
bug.adb compiles without error after applying the patch.

[Bug tree-optimization/115033] [12/13/14/15 Regression] Incorrect optimization of by-reference closure fields by fre1 pass since r12-5113-gd70ef65692fced

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115033

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #17 from Richard Biener  ---
  int resultIsStatic = 1;
  func t ={&resultIsStatic};
  map_to_vector(&t);

  if (resultIsStatic)

we disambiguate the call against the load via modref data.  The call summary
for early modref at -O1 is

  loads:
  Base 0: alias set 0
Ref 0: alias set 0
  access: Parm 0 param offset:0 offset:0 size:64 max_size:64
  stores:
Every base
  Global memory written
  parm 0 flags: no_direct_clobber no_indirect_clobber no_direct_escape
no_indirect_escape no_indirect_read

where obviously no_{indirect,}_clobber isn't correct(?).  Note this seems
to result in PTA computing a stmt call clobber set that's

$20 = {anything = 0, nonlocal = 1, escaped = 1, ipa_escaped = 0, null = 0, 
  const_pool = 0, vars_contains_nonlocal = 0, vars_contains_escaped = 0, 
  vars_contains_escaped_heap = 0, vars_contains_restrict = 0, 
  vars_contains_interposable = 0, vars = 0x771b7b80}

which relies on us having 't' as escaped, but it's not (due to modref).

While modref analysis doesn't do the disambiguation the last resort check
of the call clobber set does.  In particular we create constaints:

t = &resultIsStatic
callescape(12) = &NONLOCAL
CALLUSED(13) = callescape(12)
callarg(15) = &t
callarg(15) = callarg(15) + UNKNOWN
CALLUSED(13) = callarg(15)
resultIsStatic.0_1 = resultIsStatic

so points-to doesn't consider anything call clobbered or escaped.

In particular this looks at gimpl_call_arg_flags which is computed to
EAF_NO_INDIRECT_READ | EAF_NO_DIRECT_ESCAPE | EAF_NO_INDIRECT_ESCAPE
| EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER which again is wrong
since indirect clobber should be here.

I think the analysis of map_to_vector goes wrong:

 - Analyzing store: t
   - Read-only or local, ignoring.
 - Analyzing load: *F_2(D)
   - Recording base_set=0 ref_set=0  Parm 0 param offset:0 offset:0 size:64
max_size:64
 - Analyzing call:t = map_iterator (*F_2(D));
 - ECF_CONST | ECF_NOVOPS, ignoring all stores and all loads except for args.

 - Analyzing call:ff (&t.F);
 - Merging side effects of ff/0
   Parm map: -5
 - Analyzing store: t
   - Read-only or local, ignoring.

missing that the t = map_iterator stmt copies *F to t, making t to contain
references to non-local vars which ff indirectly clobbers.


Honza?

[Bug middle-end/115528] [15 regression] segmentation fault in legacy F77 code since r15-1238-g1fe55a1794863b

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115528

--- Comment #30 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:2f83ea87ee328d337f87d4430861221be9babe1e

commit r15-1582-g2f83ea87ee328d337f87d4430861221be9babe1e
Author: Richard Biener 
Date:   Fri Jun 21 13:19:26 2024 +0200

tree-optimization/115528 - fix vect alignment analysis for outer loop vect

For outer loop vectorization of a data reference in the inner loop
we have to look at both steps to see if they preserve alignment.

What is special for this testcase is that the outer loop step is
one element but the inner loop step four and that we now use SLP
and the vectorization factor is one.

PR tree-optimization/115528
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Make sure to look at both the inner and outer loop step
behavior.

* gfortran.dg/vect/pr115528.f: New testcase.

[Bug middle-end/115528] [15 regression] segmentation fault in legacy F77 code since r15-1238-g1fe55a1794863b

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115528

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #31 from Richard Biener  ---
Fixed.

[Bug target/115485] [11/12/13/14/15 Regression] CASEServer.cpp:203:1: internal compiler error: in require_pic_register, at config/arm/arm.c:7855

2024-06-24 Thread gang.peng at aclsemi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115485

--- Comment #16 from Gang Peng  ---
(In reply to Andrew Pinski from comment #15)
> Most likely r7-1945-gb88bd5e0ca1208 introduced/exposed the ICE. It changes
> the behavior of -mno-pic-data-is-text-relative but adding -msingle-pic-base
> didn't ICE in GCC 6.

Dear Andrew,

Thank you very much for your kindly reply.

So can I revert this change, and rebuild the toolchain to test it?

Thanks &
BRs

Gang Peng

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:c43c74f6ec795a586388de7abfdd20a0040f6f16

commit r15-1583-gc43c74f6ec795a586388de7abfdd20a0040f6f16
Author: Richard Biener 
Date:   Mon Jun 24 09:52:39 2024 +0200

tree-optimization/115602 - SLP CSE results in cycles

The following prevents SLP CSE to create new cycles which happened
because of a 1:1 permute node being present where its child was then
CSEd to the permute node.  Fixed by making a node only available to
CSE to after recursing.

PR tree-optimization/115602
* tree-vect-slp.cc (vect_cse_slp_nodes): Delay populating the
bst-map to avoid cycles.

* gcc.dg/vect/pr115602.c: New testcase.

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Richard Biener  ---
Should be fixed, but the inefficiency in creating the permute is still present.
I guess I'll create another PR for that.

[Bug target/115519] s390 fallout from removing vcond{,u,eq} patterns

2024-06-24 Thread stefansf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115519

--- Comment #3 from Stefan Schulze Frielinghaus  
---
The failing autovec-long-double-signaling-*.c tests stem from the fact that
vcond_mask_mn is not implemented for V1TF which can be easily done by simply
switching to VT mode iterator and extending TOINTVEC/tointvec.

[Bug tree-optimization/115615] New: SLP permute optimization creates unnecessary permute

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115615

Bug ID: 115615
   Summary: SLP permute optimization creates unnecessary permute
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For the testcases in PR115602, notably for the following at -O2 on x86_64
the SLP permute pass is materializing a permute ontop of the following
which is always going to be an identity transform.

t.c:13:6: note: node 0x4ab3690 (max_nunits=2, refcnt=1) vector(2) double
t.c:13:6: note: op template: _11 = gvevent_motion_job.zoom;
t.c:13:6: note: stmt 0 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: stmt 1 _11 = gvevent_motion_job.zoom;
t.c:13:6: note: load permutation { 2 2 }



typedef struct {
  double x, y;
} pointf;
struct {
  pointf focus;
  double zoom;
  pointf devscale;
  char button;
  pointf oldpointer;
} gvevent_motion_job;
char gvevent_motion_job_4;
double gvevent_motion_pointer_1, gvevent_motion_pointer_0;
void gvevent_motion() {
  double dx = (gvevent_motion_pointer_0 - gvevent_motion_job.oldpointer.x) /
  gvevent_motion_job.devscale.x,
 dy = (gvevent_motion_pointer_1 - gvevent_motion_job.oldpointer.y) /
  gvevent_motion_job.devscale.y;
  if (dx && dy < .0001)
return;
  switch (gvevent_motion_job_4)
  case 2: {
gvevent_motion_job.focus.x -= dy / gvevent_motion_job.zoom;
gvevent_motion_job.focus.y += dx / gvevent_motion_job.zoom;
  }
}

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533

Richard Biener  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #19 from Richard Biener  ---
Note I can't exactly see something wrong.  -ffp-contract=fast triggers
BB vectorization in 'ac', placing __restrict on 'ae' like

void ac(float *ad, float * __restrict ae, size_t t, float *a, float *b, size_t
af,  
uint32_t) {
...

no longer requires -fipa-pta to reproduce the issue.  I _think_ this
__restrict is OK (allocation is unnecessarily obfuscated in the test).

Alex fixed -ffp-contract=on but =fast is still default it seems.

[Bug middle-end/114855] ICE: Segfault

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #10 from Richard Biener  ---
Created attachment 58505
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58505&action=edit
preprocessed testcase

[Bug middle-end/114855] ICE: Segfault

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #11 from Richard Biener  ---
Btw, a question to the reporter - I suppose the files are machine-generated. 
Are you able to create a file of smaller size?  This one has ~20 lines,
some with 2000 and 2 lines would be perfect.

[Bug c++/115616] New: Friend-injecting a template function causes an ICE if you inject after trying to instantiate that function

2024-06-24 Thread iamsupermouse at mail dot ru via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115616

Bug ID: 115616
   Summary: Friend-injecting a template function causes an ICE if
you inject after trying to instantiate that function
   Product: gcc
   Version: 14.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iamsupermouse at mail dot ru
  Target Milestone: ---

The following causes an internal compiler error in GCC 14 and trunk:
https://gcc.godbolt.org/z/z9PWhWvzh

Same bug in Clang: https://github.com/llvm/llvm-project/issues/96485
MSVC compiles this successfully and calls `bar<10,20>()`.

template  void bar() {}

template 
struct Reader
{
template 
friend void foo(Reader);
};

template 
struct Writer
{
template 
friend void foo(Reader) {bar();}
};

int main()
{
foo<10>(Reader{});
Writer{};
}

GCC says: (this is trunk, v14 just says "segmentation fault")

: In instantiation of 'void foo(Reader) [with int X = 10; T =
int; int Y = ]':
:19:12:   required from here
 19 | foo<10>(Reader{});
| ~~~^~~
:14:33: internal compiler error: tree check: accessed elt 2 of
'tree_vec' with 1 elts in tsubst, at cp/pt.cc:16362
 14 | friend void foo(Reader) {bar();}
| ^
0x26cfe8c internal_error(char const*, ...)
???:0
0x97a4bb tree_vec_elt_check_failed(int, int, char const*, int, char const*)
???:0
0xcbc689 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
???:0
0xcabce3 instantiate_decl(tree_node*, bool, bool)
???:0
0xcd625b instantiate_pending_templates(int)
???:0
0xb6e830 c_parse_final_cleanups()
???:0
0xdcad68 c_common_parse_file()
???:0

[Bug c++/115567] Internal Compiler Error: Segmentation Fault during build

2024-06-24 Thread jjleksmi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115567

--- Comment #6 from Jayalekshmi Jayakumar  ---
Could you please tell me how I can get it to work without this error

[Bug c++/115567] Internal Compiler Error: Segmentation Fault during build

2024-06-24 Thread jjleksmi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115567

--- Comment #7 from Jayalekshmi Jayakumar  ---
Could you please tell me how I can get it to work without this error

[Bug c++/115617] New: Wrong diagnostic message for non-const expr in constexpr context

2024-06-24 Thread jengelh at inai dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115617

Bug ID: 115617
   Summary: Wrong diagnostic message for non-const expr in
constexpr context
   Product: gcc
   Version: 13.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jengelh at inai dot de
  Target Milestone: ---

Input:

enum E { FOO = 2 };
int main()
{
  static constexpr auto x = reinterpret_cast(static_cast(FOO));
  static constexpr auto y = reinterpret_cast(static_cast(2));
}

Observed output:

$ g++ -c x.cpp -Wall -std=c++17
x.cpp: In function ‘int main()’:
x.cpp:4:29: error: ‘reinterpret_cast(2)’ is not a constant expression
x.cpp:5:29: error: ‘reinterpret_cast’ from integer to pointer

Expected output:

x.cpp: In function ‘int main()’:
x.cpp:4:29: error: ‘reinterpret_cast(2)’ is not a constant expression
x.cpp:5:29: error: ‘reinterpret_cast(2)’ is not a constant expression

[Bug tree-optimization/113673] [12/13/14/15 Regression] ICE: verify_flow_info failed: BB 5 cannot throw but has an EH edge with -Os -finstrument-functions -fnon-call-exceptions -ftrapv

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113673

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:d8b05aef77443e1d3d8f3f5d2c56ac49a503fee3

commit r15-1584-gd8b05aef77443e1d3d8f3f5d2c56ac49a503fee3
Author: Roger Sayle 
Date:   Mon Jun 24 15:34:03 2024 +0100

PR tree-optimization/113673: Avoid load merging when potentially trapping.

This patch fixes PR tree-optimization/113673, a P2 ice-on-valid regression
caused by load merging of (ptr[0]<<8)+ptr[1] when -ftrapv has been
specified.  When the operator is | or ^ this is safe, but for addition
of signed integer types, a trap may be generated/required, so merging this
idiom into a single non-trapping instruction is inappropriate, confusing
the compiler by transforming a basic block with an exception edge into one
without.

This revision implements Richard Biener's feedback to add an early check
for stmt_can_throw_internal (cfun, stmt) to prevent transforming in the
presence of any statement that could trap, not just overflow on addition.
The one other tweak included in this patch is to mark the local function
find_bswap_or_nop_load as static ensuring that it isn't called from outside
this file, and guaranteeing that it is dominated by stmt_can_throw_internal
checking.

2024-06-24  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/113673
* gimple-ssa-store-merging.cc (find_bswap_or_nop_load): Make
static.
(find_bswap_or_nop_1): Avoid transformations (load merging) when
stmt_can_throw_internal indicates that a statement can trap.

gcc/testsuite/ChangeLog
PR tree-optimization/113673
* g++.dg/pr113673.C: New test case.

[Bug middle-end/114855] ICE: Segfault

2024-06-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #12 from Richard Biener  ---
At -O1 we have

Samples: 2M of event 'cycles:u', Event count (approx.): 2983686432518   
Overhead   Samples  Command  Shared Object Symbol   
  19.77%467950  cc1  cc1   [.] bitmap_bit_p 
  12.31%300919  cc1  cc1   [.]
wide_int_storage::operator=   
   6.79%158610  cc1  cc1   [.]
gori_compute::may_recompute_p 
   4.84%113100  cc1  cc1   [.]
ranger_cache::range_from_dom  
   3.79% 88582  cc1  cc1   [.] bitmap_set_bit   
   3.24% 75772  cc1  cc1   [.]
block_range_cache::get_bb_range   
   2.40% 56058  cc1  cc1   [.] get_immediate_dominator  
   2.37% 55493  cc1  cc1   [.] gori_map::exports
   2.15% 50244  cc1  cc1   [.] gori_map::is_export_p
   1.87% 45710  cc1  cc1   [.]
wide_int_storage::wide_int_storage
   1.73% 40436  cc1  cc1   [.]
infer_range_manager::has_range_p  
   1.70% 39586  cc1  cc1   [.] gimple_has_side_effects  
   1.17% 28642  cc1  cc1   [.]
irange_storage::get_irange
   1.13% 27004  cc1  cc1   [.]
back_jt_path_registry::adjust_paths_after_duplication  

so it's DOMs jump threader that takes the time.  Using -O1 -fno-thread-jumps
this improves a lot to

Samples: 362K of event 'cycles:u', Event count (approx.): 441041461405  
Overhead   Samples  Command  Shared Object Symbol   
  22.44% 78191  cc1  cc1   [.]
wide_int_storage::operator=   
  11.02% 38451  cc1  cc1   [.] bitmap_bit_p 
   3.55% 12318  cc1  cc1   [.]
dom_oracle::register_transitives  
   3.45% 12016  cc1  cc1   [.]
wide_int_storage::wide_int_storage  

I'm going to try to collect a callgrind profile for -O1.

[Bug tree-optimization/115602] [15 Regression] ICE on liblapack-3.12.0: in vect_schedule_slp_node, at tree-vect-slp.cc:9643 since r15-1565-g2a345214fc332b

2024-06-24 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115602

--- Comment #11 from Sergei Trofimovich  ---
The change fixed both liblapack-3.12.0 and graphviz-10.0.1 builds for me. Thank
you!

[Bug middle-end/47081] Macro usage too clever for localization

2024-06-24 Thread goeran at uddeborg dot se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47081

Göran Uddeborg  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Göran Uddeborg  ---
We seem to agree this can be closed.

[Bug translation/40883] [meta-bug] Translation breakage with trivial fixes

2024-06-24 Thread goeran at uddeborg dot se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
Bug 40883 depends on bug 47081, which changed state.

Bug 47081 Summary: Macro usage too clever for localization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47081

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/115618] New: GCC 13.3 should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

Bug ID: 115618
   Summary: GCC 13.3 should defined __ARM_FEATURE_CRYPTO with
+aes+sha2
   Product: gcc
   Version: 13.3.1
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

I think this has been fixed in GCC 14 onwards but we're seeing
__ARM_FEATURE_CRYPTO missing from some -mcpu=native cases that should be
including it (the prerequisite "aes pmull sha1 sha2" info exists in cpuinfo)
with GCC 13-based compilers.

I think this can be reproduced with:
#ifndef __ARM_FEATURE_CRYPTO
#error "__ARM_FEATURE_CRYPTO should be defined!"
#endif
void
foo (void)
{
}

compiled with GCC 13.3 with -march=armv9-a+aes+sha2 gives an error but it works
with GCC 14.1. I remember there was much rework in this area, could something
be backported to the branch?

[Bug c++/115617] inconsistent diagnostic message for reinterpret_cast in constexpr context (enum vs integer constant)

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115617

Andrew Pinski  changed:

   What|Removed |Added

Summary|Wrong diagnostic message|inconsistent diagnostic
   |for non-const expr in   |message for
   |constexpr context   |reinterpret_cast in
   ||constexpr context (enum vs
   ||integer constant)
   Last reconfirmed||2024-06-24
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug target/115618] GCC 13.3 should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

--- Comment #1 from Andrew Pinski  ---
r14-6612-g8d30107455f230

[Bug target/115618] [11/12/13 only] should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

Andrew Pinski  changed:

   What|Removed |Added

Summary|GCC 13.3 should defined |[11/12/13 only] should
   |__ARM_FEATURE_CRYPTO with   |defined
   |+aes+sha2   |__ARM_FEATURE_CRYPTO with
   ||+aes+sha2
   Target Milestone|--- |13.4

[Bug sanitizer/115619] New: [ASAN] new-delete-type-mismatch on aligned operator new

2024-06-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115619

Bug ID: 115619
   Summary: [ASAN] new-delete-type-mismatch on aligned operator
new
   Product: gcc
   Version: 14.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thiago at kde dot org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org
  Target Milestone: ---

Simple test case:

#include 
int main()
{
delete new (std::align_val_t(64)) char;
}

Produces:

=
==31603==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x50900040 in
thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   1 bytes;
  size of the deallocated type: 1 bytes.
  alignment of the allocated type:   64 bytes;
  alignment of the deallocated type: default-aligned.
#0 0x7f8abfefd0d8 in operator delete(void*, unsigned long)
(/lib64/libasan.so.8+0xfd0d8) (BuildId:
1827a4c72065a9f25ba519b25166029eebbf519f)
#1 0x40118a in main (/tmp/asan+0x40118a) (BuildId:
8bfb14839297ab61e6a8de28f913cc801a1f7cd7)
#2 0x7f8abf62a1ef in __libc_start_call_main (/lib64/libc.so.6+0x2a1ef)
(BuildId: a2c0942c27fb9483b47886a1b937337a797bbceb)
#3 0x7f8abf62a2b8 in __libc_start_main_alias_2 (/lib64/libc.so.6+0x2a2b8)
(BuildId: a2c0942c27fb9483b47886a1b937337a797bbceb)
#4 0x401094 in _start ../sysdeps/x86_64/start.S:115

0x50900040 is located 0 bytes inside of 1-byte region
[0x50900040,0x50900041)
allocated by thread T0 here:
#0 0x7f8abfefc708 in operator new(unsigned long, std::align_val_t)
(/lib64/libasan.so.8+0xfc708) (BuildId:
1827a4c72065a9f25ba519b25166029eebbf519f)
#1 0x401178 in main (/tmp/asan+0x401178) (BuildId:
8bfb14839297ab61e6a8de28f913cc801a1f7cd7)
#2 0x7f8abf62a1ef in __libc_start_call_main (/lib64/libc.so.6+0x2a1ef)
(BuildId: a2c0942c27fb9483b47886a1b937337a797bbceb)

SUMMARY: AddressSanitizer: new-delete-type-mismatch
(/lib64/libasan.so.8+0xfd0d8) (BuildId:
1827a4c72065a9f25ba519b25166029eebbf519f) in operator delete(void*, unsigned
long)
==31603==HINT: if you don't care about these errors you may set
ASAN_OPTIONS=new_delete_type_mismatch=0
==31603==ABORTING

Reproduced with GCC 13, 14 and with Clang 18.

[Bug sanitizer/115619] [ASAN] new-delete-type-mismatch on aligned operator new

2024-06-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115619

--- Comment #1 from Thiago Macieira  ---
Matching Clang bug report: https://github.com/llvm/llvm-project/issues/96512

[Bug fortran/115563] Unnecessary brackets prevent fortran vectorisation

2024-06-24 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563

--- Comment #6 from mjr19 at cam dot ac.uk ---
A further comment to aid others reading this report. It is not just unnecessary
brackets which used to prevent vectorisation, but also necessary ones.

subroutine foo(a,b,c,n)
  complex (kind(1d0)) :: a(*),b,c
  integer :: i,n

  do i=1,n
 a(i)=(a(i)+b)*c
  enddo
end subroutine foo

does not vectorise with gfortran-14, but does with gfortran-15.0-20240623.

The performance increase in loops making extensive use of complex variables can
therefore be quite significant -- fifty percent or more.

The almost-equivalent C code of

void foo(_Complex double *a, _Complex double b, _Complex double c, int n){
  int i;
  for(i=0;i

[Bug target/115608] ICE in extract_insn, at recog.cc:2812 when building with -mv8plus

2024-06-24 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115608

Eric Botcazou  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Eric Botcazou  ---
Tentative fix:

diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 1e2e4aef2ad..83e0d6874d9 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -162,7 +162,7 @@ extern const char *host_detect_local_cpu (int argc, const
char **argv);
 "%{m32:%{m64:%emay not use both -m32 and -m64}} \
 %{m32:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
   %{!mcpu*:-mcpu=cypress}} \
-%{mv8plus:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
+%{mv8plus:-m32 -mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
   %{!mcpu*:-mcpu=v9}} \
 %{!m32:%{!mcpu*:-mcpu=ultrasparc}} \
 %{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}}"

[Bug target/115608] ICE in extract_insn, at recog.cc:2812 when building with -mv8plus

2024-06-24 Thread glaubitz at physik dot fu-berlin.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115608

--- Comment #6 from John Paul Adrian Glaubitz  ---
(In reply to Eric Botcazou from comment #5)
> Tentative fix:
> 
> diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
> index 1e2e4aef2ad..83e0d6874d9 100644
> --- a/gcc/config/sparc/linux64.h
> +++ b/gcc/config/sparc/linux64.h
> @@ -162,7 +162,7 @@ extern const char *host_detect_local_cpu (int argc,
> const char **argv);
>  "%{m32:%{m64:%emay not use both -m32 and -m64}} \
>  %{m32:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
>%{!mcpu*:-mcpu=cypress}} \
> -%{mv8plus:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
> +%{mv8plus:-m32 -mptr32 -mno-stack-bias
> %{!mlong-double-128:-mlong-double-64} \
>%{!mcpu*:-mcpu=v9}} \
>  %{!m32:%{!mcpu*:-mcpu=ultrasparc}} \
>  %{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}}"

Indeed, passing -m32 on the command line as well fixes the problem.

[Bug target/115608] ICE in extract_insn, at recog.cc:2812 when building with -mv8plus

2024-06-24 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115608

--- Comment #7 from Eric Botcazou  ---
Right, but Solaris does it automatically so Linux can probably mimic it.

[Bug c++/115583] [14/15 Regression] C++23: Call to consteval function in `if consteval` immediate function context rejected at -O1 since r14-4140

2024-06-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115583

Marek Polacek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

--- Comment #4 from Andi Kleen  ---
Pedantry aside the basic problem is that doloop optimization depends on the
target supporting doloop, but the loop reversal would be useful everywhere.

So there are two options: add doloop to every target of interest or make the
reversal optimization independent.

[Bug middle-end/115607] missed tail call with large structure size

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115607

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c++/115561] [14/15 Regression] ICE checking constraints when a local class is involved since r14-9659

2024-06-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115561

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org

[Bug c++/115558] Trivial noexcept(false) default constructor does not make value initialization potentially throwing

2024-06-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115558

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #3 from Marek Polacek  ---
Probably mine then; thanks for the report.

[Bug c++/115620] New: internal compiler error: output_operand: invalid expression as operand

2024-06-24 Thread iamanonymous.cs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115620

Bug ID: 115620
   Summary: internal compiler error: output_operand: invalid
expression as operand
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iamanonymous.cs at gmail dot com
  Target Milestone: ---
Target: x86_64

***
The compiler produces an internal error during tsubst_pack_expansion when
compiling the provided code with the specified options. 
The issue can also be reproduced on Compiler Explorer.

***
OS and Platform:
# uname -a
Linux ubuntu 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
***
# g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/root/gdbtest/gcc/gcc-15/libexec/gcc/x86_64-pc-linux-gnu/15.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /root/gdbtest/gcc/obj/../gcc/configure
--prefix=/root/gdbtest/gcc/gcc-15 --enable-languages=c,c++,fortran,go
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240509 (experimental) (GCC) 
***
Program:
# cat code_0.cpp

#include 
#include 
#include 

[[maybe_unused]] constexpr inline static auto lpartial =
[](auto fn, auto... t0) constexpr noexcept {
  return [=](auto... rest) constexpr noexcept
requires requires { fn(t0..., rest...); }
  { return fn(t0..., rest...); };
};

[[maybe_unused]] constexpr inline static auto fix =
[](auto self) constexpr noexcept {
  return [=](auto... args) constexpr noexcept
requires requires { self(self, args...); }
  { return self(self, args...); };
};

[[maybe_unused]] constexpr inline static auto currify0 =
[](auto self, auto v, auto fn) constexpr noexcept {
  return [=](auto... as) constexpr noexcept
requires(
(v.value > sizeof...(as) && requires { lpartial(fn, as...); }) ||
(v.value == sizeof...(as) && requires { fn(as...); }))
  {
if constexpr (v.value > sizeof...(as))
  return self(self, std::integral_constant{}, lpartial(fn, as...));
else
  return fn(as...);
  };
};

[[maybe_unused]] constexpr static inline auto plus =
[](auto x, auto y) constexpr noexcept
  requires requires { x + y; }
{ return x + y; };

#undef lambda_body

[[maybe_unused]] constexpr static inline auto plus_ =
fix(currify0)(std::integral_constant{}, plus);




***
Command Lines:
# g++ code_0.cpp -O3 -Wpedantic -Wall -Wextra -Wconversion -Wshadow
-Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Wunused -Woverloaded-virtual
-Wpedantic -Wsign-conversion -Wmisleading-indentation -Wduplicated-cond
-Wnull-dereference -Wdouble-promotion -Wformat=2 -Wno-unused-parameter -c -o
code_0.o
code_0.cpp:8:9: warning: identifier ‘requires’ is a keyword in C++20
[-Wc++20-compat]
8 | requires requires { fn(t0..., rest...); }
  | ^~~~
code_0.cpp: In lambda function:
code_0.cpp:8:9: error: ‘requires’ only available with ‘-std=c++20’ or
‘-fconcepts’
code_0.cpp:8:18: error: ‘requires’ was not declared in this scope
8 | requires requires { fn(t0..., rest...); }
  |  ^~~~
code_0.cpp:8:50: error: expected ‘;’ before ‘{’ token
8 | requires requires { fn(t0..., rest...); }
  |  ^
  |  ;
9 |   { return fn(t0..., rest...); };
  |   ~   
code_0.cpp:9:26: error: ‘rest’ was not declared in this scope
9 |   { return fn(t0..., rest...); };
  |  ^~~~
code_0.cpp: In lambda function:
code_0.cpp:15:9: error: ‘requires’ only available with ‘-std=c++20’ or
‘-fconcepts’
   15 | requires requires { self(self, args...); }
  | ^~~~
code_0.cpp:15:18: error: ‘requires’ was not declared in this scope
   15 | requires requires { self(self, args...); }
  |  ^~~~
code_0.cpp:15:51: error: expected ‘;’ before ‘{’ token
   15 | requires requires { self(self, args...); }
  |   ^
  |   ;
   16 |   { return self(self, args...); };
  |   ~
code_0.cpp:16:27: error: ‘args’ was not declared in this scope
  

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

--- Comment #5 from Andi Kleen  ---
Also the other problem is that doloop optimization is only for known bounds,
while generic reversal works for unknown too

[Bug fortran/55978] class_optional_2.f90 -Os fails

2024-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55978

--- Comment #34 from GCC Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:f02c70dafd384f0c44d7a0920f4a75a30e267045

commit r15-1585-gf02c70dafd384f0c44d7a0920f4a75a30e267045
Author: Harald Anlauf 
Date:   Sun Jun 23 22:36:43 2024 +0200

Fortran: fix passing of optional dummy as actual to optional argument
[PR55978]

gcc/fortran/ChangeLog:

PR fortran/55978
* trans-array.cc (gfc_conv_array_parameter): Do not dereference
data component of a missing allocatable dummy array argument for
passing as actual to optional dummy.  Harden logic of presence
check for optional pointer dummy by using TRUTH_ANDIF_EXPR instead
of TRUTH_AND_EXPR.

gcc/testsuite/ChangeLog:

PR fortran/55978
* gfortran.dg/optional_absent_12.f90: New test.

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120

--- Comment #4 from Tamar Christina  ---
You asked why this doesn't happen with a normal vector loop Richi.

For a normal loop when IVcannon adds the downward counting loop there are two
main differences.

1. for a single exit loop, the downward IV is the main IV. which we ignore as
the vectorizer replaces the loop exit condition with a bound iteration check.

2.  when we peel, the main loop has a known iteration count.  So the starting
downward IV for the scalar loop is a known constant.  That means we statically
compute the start of the IV.  As such there's no data-flow from for this
downwards counting IV from the main loop into the scalar loop.

i.e. in this loop:

   [local count: 1063004408]:
  # i_8 = PHI 
  # ivtmp_2 = PHI 
  res[i_8] = i_8;
  i_5 = i_8 + 1;
  ivtmp_1 = ivtmp_2 - 1;
  if (ivtmp_1 != 0)
goto ; [98.99%]
  else
goto ; [1.01%]

   [local count: 1052266995]:
  goto ; [100.00%]

when we vectorize the final loop looks like:

   [local count: 1063004408]:
  # i_8 = PHI 
  # ivtmp_2 = PHI 
  # vect_vec_iv_.6_19 = PHI <_20(5), { 0, 1, 2, 3 }(2)>
  # vectp_res.7_21 = PHI 
  # ivtmp_24 = PHI 
  _20 = vect_vec_iv_.6_19 + { 4, 4, 4, 4 };
  MEM  [(int *)vectp_res.7_21] = vect_vec_iv_.6_19;
  i_5 = i_8 + 1;
  ivtmp_1 = ivtmp_2 - 1;
  vectp_res.7_22 = vectp_res.7_21 + 16;
  ivtmp_25 = ivtmp_24 + 1;
  if (ivtmp_25 < 271)
goto ; [98.99%]
  else
goto ; [1.01%]

   [local count: 1052266995]:
  goto ; [100.00%]

   [local count: 10737416]:
  # i_16 = PHI 
  # ivtmp_17 = PHI 

   [local count: 32212248]:
  # i_7 = PHI 
  # ivtmp_11 = PHI 
  res[i_7] = i_7;
  i_13 = i_7 + 1;
  ivtmp_14 = ivtmp_11 - 1;
  if (ivtmp_14 != 0)
goto ; [66.67%]
  else
goto ; [33.33%]

   [local count: 21474835]:
  goto ; [100.00%]

for a vector code neither assumption are no longer true.

1.  The vectorizer may pick another exit other than the downwards counting IV.
In particular if the early exit has a known iteration count lower than the main
exit.

2.  Because we don't know which exit the loop takes, we can't tell how many
iteration you have to do at a minimum for the scalar loop.  We only know the
maximum.  As such the loop reduction into the second loop is:

   [local count: 58465242]:
  # vect_vec_iv_.6_30 = PHI 
  # vect_vec_iv_.7_35 = PHI 
  _36 = BIT_FIELD_REF ;
  ivtmp_26 = _36;
  _31 = BIT_FIELD_REF ;
  i_25 = _31;
  goto ; [100.00%]

   [local count: 214528238]:
  # i_3 = PHI 
  # ivtmp_17 = PHI 

Since we don't know the iteration count we require both IVs to be live.  the
downcounting IV is live because the scalar loop needs a starting point, and the
incrementing IV is live due to addressing mode usages.

This means neither can be removed.

In the single exit case, the downward IV is only used for loop control:

   [local count: 32212248]:
  # i_7 = PHI 
  # ivtmp_11 = PHI 
  res[i_7] = i_7;
  i_13 = i_7 + 1;
  ivtmp_14 = ivtmp_11 - 1;
  if (ivtmp_14 != 0)
goto ; [66.67%]
  else
goto ; [33.33%]

   [local count: 21474835]:
  goto ; [100.00%]

and so IVopts rewrites the addressing mode usages of `i` into

   [local count: 32212248]:
  # ivtmp.12_2 = PHI 
  _5 = (unsigned int) ivtmp.12_2;
  i_7 = (int) _5;
  MEM[(int *)&res + ivtmp.12_2 * 4] = i_7;
  ivtmp.12_8 = ivtmp.12_2 + 1;
  if (ivtmp.12_8 != 1087)
goto ; [66.67%]
  else
goto ; [33.33%]

   [local count: 21474835]:
  goto ; [100.00%]

and rewrites the loop back into an incrementing loop.  This also happens for
the early exit loop, that's why the scalar code doesn't have the double IVs.

But vector loop we have this issue due to needing the second IV live.

We might be able to rewrite the vector IVs as you say in IVopts,  however not
only does IVopts not rewrite vector IVs, it also doesn't rewrite multiple exit
loops in general. 

It has two checks:

  /* Make sure that the loop iterates till the loop bound is hit, as otherwise
 the calculation of the BOUND could overflow, making the comparison
 invalid.  */
  if (!data->loop_single_exit_p)
return false;

and seems to lose a lot of information when niter_for_single_dom_exit (..) is
null, it seems that in order for this to work correctly IVopts needs to know
which exit we've chosen in the vectorizer. i.e. I think it would have issued
with a PEELED loop.

We also have the problem where both IVs are required:

int arr[1024];
int f()
{
int i;
for (i = 0; i < 1024; i++)
  if (arr[i] == 42)
return i;
return *(arr + i);
}

but with the downward counting IV enabled, we get a much more complicated
latch.

> Note this isn't really because of IVCANON but because the IV is live.  
> IVCANON adds a downward counting IV historically to enable RTL doloop 
> transforms.

IVopts currently has:

  /* Similar to doloop_optimize, check iteration description to know it's
 suitable or not.  Keep it as simple as possible, feel free to extend it
 if you find any multiple exits cases matter.  */
  edge e

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533

--- Comment #20 from Alexander Monakov  ---
Sam, can you provide more context? It seems there is no downstream bugreport?
How does the alleged miscompilation manifest?

Note that effects of interplay of fp-contract=fast and vectorization can be
pretty epic, like the completely wrong strong green tint of blur filter in
RawTherapee (see screenshot at
https://github.com/Beep6581/RawTherapee/issues/6384 which was minimized and
reported as our PR 106902).

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533

--- Comment #21 from Sam James  ---
It fell out when building media-libs/flac's tests after I threw in -fipa-pta in
a test container for a single run where I check trunk for regressions.

Building flac itself, I can reproduce it with:
```
export CFLAGS="-O3 -ggdb3 -march=znver2 -fipa-pta -fno-vect-cost-model"
export CXXFLAGS="-O3 -ggdb3 -march=znver2 -fipa-pta -fno-vect-cost-model"
export LDFLAGS="-Wl,-O1"
cmake -B build-bad -DINSTALL_MANPAGES=OFF -DCMAKE_BUILD_TYPE=Debug
-DBUILD_SHARED_LIBS=ON
make -C build-bad
ctest --test-dir build-bad -R generate_streams
ctest --test-dir build-bad --tests-information -o -R replaygain
--output-on-failure
```

(*BUILD_TYPE and *SHARED_LIBS can be emitted if desired - was just there for
debugging while I went as wanted assertions, and you can just run make -C
build-bad and then make -C build-bad check).

It fails like:
```
[...]
replaygain.flac: 64.82 1.00 64.82 1.00
CPU info (x86-64):
  CMOV ... Y
  MMX  Y
  SSE  Y
  SSE2 ... Y
  SSE3 ... Y
  SSSE3 .. Y
  SSE41 .. Y
  SSE42 .. Y
  AVX  Y
  FMA  Y
  AVX2 ... Y
  BMI2 ... Y
  AVX OS sup . Y
CPU info (x86-64):
  CMOV ... Y
  MMX  Y
  SSE  Y
  SSE2 ... Y
  SSE3 ... Y
  SSSE3 .. Y
  SSE41 .. Y
  SSE42 .. Y
  AVX  Y
  FMA  Y
  AVX2 ... Y
  BMI2 ... Y
  AVX OS sup . Y
ERROR, Expected -12.73 db instead of comment[1]: REPLAYGAIN_TRACK_GAIN=+64.82
dB


0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.09 sec

The following tests FAILED:
  7 - replaygain (Failed)
Errors while running CTest
```

(Will check out your link now, thanks!)

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533

--- Comment #22 from Alexander Monakov  ---
Similar to the RawTherapee issue, SLP opportunities are created by predcom, so
either -fno-predictive-commoning or -fno-tree-slp-vectorize avoids numerical
runaway on the small testcase.

[Bug c++/115605] structured binding break if a variable named tuple_size is visibile at the decomposition site

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115605

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Andrew Pinski  ---
Ok, I will take this.

[Bug target/115478] [15 Regression] gcc.target/aarch64/bitint-args.c fails since r15-1120-g2277f987979445

2024-06-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115478

Richard Sandiford  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #5 from Richard Sandiford  ---
How about adding a new match_operator predicate to common.md for this kind of
situation?  It would be nice if it could automatically detect when the two
operands have no nonzero bits in common, but doing that would need some
refactoring of the nonzero_bits code, to ensure that the predicate gives a
consistent result (and does that without polluting the current nonzero_bits
cache).

In the meantime, it might be enough to say that the insn must enforce the
non-overlapping bits check itself.

[Bug c++/115605] structured binding break if a variable named tuple_size is visibile at the decomposition site

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115605

--- Comment #4 from Andrew Pinski  ---
Created attachment 58506
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58506&action=edit
Fully self contained example

[Bug c/70930] VLAs in structs in loop headers are not evaluated each iteration

2024-06-24 Thread uecker at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70930

uecker at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed|2016-05-04 00:00:00 |2024-6-24
 CC||uecker at gcc dot gnu.org

--- Comment #2 from uecker at gcc dot gnu.org ---
This seems fixed in GCC 14
https://godbolt.org/z/TWWznxE49

[Bug c++/115621] New: internal compiler error: Segmentation fault with ambiguous operator

2024-06-24 Thread jan.zizka at nokia dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115621

Bug ID: 115621
   Summary: internal compiler error: Segmentation fault with
ambiguous operator
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jan.zizka at nokia dot com
  Target Milestone: ---

Created attachment 58507
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58507&action=edit
Reproducer preprocessed source

When compiling below code with gcc 14.1.1 (in Fedora 40) an internal error is
thrown.
The same compiles with gcc 13.3.1 (in Fedora 39).

$ g++ -freport-bug reproduce.cpp -c -o reproduce.o
reproduce.cpp: In function ‘void test()’:
reproduce.cpp:68:38: internal compiler error: Segmentation fault
   68 | log::print() << "data: " << master_data;
  |  ^~~
Please submit a full bug report, with preprocessed source.
See  for instructions.
Preprocessed source stored into /tmp/ccCeduvz.out file, please attach this to
your bugreport.


#include   
#include  
#include   
#include  

class data_class
{   
public: 
inline const int& operator*() const { return data; }
private:
int data;   
};  

class data_stream   
{   
public: 
data_stream& operator<<(char c) { return *this; }   
};  

inline data_stream& operator<<(data_stream& out, const data_class& value) {
return out << *value; }   

class text_stream   
{   
public: 
template
text_stream& operator<<(T val) { return *this; }
};  

inline text_stream& operator<<(text_stream& out, const data_class& value) {
return out  << *value; }  

template  
class log_stream_aa 
{   
public: 
template
log_stream_aa& operator<<(const T& t) { return *this; } 
};  

template  
struct log_config   
{   
typedef log_stream_aa stream; 
};  

template   
class basic_log 
{   
public: 
template   
class print : public log_config::stream   
{   
public: 
print() : log_config::stream() { }
};  
};  

typedef basic_log log; 

typedef unsigned char data_type[8];  

[Bug c++/115605] structured binding break if a variable named tuple_size is visibile at the decomposition site

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115605

--- Comment #5 from Andrew Pinski  ---
Created attachment 58508
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58508&action=edit
Patch which I am testing

Tested it on both my self contained example (which was failing before) and the
original testcase. Both work now.

[Bug c++/115621] internal compiler error: Segmentation fault with ambiguous operator

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115621

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Already fixed.

*** This bug has been marked as a duplicate of bug 115239 ***

[Bug c++/115239] [14 Regression] ICE: Segmentation fault with ambiguous function call in some cases (`const char*` vs `char` with `long` vs `unsigned`) since r14-6522

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115239

Andrew Pinski  changed:

   What|Removed |Added

 CC||jan.zizka at nokia dot com

--- Comment #8 from Andrew Pinski  ---
*** Bug 115621 has been marked as a duplicate of this bug. ***

[Bug other/115622] New: gcc.dg/ipa/iinline-attr.c fails after r15-1579-g792f97b44ffc5e

2024-06-24 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115622

Bug ID: 115622
   Summary: gcc.dg/ipa/iinline-attr.c fails after
r15-1579-g792f97b44ffc5e
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:792f97b44ffc5e6a967292b3747fd835e99396e7, r15-1579-g792f97b44ffc5e
make  -k check-gcc RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/iinline-attr.c"
FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline "hooray[^\\n]*inline copy
in test"

commit 792f97b44ffc5e6a967292b3747fd835e99396e7 (HEAD)
Author: Richard Sandiford 
Date:   Mon Jun 24 08:43:19 2024 +0100

Add a late-combine pass [PR106594]

[Bug c++/115623] New: ICE: Segmentation fault ( in contains_struct_check and finish_for_cond for cpp)

2024-06-24 Thread iamanonymous.cs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623

Bug ID: 115623
   Summary: ICE: Segmentation fault  ( in contains_struct_check
and finish_for_cond for cpp)
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iamanonymous.cs at gmail dot com
  Target Milestone: ---
Target: x86_64

***
The compiler produces a segfault during contains_struct_check  when compiling
the provided code with the specified options. 
The issue can also be reproduced on Compiler Explorer.

***
OS and Platform:
# uname -a
Linux ubuntu 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
***
# g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/root/gdbtest/gcc/gcc-15/libexec/gcc/x86_64-pc-linux-gnu/15.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /root/gdbtest/gcc/obj/../gcc/configure
--prefix=/root/gdbtest/gcc/gcc-15 --enable-languages=c,c++,fortran,go
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240509 (experimental) (GCC) 
***
Program:
# cat 1.c

void f (char *a, int i)
{
#pragma GCC novector
  for (;;i++)
a[i] *= 2;
}




***
Command Lines:
# g++ 1.c -O3 -Wpedantic -Wall -Wextra -Wconversion -Wshadow -Wunused
-Woverloaded-virtual -Wpedantic -Wsign-conversion -Wmisleading-indentation
-Wduplicated-cond -Wnull-dereference -Wdouble-promotion -c -o 1.o
1.c: In function ‘void f(char*, int)’:
1.c:4:8: internal compiler error: Segmentation fault
4 |   for (;;i++)
  |^
0x13a93af crash_signal
/root/gdbtest/gcc/obj/../gcc/gcc/toplev.cc:319
0xd22180 contains_struct_check(tree_node*, tree_node_structure_enum, char
const*, int, char const*)
/root/gdbtest/gcc/obj/../gcc/gcc/tree.h:3769
0xd22180 finish_for_cond(tree_node*, tree_node*, bool, tree_node*, bool)
/root/gdbtest/gcc/obj/../gcc/gcc/cp/semantics.cc:1506
0xc840c4 cp_parser_c_for
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:14088
0xc840c4 cp_parser_for
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:14056
0xc840c4 cp_parser_iteration_statement
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:14690
0xc50ec5 cp_parser_pragma
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:51340
0xc84977 cp_parser_statement
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:12843
0xc59807 cp_parser_statement_seq_opt
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:13427
0xc59a2f cp_parser_compound_statement
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:13281
0xc7ccd5 cp_parser_function_body
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:26072
0xc7ccd5 cp_parser_ctor_initializer_opt_and_function_body
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:26123
0xc8241e cp_parser_function_definition_after_declarator
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:32903
0xc835be cp_parser_function_definition_from_specifiers_and_declarator
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:32820
0xc835be cp_parser_init_declarator
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:23451
0xc5668f cp_parser_simple_declaration
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:15958
0xc8ffea cp_parser_declaration
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:15631
0xc90fea cp_parser_toplevel_declaration
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:15652
0xc90fea cp_parser_translation_unit
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:5284
0xc90fea c_parse_file()
/root/gdbtest/gcc/obj/../gcc/gcc/cp/parser.cc:51440
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


***

Also ICE on trunk, compiler explorer:https://godbolt.org/z/zY8bvPj3T

***

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||tnfchris at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-25

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623

--- Comment #2 from Andrew Pinski  ---
Note `#pragma GCC unroll(1)` gives an error message:

: In function 'void f(char*, int)':
:5:9: error: missing loop condition in loop with 'GCC unroll' pragma
before ';' token
5 |   for (;;i++)
  | ^

[Bug c++/115624] New: '-Wnrvo' is not an option that controls warnings

2024-06-24 Thread albrecht.guendel at web dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115624

Bug ID: 115624
   Summary: '-Wnrvo' is not an option that controls warnings
   Product: gcc
   Version: 14.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: albrecht.guendel at web dot de
  Target Milestone: ---

Hi, 
the new -Wnrvo warning is an amazing diagnostic.
However it does not play well with the #pragma GCC diagnostic directive. Yet.

Test-Case: https://godbolt.org/z/Tq8e8zPx8  

For the sake of completeness, here is the code:

using GCC 14.1 with "-Wnrvo" option

#include 
#pragma GCC diagnostic ignored "-Wnrvo"

std::string no_nrvo(int i)
{
std::string ret{};
if (i % 2)
return {};
return ret;
}


Current behavior:
warning: '-Wnrvo' is not an option that controls warnings [-Wpragmas]
warning: not eliding copy on return in 'std::string no_nrvo(int)' [-Wnrvo]

Expected behavior:
accept the "#pragma GCC diagnostic ignored" and do not diagnose the missed
optimization.

[Bug c++/115624] '-Wnrvo' is not an option that controls warnings

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115624

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Ever confirmed|0   |1
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=58487
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-06-25

--- Comment #1 from Andrew Pinski  ---
I know the obvious patch which should fix this.

[Bug libstdc++/115585] --disable-libstdcxx-verbose causes undefined symbol: _ZSt21__glibcxx_assert_failPKciS0_S0_, version GLIBCXX_3.4.30

2024-06-24 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115585

--- Comment #11 from cqwrteur  ---
Hi? Could anyone help review my patch and merge it? Ty

https://patchwork.sourceware.org/project/gcc/patch/sa1pr11mb71305d480b48400426c253d9b2...@sa1pr11mb7130.namprd11.prod.outlook.com/

[Bug libgcc/115242] libgcc unwinder does not handle vector registers, even if the target machine supports them.

2024-06-24 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115242

Sam James  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-25
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

[Bug c++/115605] structured binding break if a variable named tuple_size is visibile at the decomposition site

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115605

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> Created attachment 58508 [details]
> Patch which I am testing
> 
> Tested it on both my self contained example (which was failing before) and
> the original testcase. Both work now.

Note this patch is slightly wrong. I have a fix to that.

[Bug c++/115413] Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->f();" structure

2024-06-24 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413

Jason Merrill  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #2 from Jason Merrill  ---
If you're going to write code like this, why not

if(typeid(*a)==typeid(A)) a->A::f();

to force the non-virtual call?

[Bug sanitizer/115625] New: [10/11/13 Regression] misaligned address check missing

2024-06-24 Thread bic60176 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115625

Bug ID: 115625
   Summary: [10/11/13 Regression] misaligned address check missing
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bic60176 at gmail dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org
  Target Milestone: ---

Created attachment 58509
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58509&action=edit
Test file used in the report.

OS: Ubuntu 22.04.3 LTS
We found that gcc failed to catch misaligned address errors when compiling with
gcc-13.2.0 at optimization level 1.

$ ~/compiler-builds/gcc-13.2.0_build/bin/gcc -fsanitize=undefined -g -lgcc_s
-I/home/csmith/include/csmith-2.3.0 -O1 testcase.c -o exec
$ timeout 5s ./exec 2>exec.err
$ cat exec.err
$ ~/compiler-builds/gcc-14.1.0_build/bin/gcc -fsanitize=undefined -g -lgcc_s
-I/home/csmith/include/csmith-2.3.0 -O1 testcase.c -o exec
$ timeout 5s ./exec 2>exec.err
$ cat exec.err
testcase.c:25:7: runtime error: load of misaligned address 0x7ffe94ed505a for
type 'int32_t', which requires 4 byte alignment
0x7ffe94ed505a: note: pointer points here
 00 00  00 00 00 00 00 00 09 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00
00  01 00 00 00 00 00
  ^
$

We have found that gcc also fails to catch misaligned address errors in both
gcc-11.4.0 at optimization level 1 and gcc-10.5.0 at optimization level 1.

[Bug sanitizer/115625] [10/11/13 Regression] misaligned address check missing

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115625

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
```
  uint16_t f[1];
  int g;
  f[g] = 9;
```

This code is undefined but at -O1 and above is optimized out since nobody uses
the value; it just sets it.

[Bug c++/115413] Missing optimization: devirtualize the call in "if(typeid(*a)==typeid(A)) a->f();" structure

2024-06-24 Thread user202729 at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115413

--- Comment #3 from user202729  ---
(In reply to Jason Merrill from comment #2)
> If you're going to write code like this, why not
> 
> if(typeid(*a)==typeid(A)) a->A::f();
> 
> to force the non-virtual call?

The practical reason is that that was inspired from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . The knowledge of the
object's runtime type and the virtual function call are at different levels,
and only the optimizer can inline the function.

I can't think of any better way to address the issue, and I don't think the
optimization generates incorrect code anyway.

An idea I had was:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1325,7 +1325,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   {
 __glibcxx_requires_nonempty();
 --this->_M_impl._M_finish;
-_Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+
+if (std::is_same, _Alloc>::value) {
+  this->_M_impl._M_finish->_Tp::~_Tp();
+} else {
+  _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+}
+
 _GLIBCXX_ASAN_ANNOTATE_SHRINK(1);
   }


but that only works for std::allocator, and it is also incorrect (the user can
partially specialize std::allocator as well:
https://stackoverflow.com/q/61151170)

[Bug c++/115626] New: relax -Wsign-conversion when initializing from a literal

2024-06-24 Thread michael.kenzel at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115626

Bug ID: 115626
   Summary: relax -Wsign-conversion when initializing from a
literal
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.kenzel at gmail dot com
  Target Milestone: ---

Initializing an unsigned integer like

unsigned int mask = -1;

or

unsigned int mask = ~0;

is common practice, guaranteed to produce the desired value, and arguably the
idiomatic way to initialize a bitmask to all bits set.

Alternatives like explicitly providing the unsigned value

unsigned long mask = 0xUL;

are error prone, not generic, not portable as they cannot account for the
varying width of types across target platforms, and may not work reliably for
types for which no literal suffixes exist (e.g.: extended integer types).

Mixing signed and unsigned arithmetic is a prolific source of bugs. Thus,
enabling -Wsign-conversion is generally a good idea. However, doing so can lead
to copious amounts of false positives in code that is heavy on the use of
bitmasks. Quieting these warnings by means of explicit casts reduces
readability.

The likelihood that an unsigned integer being initialized from a literal -1 or
~0 constitutes a bug is small, while legitimate and perfectly safe uses of such
constructs are ubiquitous.

I would like to propose relaxing -Wsign-conversion to not warn upon
initialization of an unsigned integer from a literal -1 or ~0 expression, or
any unary - or ~ expression with literal operands and a signed value that does
not exceed the range of the corresponding signed type, i.e., has a
corresponding unsigned value with the same untruncated bit pattern. Maybe even
consider allowing any constant expression with such a value.

If changing the behavior of -Wsign-conversion is deemed not an option, maybe
introducing something like a -Wnonliteral-sign-conversion or
-Wnonconstant-sign-conversion option to explicitly opt into the warning only
for cases that cannot be classified as most-likely harmless at compile time
would be?

[Bug c++/115626] relax -Wsign-conversion when initializing from a literal

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115626

--- Comment #1 from Andrew Pinski  ---
-1ul and ~0ul are portable by the way.

[Bug c++/115626] relax -Wsign-conversion when initializing from a literal

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115626

--- Comment #2 from Andrew Pinski  ---
>may not work reliably for types for which no literal suffixes exist (e.g.: 
>extended integer types)

You can always do `~(cast)0` too.

[Bug c++/115626] relax -Wsign-conversion when initializing from a literal

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115626

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> You can always do `~(cast)0` too.

That is:
__uint128_t t = ~(__uint128_t)0;

does not warn.

[Bug c++/92675] sign-conversion C++ unsigned int j = -1;

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92675

Andrew Pinski  changed:

   What|Removed |Added

 CC||michael.kenzel at gmail dot com

--- Comment #7 from Andrew Pinski  ---
*** Bug 115626 has been marked as a duplicate of this bug. ***

[Bug c++/115626] relax -Wsign-conversion when initializing from a literal

2024-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115626

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andrew Pinski  ---
dup.

*** This bug has been marked as a duplicate of bug 92675 ***

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #3 from Tamar Christina  ---
It looks like cp_parser_c_for is missing the handling for novector.

Mine.

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623

--- Comment #4 from Tamar Christina  ---
novect3.c: In function 'void f(char*, int)':
novect3.c:4:9: error: missing loop condition in loop with 'GCC novector' pragma
before ';' token
4 |   for (;;i++)
  | 

should do it, will send patch later today.

  1   2   >