[Bug middle-end/114661] Bit operations not optimized to multiplication

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114661

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:df9451936c6c9e4faea371e3f188e1fc6b6d39e3

commit r15-2053-gdf9451936c6c9e4faea371e3f188e1fc6b6d39e3
Author: Roger Sayle 
Date:   Tue Jul 16 07:58:28 2024 +0100

PR tree-optimization/114661: Generalize MULT_EXPR recognition in match.pd.

This patch resolves PR tree-optimization/114661, by generalizing the set
of expressions that we canonicalize to multiplication.  This extends the
optimization(s) contributed (by me) back in July 2021.
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html

The existing transformation folds (X*C1)^(X< 3) __builtin_unreachable();
return c << 18 | c << 15 |
   c << 12 | c << 9 |
   c << 6 | c << 3 | c;
}

GCC on x86_64 with -O2 previously generated:

mul:movzbl  %dil, %edi
leal(%rdi,%rdi,8), %edx
leal0(,%rdx,8), %eax
movl%edx, %ecx
sall$15, %edx
orl %edi, %eax
sall$9, %ecx
orl %ecx, %eax
orl %edx, %eax
ret

with this patch we now generate:

mul:movzbl  %dil, %eax
imull   $299593, %eax, %eax
ret

2024-07-16  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplications, such as those inserted
by this transformation.

gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.

[Bug target/115937] duplicate .plt in module's elf header

2024-07-16 Thread ellery1016 at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115937

--- Comment #8 from Ellery  ---
(In reply to Andrew Pinski from comment #7)
> Can you attach the file ./arch/arm64/kernel/module.lds ?

Thanks a lot.
It turns out that my scripts/module-common.lds and arch/arm64/kernel/module.lds
both define a plt, I remove plt defination from scripts/module-common.lds and
now messages not show again.

Still I'm curious why things goes differently in gcc 7.3.0.According to the
result of readelf, two plt are merged into one(for the section size is 2, while
each plt keeps 1 actually) so there is no duplicate plt in section header.

Can I get an answer for this?

My scripts/module-common.lds:
```
/*
 * Common module linker script, always used when linking a module.
 * Archs are free to supply their own linker scripts.  ld will
 * combine them automatically.
 */
SECTIONS {
/DISCARD/ : {
*(.discard)
*(.discard.*)
}

__ksymtab   0 : { *(SORT(___ksymtab+*)) }
__ksymtab_gpl   0 : { *(SORT(___ksymtab_gpl+*)) }
__ksymtab_unused0 : { *(SORT(___ksymtab_unused+*)) }
__ksymtab_unused_gpl0 : { *(SORT(___ksymtab_unused_gpl+*)) }
__ksymtab_gpl_future0 : { *(SORT(___ksymtab_gpl_future+*)) }
__kcrctab   0 : { *(SORT(___kcrctab+*)) }
__kcrctab_gpl   0 : { *(SORT(___kcrctab_gpl+*)) }
__kcrctab_unused0 : { *(SORT(___kcrctab_unused+*)) }
__kcrctab_unused_gpl0 : { *(SORT(___kcrctab_unused_gpl+*)) }
__kcrctab_gpl_future0 : { *(SORT(___kcrctab_gpl_future+*)) }

.init_array 0 : ALIGN(8) { *(SORT(.init_array.*))
*(.init_array) }

__jump_table0 : ALIGN(8) { KEEP(*(__jump_table)) }
}
SECTIONS {
 . = ALIGN(4);
 .plt : { BYTE(0) }
 .plt.idx : { BYTE(0) }
}
```

My arch/arm64/kernel/module.lds:
```
SECTIONS {
.plt (NOLOAD) : { BYTE(0) }
.init.plt (NOLOAD) : { BYTE(0) }
.text.ftrace_trampoline (NOLOAD) : { BYTE(0) }
}
```

[Bug c/115848] ICE: 'verify_type' failed with -flto and strub attribute and typedef of the function type

2024-07-16 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115848

Alexandre Oliva  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #2 from Alexandre Oliva  ---
Mine

[Bug target/115937] duplicate .plt in module's elf header

2024-07-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115937

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Andrew Pinski  ---
>Still I'm curious why things goes differently in gcc 7.3.0.

The difference is binutils and not GCC that matters when it comes to the linker
script.

[Bug rtl-optimization/115948] [SH] wrong fpu mode-switch

2024-07-16 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115948

Oleg Endo  changed:

   What|Removed |Added

   Last reconfirmed||2024-07-16
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
  Known to fail||14.1.1

[Bug rtl-optimization/115912] [15 regression] Harfbuzz testsuite fails (mvar_partial_instance test) since r15-1901-g98914f9eba5f19

2024-07-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115912

--- Comment #11 from Andrew Pinski  ---
(In reply to Sam James from comment #10)
> libharfbuzz_subset_la-hb-subset.cc.300r.ext_dce.xz:
> https://dev.gentoo.org/~sam/bugs/gcc/gcc-harfbuzz-dce/libharfbuzz_subset_la-
> hb-subset.cc.300r.ext_dce.xz. Too big to attach, sorry.

Majority of the `Successfully transformed to` all come from things like:
```
void f(unsigned *a, unsigned short *b)
{
*b = __builtin_bswap16(*a);
}

void f1(unsigned b, unsigned short * a)
{
*a = __builtin_bswap16(b);
}

```

Note there is an extra move in f1 in GCC 15 compared to 14 on x86_64 too. I
didn't look through all 287 `Successfully transformed to` to verify all of them
though.

Note it might also be useful to add dbgcnt.def support to ext-dce.cc. If I get
a chance I might add it too.

[Bug tree-optimization/82255] Vectorizer cost model overcounts cost of some vectorized loads

2024-07-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82255

--- Comment #14 from Andrew Pinski  ---
This sounds very similar to what I am now running into
https://gcc.gnu.org/pipermail/gcc/2024-July/244362.html .

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:1e3aa9c9278db69d4bdb661a750a7268789188d6

commit r15-2054-g1e3aa9c9278db69d4bdb661a750a7268789188d6
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #13 from Richard Biener  ---
Hmm, interesting.  We even vectorize this with just -mavx512f but end up
using vector(16) int besides vector(8) long and equality compares of
vector(16) int:

vpcmpd  $0, %zmm7, %zmm0, %k2

according to docs that's fine with AVX512F.  But then for both long and double
you need byte masks so I wonder why kmovb isn't in AVX512F ...

I will adjust the testcase to use only AVX512F and push the fix now.  I can't
reproduce the runfail in a different worktree.

Note I don't see all-zero masks but

  vect_patt_22.11_6 = .MASK_LOAD (&MEM  [(void *)&KingSafetyMask1
+ 8B], 64B, { -1, 0, 0, 0, 0, 0, 0, 0 });

could be optimized to movq $mem, %zmmN (just a single or just a power-of-two
number of initial elements read).  Not sure if the corresponding

  vect_patt_20.17_34 = .MASK_LOAD (&MEM  [(void
*)&KingSafetyMask1 + -8B], 64B, { 0, 0, 0, 0, 0, 0, 0, -1 });

is worth optimizing to xor %zmmN, %zmmN and pinsr $MEM, %zmmN?  Eliding
constant masks might help to avoid STLF issues due to false dependences on
masked out elements (IIRC all uarchs currently suffer from that).

Note even all-zero masks cannot be optimized on GIMPLE currently since the
value of the masked out lanes isn't well-defined there (we're working on that).

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #14 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:a177be05f6952c3f7e62186d2e138d96c475b81a

commit r15-2055-ga177be05f6952c3f7e62186d2e138d96c475b81a
Author: Richard Biener 
Date:   Mon Jul 15 13:50:58 2024 +0200

tree-optimization/115843 - fix wrong-code with fully-masked loop and
peeling

When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases.  The following fixes this by properly applying
the bias for the component to the shift amount.

PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.

* gcc.dg/vect/pr115843.c: New testcase.

[Bug target/115749] Non optimal assembly for integer modulo by a constant on x86-64 CPUs

2024-07-16 Thread lingling.kong7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749

--- Comment #11 from kong lingling  ---
After adjusted rtx_cost of imulq for COST_N_INSNS (4) to COST_N_INSNS (3), I
tested the benchmark on Sierra Forest machine based on gcc trunk, and the
algorithm with 2 multiplications is 2% faster. For Spec2017 performance
improvement is around 0.2%  (1 copy,  -march=native -Ofast -funroll-loops -flto
/ -mtune=generic -O2 -march=x86-64-v3).

[Bug target/115950] New: Missed SVE fold to INCP

2024-07-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115950

Bug ID: 115950
   Summary: Missed SVE fold to INCP
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

#include 

using u64 = uint64_t;

u64 foo2(u64 x, svbool_t pg)
{
  return x+svcntp_b8(pg, pg);
}

compiled with -O3 -march=armv9-a generates:
foo2(unsigned long, __SVBool_t, __SVBool_t):
cntpx1, p0, p0.b
add x0, x1, x0
ret

but that should be folded to:
foo2(unsigned long, __SVBool_t, __SVBool_t):// @foo2(unsigned
long, __SVBool_t, __SVBool_t)
incpx0, p0.b
ret

Like LLVM does.

[Bug tree-optimization/115843] [14 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

Richard Biener  changed:

   What|Removed |Added

Summary|[14/15 Regression]  |[14 Regression]
   |531.deepsjeng_r fails to|531.deepsjeng_r fails to
   |verify with -O3 |verify with -O3
   |-march=znver4 --param   |-march=znver4 --param
   |vect-partial-vector-usage=2 |vect-partial-vector-usage=2

--- Comment #15 from Richard Biener  ---
This should be fixed on trunk, I'll backport in time for 14.2.

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545

--- Comment #15 from Jonathan Wakely  ---
The wmemchr case is covered by PR 115040

[Bug target/114189] Target implements obsolete vcond{,u,eq} expanders

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Stefan Schulze Frielinghaus
:

https://gcc.gnu.org/g:75c0bf997d2808561451e62aa6b7ae7c8e32b9e9

commit r15-2058-g75c0bf997d2808561451e62aa6b7ae7c8e32b9e9
Author: Stefan Schulze Frielinghaus 
Date:   Tue Jul 16 10:41:52 2024 +0200

s390: Drop vcond{,u} expanders

Optabs vcond{,u} will be removed for GCC 15.  Since regtest shows no
fallout, dropping the expanders, now.

gcc/ChangeLog:

PR target/114189
* config/s390/vector.md (V_HW2): Remove.
(vcond): Remove.
(vcondu): Remove.

[Bug target/115921] Missed optimization: and->ashift might be cheaper than ashift->and on typical RISC targets

2024-07-16 Thread lis8215 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115921

--- Comment #1 from Siarhei Volkau  ---
Also take in account examples like this:

uint32_t high_const_and_compare(uint32_t x)
{
if ( (x & 0x7000) == 0x3000)
  return do_some();
return do_other();
}

It might be profitable to use right shift first there to lower constants.
Now, even if you do manual optimization, GCC throws it away.

[Bug target/115949] [SH] unrecognized insn in postreload

2024-07-16 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115949

--- Comment #1 from Oleg Endo  ---
With -mlra on GCC 12 we get the following error:

internal compiler error: maximum number of generated reload insns per insn
achieved (90)
  177 | }
  | ^
0xb86282 lra_constraints(bool)
../../gcc/gcc/lra-constraints.cc:5126
0xb72242 lra(_IO_FILE*)
../../gcc/gcc/lra.cc:2375
0xb2d7e9 do_reload
../../gcc/gcc/ira.cc:5941
0xb2d7e9 execute
../../gcc/gcc/ira.cc:6127

[Bug debug/95574] line table entry in sequence with address after sequence

2024-07-16 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

Alexandre Oliva  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org

--- Comment #9 from Alexandre Oliva  ---
A noreturn function may raise an exception, and var tracking notes from the
call would be relevant within the exception handler.  Arguably, issuing a .loc
just before end of sequence, because of such tracking notes, is poor practice. 
However, the possibility of zero-sized padding, asm notes, etc, makes
eliminating such possibilities altogether extremely tricky, with forced padding
as a likely result of mandatory compliance with this rule.  IMHO relaxing the
rule makes more sense.

Location views and presumed-return-address notes change the notion that and end
of sequence can't have other information associated with it.  Location views
even change the notion that there's a single row per address, unless you
conceive of them as a fractional part of the address, which would then cause
the end of sequence opcode to bump (increment or reset) the view, thus getting
a distinct address, at least in the fractional part.

[Bug tree-optimization/115841] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

--- Comment #4 from Richard Biener  ---
r15-2054-g1e3aa9c9278db6, when backported to the branch, avoids the failure,
it's still latent of course.

The fortran loop is the DO KR=1,NRM loop from
module_mp_fast_sbm.fppized.f90:6100 which is the JERTIMESC subroutine

SUBROUTINE JERTIMESC(FI1,X1,SFN11,SFN12 &
 &  ,B11_MY,B12_MY,RIEC,CF,ID,COL,NKR)
  IMPLICIT NONE
   INTEGER NRM,KR,ICE,ID,NKR
  REAL B12,B11,FUN,DELM,FK,CF,SFN12S,SFN11S
REAL  COL, &
 & X1(NKR,ID),FI1(NKR,ID),B11_MY(NKR,ID),B12_MY(NKR,ID) &
 &,RIEC(NKR,ID),SFN11,SFN12

NRM=NKR-1
DO 1 ICE=1,ID
 SFN11S=0.
 SFN12S=0.
 SFN11=CF*SFN11S
 SFN12=CF*SFN12S
 DO KR=1,NRM
! VALUE OF DISTRIBUTION FUNCTION
FK=FI1(KR,ICE)
! DELTA-M
DELM=X1(KR,ICE)*3.*COL
! INTEGRAL'S EXPRESSION
FUN=FK*DELM
! VALUES OF INTEGRALS
B11=B11_MY(KR,ICE)
B12=B12_MY(KR,ICE)
SFN11S=SFN11S+FUN*B11
SFN12S=SFN12S+FUN*B12
 ENDDO
! CORRECTION
 SFN11=CF*SFN11S
 SFN12=CF*SFN12S
1   CONTINUE
! END
RETURN
END SUBROUTINE JERTIMESC

It's an inlined copy in ONECOND1 (and that is a IPA CP clone).

The key to reproduce is the peeling for alignment.  We have

module_mp_fast_sbm.fppized.f90:6100:19: note:  vectorization_factor = 16,
niters = 32
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_analyze_data_refs_alignment ===
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.170
  alignment:64
  misalignment: 0
  based on: fk_206 = MEM  [(float[0:D.7065]
*)&A.170][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &xl
  alignment:32
  misalignment: 0
  based on: _171 = MEM  [(float[0:D.7069] *)&xl][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.166
  alignment:64
  misalignment: 0
  based on: b11_164 = MEM  [(float[0:D.7075]
*)&A.166][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:   recording new base alignment
for &A.167
  alignment:64
  misalignment: 0
  based on: b12_161 = MEM  [(float[0:D.7079]
*)&A.167][_205];
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
 [(float[0:D.7065] *)&A.170][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: note:   can't force alignment of ref:
MEM  [(float[0:D.7069] *)&xl][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
 [(float[0:D.7075] *)&A.166][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:  
vect_compute_data_ref_alignment:
module_mp_fast_sbm.fppized.f90:6100:19: missed:   misalign = 0 bytes of ref MEM
 [(float[0:D.7079] *)&A.167][_205]
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_prune_runtime_alias_test_list ===
module_mp_fast_sbm.fppized.f90:6100:19: note:   ===
vect_enhance_data_refs_alignment ===
module_mp_fast_sbm.fppized.f90:6100:19: missed:   Unknown misalignment,
naturally aligned
module_mp_fast_sbm.fppized.f90:6100:19: note:   vect_can_advance_ivs_p:
module_mp_fast_sbm.fppized.f90:6100:19: note:   Analyze phi: sfn11s_17 = PHI

module_mp_fast_sbm.fppized.f90:6100:19: note:   reduc or virtual phi. skip.
module_mp_fast_sbm.fppized.f90:6100:19: note:   Analyze phi: sfn12s_2 = PHI

...
module_mp_fast_sbm.fppized.f90:6100:19: note:   Alignment of access forced
using peeling.
module_mp_fast_sbm.fppized.f90:6100:19: note:   Peeling for alignment will be
applied.

where we align the xl load and end up with misaligned others.  The C testcase
has the As aligned to 16 and xl aligned to 32.  We're doing runtime
alignment of xl with a scalar prologue.

[Bug tree-optimization/115841] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-07-16
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #5 from Richard Biener  ---
Reduced testcase, fails with -Ofast -mavx512vl -mtune=znver4 --param
vect-partial-vector-usage=1 -fcommon (-fcommon is the key from fortran so we
can't re-align xl).
For an arch with "proper" costs (aligned loads cheaper) one would swap
the static/not static but even with -fno-vect-cost-model which should make
three loads aligned here it doesn't reproduce.

unsigned char xl[192];
static unsigned char A170[192*3];

void jerate (unsigned char *, unsigned char *);
float foo (unsigned n)
{
  jerate (xl, A170);

  unsigned i = 32;
  int kr = 1;
  float sfn11s = 0.f;
  float sfn12s = 0.f;
  do
{
  int krm1 = kr - 1;
  long j = krm1;
  float a = (*(float(*)[n])A170)[j];
  float b = (*(float(*)[n])xl)[j];
  float c = a * b;
  float d = c * 6.93149983882904052734375e-1f;
  float e = (*(float(*)[n])A170)[j+48];
  float f = (*(float(*)[n])A170)[j+96];
  float g = d * e;
  sfn11s = sfn11s + g;
  float h = f * d;
  sfn12s = sfn12s + h;
  kr++;
}
  while (--i != 0);
  float tem = sfn11s + sfn12s;
  return tem;
}

[Bug bootstrap/115951] New: [15 Regression] pgo+lto enabled bootstrap fails building gnat (ICE in fold_stmt, at gimple-range-fold.cc:701)

2024-07-16 Thread doko at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115951

Bug ID: 115951
   Summary: [15 Regression] pgo+lto enabled bootstrap fails
building gnat (ICE in fold_stmt, at
gimple-range-fold.cc:701)
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: doko at gcc dot gnu.org
  Target Milestone: ---

seen on the trunk, targeting linux, architecture doesn't matter. probably seen
for the first time in early June.

this is a pgo+lto enabled bootstrap

during GIMPLE pass: vrp
../../src/gcc/ada/bindo-graphs.adb: In function
'bindo__graphs__library_graphs__find_components':
../../src/gcc/ada/bindo-graphs.adb:1842:7: internal compiler error: in
fold_stmt, at gimple-range-fold.cc:701
 1842 |   procedure Find_Components (G : Library_Graph) is
  |   ^
0x2a56b80 internal_error(char const*, ...)
../../src/gcc/diagnostic-global-context.cc:491
0xdf7f01 fancy_abort(char const*, int, char const*)
../../src/gcc/diagnostic.cc:1725
0xdb4402 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*)
../../src/gcc/gimple-range-fold.cc:701
0x27f71cd gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
../../src/gcc/gimple-range.cc:277
0x27f71cd gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
../../src/gcc/gimple-range.cc:338
0x17edc32 range_query::value_of_stmt(gimple*, tree_node*)
../../src/gcc/value-query.cc:133
0x1652f8f substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
../../src/gcc/tree-ssa-propagate.cc:824
0x2796e7e dom_walker::walk(basic_block_def*)
../../src/gcc/domwalk.cc:311
0x1651f65 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
../../src/gcc/tree-ssa-propagate.cc:1007
0x17b1770 execute_ranger_vrp(function*, bool)
../../src/gcc/tree-vrp.cc:1099
0x17b64ee execute
../../src/gcc/tree-vrp.cc:1347
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
make[6]: *** [/tmp/ccCo9h2R.mk:14: /tmp/ccibijV1.ltrans6.ltrans.o] Error 1
lto-wrapper: fatal error: /usr/bin/make returned 2 exit status
compilation terminated.
/usr/bin/x86_64-linux-gnu-ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[5]: *** [../../src/gcc/ada/gcc-interface/Make-lang.in:769: gnatbind] Error
1
make[5]: Leaving directory '/<>/build/gcc'
make[4]: *** [Makefile:5307: all-stagefeedback-gcc] Error 2
make[4]: Leaving directory '/<>/build'
make[3]: *** [Makefile:33241: stagefeedback-bubble] Error 2
make[3]: Leaving directory '/<>/build'
make[2]: *** [Makefile:33272: profiledbootstrap-lean] Error 2


configured with
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust
 --prefix=/usr/lib/gcc-snapshot
 --with-gcc-major-version-only
 --program-prefix=
 --enable-shared
 --enable-linker-build-id
 --disable-nls
 --enable-bootstrap
 --enable-clocale=gnu
 --enable-libstdcxx-debug
 --enable-libstdcxx-time=yes
 --with-default-libstdcxx-abi=new
 --enable-libstdcxx-backtrace
 --enable-gnu-unique-object
 --disable-vtable-verify
 --enable-plugin
 --with-system-zlib
 --enable-libphobos-checking=release
 --with-target-system-zlib=auto
 --enable-objc-gc=auto
 --enable-multiarch
 --disable-werror
 --enable-cet
 --with-arch-32=i686
 --with-abi=m64
 --with-multilib-list=m32,m64,mx32
 --enable-multilib
 --with-tune=generic

--enable-offload-targets=nvptx-none=/<>/debian/tmp-nvptx/usr/lib/gcc-snapshot,amdgcn-amdhsa=/<>/debian/tmp-gcn/usr/lib/gcc-snapshot
 --enable-offload-defaulted
 --without-cuda-driver
 --enable-checking=yes,extra,rtl
 --build=x86_64-linux-gnu
 --host=x86_64-linux-gnu
 --target=x86_64-linux-gnu
 --with-build-config=bootstrap-lto-lean
 --enable-link-serialization=2

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #16 from Alexandre Oliva  ---
Ok, I'd tested the backports on gcc-13 recently, so I'm going to install both
patches in both gcc-13 and gcc-14, the latter under the assumption that if it
works in gcc-13 and trunk, it will be fine for gcc-14 as well.

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #17 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Alexandre Oliva
:

https://gcc.gnu.org/g:102bcf147892855463c5854119aacda752ed033c

commit r14-10426-g102bcf147892855463c5854119aacda752ed033c
Author: Alexandre Oliva 
Date:   Tue Jul 16 06:27:06 2024 -0300

[i386] restore recompute to override opts after change [PR113719]

The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
toolchains configured to --enable-frame-pointer, because the
optimization node created within handle_optimize_attribute had
flag_omit_frame_pointer incorrectly set, whereas
default_optimization_node didn't.  With this difference,
can_inline_edge_by_limits_p flagged an optimization mismatch and we
refused to inline the function that had a redundant optimization flag
into one that didn't, which is exactly what is tested for there.

This patch restores the calls to ix86_default_align and
ix86_recompute_optlev_based_flags that used to be, and ought to be,
issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
intent of the original change, of having those functions called at
different spots within ix86_option_override_internal.  To that end,
the remaining bits were refactored into a separate function, that was
in turn adjusted to operate on explicitly-passed opts and opts_set,
rather than going for their global counterparts.


for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc
(ix86_override_options_after_change_1): Add opts and opts_set
parms, operate on them, after factoring out of...
(ix86_override_options_after_change): ... this.  Restore calls
of ix86_default_align and ix86_recompute_optlev_based_flags.
(ix86_option_override_internal): Call the factored-out bits.

(cherry picked from commit bf2fc0a27b35de039c3d45e6d7ea9ad0a8a305ba)

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #18 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Alexandre Oliva
:

https://gcc.gnu.org/g:7bc63f1c70331763989d72b7df051e0ce67ff84c

commit r14-10427-g7bc63f1c70331763989d72b7df051e0ce67ff84c
Author: Alexandre Oliva 
Date:   Tue Jul 16 06:27:09 2024 -0300

[i386] adjust flag_omit_frame_pointer in a single function [PR113719]

The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target.  The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:

- ix86_recompute_optlev_based_flags computes and sets a
  -f[no-]omit-frame-pointer default depending on
  USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size

- ix86_option_override_internal enables flag_omit_frame_pointer for
  -momit-leaf-frame-pointer to take effect

ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer.  It is called during global process_options.

But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes.  If they
differ, the testcase fails.

In order to fix this, we need to process all overriders of this option
whenever we process any of them.  Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.


for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.

(cherry picked from commit bf8e80f9d164f8778d86a3dc50e501cf19a9eff1)

[Bug tree-optimization/115841] [12/13/14 Regression] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.2
Summary|521.wrf_r ICEs when |[12/13/14 Regression]
   |building with -march=znver4 |521.wrf_r ICEs when
   |-Ofast -flto --param|building with -march=znver4
   |vect-partial-vector-usage=1 |-Ofast -flto --param
   ||vect-partial-vector-usage=1

--- Comment #6 from Richard Biener  ---
The problem is latent since GCC 12 adding reusable accumulator support.  I'm
testing a fix currently and will push to trunk once ready.

[Bug c++/115952] New: g++ 14.1.0 internal compiler error for ambiguous function template overloads

2024-07-16 Thread aslobodkins at mail dot smu.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115952

Bug ID: 115952
   Summary: g++ 14.1.0 internal compiler error for ambiguous
function template overloads
   Product: gcc
   Version: 14.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aslobodkins at mail dot smu.edu
  Target Milestone: ---

*
I am using g++ 14.1.0 on Ubunutu 22.04 and encountered internal compiler error
when working on one of my recent projects. The function call is ambiguous and
compiler should issue a corresponding error message. Instead, the output is as
follows:


*
slobod@LinuxHPC:~/Documents/Research/strict-lib/example$ make example2
g++-14.1 -std=c++20 -DSTRICT_DEBUG_OFF example2.cpp -o example2.x -lm -I
../src/
example2.cpp: In function ‘int main()’:
example2.cpp:11:29: internal compiler error: Segmentation fault
   11 |Z(place::all, place::all)(0, 0);
  |~^~
0x742a4804251f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x742a48029d8f __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x742a48029e3f __libc_start_main_impl
../csu/libc-start.c:392
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
make: *** [Makefile:32: example2] Error 1


*
Other compilers, such as 18.1.3 or g++ 13.2.0 report the error properly as
follows:


*
clang++ -std=c++20 -DSTRICT_DEBUG_OFF example2.cpp -o example2.x -lm -I ../src/
example2.cpp:11:4: error: call to object of type
'Derived2D>' (aka
'Derived2D>>>') is
ambiguous
   11 |Z(place::all, place::all)(0, 0);
  |^
../src/derived2D.hpp:70:53: note: candidate function [with I1 = int, I2 = int]
   70 |STRICT_NODISCARD_CONSTEXPR_INLINE decltype(auto) operator()(I1 i, I2
j);
  | ^
../src/derived2D.hpp:247:36: note: candidate function [with S1 = int, S2 = int]
  247 |STRICT_NODISCARD_CONSTEXPR auto operator()(S1 s1, S2 s2) &&
  |^
1 error generated.
make: *** [Makefile:32: example2] Error 1


*
slobod@LinuxHPC:~/Documents/Research/strict-lib/example$ make example2
g++-13.2 -std=c++20 -DSTRICT_DEBUG_OFF example2.cpp -o example2.x -lm -I
../src/
example2.cpp: In function ‘int main()’:
example2.cpp:11:29: error: call of
‘(slib::Derived2D
> > >) (int, int)’ is ambiguous
   11 |Z(place::all, place::all)(0, 0);
  |~^~
In file included from ../src/array_IO.hpp:13,
 from ../src/strict_lib.hpp:10,
 from example2.cpp:2:
../src/derived2D.hpp:357:50: note: candidate: ‘constexpr decltype(auto)
slib::Derived2D::operator()(I1, I2) [with I1 = int; I2 = int; Base =
slib::ConstSliceArrayBase2D > >]’
  357 | STRICT_NODISCARD_CONSTEXPR_INLINE decltype(auto)
Derived2D::operator()(I1 i, I2 j) {
  |  ^~~
../src/derived2D.hpp:366:50: note: candidate: ‘constexpr decltype(auto)
slib::Derived2D::operator()(I1, I2) const [with I1 = int; I2 = int; Base
= slib::ConstSliceArrayBase2D > >]’
  366 | STRICT_NODISCARD_CONSTEXPR_INLINE decltype(auto)
Derived2D::operator()(I1 i, I2 j) const {
  |  ^~~
../src/derived2D.hpp:830:33: note: candidate: ‘constexpr auto
slib::Derived2D::operator()(S1, S2) const & [with S1 = int; S2 = int;
Base = slib::ConstSliceArrayBase2D >
>]’
  830 | STRICT_NODISCARD_CONSTEXPR auto Derived2D::operator()(S1 s1, S2
s2) const& {
  | ^~~
../src/derived2D.hpp:863:33: note: candidate: ‘constexpr auto
slib::Derived2D::operator()(S1, S2) && requires
!(ArrayTwoDimType >) [with S1 = int; S2 = int; Base =
slib::ConstSliceArrayBase2D > >]’
  863 | STRICT_NODISCARD_CONSTEXPR auto Derived2D::operator()(S1 s1, S2
s2) &&
  | ^~~
make: *** [Makefile:32: example2] Error 1



*
I have attached preprocessed file by adding save-temps option to Makefile.
Compiler configuration settings were: 

../gcc-releases-gcc-14.1.0/configure -v --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--prefix=/usr/local/gcc-14.1.0 --enabl

[Bug c++/115952] g++ 14.1.0 internal compiler error for ambiguous function template overloads

2024-07-16 Thread aslobodkins at mail dot smu.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115952

--- Comment #1 from Slobodkins, Arkadijs  ---
Created attachment 58683
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58683&action=edit
preprocessed file

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #19 from GCC Commits  ---
The releases/gcc-13 branch has been updated by Alexandre Oliva
:

https://gcc.gnu.org/g:0b9d6829b503cfc72c4271ead2948d8100cce25c

commit r13-8915-g0b9d6829b503cfc72c4271ead2948d8100cce25c
Author: Alexandre Oliva 
Date:   Tue Jul 16 06:48:18 2024 -0300

[i386] restore recompute to override opts after change [PR113719]

The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
toolchains configured to --enable-frame-pointer, because the
optimization node created within handle_optimize_attribute had
flag_omit_frame_pointer incorrectly set, whereas
default_optimization_node didn't.  With this difference,
can_inline_edge_by_limits_p flagged an optimization mismatch and we
refused to inline the function that had a redundant optimization flag
into one that didn't, which is exactly what is tested for there.

This patch restores the calls to ix86_default_align and
ix86_recompute_optlev_based_flags that used to be, and ought to be,
issued during TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE, but preserves the
intent of the original change, of having those functions called at
different spots within ix86_option_override_internal.  To that end,
the remaining bits were refactored into a separate function, that was
in turn adjusted to operate on explicitly-passed opts and opts_set,
rather than going for their global counterparts.


for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc
(ix86_override_options_after_change_1): Add opts and opts_set
parms, operate on them, after factoring out of...
(ix86_override_options_after_change): ... this.  Restore calls
of ix86_default_align and ix86_recompute_optlev_based_flags.
(ix86_option_override_internal): Call the factored-out bits.

(cherry picked from commit bf2fc0a27b35de039c3d45e6d7ea9ad0a8a305ba)

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #20 from GCC Commits  ---
The releases/gcc-13 branch has been updated by Alexandre Oliva
:

https://gcc.gnu.org/g:52959e34c8a7a0473784ca044487d05e791286c1

commit r13-8916-g52959e34c8a7a0473784ca044487d05e791286c1
Author: Alexandre Oliva 
Date:   Tue Jul 16 06:48:36 2024 -0300

[i386] adjust flag_omit_frame_pointer in a single function [PR113719]

The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target.  The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:

- ix86_recompute_optlev_based_flags computes and sets a
  -f[no-]omit-frame-pointer default depending on
  USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size

- ix86_option_override_internal enables flag_omit_frame_pointer for
  -momit-leaf-frame-pointer to take effect

ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer.  It is called during global process_options.

But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes.  If they
differ, the testcase fails.

In order to fix this, we need to process all overriders of this option
whenever we process any of them.  Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.


for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.

(cherry picked from commit bf8e80f9d164f8778d86a3dc50e501cf19a9eff1)

[Bug lto/115953] New: --wrap does now work with lto

2024-07-16 Thread koule2333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115953

Bug ID: 115953
   Summary: --wrap does now work with lto
   Product: gcc
   Version: 12.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: koule2333 at gmail dot com
  Target Milestone: ---

I am using gcc 12.3.1 and ld 2.41, and this error occurs again.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88643

my case:

int foo();

int __wrap_foo() {
return 6;
}

int main() {
return foo();
}

When I use 'gcc -flto ccc.c -Wl,--wrap=foo' to compile it, I got undefined
reference to `__wrap_foo'.

Can anyone tell me why?

[Bug c++/115952] [14 Regression] g++ 14.1.0 internal compiler error for ambiguous function template overloads

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115952

Jonathan Wakely  changed:

   What|Removed |Added

Summary|g++ 14.1.0 internal |[14 Regression] g++ 14.1.0
   |compiler error for  |internal compiler error for
   |ambiguous function template |ambiguous function template
   |overloads   |overloads
  Known to fail||14.1.0
   Target Milestone|--- |14.2
 Status|UNCONFIRMED |RESOLVED
   Keywords||ice-on-invalid-code
  Known to work||13.2.0, 15.0
 Resolution|--- |DUPLICATE

--- Comment #2 from Jonathan Wakely  ---
The ICE for GCC 14 started with r14-6522

It was fixed on trunk by r15-1292

So this is a dup of PR 115239

*** This bug has been marked as a duplicate of bug 115239 ***

[Bug c++/115239] [14 Regression] ICE: Segmentation fault with ambiguous function call in some cases (`const char*` vs `char` with `long` vs `unsigned`) since r14-6522

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115239

Jonathan Wakely  changed:

   What|Removed |Added

 CC||aslobodkins at mail dot smu.edu

--- Comment #9 from Jonathan Wakely  ---
*** Bug 115952 has been marked as a duplicate of this bug. ***

[Bug c++/103909] coroutines: co_yield of aggregate-initialized temporaries leads to segmentation faults.

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103909

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org
 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from Arsen Arsenović  ---
fixed by r13-4479-g58a7b1e354530d

*** This bug has been marked as a duplicate of bug 101367 ***

[Bug c++/101367] [coroutines] destructor for capture in lambda temporary operand to co_yield expression called twice

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101367

Arsen Arsenović  changed:

   What|Removed |Added

 CC||johannes.kalmbach@googlemai
   ||l.com

--- Comment #11 from Arsen Arsenović  ---
*** Bug 103909 has been marked as a duplicate of this bug. ***

[Bug ipa/115942] [14/15 regression] lto1: ICE in record_argument_state, at ipa-param-manipulation.cc:2122

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115942

Richard Biener  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org
   Target Milestone|--- |14.2
  Component|lto |ipa
   Keywords||ice-on-valid-code, lto

[Bug bootstrap/115951] [15 Regression] pgo+lto enabled bootstrap fails building gnat (ICE in fold_stmt, at gimple-range-fold.cc:701)

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115951

Richard Biener  changed:

   What|Removed |Added

 Blocks||85316
   Target Milestone|--- |15.0
   Keywords||build, ice-on-valid-code,
   ||needs-reduction
 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com

--- Comment #1 from Richard Biener  ---
It might also fail on the GCC 14 branch when checking is enabled since this is

  // We sometimes get compatible types copied from operands, make sure
  // the correct type is being returned.
  if (name && TREE_TYPE (name) != r.type ()) 
{
  gcc_checking_assert (range_compatible_p (r.type (), TREE_TYPE (name)));


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316
[Bug 85316] [meta-bug] VRP range propagation missed cases

[Bug lto/115953] --wrap does not work with lto

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115953

Richard Biener  changed:

   What|Removed |Added

Summary|--wrap does now work with   |--wrap does not work with
   |lto |lto
 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Richard Biener  ---
The linker-side bug isn't resolved yet:
https://sourceware.org/bugzilla/show_bug.cgi?id=24415

*** This bug has been marked as a duplicate of bug 88643 ***

[Bug lto/88643] -Wl,--wrap not supported with LTO

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88643

Richard Biener  changed:

   What|Removed |Added

 CC||koule2333 at gmail dot com

--- Comment #13 from Richard Biener  ---
*** Bug 115953 has been marked as a duplicate of this bug. ***

[Bug middle-end/115954] New: Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

Bug ID: 115954
   Summary: Alignment of _Atomic structs incompatible between GCC
and LLVM
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilco at gcc dot gnu.org
  Target Milestone: ---

The following code shows ABI inconsistencies between GCC and LLVM:

#include 
#include 
#include 

_Atomic struct A3 { char a[3]; } a3;
_Atomic struct A7 { char a[7]; } a7;
_Atomic struct A8 { char a[8]; } a8;
_Atomic struct A9 { char a[9]; } a9;
_Atomic struct A16 { char a[16]; } a16;

int main (void)
{
   printf("size %ld align %ld lockfree %d\n", sizeof (a3), alignof (a3),
atomic_is_lock_free (&a3));
   printf("size %ld align %ld lockfree %d\n", sizeof (a7), alignof (a7),
atomic_is_lock_free (&a7));
   printf("size %ld align %ld lockfree %d\n", sizeof (a8), alignof (a8),
atomic_is_lock_free (&a8));
   printf("size %ld align %ld lockfree %d\n", sizeof (a9), alignof (a9),
atomic_is_lock_free (&a9));
   printf("size %ld align %ld lockfree %d\n", sizeof (a16), alignof (a16),
atomic_is_lock_free (&a16));
   return 0;
}

Compiled with GCC -O2 -latomic I get this on AArch64:

size 3 align 1 lockfree 1
size 7 align 1 lockfree 1
size 8 align 8 lockfree 1
size 9 align 1 lockfree 0
size 16 align 16 lockfree 0

However LLVM reports:

size 4 align 4 lockfree 1
size 8 align 8 lockfree 1
size 8 align 8 lockfree 1
size 16 align 16 lockfree 1
size 16 align 16 lockfree 1

The same is true for x86_64 GCC:

size 3 align 1 lockfree 0
size 7 align 1 lockfree 1  (due to alignment in libatomic)
size 8 align 8 lockfree 1
size 9 align 1 lockfree 0
size 16 align 16 lockfree 0

and LLVM:

size 4 align 4 lockfree 1
size 8 align 8 lockfree 1
size 8 align 8 lockfree 1
size 16 align 16 lockfree 0
size 16 align 16 lockfree 0

Increasing the alignment of small _Atomic structs to a power of 2 means these
will always be lock free rather than sometimes depending on alignment.

This also has the nice property that all types smaller than the maximum
supported atomic size are always lock free so there is no need to make
libatomic calls.

[Bug lto/88643] -Wl,--wrap not supported with LTO

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88643

--- Comment #14 from Richard Biener  ---
Just to note with BFD ld from 2.41.0 I see

1
a-t.o 3
194 b4b00c6ef6ad050b PREVAILING_DEF_IRONLY __wrap_cook
198 b4b00c6ef6ad050b PREVAILING_DEF main
214 b4b00c6ef6ad050b RESOLVED_IR cook

when building

int cook(void);
int __wrap_cook(void)
{
  return 0;
}
int main()
{
  if (cook () == -1)
__builtin_abort ();

  return 0;
}

but an LTO link fails with

/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld:
./a.ltrans0.ltrans.o: in function `main':
:(.text+0x10): undefined reference to `__wrap_cook'

GCC emits at LTRANS stage

.file   ""
.text
.type   __wrap_cook, @function
__wrap_cook:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movl$0, %eax
popq%rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size   __wrap_cook, .-__wrap_cook
.globl  main
.type   main, @function
main:
.LFB1:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
callcook
cmpl$-1, %eax
jne .L4
callabort
.L4:
movl$0, %eax
popq%rbp
.cfi_def_cfa 7, 8
ret

which I think should be OK.  The main difference is that __wrap_cook is
an internal symbol but that's OK since the linker told GCC

194 b4b00c6ef6ad050b PREVAILING_DEF_IRONLY __wrap_cook

it probably should have used

194 b4b00c6ef6ad050b PREVAILING_DEF __wrap_cook

instead.

[Bug tree-optimization/115843] [14 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #16 from GCC Commits  ---
The releases/gcc-11 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:bcb2a35a0c04417c407a97d9ff05c2af1d6d1b8d

commit r11-11578-gbcb2a35a0c04417c407a97d9ff05c2af1d6d1b8d
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

[Bug middle-end/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

Richard Biener  changed:

   What|Removed |Added

 Target||aarch64 x86_64-*-*
 CC||matz at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
Not sure what the x86 psABI says here (possibly nothing for aggregate _Atomic).

[Bug middle-end/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

--- Comment #2 from Richard Biener  ---
(In reply to Richard Biener from comment #1)
> Not sure what the x86 psABI says here (possibly nothing for aggregate
> _Atomic).

It doesn't consider _Atomic [influencing the ABI] at all.

Note I think your test queries actual object alignment which a compiler
can of course increase vs. what the ABI requires as minimum alignment,
you should possibly cross-check with alignof/sizeof of the type.

On x86 clang returns size 8 and align 8 for the atomic A7 type (GCC does not).

[Bug libstdc++/115955] New: atomic::wait _S_for uses a poor hash function

2024-07-16 Thread jakub.lopuszanski at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115955

Bug ID: 115955
   Summary: atomic::wait _S_for uses a poor hash function
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub.lopuszanski at oracle dot com
  Target Milestone: ---

Summary:


The way I understand the code in 
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/atomic_wait.h#L251-L258
and the article at
https://developers.redhat.com/articles/2022/12/06/implementing-c20-atomic-waiting-libstdc#
is that for std::atomic::wait the implementation uses 16 elements and
maps the address of the actual atomic to the element by a very simple hash
function:
```
  static __waiter_pool_base&
  _S_for(const void* __addr) noexcept
  {
constexpr uintptr_t __ct = 16;
static __waiter_pool_base __w[__ct];
auto __key = (uintptr_t(__addr) >> 2) % __ct;
return __w[__key];
  }
```
This seems to be a very poor choice in my experience, as it only depends on (4
of) the lowest 6 bits, because data structures used in multi-threaded apps, and
in particular, data structures which use atomics, often use cache-line
alignment to avoid "false sharing", and because a cache line usually has 64
bytes, thus the __addr%64 is usually the same value for all instances of a
given class. Also, quite often: 0.
This is a performance problem, as it then leads to spurious wake ups.

Background:
===

I am working on InnoDB engine for MySQL, which uses a very old `struct
os_event` which among other things lets people call epoch=reset(), set(),
wait_low(epoch). This structure is used a lot in other old classes such as
ib_mutex_t or rw_lock_t to notify threads waiting for a latch that it got
released. As each page in Buffer Pool has several such latches, and BP size can
be in TBs, one can imagine we have a lot of os_event-s. They are implemented as
a combination of boolean flag, uint64_t counter, a native mutex, and a native
cond var, leading to a huge sizeof(os_event) (upwards of 96 bytes on Linux) and
a lot of non-trivial code to maintain. In 2020 [Facebook
proposed](https://bugs.mysql.com/bug.php?id=102045) to use an idea similar to
[WebKit's Parking Lot](https://db.in.tum.de/~boettcher/locking.pdf), which is
to have a pool of such structs, and somehow reuse/share them. This would make
the logic even more complicated, but would save a lot of memory (their claim:
1GB).

Recently MySQL's codebase moved to C++20, and I saw an opportunity to replace
all of this custom logic with std::atomic as there is a natural
mapping:
```
epoch=reset() --> counter=load(), 
set() --> fetch_add(1)+notify_all(), 
wait_low(counter) --> wait(epoch).
```
This is not exactly the same thing, but close enough semantically, so I've
implemented the change and run sysbench OLTP-RW test suite, and saw...disaster:
```
$num_of_runs $median $median_abs_diff ($min < $avg < $max) "$version"
pareto 64
6 27800 +- 113 ( 27423 < 27739 < 27943 ) "use std::atomic instead of os_event_t
+ 32-bit"
6 27676 +- 255 ( 27401 < 27696 < 28019 ) "use std::atomic instead of
os_event_t"
6 27527 +- 163 ( 27171 < 27545 < 27806 ) "baseline"

pareto 1024
9 32234 +- 113 ( 31930 < 32244 < 32555 ) "use std::atomic instead of os_event_t
+ 32-bit"
9 32080 +- 134 ( 31946 < 32137 < 32460 ) "baseline"
9 21791 +- 677 ( 19965 < 21408 < 22486 ) "use std::atomic instead of
os_event_t"

uniform 64
9 27326 +- 170 ( 26799 < 27287 < 27666 ) "use std::atomic instead of os_event_t
+ 32-bit"
9 27264 +- 98 ( 26677 < 27162 < 27453 ) "use std::atomic instead of os_event_t"
9 27246 +- 195 ( 26758 < 27206 < 27561 ) "baseline"

uniform 1024
9 35766 +- 125 ( 35246 < 35671 < 36173 ) "use std::atomic instead of os_event_t
+ 32-bit"
9 35623 +- 146 ( 35446 < 35669 < 36150 ) "baseline"
9 23631 +- 313 ( 22835 < 23603 < 24399 ) "use std::atomic instead of
os_event_t"
```
This is a machine with 96 CPUs, and we can see that for 1024 threads the impact
of replacing the default os_event_t with atomic is changing the
number of transactions per second from 35.6K to 23.6K.

A flame graphs' diff showed a lot of time spent in `syscall` during latch
acquistions, i.e. when calling "wait".
I guess this is not so much about the duration, but about frequency of these
calls, as there's a retry `while` loop in atomic_wait.h which calls futex
syscall again whenever a spurious wake up occurs.
At any rate, the problem is somewhere in the handling of 64-bit atomics,
because changing the counter to std::atomic made the problem go away
as you can see.
I'd rather use uint64_t to make ABA almost impossible.

Proposed fix:
=
How about:
a) using %17 instead of %16? 
b) looking at the next 4 bits?
c) pre-multiply the __addr by a random odd constant and take the top 4 bits?
d) use a real hash function, such 

[Bug middle-end/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

--- Comment #3 from Richard Biener  ---
And I'll note the original JTC1/SC22/WG14 - N2771 Title: C23 Atomics paper
mentions "ABI would have been fully determined to be compatible with non-atomic
type, leaving no room to implementations for introducing inconsistencies." but
I can't find where/if this went into the actual standard.

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-16 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

Alexandre Oliva  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from Alexandre Oliva  ---
Backports done.

[Bug bootstrap/115951] [15 Regression] pgo+lto enabled bootstrap fails building gnat (ICE in fold_stmt, at gimple-range-fold.cc:701)

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115951

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #2 from Arsen Arsenović  ---
PR115918 possibly related?

[Bug libstdc++/115955] atomic::wait _S_for uses a poor hash function

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115955

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-07-16
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jonathan Wakely  ---
Yes, this is a known limitation that will get fixed with the planned
refactoring of atomic::wait.

[Bug middle-end/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

--- Comment #4 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #1)
> Not sure what the x86 psABI says here (possibly nothing for aggregate
> _Atomic).

I've been asking for it to say something for years.
https://groups.google.com/g/ia32-abi/c/Tlu6Hs-ohPY

[Bug middle-end/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

--- Comment #5 from Wilco  ---
(In reply to Richard Biener from comment #2)
> (In reply to Richard Biener from comment #1)
> > Not sure what the x86 psABI says here (possibly nothing for aggregate
> > _Atomic).
> 
> It doesn't consider _Atomic [influencing the ABI] at all.
> 
> Note I think your test queries actual object alignment which a compiler
> can of course increase vs. what the ABI requires as minimum alignment,
> you should possibly cross-check with alignof/sizeof of the type.
> 
> On x86 clang returns size 8 and align 8 for the atomic A7 type (GCC does
> not).

I tried using the type for sizeof/alignof, and it returns the same values. So
GCC overaligns structs that are an exact power of 2.

[Bug c++/95457] Inadequate diagnostics on constrained coroutines

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95457

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #1 from Arsen Arsenović  ---
hm, the promise itself is not constrained, the traits specialization is.  I
imagine that might be why the diagnostic fails to manifest.  convoluting the
template a bit to make it so that there is a constraint failure on the promise
itself results in a diagnostic:

template 
class coroutine_traits {
public:
template
requires (... && !rvalue_reference)
struct my_promise {
void return_value(int x) {  }
std::suspend_never initial_suspend() { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
dummy_coroutine get_return_object() { return {}; }
void unhandled_exception() {}
};
using promise_type = my_promise;
};

result:
: In instantiation of 'struct
std::__n4861::coroutine_traits':
:36:15:   required from here
   36 | co_return x;
  |   ^
:26:11: error: template constraint failure for 'template template  requires (... &&!(rvalue_reference)) struct
std::__n4861::coroutine_traits::my_promise'
   26 | using promise_type = my_promise;
  |   ^~~~
:26:11: note: constraints not satisfied
: In substitution of 'template template 
requires (... &&!(rvalue_reference)) struct
std::__n4861::coroutine_traits::my_promise [with
 = void; Args = {int&&}]':
:26:11:   required from 'struct
std::__n4861::coroutine_traits'
:36:15:   required from here
   36 | co_return x;
  |   ^
:19:12:   required by the constraints of 'template
template  requires (... &&!(rvalue_reference)) struct
std::__n4861::coroutine_traits::my_promise'
:18:19: note: the expression '(... &&!(rvalue_reference)) [with
Args = {int&&}]' evaluated to 'false'
   18 | requires (... && !rvalue_reference)
  |  ~^~~
: In function 'dummy_coroutine foo(int&&)':
:36:5: error: unable to find the promise type for this coroutine
   36 | co_return x;
  | ^
Compiler returned: 1


unsure if this is solvable

[Bug tree-optimization/115895] [15 Regression] FAIL: gcc.dg/vect/pr115385.c execution test with -march=znver4 --param partial-vector-usage=2

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115895

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
Summary|FAIL:   |[15 Regression] FAIL:
   |gcc.dg/vect/pr115385.c  |gcc.dg/vect/pr115385.c
   |execution test with |execution test with
   |-march=znver4 --param   |-march=znver4 --param
   |partial-vector-usage=2  |partial-vector-usage=2

[Bug c++/115956] New: ICE: in change_stack, at reg-stack.cc:2732

2024-07-16 Thread iamanonymous.cs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115956

Bug ID: 115956
   Summary: ICE: in change_stack, at reg-stack.cc:2732
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iamanonymous.cs at gmail dot com
  Target Milestone: ---
Target: x86_64

Created attachment 58684
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58684&action=edit
cpp testcase

***
The compiler produces a segfault during build_this_conversion when compiling
the provided code with the specified options. 
The issue can also be reproduced on Compiler Explorer.

***
OS and Platform:
# uname -a
Linux ubuntu 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
***
# g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/root/gdbtest/gcc/gcc-15/libexec/gcc/x86_64-pc-linux-gnu/15.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /root/gdbtest/gcc/obj/../gcc/configure
--prefix=/root/gdbtest/gcc/gcc-15 --enable-languages=c,c++,fortran,go
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 15.0.0 20240509 (experimental) (GCC) 
***
Program:Please refer to the attachment.
***
Command Lines:
# g++ random.cpp -O2 -std=c11 -march=native -msse2 -Wall -Wextra 
-Wconversion -Wshadow -Wold-style-cast -Wcast-align -Woverloaded-virtual  -c -o
random.o

cc1plus: warning: command-line option '-std=c11' is valid for C/ObjC but not
for C++
random.cpp: In member function 'float Random::operator()(int, int, float,
float, float) const':
random.cpp:75:21: warning: use of old-style cast to 'int' [-Wold-style-cast]
random.cpp:76:21: warning: use of old-style cast to 'int' [-Wold-style-cast]
random.cpp:77:21: warning: use of old-style cast to 'int' [-Wold-style-cast]
random.cpp:95:22: warning: conversion from 'int' to 'float' may change value
[-Wconversion]
random.cpp:96:22: warning: conversion from 'int' to 'float' may change value
[-Wconversion]
random.cpp:97:22: warning: conversion from 'int' to 'float' may change value
[-Wconversion]
random.cpp:102:11: warning: narrowing conversion of '((5.0e-1 *
((double)((float)dx))) * ((double)float)dx) * float)dx) * (float)-1) +
(float)2)) - (float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:102:11: warning: conversion from 'double' to 'float' may change
value [-Wfloat-conversion]
random.cpp:103:8: warning: narrowing conversion of '(5.0e-1 *
((double)float)dx) * (((float)dx) * (((float)3 * ((float)dx)) - (float)5)))
+ (float)2)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:103:8: warning: conversion from 'double' to 'float' may change value
[-Wfloat-conversion]
random.cpp:104:11: warning: narrowing conversion of '((5.0e-1 *
((double)((float)dx))) * ((double)float)dx) * (((float)-3 * ((float)dx)) +
(float)4)) + (float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:104:11: warning: conversion from 'double' to 'float' may change
value [-Wfloat-conversion]
random.cpp:105:14: warning: narrowing conversion of '(((5.0e-1 *
((double)((float)dx))) * ((double)((float)dx))) * ((double)(((float)dx) -
(float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:105:14: warning: conversion from 'double' to 'float' may change
value [-Wfloat-conversion]
random.cpp:110:11: warning: narrowing conversion of '((5.0e-1 *
((double)((float)dy))) * ((double)float)dy) * float)dy) * (float)-1) +
(float)2)) - (float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:110:11: warning: conversion from 'double' to 'float' may change
value [-Wfloat-conversion]
random.cpp:111:8: warning: narrowing conversion of '(5.0e-1 *
((double)float)dy) * (((float)dy) * (((float)3 * ((float)dy)) - (float)5)))
+ (float)2)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:111:8: warning: conversion from 'double' to 'float' may change value
[-Wfloat-conversion]
random.cpp:112:11: warning: narrowing conversion of '((5.0e-1 *
((double)((float)dy))) * ((double)float)dy) * (((float)-3 * ((float)dy)) +
(float)4)) + (float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:112:11: warning: conversion from 'double' to 'float' may change
value [-Wfloat-conversion]
random.cpp:113:14: warning: narrowing conversion of '(((5.0e-1 *
((double)((float)dy))) * ((double)((float)dy))) * ((double)(((float)dy) -
(float)1)))' from 'double' to 'float' [-Wnarrowing]
random.cpp:113:14: warning: conversion from 'double' to 'float' may change

[Bug middle-end/115459] [14 regression] Alpha/Linux ICE: in gen_rtx_SUBREG, at emit-rtl.cc:1032 around g-debpoo.adb:1896:8, as from r14-1187-gd6b756447cd5

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115459

--- Comment #7 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Alexandre Oliva
:

https://gcc.gnu.org/g:c8fdef7fc25dafc8c7a12727c1046b3c7f2b89bb

commit r14-10433-gc8fdef7fc25dafc8c7a12727c1046b3c7f2b89bb
Author: Alexandre Oliva 
Date:   Tue Jul 16 08:54:20 2024 -0300

[alpha] adjust MEM alignment for block move [PR115459]

Before issuing loads or stores for a block move, adjust the MEM
alignments if analysis of the addresses enabled the inference of
stricter alignment.  This ensures that the MEMs are sufficiently
aligned for the corresponding insns, which avoids trouble in case of
e.g. substitutions into SUBREGs.


for  gcc/ChangeLog

PR target/115459
* config/alpha/alpha.cc (alpha_expand_block_move): Adjust
MEMs to match inferred alignment.

(cherry picked from commit ccfe7151803956d178947d0afda0bd66ce097275)

[Bug gcov-profile/114715] Gcov allocates branches to wrong row for nested switches

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114715

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.4
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Richard Biener  ---
Fixed for GCC 12.4, 13.3 and 14+.

[Bug c++/114480] [12/13/14/15 Regression] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #34 from Richard Biener  ---
I don't have any more patches or analysis in this bug, there are now separate
bugs for found issues.

[Bug libstdc++/109162] C++23 improvements to std::format

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109162

--- Comment #7 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #0)
> https://wg21.link/P2419R2 localized chrono formatting (also p2372r3)

Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657386.html

[Bug libstdc++/115776] [C++26] Implement P2757R3 Type checking format args

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115776

Jonathan Wakely  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-July/65
   ||7390.html
   Keywords||patch

--- Comment #1 from Jonathan Wakely  ---
Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657390.html

[Bug libstdc++/110356] [C++26] P2637R3 Member visit

2024-07-16 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110356

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #1 from Jonathan Wakely  ---
Patches posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657388.html (variant)
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657389.html (format)

[Bug c++/103953] Leak of coroutine return object

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103953

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #1 from Arsen Arsenović  ---
appears to be fixed in trunk and in one of the commits between 12.2 and 12.3. 
will try to figure out which (12.2 is building..)

[Bug sanitizer/105336] truncated address sanitizer stack traces, coroutines + -Og

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105336

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #7 from Arsen Arsenović  ---
interestingly, I also got the same result with

  -Og -fno-branch-count-reg -fno-delayed-branch -fno-dse -fno-if-conversion
-fno-if-conversion2 -fno-inline-functions-called-once -fno-move-loop-invariants
-fno-move-loop-stores -fno-ssa-phiopt -fno-tree-bit-ccp -fno-tree-dse
-fno-tree-pta -fno-tree-sra

(so, -Og and -fno- of each option the info section of -Og describes as part of
-Og)

[Bug c++/103953] Leak of coroutine return object

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103953

--- Comment #2 from Arsen Arsenović  ---
seems to have been r12-9435-g6fd32842404ac1.

[Bug middle-end/115527] incorrect folding of __builtin_clear_padding()

2024-07-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115527

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Testcase more appropriate for the testsuite:

/* PR middle-end/115527 */

struct T { struct S { double a; signed char b; long c; } d[3]; int e; };

int
main ()
{
  struct T t = { { { 1., 2, 3 }, { 4., 5, 6 }, { 7., 8, 9 } }, 10 };
  __builtin_clear_padding (&t);
  for (int i = 0; i < 3; ++i)
if (t.d[i].a != 1. + 3 * i
|| t.d[i].b != 3 * i + 2
|| t.d[i].c != 3 * i + 3)
  __builtin_abort ();
  if (t.e != 10)
__builtin_abort ();
}

[Bug c++/101367] [coroutines] destructor for capture in lambda temporary operand to co_yield expression called twice

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101367

Arsen Arsenović  changed:

   What|Removed |Added

 CC||benni.buch at gmail dot com

--- Comment #12 from Arsen Arsenović  ---
*** Bug 104872 has been marked as a duplicate of this bug. ***

[Bug c++/104872] Memory corruption in Coroutine with POD type

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104872

Arsen Arsenović  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||arsen at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #7 from Arsen Arsenović  ---
indeed, this one is also fixed by the same commit

*** This bug has been marked as a duplicate of bug 101367 ***

[Bug modula2/115957] New: ICE on procedure-local CONST declaration

2024-07-16 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115957

Bug ID: 115957
   Summary: ICE on procedure-local CONST declaration
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: gaius at gcc dot gnu.org
  Target Milestone: ---

As reported on the gm2 mailing list:


The following program (Test.mod) causes an ICE on gm2 (GCC) 15.0.1 20240707 
(x86_64, Arch Linux):

MODULE Test;

IMPORT SYSTEM;

TYPE
T = POINTER TO CONS;
CONS = RECORD
CAR: SYSTEM.ADDRESS;
CDR: T;
END;

PROCEDURE POP(VAR LST: T): SYSTEM.ADDRESS;
CONST CAR = LST.CAR;
BEGIN
RETURN NIL;
END POP;

BEGIN
END Test.

[Bug modula2/115957] ICE on procedure-local CONST declaration

2024-07-16 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115957

Gaius Mulley  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-07-16

--- Comment #1 from Gaius Mulley  ---
Confirmed.

[Bug c++/103963] Coroutine return type needs not be copy- or move-constructible

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103963

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #1 from Arsen Arsenović  ---
possibly dupe of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99575 ?

[Bug modula2/115957] ICE on procedure-local CONST declaration

2024-07-16 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115957

--- Comment #2 from Gaius Mulley  ---
Created attachment 58685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58685&action=edit
Proposed fix

Here is a proposed fix.

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2024-07-16 Thread julien.voisin+gnu at dustri dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

--- Comment #19 from jvoisin  ---
> That's not a good reason to weaken the security of the generated code.

Having BTI will more valid targets is still better than no BTI at all, and it
would still be better than what clang is doing.

[Bug c++/105595] Coroutines can trigger -Wsubobject-linkage

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105595

Arsen Arsenović  changed:

   What|Removed |Added

 CC||arsen at gcc dot gnu.org

--- Comment #3 from Arsen Arsenović  ---
hm, I think the frame type (and the functions consuming pointers to those)
could potentially be made anonymous always?  I think they're only used via
pointers.

[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension

2024-07-16 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392

Gabriel Ravier  changed:

   What|Removed |Added

 Target|X86_64-linux-gnu|x86_64-linux-gnu
Version|12.0|15.0

--- Comment #5 from Gabriel Ravier  ---
I've wound up stumbling upon a very similar bug (which I think is the same bug
at its core) while examining the following code:

static uint32_t f(int8_t x)
{
return (~(uint32_t)x) & 1;
}

void floop(uint32_t *r, int8_t *x, size_t n)
{
#ifndef __clang__
_Pragma("GCC unroll 0") _Pragma("GCC novector")
#else
_Pragma("clang loop unroll(disable) vectorize(disable)")
#endif
for (size_t i = 0; i < n; ++i)
r[i] = f(x[i]);
}

where for the loop, GCC generates:

.L3:
  movsx eax, BYTE PTR [rsi+rdx]# <--- sign extension
  not eax
  and eax, 1
  mov DWORD PTR [rdi+rdx*4], eax
  add rdx, 1
  cmp rcx, rdx
  jne .L3

whereas LLVM manages:

.LBB0_2: # =>This Inner Loop Header: Depth=1
  movzx ecx, byte ptr [rsi + rax]   # <--- zero extension
  not ecx
  and ecx, 1
  mov dword ptr [rdi + 4*rax], ecx
  inc rax
  cmp rdx, rax
  jne .LBB0_2

which makes LLVM's output slightly faster (according to llvm-mca) for the same
reasons (i.e. lack of conversion from sign extension to zero extension).

[Bug c++/107768] Bogus -Wzero-as-null-pointer-constant in coroutine

2024-07-16 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107768

Arsen Arsenović  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||arsen at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #11 from Arsen Arsenović  ---
this was fixed, and the fix backported, so I think this can be considered
resolved

[Bug middle-end/115527] incorrect folding of __builtin_clear_padding()

2024-07-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115527

--- Comment #8 from Jakub Jelinek  ---
(In reply to qinzhao from comment #6)
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -4815,6 +4815,7 @@ clear_padding_type (clear_padding_struct *buf, tree
> type,
>   unsigned int prev_align = buf->align;
>   HOST_WIDE_INT off = buf->off + buf->size;
>   HOST_WIDE_INT prev_sz = buf->sz;
> + HOST_WIDE_INT prev_size = buf->size;
>   clear_padding_flush (buf, true);
>   tree elttype = TREE_TYPE (type);
>   buf->base = create_tmp_var (build_pointer_type (elttype));
> @@ -4835,8 +4836,8 @@ clear_padding_type (clear_padding_struct *buf, tree
> type,
>   buf->base = base;
>   buf->sz = prev_sz;
>   buf->align = prev_align;
> - buf->size = off % UNITS_PER_WORD;
> - buf->off = off - buf->size;
> + buf->size = prev_size + nelts * fldsz;
> + buf->off = 0;
>   memset (buf->buf, 0, buf->size);
>   break;
> }

That is incorrect.
I think the right fix is
--- gcc/gimple-fold.cc.jj   2024-07-16 13:36:36.0 +0200
+++ gcc/gimple-fold.cc  2024-07-16 15:50:26.493782065 +0200
@@ -4832,6 +4832,7 @@ clear_padding_type (clear_padding_struct
  buf->off = 0;
  buf->size = 0;
  clear_padding_emit_loop (buf, elttype, end, for_auto_init);
+ off += sz;
  buf->base = base;
  buf->sz = prev_sz;
  buf->align = prev_align;
Will try to test it soon.

[Bug driver/47229] Objective C and C++ compiler specific drivers

2024-07-16 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47229

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org

--- Comment #3 from Eric Gallager  ---
Apparently some distros ship such drivers:
https://lists.gnu.org/archive/html/bug-autoconf/2024-07/msg0.html

[Bug tree-optimization/115841] [12/13/14 Regression] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd

commit r15-2065-g016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd
Author: Richard Biener 
Date:   Tue Jul 16 11:53:17 2024 +0200

tree-optimization/115841 - reduction epilogue placement issue

When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge).  The code
currently disregards this situation.

With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.

PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.

* gcc.dg/vect/pr115841.c: New testcase.

[Bug tree-optimization/95817] Failure to optimize shift with constant to compare

2024-07-16 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95817

--- Comment #3 from Gabriel Ravier  ---
A place where this does seem to give faster results in real code is code where
multiple conditions like this are placed in a row, in e.g. the code at
https://github.com/openbsd/xenocara/blob/0c50e27b4c04e035f08a7c09a95e25bf7e4cf7c3/lib/mesa/src/intel/compiler/brw_eu_compact.c#L1612-L1616
(from OpenBSD's Xenocara X server).

LLVM is able to transform a condition like `(((int)imm >> 11) == 0 || ((int)imm
>> 11) == -1)` (from the above-mentioned OpenBSD code) to `(imm + 2048) <
4096`, whereas GCC isn't.

[Bug middle-end/115527] incorrect folding of __builtin_clear_padding()

2024-07-16 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115527

--- Comment #9 from qinzhao at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #8)
> (In reply to qinzhao from comment #6)
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -4815,6 +4815,7 @@ clear_padding_type (clear_padding_struct *buf, tree
> > type,
> >   unsigned int prev_align = buf->align;
> >   HOST_WIDE_INT off = buf->off + buf->size;
> >   HOST_WIDE_INT prev_sz = buf->sz;
> > + HOST_WIDE_INT prev_size = buf->size;
> >   clear_padding_flush (buf, true);
> >   tree elttype = TREE_TYPE (type);
> >   buf->base = create_tmp_var (build_pointer_type (elttype));
> > @@ -4835,8 +4836,8 @@ clear_padding_type (clear_padding_struct *buf, tree
> > type,
> >   buf->base = base;
> >   buf->sz = prev_sz;
> >   buf->align = prev_align;
> > - buf->size = off % UNITS_PER_WORD;
> > - buf->off = off - buf->size;
> > + buf->size = prev_size + nelts * fldsz;
> > + buf->off = 0;
> >   memset (buf->buf, 0, buf->size);
> >   break;
> > }
> 
> That is incorrect.
> I think the right fix is
> --- gcc/gimple-fold.cc.jj 2024-07-16 13:36:36.0 +0200
> +++ gcc/gimple-fold.cc2024-07-16 15:50:26.493782065 +0200
> @@ -4832,6 +4832,7 @@ clear_padding_type (clear_padding_struct
> buf->off = 0;
> buf->size = 0;
> clear_padding_emit_loop (buf, elttype, end, for_auto_init);
> +   off += sz;
> buf->base = base;
> buf->sz = prev_sz;
> buf->align = prev_align;
> Will try to test it soon.

Yes, this fix is better, and the testing case run correctly.

[Bug tree-optimization/115843] [14 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #17 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:d702a957753caf020cb550d143e9e9a62f79e9f5

commit r14-10434-gd702a957753caf020cb550d143e9e9a62f79e9f5
Author: Richard Biener 
Date:   Mon Jul 15 13:01:24 2024 +0200

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

(cherry picked from commit 1e3aa9c9278db69d4bdb661a750a7268789188d6)

[Bug tree-optimization/115867] [14 Regression] ICE: tree check: expected vector_type, have integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4246

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115867

--- Comment #3 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:ca275b68ef11d7d70bff8d7426e45b3734b3

commit r14-10436-gca275b68ef11d7d70bff8d7426e45b3734b3
Author: Richard Biener 
Date:   Thu Jul 11 10:18:55 2024 +0200

tree-optimization/115867 - ICE with simdcall vectorization in masked loop

When only a loop mask is to be supplied for the inbranch arg to a
simd function we fail to handle integer mode masks correctly.  We
need to guess the number of elements represented by it.  This assumes
that excess arguments are all for masks, I wasn't able to create
a simdclone with more than one integer mode mask argument.

The gcc.dg/vect/vect-simd-clone-20.c exercises this with -mavx512vl

PR tree-optimization/115867
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Properly
guess the number of mask elements for integer mode masks.

(cherry picked from commit 4f4478f0f31263997bfdc4159f90e58dd79b38f9)

[Bug ipa/115701] [11/12/13/14 Regression] wrong code at -O1 and above with "-fipa-pta" on x86_64-linux-gnu

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115701

--- Comment #8 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:e01012c459c931ae39558b019107226c232fa4d1

commit r14-10438-ge01012c459c931ae39558b019107226c232fa4d1
Author: Richard Biener 
Date:   Sun Jun 30 11:34:43 2024 +0200

tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

The following restricts copying of points-to info from defs that
might be in regions invoking UB and are never executed.

PR tree-optimization/115701
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy):
Only copy info from within the same BB.

* gcc.dg/torture/pr115701.c: New testcase.

(cherry picked from commit b77f17c5feec9614568bf2dee7f7d811465ee4a5)

[Bug tree-optimization/115841] [12/13/14 Regression] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

--- Comment #8 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:59ed01d5e3d2b0e59163d3248bdba9f1e35de599

commit r14-10440-g59ed01d5e3d2b0e59163d3248bdba9f1e35de599
Author: Richard Biener 
Date:   Tue Jul 16 11:53:17 2024 +0200

tree-optimization/115841 - reduction epilogue placement issue

When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge).  The code
currently disregards this situation.

With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.

PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.

* gcc.dg/vect/pr115841.c: New testcase.

(cherry picked from commit 016c947b02e79a5c0c0c2d4ad5cb71aa04db3efd)

[Bug ipa/115701] [11/12/13/14 Regression] wrong code at -O1 and above with "-fipa-pta" on x86_64-linux-gnu

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115701

--- Comment #7 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:6f74a5f5dc12bc337068f0f6a554d72604488959

commit r14-10437-g6f74a5f5dc12bc337068f0f6a554d72604488959
Author: Richard Biener 
Date:   Sun Jun 30 11:28:11 2024 +0200

tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

The following factors out the code that preserves SSA info of the LHS
of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS.

PR tree-optimization/115701
* tree-ssanames.h (maybe_duplicate_ssa_info_at_copy): Declare.
* tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): New
function, split out from ...
* tree-ssa-copy.cc (fini_copy_prop): ... here.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): ...
and here.

(cherry picked from commit b5c64b413fd5bc03a1a8ef86d005892071e42cbe)

[Bug tree-optimization/115867] [14 Regression] ICE: tree check: expected vector_type, have integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4246

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115867

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to fail|14.1.1  |14.1.0
 Resolution|--- |FIXED
  Known to work||14.1.1
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug tree-optimization/115843] [14 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #18 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:06829e593d2e5611e7924624cb8228795691e2b7

commit r14-10439-g06829e593d2e5611e7924624cb8228795691e2b7
Author: Richard Biener 
Date:   Mon Jul 15 13:50:58 2024 +0200

tree-optimization/115843 - fix wrong-code with fully-masked loop and
peeling

When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases.  The following fixes this by properly applying
the bias for the component to the shift amount.

PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.

* gcc.dg/vect/pr115843.c: New testcase.

(cherry picked from commit a177be05f6952c3f7e62186d2e138d96c475b81a)

[Bug tree-optimization/115841] [12/13/14 Regression] 521.wrf_r ICEs when building with -march=znver4 -Ofast -flto --param vect-partial-vector-usage=1

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115841

Richard Biener  changed:

   What|Removed |Added

  Known to fail||14.1.0
   Priority|P3  |P2
  Known to work||14.1.1

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115867, which changed state.

Bug 115867 Summary: [14 Regression] ICE: tree check: expected vector_type, have 
integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115867

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 115843, which changed state.

Bug 115843 Summary: [14 Regression] 531.deepsjeng_r fails to verify with -O3 
-march=znver4 --param vect-partial-vector-usage=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/115843] [14 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.1.1
 Status|ASSIGNED|RESOLVED
  Known to fail|14.1.1  |14.1.0
 Resolution|--- |FIXED

--- Comment #19 from Richard Biener  ---
Fixed.

[Bug modula2/115957] ICE on procedure-local CONST declaration

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115957

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Gaius Mulley :

https://gcc.gnu.org/g:d9709fafb2c498ba2f4c920f953c9b78fa3bf114

commit r15-2067-gd9709fafb2c498ba2f4c920f953c9b78fa3bf114
Author: Gaius Mulley 
Date:   Tue Jul 16 15:27:21 2024 +0100

PR modula2/115957 ICE on procedure local const declaration

An ICE would occur if a constant was declared using a variable term.
This fix catches variable terms in constant expressions and generates
an unrecoverable error.

gcc/m2/ChangeLog:

PR modula2/115957
* gm2-compiler/M2StackAddress.mod (PopAddress): Detect tail=NIL
and generate an internal error.
* gm2-compiler/PCBuild.bnf (InConstParameter): New variable.
(InConstBlock): New variable.
(ErrorString): Rewrite using MetaErrorStringT0.
(ErrorArrayAt): Rewrite using MetaErrorStringT0.
(WarnMissingToken): Use MetaErrorStringT0.
(CompilationUnit): Set seenError FALSE.
(init): Initialize InConstParameter and InConstBlock.
(ConstantDeclaration): Set InConstBlock.
(ConstSetOrQualidentOrFunction): Call CheckNotVar if not
InConstParameter and InConstBlock.
(ConstActualParameters): Set InConstParameter TRUE and restore
value at the end.
* gm2-compiler/PCSymBuild.def (CheckNotVar): New procedure.
Remove all unnecessary export qualified list.
* gm2-compiler/PCSymBuild.mod (CheckNotVar): New procedure.

gcc/testsuite/ChangeLog:

PR modula2/115957
* gm2/errors/fail/badconst.mod: New test.
* gm2/pim/fail/tinyadr.mod: New test.

Signed-off-by: Gaius Mulley 

[Bug modula2/115957] ICE on procedure-local CONST declaration

2024-07-16 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115957

Gaius Mulley  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Gaius Mulley  ---
Closing now that the patch has been applied.

[Bug rtl-optimization/115901] [15 regression] ICE when building coreutils-9.5 on arm64 with -O3 -flto -fno-vect-cost-model -ftrivial-auto-var-init=zero

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115901

--- Comment #11 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:851ec9960b084ad37556ec627e6931e985e41a24

commit r15-2070-g851ec9960b084ad37556ec627e6931e985e41a24
Author: Richard Sandiford 
Date:   Tue Jul 16 15:31:17 2024 +0100

recog: restrict paradoxical mode punning in insn_propagation [PR115901]

In g:44fc801e97a8dc626a4806ff4124439003420b20 I'd extended
insn_propagation to handle simple cases of hard-reg mode punning.
One of the checks was that the new use mode occupied the same
number of registers as the original definition mode.  However,
as PR115901 shows, we need to avoid increasing the size of any
registers in the punned "to" expression as well.

Specifically, the test includes a DImode move from GPR x0 to
a vector register, followed by a V2DI use of the vector register.
The simplification would then create a V2DI spanning x0 and x1,
manufacturing a new, unwanted use of x1.

Checking for that kind of thing directly seems too cumbersome,
and is not related to the original motivation (which was to improve
handling of shared vector zeros on aarch64).  This patch therefore
restricts the paradoxical case to constants.

gcc/
PR rtl-optimization/115901
* recog.cc (insn_propagation::apply_to_rvalue_1): Restrict
paradoxical mode punning to cases where "to" is constant.

gcc/testsuite/
PR rtl-optimization/115901
* gcc.dg/torture/pr115901.c: New test.

[Bug target/115891] [15 regression] libgcrypt tests segfault in crc32_less_than_16 with LTO with late-combine

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115891

--- Comment #4 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:9f9faebb8ebfc0103461641cc49ba0b21877b2b1

commit r15-2069-g9f9faebb8ebfc0103461641cc49ba0b21877b2b1
Author: Richard Sandiford 
Date:   Tue Jul 16 15:31:17 2024 +0100

rtl-ssa: Enforce earlyclobbers on hard-coded clobbers [PR115891]

The asm in the testcase has a memory operand and also clobbers ax.
The clobber means that ax cannot be used to hold inputs, which
extends to the address of the memory.

I think I had an implicit assumption that constrain_operands
would enforce this, but in hindsight, that clearly wasn't going
to be true.  constrain_operands only looks at constraints, and
these clobbers are by definition outside the constraint system.
(And that's why they have to be handled conservatively, since there's
no way to distinguish the earlyclobber and non-earlyclobber cases.)

The semantics of hard-coded clobbers are generic enough that I think
they should be handled directly by rtl-ssa, rather than by consumers.
And in the context of rtl-ssa, the easiest way to check for a clash is
to walk the list of input registers, which we already have to hand.
It therefore seemed better not to push this down to a more generic
rtl helper.

The patch detects hard-coded clobbers in the same way as regrename:
by temporarily stubbing out the operands with pc_rtx.

gcc/
PR rtl-optimization/115891
* rtl-ssa/changes.cc (find_clobbered_access): New function.
(recog_level2): Use it to check for overlap between input
registers and hard-coded clobbers.  Conditionally reset
recog_data.insn after changing the insn code.

gcc/testsuite/
PR rtl-optimization/115891
* gcc.target/i386/pr115891.c: New test.

[Bug rtl-optimization/115929] [15 regression] ICE on valid code at -O{2,3} with "-fschedule-insns" on x86_64-linux-gnu: Segmentation fault

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115929

--- Comment #2 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:fec38d7987dd6d68b234b0076b57ac66a30a3a1d

commit r15-2071-gfec38d7987dd6d68b234b0076b57ac66a30a3a1d
Author: Richard Sandiford 
Date:   Tue Jul 16 15:33:23 2024 +0100

rtl-ssa: Fix removal of order_nodes [PR115929]

order_nodes are used to implement ordered comparisons between
two insns with the same program point number.  remove_insn would
remove an order_node from its splay tree, but didn't remove it
from the insn.  This caused confusion if the insn was later
reinserted somewhere else that also needed an order_node.

gcc/
PR rtl-optimization/115929
* rtl-ssa/insns.cc (function_info::remove_insn): Remove an
order_node from the instruction as well as from the splay tree.

gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-1.c: New test.

[Bug target/115891] [15 regression] libgcrypt tests segfault in crc32_less_than_16 with LTO with late-combine

2024-07-16 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115891

Richard Sandiford  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Sandiford  ---
Fixed.

[Bug plugins/112520] gcc.dg/plugin/cpython-plugin-test-PyList_Append.c -fplugin=./analyzer_cpython_plugin.so ICE (segmentation fault) with Python 3.12+

2024-07-16 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112520

--- Comment #8 from John David Anglin  ---
I think get_field_by_name needs updating to handle the struct layout
changes.  DECL_NAME(field) is null for the unnamed union and it causes
a segmentation fault when dereferenced by IDENTIFIER_POINTER.  I think
get_field_by_name needs to search inside unions and structs for the
field.  If field isn't found, TREE_NULL is returned and there are no
checks for this in the current code.

The tests will be fragile if they have to check Python version.  Test is
probably broken now on all systems using Python 3.12.

[Bug rtl-optimization/115901] [15 regression] ICE when building coreutils-9.5 on arm64 with -O3 -flto -fno-vect-cost-model -ftrivial-auto-var-init=zero

2024-07-16 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115901

Richard Sandiford  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Richard Sandiford  ---
Fixed.

[Bug rtl-optimization/115929] [15 regression] ICE on valid code at -O{2,3} with "-fschedule-insns" on x86_64-linux-gnu: Segmentation fault

2024-07-16 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115929

--- Comment #3 from Richard Sandiford  ---
As it turned out, the two tests exposed different bugs.  I've submitted a patch
for the other one and will close once that's resolved.

[Bug c/115954] Alignment of _Atomic structs incompatible between GCC and LLVM

2024-07-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115954

--- Comment #6 from Andrew Pinski  ---
https://gitlab.com/x86-psABIs/i386-ABI/-/issues/1 for x86_64 abi.

Aarch64 should most likely also do the same ...

[Bug fortran/84868] [11/12/13/14/15 Regression] ICE in gfc_conv_descriptor_offset, at fortran/trans-array.c:208

2024-07-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84868

--- Comment #18 from GCC Commits  ---
The master branch has been updated by Paul Thomas :

https://gcc.gnu.org/g:9f966b6a8ff0244dd6f8bf36d876799d5f9bbaee

commit r15-2072-g9f966b6a8ff0244dd6f8bf36d876799d5f9bbaee
Author: Paul Thomas 
Date:   Tue Jul 16 15:56:44 2024 +0100

Fortran: Simplify len_trim with array ref and fix mapping bug[PR84868].

2024-07-16  Paul Thomas  

gcc/fortran
PR fortran/84868
* simplify.cc (gfc_simplify_len_trim): If the argument is an
element of a parameter array, simplify all the elements and
build a new parameter array to hold the result, after checking
that it doesn't already exist.
* trans-expr.cc (gfc_get_interface_mapping_array) if a string
length is available, use it for the typespec.
(gfc_add_interface_mapping): Supply the se string length.

gcc/testsuite/
PR fortran/84868
* gfortran.dg/pr84868.f90: New test.

  1   2   3   >