[Bug middle-end/115346] [15] Volatile load elimination with packed struct bitfields

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115346

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Richard Biener  ---
duplicate

*** This bug has been marked as a duplicate of bug 99258 ***

[Bug middle-end/99258] volatile struct access optimized away

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99258

Richard Biener  changed:

   What|Removed |Added

 CC||patrick at rivosinc dot com

--- Comment #4 from Richard Biener  ---
*** Bug 115346 has been marked as a duplicate of this bug. ***

[Bug lto/46083] gcc.dg/initpri1.c FAILs with -flto/-fwhopr (attribute constructor/destructor doesn't work)

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46083

--- Comment #7 from GCC Commits  ---
The trunk branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:38dd7419324490b386bbac06ddc5fafbfe8629d3

commit r15-1024-g38dd7419324490b386bbac06ddc5fafbfe8629d3
Author: Thomas Schwinge 
Date:   Wed Apr 24 10:11:02 2024 +0200

Clarify that 'gcc.dg/initpri3.c' is a LTO variant of 'gcc.dg/initpri1.c':
'gcc.dg/initpri1-lto.c' [PR46083]

Added in commit 06c9eb5136fe0e778cc3a643131eba2a3dfb77a8 (Subversion
r168642)
"re PR lto/46083 (gcc.dg/initpri1.c FAILs with -flto/-fwhopr (attribute
constructor/destructor doesn't work))".

PR lto/46083
gcc/testsuite/
* gcc.dg/initpri3.c: Remove.
* gcc.dg/initpri1-lto.c: New.

[Bug tree-optimization/115347] [12/13/14/15 Regression] wrong code at -O3 on x86_64-linux-gnu

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115347

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=112859
Version|unknown |14.1.1

--- Comment #2 from Richard Biener  ---
it's loop distribution doing

t2.c:7:12: optimized: Loop nest 1 distributed: split to 2 loops and 0 library
calls.

We get

  for (; f < 1; f++) {
for (h = 0; h < 2; h++) {
  d = e[f];
}
  }
  for (; f < 1; f++) {
for (h = 0; h < 2; h++) {
  g = e[1].c;
  e[f].c = 1;
}
  }

I think this is similar to the other still open issue where zero-distance
inner loop dependences (&e[f].c doesnt't vary in the inner loop) cause
issues with the interpretation of classical dependence analysis.

I'm somewhat lost there.  PR112859.

[Bug rtl-optimization/115351] [14/15 regression] pointless movs when passing by value on x86-64

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
Summary|[14 regression] pointless   |[14/15 regression]
   |movs when passing by value  |pointless movs when passing
   |on x86-64   |by value on x86-64
   Target Milestone|--- |14.2
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-05
  Component|c++ |rtl-optimization
   Keywords||missed-optimization,
   ||needs-bisection

--- Comment #1 from Richard Biener  ---
Confirmed.  The IL we expand from is the same.

[Bug tree-optimization/115354] [14/15 Regression] Large -Os code size increase related to -ftree-sra

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115354

Richard Biener  changed:

   What|Removed |Added

Summary|Large -Os code size |[14/15 Regression] Large
   |increase related to |-Os code size increase
   |-ftree-sra  |related to -ftree-sra
   Target Milestone|--- |14.2
 CC||jamborm at gcc dot gnu.org
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
The optimization is performed optimistically anticipating followup
optimizations to make up for the immediate caused bloat (that's what I
understand).  I'm not
sure if we make any attempt of assessing the possibility of that to happen
but certainly this transform could be disabled when optimizing for size or
for cold calls?

[Bug rtl-optimization/115351] [14/15 regression] pointless movs when passing by value on x86-64

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org

--- Comment #2 from Hongtao Liu  ---
There're 

(insn 5 4 6 2 (set (reg:TI 110)
(ior:TI (and:TI (reg:TI 110)
(const_wide_int 0x))
(zero_extend:TI (subreg:DI (reg:DF 111) 0
"/app/example.cpp":8:1 136 {*insvti_lowpart_1}
 (nil))
(insn 6 5 7 2 (set (reg:TI 110)
(ior:TI (and:TI (reg:TI 110)
(const_wide_int 0x0))
(ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112) 0))
(const_int 64 [0x40] "/app/example.cpp":8:1 133
{*insvti_highpart_1}
 (nil))
(insn 7 6 8 2 (set (reg/v:TI 109 [ z ])

in GCC14's rtl dump, guess related to r14-589-g1e3054d27c83ee?

[Bug target/69374] install.texi is bit-rotten

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69374

--- Comment #15 from GCC Commits  ---
The trunk branch has been updated by Gerald Pfeifer :

https://gcc.gnu.org/g:993142677e2cf780ef578e1d46309f0042743dd5

commit r15-1029-g993142677e2cf780ef578e1d46309f0042743dd5
Author: Gerald Pfeifer 
Date:   Wed Jun 5 09:26:58 2024 +0200

doc: Streamline recommendation of GNU awk

GNU awk 3.1.5 was released in August 2005; no need to specify this in
the context of "recent version".

gcc:
PR other/69374
* doc/install.texi (Prerequisites): Drop reference to GNU awk
version 3.1.5. Remove fluff.

[Bug ipa/96503] attribute alloc_size effect lost after inlining

2024-06-05 Thread nrk at disroot dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503

nrk at disroot dot org changed:

   What|Removed |Added

 CC||nrk at disroot dot org

--- Comment #9 from nrk at disroot dot org ---
This bug is particularly nasty when you have `alloc` and `resize` and the
`alloc` call retains the `alloc_size` information but the `resize` call gets
inlined (thus losing the new `alloc_size`) and now you're left with a pointer
that GCC thinks has the *old* size.

Here's a minimal demo:

static int *arena;

__attribute(( malloc, alloc_size(1), noinline ))
static void *alloc(int size) { return arena++; }

//__attribute((noinline))
__attribute(( alloc_size(2) ))
static void *extend(void *oldptr, int newsize) { ++arena; return
oldptr; }

#include 
int main(void)
{
arena = malloc(4 * sizeof(int));
if (!arena) abort();

int *a = alloc(sizeof *a);
a[0] = 4;
a = extend(a, sizeof *a * 2);
a[1] = 8;
return a[1];
}

(The `alloc` and `resize` function in practice is more sophisticated, but the
above suffices for demo purposes).

Compile with `gcc -O2 -fsanitize=address,undefined` and the `a[1]` will trigger
UBSan because it's still operating on the old size information. Uncommenting
the `noinline` from extend() "fixes" the issue.

But littering the allocation routines with `noinline` is not really a good idea
since it'd regress performance for custom allocators that have trivial logic
(e.g bump allocators).

This bug unfortunately makes the `alloc_size` attribute unusable for my
purposes.

[Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-05 Thread user202729 at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137

--- Comment #14 from user202729  ---
Regarding alias analysis. The current implementaion is such that:

compiler  | flag | can alias?| can modify global?|
gcc   | sane | no| no| << NEW
  | no-sane [default]| no| yes   |
clang | sane [default]   | no| no|
  | no-sane  | yes   | yes   |
[the standard]|  | no| yes   |

As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035#c13 , gcc
already assume operator new's retuned pointer cannot alias any existing
pointer. So no change is needed there.

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #7 from David Brown  ---
(In reply to Xi Ruoyao from comment #6)
> (In reply to Zhaohaifeng from comment #5)
> 
> > Does gcc implement -fsection-anchors like function in -fcommon option for
> > x86? In general concept, gcc should has some similar feature for x86 and 
> > ARM.
> 

AFAIK, -fsection-anchors and -fcommon / -fno-common are completely independent.
 But section anchors cannot work with "common" symbols, no matter what
architecture, because at compile time the compiler does not know the order of
allocation of the common symbols.  It /does/ know the order of allocation of
symbols defined in the current translation unit, such as initialised data,
-fno-common zero initialised data, and static data.  This information can be
used with section anchors and also with other optimisations based on the
relative positions of objects.

> AFAIK it's not very useful for CISC architectures supporting variable-length
> fancy memory operands.

That seems strange to me.  But I know very little about how targets such as
x86-64 work for global data that might be complicated with load-time or
run-time linking - my experience and understanding is all with statically
linked binaries.

It seems, from my brief testing, that for the x86-64 target, the compiler does
not do any optimisations based on the relative positions of data defined in a
unit (whether initialised, non-common bss, or static).  For targets such as the
ARM, gcc can optimise as though the individual variables were fields in a
struct where it knows the relative positions.  I don't see any reason why
x86-64 should not benefit from some of these, though I realise that scheduling
and out-of-order execution will mean some apparent optimisations would be
counter-productive.  Maybe there is some kind of address space layout
randomisation that is playing a role here?


Anyway, I cannot see any reason while -fno-common should result in the slower
run-times the OP saw (though I have only looked at current gcc versions).  I
haven't seen any differences in the code generated for -fcommon and -fno-common
on the x86-64.  And my experience on other targets is that -fcommon allows
optimisations that cannot be done with -fno-common, thus giving faster code.

I have not, however, seen the OP's real code - I've just made small tests.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755

commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li 
Date:   Tue May 28 15:37:44 2024 +0800

Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x - y) & (-(TYPE)(x >= y));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;succ:   EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li 

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #22 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755

commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li 
Date:   Tue May 28 15:37:44 2024 +0800

Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x - y) & (-(TYPE)(x >= y));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;succ:   EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li 

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread zhaohaifeng4 at huawei dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #8 from Zhaohaifeng  ---
(In reply to David Brown from comment #7)
> (In reply to Xi Ruoyao from comment #6)
> > (In reply to Zhaohaifeng from comment #5)
> > 
> > > Does gcc implement -fsection-anchors like function in -fcommon option for
> > > x86? In general concept, gcc should has some similar feature for x86 and 
> > > ARM.
> > 
> 
> AFAIK, -fsection-anchors and -fcommon / -fno-common are completely
> independent.  But section anchors cannot work with "common" symbols, no
> matter what architecture, because at compile time the compiler does not know
> the order of allocation of the common symbols.  It /does/ know the order of
> allocation of symbols defined in the current translation unit, such as
> initialised data, -fno-common zero initialised data, and static data.  This
> information can be used with section anchors and also with other
> optimisations based on the relative positions of objects.
> 
> > AFAIK it's not very useful for CISC architectures supporting variable-length
> > fancy memory operands.
> 
> That seems strange to me.  But I know very little about how targets such as
> x86-64 work for global data that might be complicated with load-time or
> run-time linking - my experience and understanding is all with statically
> linked binaries.
> 
> It seems, from my brief testing, that for the x86-64 target, the compiler
> does not do any optimisations based on the relative positions of data
> defined in a unit (whether initialised, non-common bss, or static).  For
> targets such as the ARM, gcc can optimise as though the individual variables
> were fields in a struct where it knows the relative positions.  I don't see
> any reason why x86-64 should not benefit from some of these, though I
> realise that scheduling and out-of-order execution will mean some apparent
> optimisations would be counter-productive.  Maybe there is some kind of
> address space layout randomisation that is playing a role here?
> 
> 
> Anyway, I cannot see any reason while -fno-common should result in the
> slower run-times the OP saw (though I have only looked at current gcc
> versions).  I haven't seen any differences in the code generated for
> -fcommon and -fno-common on the x86-64.  And my experience on other targets
> is that -fcommon allows optimisations that cannot be done with -fno-common,
> thus giving faster code.
> 
> I have not, however, seen the OP's real code - I've just made small tests.

The difference generated for -fcommon and -fno-common is just the global
variable order in memory address.

-fcommon is as following (some special order):
stderr@GLIBC_2.2.5
completed.0
Begin_Time
Arr_2_Glob
Ch_2_Glob
Run_Index
Microseconds
Ptr_Glob
Dhrystones_Per_Second
End_Time
Int_Glob
Bool_Glob
User_Time
Next_Ptr_Glob
Arr_1_Glob
Ch_1_Glob

-fno-common is as following (reversed order of source code):
stderr@GLIBC_2.2.5
completed.0
Dhrystones_Per_Second
Microseconds
User_Time
End_Time
Begin_Time
Reg
Arr_2_Glob
Arr_1_Glob
Ch_2_Glob
Ch_1_Glob
Bool_Glob
Int_Glob
Next_Ptr_Glob
Ptr_Glob
Run_Index

[Bug target/115115] [12/13/14/15 Regression] highway-1.0.7 wrong _mm_cvttps_epi32() constant fold

2024-06-05 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115115

--- Comment #14 from Sergei Trofimovich  ---
The change fixed highway-1.0.7 test suite for me. Thank you!

[Bug target/115161] highway-1.0.7 miscompilation of _mm_cvttps_epi32(): invalid result assumed

2024-06-05 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161

--- Comment #27 from Sergei Trofimovich  ---
The change fixed highway-1.0.7 test suite for me. Thank you!

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-06-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545

--- Comment #8 from Jonathan Wakely  ---
Yes, but it's only a missed-optimization bug so there are much higher
priorities.

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #9 from Xi Ruoyao  ---
Then will -fno-toplevel-reorder help?

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #10 from Xi Ruoyao  ---
Anyway if you really require a specific order of some data you need to either
use -fno-toplevel-reorder, or group the data with a struct or linker script
explicitly.

Relying on any implicit behavior like -fcommon is just fragile and it may
"break" if the compiler or the linker are changed.

[Bug middle-end/115352] wrong code with _BitInt() __builtin_sub_overflow_p() at -O0

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115352

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end
 Target|x86_64-pc-linux-gnu |x86_64-pc-linux-gnu
   ||aarch64-linux-gnu
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-05
   Host|x86_64-pc-linux-gnu |

--- Comment #1 from Andrew Pinski  ---
Reducing it down to `128*300` works but `128*400` fails. The gimple level
difference between 128*300 vs 128*400 is just the argument that gets passed.
So I don't think the bug is __builtin_sub_overflow_p  expansion but I could be
wrong.

Note clang is very useless at testing this since it unrolls the loop always.
(that is after changing __builtin_sub_overflow_p to __builtin_sub_overflow:

  _BitInt (65) t;
  return __builtin_sub_overflow (0, b, &t);

)


It also fails on aarch64-linux-gnu.

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #11 from David Brown  ---
(In reply to Zhaohaifeng from comment #8)
> (In reply to David Brown from comment #7)
> > (In reply to Xi Ruoyao from comment #6)

> > Anyway, I cannot see any reason while -fno-common should result in the
> > slower run-times the OP saw (though I have only looked at current gcc
> > versions).  I haven't seen any differences in the code generated for
> > -fcommon and -fno-common on the x86-64.  And my experience on other targets
> > is that -fcommon allows optimisations that cannot be done with -fno-common,
> > thus giving faster code.
> > 
> > I have not, however, seen the OP's real code - I've just made small tests.
> 
> The difference generated for -fcommon and -fno-common is just the global
> variable order in memory address.
> 
> -fcommon is as following (some special order):
> stderr@GLIBC_2.2.5
> completed.0
> Begin_Time
...
> -fno-common is as following (reversed order of source code):
> stderr@GLIBC_2.2.5
> completed.0
> Dhrystones_Per_Second
> Microseconds
> User_Time
...

A change in the order is not unexpected.  But it is hard to believe this will
make a significant difference to the speed of the code as much as you describe
- it would have to involve particularly unlucky cache issues.

On the x86-64, defined variables appear to be allocated in the reverse order
from the source code unless there are overriding reasons to change that.  I
don't know why that is the case.  You can avoid this by using the
"-fno-toplevel-reorder" switch.  I don't know how common variables are
allocated - that may depend on ordering in the code, or linker scripts, or
declarations in headers.

I have no idea about your program, but one situation where the details of
memory  layout can have a big effect is if you have multiple threads, and
nominally independent data used by multiple threads happen to share a cache
line.  Access patterns to arrays and structs can also have different effects
depending on the alignment of the data to cache lines.

So you might try "-fno-toplevel-reorder" to have tighter control of the
ordering.  It may also be worth adding cacheline-sized _Alignas specifiers to
some objects, particularly bigger or critical structs or arrays.  (If you are
using a C standard prior to C11, gcc's __attribute__((aligned(XXX))) can be
used.)

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #9 from Tamar Christina  ---
It's taken me a bit of time to track down all the reasons for the speedup with
the earlier patch.

This comes from two parts:

1. Signed IVs don't get simplified.  Due to possible UB with signed overflows
gimple expressions don't get simplified when the type is signed.

However for addressing modes it doesn't matter as simplifying the constants any
potential overflow can still happen.  Secondly most architectures say you can
never reach the full address space range anyway.  Those that due (like those
that offer baremetal variants like Arm and AArch64) explicitly specify that
overflow is defined as wrapping around.  That means that IVs for their use in
IV opts should be save to simplify as if they were unsigned.

I have a patch that during the creation of IV candidates folds them to unsigned
and then folds them back to their original signed types.  This maintains all
the original overflow analysis and the correct typing in gimple.

2. The second problem is that due to Fortran not having unsigned types, the
front-end generates a signed IV.  Some optimizations as they work can convert
these to unsigned due to folding, e.g. extract_muldiv is one place where this
is done.

This can make us end up having the same IV as both signed and unsigned, as is
the case here:

:   
   
   
 inv_expr 1: stride.3_27 * 4   
   
   
  inv_expr 2:
(unsigned long) stride.3_27 * 4   

These end up being used in the same group:

Group 1:   
   
   
   cand  costcompl.  inv.expr.   inv.vars  
   
   
1 0   0
  NIL;6
   
   
 2 0   0   NIL;6   
   
   
  3 0   0   NIL;6  
   
   
   4 0 
 0   NIL;6 

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
58.2545), maybe hot
   
 ;;prev block 28, next block 31, flags:
(NEW, REACHABLE, VISITED)
;;pred:   28 [always]  count:23622320 (estimated locally, freq 6.4080)
(FALLTHRU,EXECUTABLE)  
   
  ;;25 [always]  count:191126046
(estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI 
  # ivtmp.28_90 = PHI 

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
58.2545), maybe hot
   
 ;;prev block 22, 

[Bug fortran/90068] Array Constructor Containing Function Call Leaks Memory

2024-06-05 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90068

--- Comment #2 from Andre Vehreschild  ---
Created attachment 58354
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58354&action=edit
Add final blocks to free temp. memory.

[Bug fortran/90068] Array Constructor Containing Function Call Leaks Memory

2024-06-05 Thread vehre at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90068

Andre Vehreschild  changed:

   What|Removed |Added

 Status|ASSIGNED|WAITING

--- Comment #3 from Andre Vehreschild  ---
Patch submitted, waiting for review.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #10 from Richard Biener  ---
I think the question is why IVOPTs ends up using both the signed and unsigned
variant of the same IV instead of expressing all uses of both with one IV?

That's where I'd look into.

[Bug target/115355] New: PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Bug ID: 115355
   Summary: PPCLE: Auto-vectorization creates wrong code for
Power9
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jens.seifert at de dot ibm.com
  Target Milestone: ---

Input setToIdentity.C:

#include 
#include 
#include 

void setToIdentityGOOD(unsigned long long *mVec, unsigned int mLen)
{
  for (unsigned long long i = 0; i < mLen; i++)
  {
mVec[i] = i;
  }
}

void setToIdentityBAD(unsigned long long *mVec, unsigned int mLen)
{
  for (unsigned int i = 0; i < mLen; i++)
  {
mVec[i] = i;
  }
}

unsigned long long vec1[100];
unsigned long long vec2[100];

int main(int argc, char *argv[])
{
  unsigned int l = argc > 1 ? atoi(argv[1]) : 29;
  setToIdentityGOOD(vec1, l);
  setToIdentityBAD(vec2, l);

  if (memcmp(vec1, vec2, l*sizeof(vec1[0])) != 0)
  {
 for (unsigned int i = 0; i < l; i++)
 {
printf("%llu %llu\n", vec1[i], vec2[i]);
 }
  }
  else
  {
 printf("match\n");
  }
  return 0;
}


Fails
gcc -O3 -mcpu=power9 -m64 setToIdentity.C -save-temps -fverbose-asm -o pwr9.exe
-mno-isel


Good:
gcc -O3 -mcpu=power8 -m64 setToIdentity.C -save-temps -fverbose-asm -o pwr8.exe
-mno-isel

"-mno-isel" is only specified to reduce the diff.


Failing output:

pwr9.exe
0 0
1 1
2 0
3 4294967296
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28

4th element contains wrong data.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-05 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #4 from YunQiang Su  ---
Ohh, RISC-V has solved this problem in recent release.
So we can just do similar work.

[Bug ada/115349] compiler infers the wrong Accum_Type for a Reducer expression

2024-06-05 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115349

Eric Botcazou  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-05
 CC||ebotcazou at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Eric Botcazou  ---
What crash do you get though?  AFAICS it's a standard Constraint_Error.

gfortran-14.1.1: issue with -Jpath search order

2024-06-05 Thread Satish Balay via Gcc-bugs
A test case:

>
$ ls
incdir/  moddir/  srcdir/
$ ls incdir/
$ ls moddir/
$ ls srcdir/
modtest.F90
$ cat srcdir/modtest.F90 
module modtest
integer a
end module
program main
use modtest
end
$ gfortran --version
GNU Fortran (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ strace --follow-forks -o trace.log gfortran -Jmoddir -Iincdir 
srcdir/modtest.F90
$ 


>From trace.log:


231249 openat(AT_FDCWD, "moddir/modtest.mod", O_RDONLY) = -1 ENOENT (No such 
file or directory)
231249 unlink("moddir/modtest.mod") = -1 ENOENT (No such file or directory)
231249 rename("moddir/modtest.mod0", "moddir/modtest.mod") = 0
231249 openat(AT_FDCWD, "modtest.mod", O_RDONLY) = -1 ENOENT (No such file or 
directory)
231249 openat(AT_FDCWD, "srcdir/modtest.mod", O_RDONLY) = -1 ENOENT (No such 
file or directory)
231249 openat(AT_FDCWD, "incdir/modtest.mod", O_RDONLY) = -1 ENOENT (No such 
file or directory)
231249 openat(AT_FDCWD, "moddir/modtest.mod", O_RDONLY) = 5


i.e after moddir/modtest.mod is created - its searched for in the following 
order:

- pwd
- src-file-dir
- -Ipath
- -Jpath

With this search order - a buggy/old/incorrect modtest.mod in pwd or 
src-file-dir gets picked up - resulting in broken builds.

Checking ifx from OneAPI - i see:


$ ifx --version
ifx (IFORT) 2023.0.0 20221201
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

$ ifx --help |& grep -A2 \\-module
-module path
   specify path where mod files should be placed and first location to
   look for mod files

Here with ifx - I don't get into the issue with picking up buggy/old/incorrect 
modtest.mod. With a slightly tweaked test:


$ cat srcdir/modtest.F90
module modtest
integer a
end module
program main
use modtestwrong
end
$ strace --follow-forks -o trace.log ifx -module moddir -Iincdir 
srcdir/modtest.F90
srcdir/modtest.F90(5): error #7002: Error in opening the compiled module file.  
Check INCLUDE paths.   [MODTESTWRONG]
use modtestwrong
^
compilation aborted for srcdir/modtest.F90 (code 1)
$ 
<<<

trace.log has:

>>>
1573952 openat(AT_FDCWD, "moddir/modtestwrong.mod", O_RDONLY) = -1 ENOENT (No 
such file or directory)
1573952 openat(AT_FDCWD, "modtestwrong.mod", O_RDONLY) = -1 ENOENT (No such 
file or directory)
1573952 openat(AT_FDCWD, "srcdir/modtestwrong.mod", O_RDONLY) = -1 ENOENT (No 
such file or directory)
1573952 openat(AT_FDCWD, "./modtestwrong.mod", O_RDONLY) = -1 ENOENT (No such 
file or directory)
1573952 openat(AT_FDCWD, "incdir/modtestwrong.mod", O_RDONLY) = -1 ENOENT (No 
such file or directory)
...

[i.e; -Jpath is searched first, then pwd, src-file-dir, ./, -Ipath, ...]

So is this a bug in gfortran?

thanks,
Satish



[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #1 from Jens Seifert  ---
Same issue with gcc 13.2.1

[Bug libstdc++/98678] 30_threads/future/members/poll.cc execution test FAILs

2024-06-05 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98678

--- Comment #8 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Jonathan Wakely  ---

> This test is a bit tricky. The whole point is to check that performance of one
> operation is acceptable compared to a baseline. But the definition of
> "acceptable" and the relative difference between the speed of the different
> operations varies with arch. We could just increase the tolerances, but then 
> we
> allow worse performance on the targets that don't need it. Maybe we want to
> change the 30 and 100 magic numbers to depend on the target.

I've made some more checks on Solaris now.  The test consistently PASSes
on Solaris/SPARC, both 32 and 64-bit.  However, on Solaris/x86 the
failure is just as reliable, e.g.

/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/30_threads/future/members/poll.cc:132:
int main(): Assertion 'wait_until_sys_min < (ready * 100)' failed.
wait_for(0s): 3674ns for 200 calls, avg 18.37ns per call
wait_until(system_clock minimum): 419918ns for 200 calls, avg 2099.59ns per
call
wait_until(steady_clock minimum): 459775ns for 200 calls, avg 2298.88ns per
call
wait_until(system_clock epoch): 1117280ns for 200 calls, avg 5586.4ns per call
wait_until(steady_clock epoch: 956073ns for 200 calls, avg 4780.36ns per call
wait_for when ready: 3194ns for 200 calls, avg 15.97ns per call

It also makes no difference if the system is under full load or
completely idle.  I've also checked a wider range of systems/CPUs:

  host  32-bit  64-bit  

  nahe  1.311.402.60 GHz Xeon Gold 6132
  lokon 1.431.663.10 GHz Core i5-2400
  itzacchiuatl  0.691.533.20 GHz Core i7-8700
  manam 0.892.223.50 GHz Xeon E3-1245
  lucy  0.540.592.00 GHz Xeon E7-4850

The attached patch uses a scale factor of 2.5 to accomodate this.

[Bug libstdc++/98678] 30_threads/future/members/poll.cc execution test FAILs

2024-06-05 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98678

--- Comment #9 from Rainer Orth  ---
Created attachment 58355
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58355&action=edit
Proposed patch

[Bug target/113357] [14/15 regression] m68k-linux bootstrap failure in stage2 due to segfault compiling unwind-dw2.c since r14-4664-g04c9cf5c786b94

2024-06-05 Thread mikpelinux at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113357

--- Comment #13 from Mikael Pettersson  ---
(In reply to Mikael Pettersson from comment #9)
> (In reply to Manolis Tsamis from comment #8)
> > Created attachment 58335 [details]
> > Do not modify live_out registers
> > 
> > After looking again at the dumps from PR112415, which I believe is closely
> > related to the issue here, I saw that while f-m-o was checking the uses of
> > the folded registers, it was still modifying registers that were at the BB's
> > live_out set.
> > 
> > I have attached a patch that I'm testing for addressing this. Could you
> > please check if this fixes this issue?
> 
> I'm starting a bootstrap on m68k-linux-gnu with this now, should know by
> Friday if it resolves the issue or not. (It's running in full-system
> emulation mode on Aranym, so it's slow.)

Failed with a compile-warning-turned-error, haven't had time to investigate.

In file included from /mnt/scratch/gcc-15-20240602/gcc/system.h:726,
 from /mnt/scratch/gcc-15-20240602/gcc/gimple-range-edge.cc:24:
In member function 'wide_int_storage& wide_int_storage::operator=(const T&)
[with T = wi::hwi_with_prec]',
inlined from 'generic_wide_int&
generic_wide_int::operator=(const T&) [with T = wi::hwi_with_prec; storage =
wide_int_storage]' at /mnt/scratch/gcc-15-20240602/gcc/wide-int.h:1002:23,
inlined from 'void irange_bitmask::set_unknown(unsigned int)' at
/mnt/scratch/gcc-15-20240602/gcc/value-range.h:165:27,
inlined from 'virtual void irange::set_varying(tree)' at
/mnt/scratch/gcc-15-20240602/gcc/value-range.h:1140:25,
inlined from 'int_range::int_range(tree) [with unsigned int N
= 3; bool RESIZABLE = true]' at
/mnt/scratch/gcc-15-20240602/gcc/value-range.h:1102:15,
inlined from 'void gimple_outgoing_range::calc_switch_ranges(gswitch*)' at
/mnt/scratch/gcc-15-20240602/gcc/gimple-range-edge.cc:140:36:
/mnt/scratch/gcc-15-20240602/gcc/wide-int.h:1241:23: error:
'default_range.int_range<3,
true>::.irange::m_bitmask.irange_bitmask::m_value.generic_wide_int::.wide_int_storage::u.wide_int_storagevalp' may be used uninitialized [-Werror=maybe-uninitialized]
 1241 | XDELETEVEC (u.valp);
/mnt/scratch/gcc-15-20240602/gcc/../include/libiberty.h:370:48: note: in
definition of macro 'XDELETEVEC'
  370 | #define XDELETEVEC(P)   free ((void*) (P))
  |^
/mnt/scratch/gcc-15-20240602/gcc/gimple-range-edge.cc: In member function 'void
gimple_outgoing_range::calc_switch_ranges(gswitch*)':
/mnt/scratch/gcc-15-20240602/gcc/gimple-range-edge.cc:140:17: note:
'default_range' declared here
  140 |   int_range_max default_range (type);
  | ^
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1197: gimple-range-edge.o] Error 1

[Bug target/115353] [14/15 regression] Missed thumb2 table branch instruction optimisations since r14-4946-g7006e5d2d7b5b2

2024-06-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115353

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-05
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Richard Earnshaw  ---
Confirmed.

[Bug c++/115356] New: not a constant expression can be used as non-type template argument inside requires expression

2024-06-05 Thread fchelnokov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115356

Bug ID: 115356
   Summary: not a constant expression can be used as non-type
template argument inside requires expression
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fchelnokov at gmail dot com
  Target Milestone: ---

This program

```
template 
struct A {};

constexpr bool foo(int v) {
 int && x = int{v};
 return requires { 
typename A;
typename A::T;
};
}

static_assert( foo(1) );
```

is rejected in Clang and MSVC, because `x` is not initialized by a constant
expression and because there is no type `A<1>::T`, but GCC accepts the program
just fine. Online demo: https://gcc.godbolt.org/z/o9oEee5ve

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Richard Biener  changed:

   What|Removed |Added

 Target||powerpc64le
   Keywords||wrong-code

--- Comment #2 from Richard Biener  ---
wild guess - store-with-len with bogus initial len/bias value?

[Bug c++/115357] New: template argument deduction/substitution failed on lambda function

2024-06-05 Thread dongkyun.s at samsung dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115357

Bug ID: 115357
   Summary: template argument deduction/substitution failed on
lambda function
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dongkyun.s at samsung dot com
  Target Milestone: ---

Hello

Noticed compilation failed with https://godbolt.org/z/8s78YPWs3 example code
starting from GCC-13.

: In lambda function:
:10:12: error: no matching function for call to 'foo(const char [])'
   10 | foo(STR);
  | ~~~^
:2:6: note: candidate: 'template void foo(const char (&)[N])'
2 | void foo(const char (&data)[N]) {}
  |  ^~~
:2:6: note:   template argument deduction/substitution failed:
:10:12: note:   mismatched types 'const char [N]' and 'const char []'
   10 | foo(STR);
  | ~~~^
ASM generation compiler returned: 1

Any idea into this failure ?
Thank you in advance !

[Bug c++/115358] New: template argument deduction/substitution failed on lambda function

2024-06-05 Thread dongkyun.s at samsung dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

Bug ID: 115358
   Summary: template argument deduction/substitution failed on
lambda function
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dongkyun.s at samsung dot com
  Target Milestone: ---

Hello

Noticed compilation failed with https://godbolt.org/z/8s78YPWs3 example code
starting from GCC-13.

: In lambda function:
:10:12: error: no matching function for call to 'foo(const char [])'
   10 | foo(STR);
  | ~~~^
:2:6: note: candidate: 'template void foo(const char (&)[N])'
2 | void foo(const char (&data)[N]) {}
  |  ^~~
:2:6: note:   template argument deduction/substitution failed:
:10:12: note:   mismatched types 'const char [N]' and 'const char []'
   10 | foo(STR);
  | ~~~^
ASM generation compiler returned: 1

Any idea into this failure ?
Thank you in advance !

[Bug c++/115358] template argument deduction/substitution failed on lambda function

2024-06-05 Thread dongkyun.s at samsung dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

--- Comment #1 from dongkyun.s at samsung dot com ---
Created attachment 58356
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58356&action=edit
example code

example code

[Bug c++/115358] template argument deduction/substitution failed on lambda function

2024-06-05 Thread dongkyun.s at samsung dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

--- Comment #2 from dongkyun.s at samsung dot com ---
This might be related with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106649
applied on GCC-13 but latest clang can build this example though.

[Bug go/87589] [11/12/13/14/15 regression] index0-out.go FAILs

2024-06-05 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87589

--- Comment #11 from Rainer Orth  ---
Created attachment 58357
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58357&action=edit
Proposed patch

Wouldn't the attached patch be TRT then?

Btw., ISTM that this should be unsupported instead of untested, according to
the DejaGnu docs.

[Bug c++/115358] template argument deduction/substitution failed on lambda function

2024-06-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

--- Comment #3 from Jonathan Wakely  ---
*** Bug 115357 has been marked as a duplicate of this bug. ***

[Bug c++/115357] template argument deduction/substitution failed on lambda function

2024-06-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115357

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Jonathan Wakely  ---
reported twice

*** This bug has been marked as a duplicate of bug 115358 ***

[Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137

--- Comment #15 from Jan Hubicka  ---
> As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035#c13 , 
> gcc
> already assume operator new's retuned pointer cannot alias any existing
> pointer. So no change is needed there.
Seems you are right, -fsane-operator-new is weaker than I assumed.
In addition to the assumption about modifying globals, one missed
optimization is removal of unused vectors (we have separate PR for that)

#include 
void foo (int *);
int test()
{
std::vector  a(1);
return 0;
}
int test2(int size)
{
int *a = new int;
*a=0;
delete a;
return 0;
}
int test3(int size)
{
int *a = (int *)::operator new (sizeof (int));
*a=0;
::operator delete (a);
return 0;
}
At -O2 Clang optimizes test and test2 "return 0", while we optimize only
test2.  libstdc++ uses clang's __builtin_operator_new that enables this
optimization based on claim that user's possibly insane new/delete can
not rely on internals of std::vector implementation and thus can not
reliably use memory being deleted.

I expected that -fassume-sane-operator-new should enable clang or GCC to
do the optimization for all three variants, but it does not (for clang).
I think it would be useful to have such a flag. There are number of
transformations we may want to do including
 - removal of dead stores to memory being deleted
 - removal of unnecesary reallocations i.e. when user creates empty
   vector and does sequence of push_backs.
 - avoiding new/delete for small vectors that fits to stack.
I believe those are sane for builtin_operator_new/delete. However since
clang does not define it this way, perhaps we will need
-fassume-really-sane-operator-new-delete for that?

It would be also nice to be able to determine functions that does paired
new/delete but no other side effects as pure/const etc.

BTW clang also does not optimize out load from global here:
int test3(int size)
{
global = 0;
int *a = (int *)::operator new (sizeof (int));
*a=0;
return global;
}
I would expect -fassume-sane-operator-new to make it possible?

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Peter Bergner  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||linkw at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #3 from Peter Bergner  ---
I'll find someone to look into this.  Thanks for the test case!

[Bug c++/115358] [13/14/15 Regression] template argument deduction/substitution failed in generic lambda function

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

Andrew Pinski  changed:

   What|Removed |Added

Summary|template argument   |[13/14/15 Regression]
   |deduction/substitution  |template argument
   |failed in generic lambda|deduction/substitution
   |function|failed in generic lambda
   ||function
   Last reconfirmed||2024-06-05
   Keywords||needs-bisection,
   ||rejects-valid
 Ever confirmed|0   |1
   Target Milestone|--- |13.4
 Status|UNCONFIRMED |NEW

--- Comment #4 from Andrew Pinski  ---
Confirmed.

[Bug c++/115358] [13/14/15 Regression] template argument deduction/substitution failed in generic lambda function

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

--- Comment #5 from Andrew Pinski  ---
Note this is not related to NSDMI nor related to use of STR in a non-complete
type context as shown by:
```
template 
void foo(const int (&data)[N]) {}

template 
struct Bar
{
static constexpr int STR[] = {1,2,3};
Bar();
};

template
Bar::Bar()
{
[](auto){
foo(STR);
   };
}

int main()
{
Bar{};
}
```

[Bug target/115342] [14/15 Regression] AArch64: Function multiversioning initialization incorrect

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115342

--- Comment #1 from GCC Commits  ---
The master branch has been updated by Wilco Dijkstra :

https://gcc.gnu.org/g:d7cbcfe7c33645eaf95f175f19884d443817857b

commit r15-1036-gd7cbcfe7c33645eaf95f175f19884d443817857b
Author: Wilco Dijkstra 
Date:   Wed Jun 5 14:04:33 2024 +0100

AArch64: Fix cpu features initialization [PR115342]

The CPU features initialization code uses CPUID registers (rather than
HWCAP).  The equality comparisons it uses are incorrect: for example
FEAT_SVE
is not set if SVE2 is available.  Using HWCAPs for these is both simpler
and
correct.  The initialization must also be done atomically to avoid multiple
threads causing corruption due to non-atomic RMW accesses to the global.

libgcc:
PR target/115342
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Use HWCAP where possible.  Use atomic write for initialization.
Fix FEAT_PREDRES comparison.
(__init_cpu_features_resolver): Use atomic load for correct
initialization.
(__init_cpu_features): Likewise.

[Bug target/115083] undefined reference for aarch64-w64-mingw32 target

2024-06-05 Thread Evgeny.Karpov at microsoft dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115083

Evgeny Karpov  changed:

   What|Removed |Added

 CC||Evgeny.Karpov at microsoft dot 
com

--- Comment #7 from Evgeny Karpov  ---
Thank you for using the new aarch64-w64-mingw32 target. It seems that C++ is
being used in the configuration. This is not currently supported, but minimal
C++ support without SEH is coming soon, most likely with patch series 3 or 4 in
June or July. We are currently upstreaming patch series 2 and preparing patch
series 3. Please exclude C++ from configuration.

[Bug go/87589] [11/12/13/14/15 regression] index0-out.go FAILs

2024-06-05 Thread ian at airs dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87589

--- Comment #12 from Ian Lance Taylor  ---
Sure, we can do that patch for now.  Thanks.  unsupported is fine too.

Let's not close the bug, though.  The real fix is to not put very large objects
on the stack--we don't want to do that for split-stack either.  There should be
some size cut off where we push objects onto the heap.  But we don't have to do
that today.

[Bug target/108678] Windows on ARM64 platform target aarch64-w64-mingw32

2024-06-05 Thread Evgeny.Karpov at microsoft dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108678

Evgeny Karpov  changed:

   What|Removed |Added

 CC||Evgeny.Karpov at microsoft dot 
com

--- Comment #15 from Evgeny Karpov  ---
It looks like the same issue which was described here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115083
Please specify only C language in the configuration for now.

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-05
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
Thanks for reporting, I'll have a look first.

[Bug lto/115359] New: ICE in warn_types_mismatch: lto1: internal compiler error: Segmentation fault

2024-06-05 Thread a.horodniceanu at proton dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115359

Bug ID: 115359
   Summary: ICE in warn_types_mismatch: lto1: internal compiler
error: Segmentation fault
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: a.horodniceanu at proton dot me
  Target Milestone: ---

Given the two source files:

a.cpp:
-
struct Foo { };
void bar(Foo foo);
int main() {
bar({});
}


a.d:

extern(C++):
struct Foo { }
void bar(Foo foo) { }


Compile them with: `g++ a.d a.cpp -flto -freport-bug`:

a.cpp:2:6: warning: ‘bar’ violates the C++ One Definition Rule [-Wodr]
2 | void bar(Foo foo);
  |  ^
a.d:3:6: note: type mismatch in parameter 1
3 | void bar(Foo foo) { }
  |  ^
lto1: internal compiler error: Segmentation fault
0x5654dba30e94 internal_error(char const*, ...)
???:0
0x5654dbaccbd1 xstrdup
???:0
0x5654da59f465 warn_types_mismatch(tree_node*, tree_node*, unsigned int,
unsigned int)
???:0
0x5654da30d1e1 lto_symtab_merge_decls()
???:0
0x5654da3154a2 read_cgraph_and_symbols(unsigned int, char const**)
???:0
0x5654da2fdce6 lto_main()
???:0
Please submit a full bug report, with preprocessed source.
Please include the complete backtrace with any bug report.
See  for instructions.
lto-wrapper: fatal error: g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/x86_64-pc-linux-gnu/14/../../../../x86_64-pc-linux-gnu/bin/ld:
error: lto-wrapper failed
collect2: error: ld returned 1 exit status


g++ --version:

g++ (Gentoo Hardened 14.1.1_p20240518 p1) 14.1.1 20240516
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


I think this can be fixed with:

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index a7ce434bf..efeb3766c 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -1084,7 +1084,9 @@ warn_types_mismatch (tree t1, tree t2, location_t loc1,
location_t loc2)
   if (odr1 != NULL && odr2 != NULL && odr1 != odr2)
 {
   const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
-  char *name1 = xstrdup (cplus_demangle (odr1, opts));
+  char *name1 = cplus_demangle (odr1, opts);
+  if (name1)
+ name1 = xstrdup(name1);
   char *name2 = cplus_demangle (odr2, opts);
   if (name1 && name2 && strcmp (name1, name2))
{

but I'm not sure if cplus_demangle failing to demangle the D symbol is the real
problem.

[Bug testsuite/111658] test-function-bodies fails to find functions with single-letter names

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111658

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Fixed by r15-1035-gacdc9df371fbe99e814a3f35a439531e08af79e7
(https://gcc.gnu.org/pipermail/gcc-cvs/2024-June/403789.html).

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #5 from Peter Bergner  ---
FYI, fails for me with gcc 12 and later and works with gcc 11.  It also fails
with -O3 -mcpu=power10.

[Bug target/115360] New: cmse_nonsecure_call wrapper missing STT_FUNCTION

2024-06-05 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115360

Bug ID: 115360
   Summary: cmse_nonsecure_call wrapper missing STT_FUNCTION
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: avieira at gcc dot gnu.org
  Target Milestone: ---

The Arm ABI requires a linker to handle calls to 'distant' functions by
inserting a wrapper veneer, or trampoline.  Such functions need to be given
permission to do this by marking them as type STT_FUNC (so that it is clear
that there is a scratch register available for use within the veneer). 
Unfortunately, __gnu_cmse_nonesecure_call is not marked at all (defaulting to
STT_NOTYPE).  A separate bug in GNU ld means this problem is not diagnosed at
link time, and the linker silently picks the wrong veneer type into the
bargain, leading to run-time crashes when the CPU is asked to switch into Arm
state on a thumb-only processor.

[Bug target/115360] cmse_nonsecure_call wrapper on arm missing STT_FUNCTION

2024-06-05 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115360

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-05
Summary|cmse_nonsecure_call wrapper |cmse_nonsecure_call wrapper
   |missing STT_FUNCTION|on arm missing STT_FUNCTION

--- Comment #1 from Richard Earnshaw  ---
Confirmed by observation.

[Bug c++/115356] a reference to a non-constant integer expression can be used as non-type template argument inside requires expression

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115356

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-05
 Ever confirmed|0   |1
Summary|not a constant expression   |a reference to a
   |can be used as non-type |non-constant integer
   |template argument inside|expression can be used as
   |requires expression |non-type template argument
   ||inside requires expression
   Keywords||accepts-invalid
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
A const reference has the same issue as rvalue reference.

Confirmed.

[Bug c++/115361] New: "possibly dangling reference to a temporary" when object is_empty

2024-06-05 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115361

Bug ID: 115361
   Summary: "possibly dangling reference to a temporary" when
object is_empty
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: arthur.j.odwyer at gmail dot com
  Target Milestone: ---

Bug #115229 is related.

// https://godbolt.org/z/os63oEEax

struct GetKey {
  int& rn(  int& x) const { return  x; }
const int& rc(const int& x) const { return  x; }
  int* pn(  int& x) const { return &x; }
const int* pc(const int& x) const { return &x; }

static   int& srn(  int& x) { return  x; }
static const int& src(const int& x) { return  x; }
};

int f0(int  t) { const int& x = GetKey().rn(t); return x; } // False positive
int f1(int  t) {   int& x = GetKey().rn(t); return x; } // OK
int f2(int  t) { const int& x = GetKey().rc(t); return x; } // False positive
int f3(char c) { const int& x = GetKey().rc(c); return x; } // True(*) positive
int f4(int  t) { const int* x = GetKey().pn(t); return *x; } // OK (compare f0)
int f5(int  t) {   int* x = GetKey().pn(t); return *x; } // OK
int f6(int  t) { const int* x = GetKey().pc(t); return *x; } // OK (compare f2)
int f7(char c) { const int* x = GetKey().pc(c); return *x; } // False negative
(compare f3)
int f8(int  t) { const int& x = GetKey().srn(t); return x; } // OK (compare f0)
int f9(char c) { const int& x = GetKey().src(c); return x; } // True(*)
positive (compare f3)

// f9 in 14.1 is a true(*) positive; it had been a false negative in 13.1.
// f7 in 14.1 gives -Wuninitialized on the use of `x`, but fails to give
-Wdangling-reference!


GCC 14.1 complains---

:12:29: warning: possibly dangling reference to a temporary
[-Wdangling-reference]
   12 | int f0(int  t) { const int& x = GetKey().rn(t); return x; } // False
positive
  | ^
:12:44: note: the temporary was destroyed at the end of the full
expression 'GetKey().GetKey::rn(t)'
   12 | int f0(int  t) { const int& x = GetKey().rn(t); return x; } // False
positive
  | 

I notice that GCC 13.1 includes -Wdangling-reference in -Wall, but then 13.2
removed it from -Wall again, presumably because of all the false positives. But
then in 14.1 it's back in -Wall causing trouble again! IMO it doesn't even
belong in -Wextra right now; but I see how you have to ship it in order to find
out how it works in practice. (Still, the answer is "it doesn't work great.")

The warning message could be improved immediately by changing the phrase "the
temporary was destroyed" to "a temporary of type 'GetKey' was destroyed". And
then change "reference *to* a temporary" to "reference *into* a temporary,"
since obviously we have no reference to the 'GetKey' temporary anywhere in this
code.

Another reason to mention the type of the temporary is that the naïve
programmer (me!) might assume that the compiler was giving the warning because
of cases like `f3`, which legitimately does dangle; but then it makes no sense
that `f7` is accepted quietly! The solution to that puzzle is that GCC doesn't
care about the *actual dangling* in `f3` at all; the warning is essentially a
false positive about the temporary of type `GetKey`; GCC doesn't even realize
that we have a temporary of type `int`! By mentioning the type of the
temporary, you'll help the programmer build the correct mental model (of what
the compiler is diagnosing) more quickly, with less trial-and-error.

---

The big problem with this diagnostic, for programmers, is that it's so
encumbered with heuristics that it's not easily actionable. My coworkers
discovered by trial and error that you can get around it by replacing our
code's

const int& x = GetKey()(t);

with

static constexpr GetKey g;
const int& x = g(t);

(here `GetKey` is a template policy parameter, basically something like
`std::hash`, with an `operator()` — which also means that we cannot work
around the problem by making the function `static`, as shown in `src` above).
But this is dumb. Another more generic solution — proposed only tongue-in-cheek
— was

// https://godbolt.org/z/ezE9KjKnP
const int& x = (+[](const int& z) -> decltype(auto) { return GetKey()(z);
})(t);

---

Since the heuristic is concerned only with dangling references to *members of
the GetKey object itself*, could you maybe add yet another heuristic to
suppress the warning whenever the GetKey object is_empty?

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-06-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545

Jonathan Wakely  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-June/65
   ||3731.html
   Keywords||patch

--- Comment #9 from Jonathan Wakely  ---
Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html

Rerunning benchmarks with this patch would be very welcome.

[Bug c++/115361] "possibly dangling reference to a temporary" when object is_empty

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115361

--- Comment #1 from Andrew Pinski  ---
GetKey() is the temporary in all cases.

[Bug target/115355] [12/13/14/15 Regression] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |12.4
Summary|PPCLE: Auto-vectorization   |[12/13/14/15 Regression]
   |creates wrong code for  |PPCLE: Auto-vectorization
   |Power9  |creates wrong code for
   ||Power9

[Bug driver/103949] gcc fails to provide a standard conforming C11 or C++17 environment even when specifying -std=c11 or -std=c++17

2024-06-05 Thread frankhb1989 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103949

frankhb1989 at gmail dot com changed:

   What|Removed |Added

 CC||frankhb1989 at gmail dot com

--- Comment #19 from frankhb1989 at gmail dot com ---
(In reply to Jonathan Wakely from comment #18)
> (In reply to Jörn Heusipp from comment #17)
> > All these issues are a tremendous user experience nightmare,
> 
> OK.
> 
>  and it sadly
> > looks like I absolutely have to point out every single one of them
> > explicitly by asking every single detail question possible, so that you
> > actually can in fact feel and realize the mess you have created for users to
> > deal with.
> 
> I think you'll just make people ignore your pages of ranting.
> 
> Patches to improve the docs are welcome.

It may be quite difficult to improve the docs without the first step from the
maintainers to make the sensible default clear enough. Anyway, whether the
issue a bug or an enhancement depends on how the spec says, but this does not
work when the spec in the doc is just missing. Besides the supported standards
(which are specs), users have to guess what features are expected by default,
or silently accept the status quo (everything is by design). This will need
additional communication between the maintainers to prevent real bugs being
ignored, hence, inefficient and error-prone.

This even happens when the spec is clear. For example, [intro.multithread]/1
explicitly allows the programs under a hosted implementation having concurrent
threads (not in C++98/03, though), and whether multi-threading is supported in
a freestanding implementation is implementation-defined. As I've checked
(https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Implementation.html) this entry
is missing, which is a bug because it fails to meet the mandated requirements
for conformance in the spec (the standard). However, usually users have no
enough knowledge to fix it. Virtually only the maintainers know what should be
here. No *policies* are visible for others.

In this issue, about the some parts of C++, silent degradation of performance
is certainly bad, so keeping `-pthread` away by default makes sense, esp. for
programs without knowledge of multi-threading environment (which can be at
least conforming to C++98/03) as the assumptions. It is also not a bug in the
sense that the standard actually allows single-threading if the doc bug above
is fixed (with the acknowledgement that single-thread model for host
implementations are not conforming after C++03). However, chasing for
performance over other concerns cannot be the policy in general. In
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937 the direction is actually
the opposite. AFIAK all the primary platforms in the release criteria
(https://gcc.gnu.org/gcc-14/criteria.html) support ELF and (most of) POSIX, so
it seems that preventing to sacrifice the features available on these platforms
is one of the candidate of the policies. But this is just my guess, and
specific for the releases; I fail to find any closer to guarantees in other GCC
docs. So, please clarify such concerned meta issues to reduce potential
disagreements at first.

[Bug c++/115362] New: fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Bug ID: 115362
   Summary: fixed_size_simd dot product recognition not working
for stdx::reduce
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jondaniel879 at gmail dot com
  Target Milestone: ---

namespace stdx = std::experimental;
using namespace stdx::parallelism_v2;

template, const
stdx::fixed_size_simd> && ...)>>
static inline constexpr T dot(FIRST first, OTHER&&... other)
{
   return stdx::reduce(first * (... * std::forward(other)));
}

doesn't generate vdpp(s/d) on AVX machines.

[Bug target/111376] missed optimization of one bit test on MIPS32r1

2024-06-05 Thread syq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #5 from YunQiang Su  ---
I copy the RTL pattern from RISC-V, and it seems work

```
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -6253,6 +6253,40 @@ (define_insn "*branch_bit_inverted"
 }
   [(set_attr "type" "branch")
(set_attr "branch_likely" "no")])
+
+(define_insn_and_split "*branch_on_bit"
+  [(set (pc)
+   (if_then_else
+   (match_operator 0 "equality_operator"
+   [(zero_extract:GPR (match_operand:GPR 2 "register_operand" "d")
+(const_int 1)
+(match_operand:GPR 3 "const_int_operand"))
+(const_int 0)])
+   (label_ref (match_operand 1))
+   (pc)))]
+  "!ISA_HAS_BBIT && !ISA_HAS_EXT_INS && !TARGET_MIPS16"
+  "#"
+  "!reload_completed"
+  [(set (match_dup 4)
+   (ashift:GPR (match_dup 2) (match_dup 3)))
+   (set (pc)
+   (if_then_else
+   (match_op_dup 0 [(match_dup 4) (const_int 0)])
+   (label_ref (match_operand 1))
+   (pc)))]
+{
+  int shift = GET_MODE_BITSIZE (mode) - 1 - INTVAL (operands[3]);
+  operands[3] = GEN_INT (shift);
+  operands[4] = gen_reg_rtx (mode);
+
+  if (GET_CODE (operands[0]) == EQ)
+operands[0] = gen_rtx_GE (mode, operands[4], const0_rtx);
+  else
+operands[0] = gen_rtx_LT (mode, operands[4], const0_rtx);
+}
+[(set_attr "type" "branch")])
+
+
```

[Bug c++/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #1 from Jon Daniel  ---
the generated code should be similar to the following using __m128 as
FIRST/OTHER type for floating point.

inline constexpr uint8_t mask4dp(size_t n)
{
switch(n)
{
case 1: return 0xff >> 3;
case 2: return 0xff >> 2;
case 3: return 0xff >> 1;
case 4: return 0xff >> 0;
}
}

template
static inline constexpr FIRST dot(FIRST first, OTHER&&... other)
{
   return _mm_dp_ps(first, (... * std::forward(other)), mask4dp(N));
}

[Bug tree-optimization/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|c++ |tree-optimization

--- Comment #2 from Andrew Pinski  ---
Can you provide a full compilable testcase?

[Bug ada/115349] GNAT infers the wrong Accum_Type for a Reducer expression

2024-06-05 Thread devotus at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115349

--- Comment #2 from Jack Perry  ---
Sorry, that's what I mean by "crash":
```
raised CONSTRAINT_ERROR : intvec.adb:14 range check failed
```

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

--- Comment #3 from Andrew Pinski  ---
I think for SVE(2?) this could be vectorized using the fault first case.

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

Tamar Christina  changed:

   What|Removed |Added

 Blocks||115130

--- Comment #4 from Tamar Christina  ---
Since there's only one source here, alignment peeling should be enough to
vectorize it.

our pending patches should support it.  Will add it to verify list.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
[Bug 115130] [meta-bug] early break vectorization

[Bug tree-optimization/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #3 from Jon Daniel  ---
Created attachment 58358
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58358&action=edit
compilable testcase

Compile:

g++ -march=native -mfpmath=sse -mveclibabi=svml -O3 -std=gnu++26 dotsimd.cpp -o
dotsimd

Run:

./dotsimd

[Bug tree-optimization/115363] New: Missing loop vectorization due to loop bound load not being pulled out

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115363

Bug ID: 115363
   Summary: Missing loop vectorization due to loop bound load not
being pulled out
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
struct out {
  unsigned *array;
};

struct m{
  void f(out *output);
  void f1(out *output);
  int size;
};

void  m::f(out *output)
{
  for (int k = 0; k < size; k++) {
output->array[k] += 1;
  }
}

void  m::f1(out *output)
{
  int tmp = size;
  for (int k = 0; k < size; k++) {
output->array[k] += 1;
  }
}
```

We should be able to vectorize `m::f` but currently does not since this->size
might alias array[k].
But we could version the loop to pull out the this->size out of the loop and we
could vectorize the loop then.

[Bug c++/115364] New: ICE-on-invalid when calling non-const template member on const object

2024-06-05 Thread blubban at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115364

Bug ID: 115364
   Summary: ICE-on-invalid when calling non-const template member
on const object
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: blubban at gmail dot com
  Target Milestone: ---

struct foo
{
template void not_const() {}
};
void fn(const foo& obj) { obj.not_const<5>(); }


No flags needed.


: In member function 'void foo::fn() const':
:4:41: error: cannot convert 'const foo*' to 'foo*'
4 | void fn() const { this->not_const<5>(); }
  |   ~~^~
:4:41: internal compiler error: tree check: expected function_decl,
have template_decl in get_fndecl_argument_location, at cp/call.cc:8354
0x26aa1ac internal_error(char const*, ...)
???:0
0x96fd9e tree_check_failed(tree_node const*, char const*, int, char const*,
...)
???:0
0xa7d15f complain_about_bad_argument(unsigned int, tree_node*, tree_node*,
tree_node*, int)
???:0
0xa8c20a build_new_method_call(tree_node*, tree_node*, vec**, tree_node*, int, tree_node**, int)
???:0
0xce8267 finish_call_expr(tree_node*, vec**, bool,
bool, int)
???:0
0xc6a70a c_parse_file()
???:0
0xdbfcf9 c_common_parse_file()
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1


Online reproducer: https://godbolt.org/z/86noE3935


I would've expected something as simple as this to have been tested and fixed
long ago, but apparently not. Probably because it's just a tree check; from
what I've heard, they don't exist in versioned releases of GCC.

[Bug tree-optimization/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #4 from Jon Daniel  ---
Created attachment 58359
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58359&action=edit
dotsimd assembly output with dot_sse only

[Bug tree-optimization/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #5 from Jon Daniel  ---
Created attachment 58360
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58360&action=edit
dotsimd assembly output with dot only

[Bug c++/115364] [11/12/13/14/15 Regression] ICE-on-invalid when calling non-const template member on const object

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115364

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
Summary|ICE-on-invalid when calling |[11/12/13/14/15 Regression]
   |non-const template member   |ICE-on-invalid when calling
   |on const object |non-const template member
   ||on const object
   Last reconfirmed||2024-06-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||diagnostic, error-recovery,
   ||ice-checking

--- Comment #1 from Andrew Pinski  ---
>Probably because it's just a tree check; from what I've heard, they 

Yes it does not show up with release checking set so most folks won't see the
ICE in this case and it is only trying to find the location of the argument
which in this case it is this which does not really have a location.

Anyways confirmed, 99% sure it was introduced by r8-3378-g9003adc732305c .

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #11 from Tamar Christina  ---
(In reply to Richard Biener from comment #10)
> I think the question is why IVOPTs ends up using both the signed and
> unsigned variant of the same IV instead of expressing all uses of both with
> one IV?
> 
> That's where I'd look into.

It looks like this is because of a subtle difference in the expressions.

In get_loop_invariant_expr IVOPTs first tries to strip away all casts with
STRIP_NOPS.

The first expression is (unsigned long) (stride.3_27 * 4) and the second
expression is ((unsigned long) stride.3_27) * 4 (The pretty printing here is
pretty bad...)

So the first one becomes:
  (unsigned long) (stride.3_27 * 4) -> stride.3_27 * 4

and second one:
  ((unsigned long) stride.3_27) * 4 -> ((unsigned long) stride.3_27) * 4

since we don't care about overflow here, it looks like the stripping should
be recursive as long as it's a NOP expression between two integral types.

That would get them to hash to the same IV expression.  Trying now..

[Bug other/115241] header-tools scripts not compatible to python3

2024-06-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115241

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jonathan Wakely  ---
Fixed on trunk now, thanks for the patch

[Bug other/115365] New: New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-06-05 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365

Bug ID: 115365
   Summary: New test case gcc.dg/pr100927.c from
r15-1022-gb05288d1f1e4b6 fails
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:b05288d1f1e4b632eddf8830b4369d4659f6c2ff, r15-1022-gb05288d1f1e4b6

I am seeing this failing on a power10 LE system.

make  -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/pr100927.c"
FAIL: gcc.dg/pr100927.c scan-rtl-dump-times final "(?n)\\(fix:SI" 3

commit b05288d1f1e4b632eddf8830b4369d4659f6c2ff (HEAD)
Author: liuhongt 
Date:   Tue May 21 16:57:17 2024 +0800

Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

* gcc.dg/pr100927.c: New test.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Uroš Bizjak  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #11 from Uroš Bizjak  ---
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat) have equivalent behaviour:
> 
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
> unsigned z;
> if (!__builtin_add_overflow(x, y, &z))
>   return z;
> return -1u;
> }

[...]

> For -O3 on x86_64 GCC uses a branch for the first one:
> 
> add_sat(unsigned int, unsigned int):
> add edi, esi
> jc  .L3
> mov eax, edi
> ret
> .L3:
> or  eax, -1
> ret

The reason for failed if-conversion to cmove is due to the "weird" compare
arguments, the consequence of addsi3_cc_overflow_1 definition:

(insn 9 4 10 2 (parallel [
(set (reg:CCC 17 flags)
(compare:CCC (plus:SI (reg:SI 106)
(reg:SI 107))
(reg:SI 106)))
(set (reg:SI 104)
(plus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil

the noce_try_cmove path fails in noce_emit_cmove:

Breakpoint 1, noce_emit_cmove (if_info=0x7fffd750, x=0x7fffe9fe4e40,
code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
1774return NULL_RTX;
(gdb) list
1766  /* Don't even try if the comparison operands are weird
1767 except that the target supports cbranchcc4.  */
1768  if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770{
1771  if (!have_cbranchcc4
1772  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773  || cmp_b != const0_rtx)
1774return NULL_RTX;
1775}
1776
1777  target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode
},
1778  vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0  noce_emit_cmove (if_info=0x7fffd750, x=0x7fffe9fe4e40, code=LTU,
cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
#1  0x020d995b in noce_try_cmove (if_info=0x7fffd750) at
../../git/gcc/gcc/ifcvt.cc:1884
#2  0x020dec37 in noce_process_if_block (if_info=0x7fffd750) at
../../git/gcc/gcc/ifcvt.cc:4149
#3  0x020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80,
then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
at ../../git/gcc/gcc/ifcvt.cc:4716
#4  0x020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at
../../git/gcc/gcc/ifcvt.cc:4921
#5  0x020e3255 in if_convert (after_combine=true) at
../../git/gcc/gcc/ifcvt.cc:6068

(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
(reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void

The above cmp_a RTX fails general_operand check.

Please note that similar testcase:

unsigned
sub_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_sub_overflow(x, y, &z) ? 0 : z;
}

results in the expected:

subl%esi, %edi  # 52[c=4 l=2]  *subsi_3/0
movl$0, %eax# 53[c=4 l=5]  *movsi_internal/0
cmovnb  %edi, %eax  # 54[c=4 l=3]  *movsicc_noc/0
ret # 50[c=0 l=1]  simple_return_internal

due to:

(insn 9 4 10 2 (parallel [
(set (reg:CC 17 flags)
(compare:CC (reg:SI 106)
(reg:SI 107)))
(set (reg:SI 104)
(minus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":28:12 416 {*subsi_3}
 (expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil

So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should
be improved to handle the above "weird" operand form.

Let's ask Jakub.

[Bug target/115355] [12/13/14/15 Regression] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #6 from Peter Bergner  ---
Created attachment 58361
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58361&action=edit
setToIdentityBAD-char.s

Code generated for setToIdentityBAD.c when using unsigned char for the index
variable.

[Bug target/115355] [12/13/14/15 Regression] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #7 from Peter Bergner  ---
The test fails when setToIdentityBAD's index var is unsigned int.  It passes
when using unsigned long long, unsigned long, unsigned short and unsigned char.
 When using unsigned long long/unsigned long, we do no vectorize the loop.  We
vectorize the loop when using unsigned int/short/char.  The vectorized code is
a little strange, in that the smaller the integer type we use for the index
var, the more code we generate.  

The vectorized code for unsigned char is truly huge!  ...although it does seem
to work correctly.  I'm attaching the "unsigned char i" code gen for
setToIdentityBAD for people to examine.  Even though it gives "correct"
results, it can't really be the code we want to generate, correct???

[Bug target/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Jon Daniel  changed:

   What|Removed |Added

  Attachment #58358|0   |1
is obsolete||

--- Comment #6 from Jon Daniel  ---
Created attachment 58362
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58362&action=edit
dot product and determinant testcase

[Bug target/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #7 from Jon Daniel  ---
sign of determinant result using the dot product differs from clang++ generated
binary

[Bug target/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Jon Daniel  changed:

   What|Removed |Added

  Attachment #58362|0   |1
is obsolete||

--- Comment #8 from Jon Daniel  ---
Created attachment 58363
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58363&action=edit
dot product and determinant testcase

[Bug target/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Jon Daniel  changed:

   What|Removed |Added

  Attachment #58363|0   |1
is obsolete||

--- Comment #9 from Jon Daniel  ---
Created attachment 58364
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58364&action=edit
dot product and determinant testcase

[Bug target/115362] fixed_size_simd dot product recognition not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Jon Daniel  changed:

   What|Removed |Added

  Attachment #58364|0   |1
is obsolete||

--- Comment #10 from Jon Daniel  ---
Created attachment 58365
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58365&action=edit
dot product and determinant testcase

[Bug target/115362] fixed_size_simd dot product recognition and sign of determinant not working for stdx::reduce

2024-06-05 Thread jondaniel879 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

--- Comment #11 from Jon Daniel  ---
g++ output:

dot_product stdx::reduce: -16.00
dot_product_mm_dp_ps: -16.00
determinant: dot_product: 717.00
determinant: submatrices: -717.00

clang++ output:

dot_product stdx::reduce: -16.00
dot_product_mm_dp_ps: -16.00
determinant: dot_product: -717.00
determinant: submatrices: -717.00

[Bug target/115351] [14/15 regression] pointless movs when passing by value on x86-64

2024-06-05 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351

Roger Sayle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com

--- Comment #3 from Roger Sayle  ---
I have a fix (to ix86_rtx_costs) that I'm bootstrapping and regression testing.
By making concatditi3 slightly cheaper than *insvti_highpart (COSTS_N_INSN(2)
vs. COSTS_N_INSNS(2)+1), fwprop is able to do the right.

[Bug c++/115366] New: Missing optimzation: fold `return (bool)(((a / 8) * 4) << f)` to `return (bool)(a / 8)`

2024-06-05 Thread zhiwuyazhe154 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115366

Bug ID: 115366
   Summary: Missing optimzation: fold `return (bool)(((a / 8) * 4)
<< f)` to `return (bool)(a / 8)`
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhiwuyazhe154 at gmail dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/c9zePoc31

code example:
bool fn1(short a, bool f) {
return ((a / 8) * 4) << f; // equals to return a / 8;
}

In this case, it is actually equivalent to calculating 8 | a.

GCC -O3:
fn1(short, bool):
testdi, di
lea eax, [rdi+7]
mov ecx, esi
cmovns  eax, edi
sar ax, 3
cwde
sal eax, 2
sal eax, cl
testeax, eax
setne   al
ret

Expected Code(CLANG -O3):
fn1(short, bool):   
add edi, -8
cmp di, -15
setbal
ret

[Bug target/114428] [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428

--- Comment #1 from GCC Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:7876cde25cbd2f026a0ae488e5263e72f8e9bfa0

commit r15-1047-g7876cde25cbd2f026a0ae488e5263e72f8e9bfa0
Author: liuhongt 
Date:   Fri Apr 19 10:29:34 2024 +0800

Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
  (ashifrt:v8hi A 8)
  (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

gcc/ChangeLog:

PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428-1.c: New test.

[Bug target/114428] [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428

--- Comment #2 from GCC Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:961dd0d635217c703a38c48903981e0d60962546

commit r15-1048-g961dd0d635217c703a38c48903981e0d60962546
Author: liuhongt 
Date:   Fri Apr 19 10:39:53 2024 +0800

Adjust rtx_cost for MEM to enable more simplication

For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
variants in ix86_vector_duplicate_simode_const.
Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
bit larger than broadcast.

gcc/ChangeLog:
PR target/114428
* config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
CONST_VECTOR_DUPLICATE_P in constant_pool.
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Remove static.
* config/i386/i386-protos.h (ix86_broadcast_from_constant):
Declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428.c: New test.

[Bug target/114428] [x86] psrad xmm, xmm, 16 and pand xmm, const_vector (0xffff x4) can be optimized to psrld

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114428

Hongtao Liu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Hongtao Liu  ---
Fixed in GCC15.

[Bug tree-optimization/98909] Failure to optimize odd loop pattern

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98909

--- Comment #3 from Andrew Pinski  ---
Note this is very similar to PR 112104, in that `~a` can be treated as `a ^
-1`.

[Bug other/115365] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-06-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365

--- Comment #1 from Hongtao Liu  ---
pr100927.c.349r.final:(fix:SI (reg:SF 32 0 [120])))
"../../gcc/intel-innersource/pr115365/gcc/testsuite/gcc.dg/pr100927.c":12:10
428 {*fix_truncsfsi2_p8}
pr100927.c.349r.final: (expr_list:REG_EQUIV (fix:SI (const_double:SF
2.147483648e+9 [0x0.8p+32]))
pr100927.c.349r.final:(fix:SI (reg:SF 32 0 [120])))
"../../gcc/intel-innersource/pr115365/gcc/testsuite/gcc.dg/pr100927.c":21:10
428 {*fix_truncsfsi2_p8}
pr100927.c.349r.final: (expr_list:REG_EQUIV (fix:SI (const_double:SF -Inf
[-Inf]))
pr100927.c.349r.final:(fix:SI (reg:SF 32 0 [120])))
"../../gcc/intel-innersource/pr115365/gcc/testsuite/gcc.dg/pr100927.c":30:10
428 {*fix_truncsfsi2_p8}

there're 5 fix:SI in the final dump.

[Bug tree-optimization/115366] Missing optimzation: fold `(bool)(a<< boolvalue)` to `(bool)(a)`

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115366

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
Summary|Missing optimzation: fold   |Missing optimzation: fold
   |`return (bool)(((a / 8) *   |`(bool)(a<< boolvalue)` to
   |4) << f)` to `return|`(bool)(a)`
   |(bool)(a / 8)`  |
   Last reconfirmed||2024-06-06
   Severity|normal  |enhancement
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
The missed optimization here is really just:
```
bool fn1(unsigned short a, bool f) {
return a << f != 0; // equals to return a / 8;
}
```
is not optimized to:
```
bool fn1(unsigned short a, bool f) {
return a != 0; // equals to return a / 8;
}
```

Since `((int)a) << [0,1] != 0` is the same as `((int)a) != 0`.

After that, GCC already has the rest.

[Bug analyzer/111567] RFE: support __attribute__((counted_by)) in -fanalyzer

2024-06-05 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111567

Eric Gallager  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-06
 Status|UNCONFIRMED |NEW
   Keywords||diagnostic
 CC||egallager at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Eric Gallager  ---
Confirmed.

[Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic

2024-06-05 Thread mail+gcc at nh2 dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

Bug ID: 115367
   Summary: The implementation of OMP_DYNAMIC is not dynamic
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mail+gcc at nh2 dot me
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Please see:

"Why does my OpenMP app sometimes use only 1 thread, sometimes 3, sometimes all
cores?"

https://stackoverflow.com/questions/78584145/why-does-my-openmp-app-sometimes-use-only-1-thread-sometimes-3-sometimes-all-c/78584146

OMP_DYNAMIC is implemented like this (on Linux, likely other platforms):

https://github.com/gcc-mirror/gcc/blob/10cb3336ba1ac89b258f627222e668b023a6d3d4/libgomp/config/linux/proc.c#L180-L188

/* When OMP_DYNAMIC is set, at thread launch determine the number of
   threads we should spawn for this team.  */
/* ??? I have no idea what best practice for this is.  Surely some
   function of the number of processors that are *still* online and
   the load average.  Here I use the number of processors online
   minus the 15 minute load average.  */

unsigned
gomp_dynamic_max_threads (void) {
// ...
return n_onln - loadavg;


### `OMP_DYNAMIC` (of `libgomp`) is really bad

* Because of this logic, your app will use only 1 thread, even though the
system is completely idle _now_, just because it was busy 10 minutes ago.
* The dynamic limit is determined _at process start_ (loading time), and fixed
forever.
  * So started programs stay slow _forever_ if they were started at a time 5
minutes after the system was busy.
* It means a server can never achieve full utilisation when working down a
queue of jobs.
  * Say you have 8 cores, and a queue of N jobs to process, each of which takes
15 minutes full-CPU.
  * The first jobs starts at 0 15-min-utilisation, thus using all cores.
  * The next job starts, using only 1 core.
  * The next job starts, using only 7 cores.
  * The next job starts, using only 1 cores.
  * The next job starts, using only 7 cores.
  * ...
  * In the long run, the **server uses only half of its cores on average**.
* It makes performance behaviour completely irreproducible.

None of this behaviour is documented

* in [`libgomp`](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fDYNAMIC.html)
* or in the [OpenMP spec for
`OMP_DYNAMIC`](https://www.openmp.org/spec-html/5.0/openmpse51.html).

Those docs sound like the behaviour is nice "runtime-dynamic" when in fact it
is fixed across the process's liftime, and based on ultra-slow rolling
averages.


I argue that libgomp does not implement the OpenMP spec well here.

It says

> OpenMP implementation may adjust the number of threads to use for executing 
> parallel regions in order to optimize the use of system resources


Thus suggests that the OpenMP implementation may do something sensible to
adjust the number of threads "DYNAMIC"ally.

Nowhere does it say that it should determine this at the start of the process,
and never adjust it again.

That's the opposite of "dynamic"!

And then combined with a very-much-not-dynamic 15 minutes delay.

I read the spec text as "do something sensible like GNU make, which checks the
(short-term!) loadavg()" of the current system periodically and ajusts its
parallelism accordingly".

[Bug tree-optimization/115354] [14/15 Regression] Large -Os code size increase related to -ftree-sra since r14-5831-gaae723d360ca26

2024-06-05 Thread gus at projectgus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115354

--- Comment #2 from Angus Gratton  ---
Thanks Richard, that's very helpful context (I don't quite have my head around
SRA to be honest!)

For non-LTO MicroPython builds (mostly C, no C++), building with -Os
-fno-tree-sra has almost no impact (for GCC versions before this change), and
undoes the code size increase for GCC versions after this change.

For LTO MicroPython builds, -Os -fno-tree-sra consistently results in a big
code size increase. (And there is also a code size increase when building with
the GCC version after this change vs before.)

I'll experiment with the function and see if I can find what is causing later
optimisations to not shrink the code size back down.

[Bug tree-optimization/115354] [14/15 Regression] Large -Os code size increase related to -ftree-sra since r14-5831-gaae723d360ca26

2024-06-05 Thread gus at projectgus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115354

--- Comment #3 from Angus Gratton  ---
Sorry, my notes about LTO builds look like they were wrong.

MicroPython LTO builds with -Os -fno-tree-sra seem to consistently reduce code
size as well, for both the "before" and "after" GCC versions, including undoing
the increase from the "after" version.

  1   2   >