[Bug tree-optimization/111147] bitwise_inverted_equal_p can be used in the `(x | y) & (~x ^ y)` pattern to catch more

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47

--- Comment #2 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:7c04da768c1fc22e0607e3ccad87e2c793499797

commit r14-3540-g7c04da768c1fc22e0607e3ccad87e2c793499797
Author: Andrew Pinski 
Date:   Mon Aug 28 10:04:00 2023 -0700

MATCH: Move `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p

This moves the match pattern `(x | y) & (~x ^ y)` over to use
bitwise_inverted_equal_p.
This now also allows to optmize comparisons and also catches the missed
`(~x | y) & (x ^ y)`
transformation into `~x & y`.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/47
* match.pd (`(x | y) & (~x ^ y)`) Use bitwise_inverted_equal_p
instead of matching bit_not.

gcc/testsuite/ChangeLog:

PR tree-optimization/47
* gcc.dg/tree-ssa/cmpbit-4.c: New test.

[Bug tree-optimization/111147] bitwise_inverted_equal_p can be used in the `(x | y) & (~x ^ y)` pattern to catch more

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |14.0

--- Comment #3 from Andrew Pinski  ---
Fixed.

[Bug tree-optimization/110111] bool patterns that should produce a?b:c

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110111

--- Comment #4 from Andrew Pinski  ---
>/* 1bit `((x ^ y) & m) ^ x` should just be convert into `m ? y : x` early */


Actually it is true for all zero_one_valued_p. Even more if m is just
zere_one_valued_p we could convert it to
(m ? y : x) & 1

[Bug testsuite/111216] [14 regression] instructions counts for vector tests change after r14-3258-ge7a36e4715c716

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111216

Richard Biener  changed:

   What|Removed |Added

   Keywords||testsuite-fail
   Target Milestone|--- |14.0

[Bug c/111219] -Wformat-truncation intentional false negative with %p modifier is undocumented

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111219

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-29
 Ever confirmed|0   |1
   Keywords||documentation
 Status|UNCONFIRMED |NEW

[Bug middle-end/111209] GCC fails to understand adc pattern what its document describes

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111209

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac

commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac
Author: Jakub Jelinek 
Date:   Tue Aug 29 10:46:01 2023 +0200

tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs).  Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.

As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out.  The following patch matches that, e.g.
  _16 = .ADD_OVERFLOW (_1, _2);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _15 = _3 + _4;
  _12 = _15 + _18;
into
  _16 = .UADDC (_1, _2, 0);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _19 = .UADDC (_3, _4, _18);
  _12 = IMAGPART_EXPR <_19>;
so that we can emit better code.

As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.

2023-08-29  Jakub Jelinek  

PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb.  Don't match it though if it could be
matched later on 4 argument addition/subtraction.

* gcc.target/i386/pr79173-12.c: New test.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #33 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac

commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac
Author: Jakub Jelinek 
Date:   Tue Aug 29 10:46:01 2023 +0200

tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs).  Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.

As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out.  The following patch matches that, e.g.
  _16 = .ADD_OVERFLOW (_1, _2);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _15 = _3 + _4;
  _12 = _15 + _18;
into
  _16 = .UADDC (_1, _2, 0);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _19 = .UADDC (_3, _4, _18);
  _12 = IMAGPART_EXPR <_19>;
so that we can emit better code.

As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.

2023-08-29  Jakub Jelinek  

PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb.  Don't match it though if it could be
matched later on 4 argument addition/subtraction.

* gcc.target/i386/pr79173-12.c: New test.

[Bug rtl-optimization/110034] The first popped allcono doesn't take precedence over later popped in ira coloring

2023-08-29 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110034

--- Comment #5 from HaoChen Gui  ---
(In reply to Vladimir Makarov from comment #4)
> Thank you for providing the test case.
> 
> To be honest I don't see why assigning to hr3 to r134 is better.
> Currently we have the following assignments:
> 
> hr9->r134; hr3->r173; hr3->r124
> 
> and the related preferences:
> 
>   cp11:a18(r134)<->a29(r173)@125:shuffle
>   pref3:a29(r173)<-hr3@2000
>   pref4:a0(r124)<-hr3@125
> 
> This removes cost 2000 (pref3) and cost 125 (pref4) and adds cost 125
> (cp11).  The profit is 2000
> 
> If we started with r173, we would have the following assignments:
> 
> hr3->r173; hr3->r134; ->r124
> 
> This would remove cost 2000 (pref3) and cost 125 (cp11) and add cost
> 125 (pref).  The profit would be the same 2000.
> 
> Choice of heuristics is very time consuming.  I spent a lot of time to
> try and benchmark numerous ones.  I clearly remember that introduction
> of pseudo threads for colorable busket gave visible performance
> improvement.  Currently we assign pseudos from a thread with the
> biggest frequency first (r173 and r134) and a pseudo (r134) with the
> biggest frequency first from the same thread.  I think it is logical.
> 
> Also it is always possible to find a test (not this case) where
> heuristics give some undesirable results.  RA is NP-complete task even
> in the simplest formulation. We can not get the optimal solution for
> reasonable time.
> 
> Still I am open to change any heuristic if somebody can show that it
> improves performance for some credible benchmark (I prefer SPEC2007)
> on major GCC targets.

Thanks for your explanation. I agree with it. I also checked the assembly and
found there is no potential performance gain when r3 is assigned to r134. It
should be not a bug.

[Bug middle-end/110973] 9% 444.namd regression between g:c2a447d840476dbd (2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)

2023-08-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973

--- Comment #5 from Jan Hubicka  ---
Note that some (not all?) namd scores seems to be back to pre-regression
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=791.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=299.120.0
between 2a0b19f52596d75b (2023-08-07 00:16) and b0894a12e9e04dea (2023-08-10
13:29)

[Bug middle-end/110973] 9% 444.namd regression between g:c2a447d840476dbd (2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #6 from Richard Biener  ---
Likely r14-3440-ge80f7c13f64e10 fixed it.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 110973, which changed state.

Bug 110973 Summary: 9% 444.namd regression between g:c2a447d840476dbd 
(2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/111221] Floating point handling a*1.0 vs. a+0.0

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111221

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #3 from Xi Ruoyao  ---
FWIW if you want "add0" to be optimized use -fno-signed-zeros.

[Bug middle-end/110973] 9% 444.namd regression between g:c2a447d840476dbd (2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)

2023-08-29 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973

--- Comment #7 from Filip Kastl  ---
Not all measurements are back to pre-regression. The Ofast zen3 generic score
that Martin mentioned
(https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=466.120.0) is still
higher than before.

[Bug ipa/111157] [14 Regression] 416.gamess fails with a run-time abort when compiled with -O2 -flto after r14-3226-gd073e2d75d9ed4

2023-08-29 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57

--- Comment #8 from Jan Hubicka  ---
> This is what I wanted to ask about.  Looking at the dumps, ipa-modref
> knows it is "killed."  Is that enough or does it need to be also not
> read to be know to be useless?

The killed info means that the data does not need to be stored before
function call (since it will always be overwritten before reading).
So indeed that is what braks with ipa-cp/FRE transform now.

[Bug c++/111224] New: modules: xtreme-header-1_a.H etc. ICE (in core_vals, at cp/module.cc:6108) on AArch64

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111224

Bug ID: 111224
   Summary: modules: xtreme-header-1_a.H etc. ICE (in core_vals,
at cp/module.cc:6108) on AArch64
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xry111 at gcc dot gnu.org
  Target Milestone: ---

FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++17 (internal compiler error:
in core_vals, at cp/module.cc:6108)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_a.H module-cmi 
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header-1_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-1_b.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_c.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (internal compiler error:
in core_vals, at cp/module.cc:6108)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_a.H module-cmi 
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header-1_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-1_b.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_c.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2b (internal compiler error:
in core_vals, at cp/module.cc:6108)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_a.H module-cmi 
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header-1_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-1_b.C -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-1_c.C -std=c++2b (test for excess errors)

[Bug c++/111224] modules: xtreme-header-1_a.H etc. ICE (in core_vals, at cp/module.cc:6108) on AArch64

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111224

--- Comment #1 from Xi Ruoyao  ---
The stack trace in g++.log:

/home/xry111/git-repos/gcc/gcc/testsuite/g++.dg/modules/xtreme-header-1_a.H:
internal compiler error: in core_vals, at cp/module.cc:6108
0x9563a3 trees_out::core_vals(tree_node*)
../../gcc/gcc/cp/module.cc:6108
0x95a5af trees_out::tree_node_vals(tree_node*)
../../gcc/gcc/cp/module.cc:7218
0x95a5af trees_out::tree_value(tree_node*)
../../gcc/gcc/cp/module.cc:9083
0x954463 trees_out::tree_node(tree_node*)
../../gcc/gcc/cp/module.cc:9281
0x955423 trees_out::core_vals(tree_node*)
../../gcc/gcc/cp/module.cc:5998
0x958cbb trees_out::tree_node_vals(tree_node*)
../../gcc/gcc/cp/module.cc:7218
0x958cbb trees_out::fn_parms_init(tree_node*)
../../gcc/gcc/cp/module.cc:10189
0x952233 trees_out::decl_value(tree_node*, depset*)
../../gcc/gcc/cp/module.cc:7782
0x95d017 depset::hash::find_dependencies(module_state*)
../../gcc/gcc/cp/module.cc:13328
0x95df5b module_state::write_begin(elf_out*, cpp_reader*, module_state_config&,
unsigned int&)
../../gcc/gcc/cp/module.cc:17895
0x95f1ef finish_module_processing(cpp_reader*)
../../gcc/gcc/cp/module.cc:20237
0x8c8feb c_parse_final_cleanups()
../../gcc/gcc/cp/decl2.cc:5184
0xb95a0f c_common_parse_file()
../../gcc/gcc/c-family/c-opts.cc:1275

[Bug c++/111224] modules: xtreme-header-1_a.H etc. ICE (in core_vals, at cp/module.cc:6108) on AArch64

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111224

--- Comment #2 from Xi Ruoyao  ---
It seems related to Glibc version.

[Bug libstdc++/111162] signed integer overflow triggered by std::chrono::parse

2023-08-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62

--- Comment #1 from Jonathan Wakely  ---
Testing a patch ...

[Bug target/111119] maskload and maskstore for integer modes are oddly conditional on AVX2

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
.

[Bug target/85919] Incomplete transition to IFNs for scatter/gather support, drop vectorize.builtin_{gather,scatter} target hooks

2023-08-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85919

--- Comment #2 from Richard Biener  ---
See long thread at
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577577.html for the
attempt to fix this and how it failed.

[Bug c++/111224] modules: xtreme-header-1_a.H etc. ICE (in core_vals, at cp/module.cc:6108) on AArch64

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111224

Xi Ruoyao  changed:

   What|Removed |Added

   Keywords|needs-reduction |ice-on-valid-code

--- Comment #3 from Xi Ruoyao  ---
Reduced:

$ cat t.ii
typedef __SVBool_t __sv_bool_t;
void _ZGVsMxv_sin(__sv_bool_t);
$ ./git-repos/gcc-build/gcc/cc1plus t.ii -fmodule-header
 creating:./t.ii
t.ii: internal compiler error: in core_vals, at cp/module.cc:6108
0x800a27 trees_out::core_vals(tree_node*)
../../gcc/gcc/cp/module.cc:6108
0x803c93 trees_out::tree_node_vals(tree_node*)
../../gcc/gcc/cp/module.cc:7216
0x803c93 trees_out::tree_value(tree_node*)
../../gcc/gcc/cp/module.cc:9081
0x7ff4d3 trees_out::tree_node(tree_node*)
../../gcc/gcc/cp/module.cc:9279
0x7ffe6f trees_out::core_vals(tree_node*)
../../gcc/gcc/cp/module.cc:5998
0x8019ef trees_out::tree_node_vals(tree_node*)
../../gcc/gcc/cp/module.cc:7216
0x8019ef trees_out::fn_parms_init(tree_node*)
../../gcc/gcc/cp/module.cc:10187
0x80421b trees_out::decl_value(tree_node*, depset*)
../../gcc/gcc/cp/module.cc:7780
0x804c67 depset::hash::find_dependencies(module_state*)
../../gcc/gcc/cp/module.cc:13326
0x80515f module_state::write_begin(elf_out*, cpp_reader*, module_state_config&,
unsigned int&)
../../gcc/gcc/cp/module.cc:17893
0x80619f finish_module_processing(cpp_reader*)
../../gcc/gcc/cp/module.cc:20235
0x7aaf6b c_parse_final_cleanups()
../../gcc/gcc/cp/decl2.cc:5183
0x966a73 c_common_parse_file()
../../gcc/gcc/c-family/c-opts.cc:1266
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/111224] modules: xtreme-header-1_a.H etc. ICE (in core_vals, at cp/module.cc:6108) on AArch64

2023-08-29 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111224

Xi Ruoyao  changed:

   What|Removed |Added

  Known to fail||11.1.0, 11.4.0, 12.3.0

--- Comment #4 from Xi Ruoyao  ---
Not a regression.

[Bug tree-optimization/111015] [11/12/13/14 Regression] __int128 bitfields optimized incorrectly to the 64 bit operations

2023-08-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111015

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug target/111225] New: ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-08-29 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225

Bug ID: 111225
   Summary: ICE in curr_insn_transform, unable to generate reloads
for xor, since r14-2447-g13c556d6ae84be
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jamborm at gcc dot gnu.org
CC: crazylht at gmail dot com, fkastl at suse dot cz
  Target Milestone: ---
  Host: x86_64-linux
Target: x86_64-linux

On GCC 14 development master, since revision r14-2447-g13c556d6ae84be, I get
an ICE when compiling our testcase
gcc/testsuite/gcc.target/i386/avx-vptest-5.c with options -fsanitize=thread
-O1 -mforce-drap -mavx512cd:

$ ~/gcc/small/inst/bin/gcc
/home/mjambor/gcc/trunk/src/gcc/testsuite/gcc.target/i386/avx-vptest-5.c
-fsanitize=thread -O1 -mforce-drap -mavx512cd
/home/mjambor/gcc/trunk/src/gcc/testsuite/gcc.target/i386/avx-vptest-5.c: In
function ‘foo’:
/home/mjambor/gcc/trunk/src/gcc/testsuite/gcc.target/i386/avx-vptest-5.c:10:1:
error: unable to generate reloads for:
   10 | }
  | ^
(insn 11 10 12 2 (set (reg:V4DI 91)
(xor:V4DI (mem/c:V4DI (plus:DI (reg/f:DI 19 frame)
(const_int -80 [0xffb0])) [1 %sfp+-64 S32
A256])
(const_vector:V4DI [
(const_int -1 [0x]) repeated x4
])))
"/home/mjambor/gcc/trunk/src/gcc/testsuite/gcc.target/i386/avx-vptest-5.c":8:19
6853 {*one_cmplv4di2}   
 (expr_list:REG_DEAD (reg/v:V4DI 89 [ y ])
(nil)))
during RTL pass: reload
/home/mjambor/gcc/trunk/src/gcc/testsuite/gcc.target/i386/avx-vptest-5.c:10:1:
internal compiler error: in curr_insn_transform, at lra-constraints.cc:4259
0x7cd910 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/mjambor/gcc/small/src/gcc/rtl-error.cc:108
0x7a42c0 curr_insn_transform
/home/mjambor/gcc/small/src/gcc/lra-constraints.cc:4259
0xdcda7e lra_constraints(bool)
/home/mjambor/gcc/small/src/gcc/lra-constraints.cc:5430
0xdb99c2 lra(_IO_FILE*)
/home/mjambor/gcc/small/src/gcc/lra.cc:2396
0xd715f1 do_reload
/home/mjambor/gcc/small/src/gcc/ira.cc:5967
0xd715f1 execute
/home/mjambor/gcc/small/src/gcc/ira.cc:6153
Please submit a full bug report, with preprocessed source (by using
-freport-bug).


As noted earlier, I have bisected this down to:

commit 13c556d6ae84be3ee2bc245a56eafa58221de86a (HEAD)
Author: liuhongt 
Date:   Thu Jun 29 14:25:28 2023 +0800

Break false dependence for vpternlog by inserting vpxor or setting
constraint of input operand to '0'

False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.

gcc/ChangeLog:

PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov_constm1_pternlog_false_dep): New
define_insn.
(*_cvtmask2_pternlog_false_dep):
Ditto.
(*_cvtmask2_pternlog_false_dep):
Ditto.
(*_cvtmask2): Adjust to
define_insn_and_split to avoid false dependence.
(*_cvtmask2): Ditto.
(one_cmpl2): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot3): Ditto.
(iornot3): Ditto.
(*3): Ditto.

[Bug c++/111226] New: constexpr doesn't detect change of union to empty member

2023-08-29 Thread nathanieloshead at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111226

Bug ID: 111226
   Summary: constexpr doesn't detect change of union to empty
member
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nathanieloshead at gmail dot com
  Target Milestone: ---

While working on a patch for PR101631, I found that the following code is
currently incorrectly handled by GCC: (https://godbolt.org/z/1YevacMK3)


struct Empty {};

union U {
  int x;
  Empty e;
};

constexpr int foo() {
  U u{ 10 };
  u.e = {};
  return u.x;  // incorrectly accepted, even pre-C++20
}
constexpr auto y = foo();

constexpr Empty bar() {
  U u{ 10 };
  u.e = {};
  return u.e;  // incorrectly errors, thinks active member is still 'x'
}
constexpr auto x = bar();


The cause seems to be that the zero-sized trivial assignment is removed in
call.cc (since PR43075) and after constant folding is no longer in the tree
that the constexpr handling machinery receives.

[Bug target/102957] [riscv64] ICE on bogus -march value

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102957

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Tsukasa OI :

https://gcc.gnu.org/g:8b0662254cdac3e0b670c1c54752e1d43113b0f4

commit r14-3544-g8b0662254cdac3e0b670c1c54752e1d43113b0f4
Author: Tsukasa OI 
Date:   Fri Aug 11 06:09:34 2023 +

RISC-V: Make PR 102957 tests more comprehensive

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic
messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

[Bug tree-optimization/111015] [11/12/13/14 Regression] __int128 bitfields optimized incorrectly to the 64 bit operations

2023-08-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111015

--- Comment #5 from Jakub Jelinek  ---
Created attachment 55811
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55811&action=edit
gcc14-pr111015.patch

Untested fix.

[Bug middle-end/110973] 9% 444.namd regression between g:c2a447d840476dbd (2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)

2023-08-29 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973

--- Comment #8 from Filip Kastl  ---
(In reply to Filip Kastl from comment #7)
> Not all measurements are back to pre-regression. The Ofast zen3 generic
> score that Martin mentioned
> (https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=466.120.0) is
> still higher than before.

There's also these two:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=789.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=300.120.0

And these benchmarks haven't run yet since the commit that Richard thinks fixes
the regression:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=785.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=790.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=288.120.0

[Bug analyzer/105899] RFE: -fanalyzer could complain about misuses of standard C string APIs

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105899

--- Comment #12 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:f687fc1ff6d4a44db87a35e9e3be7f20425bdacc

commit r14-3549-gf687fc1ff6d4a44db87a35e9e3be7f20425bdacc
Author: David Malcolm 
Date:   Tue Aug 29 10:57:42 2023 -0400

analyzer: improve strdup handling [PR105899]

gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strdup::impl_call_pre): Set size of
dynamically-allocated buffer.  Simulate copying the string from
the source region to the new buffer.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* c-c++-common/analyzer/pr99193-2.c: Add
-Wno-analyzer-too-complex.
* gcc.dg/analyzer/strdup-1.c: Include "analyzer-decls.h".
(test_concrete_strlen): New.
(test_symbolic_strlen): New.

Signed-off-by: David Malcolm 

[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Carl Love :

https://gcc.gnu.org/g:14a3839c63d550957556d70e824a8293938646e6

commit r14-3550-g14a3839c63d550957556d70e824a8293938646e6
Author: Carl Love 
Date:   Tue Aug 29 11:19:40 2023 -0400

rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_, dfp_dquai_): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448

[Bug tree-optimization/110914] [11/12/13/14 Regression] Optimization eliminating necessary assignment before 0-byte memcpy since r10-5451

2023-08-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110914

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[11/12/13/14 Regression]|[11/12/13/14 Regression]
   |Optimization eliminating|Optimization eliminating
   |necessary assignment before |necessary assignment before
   |0-byte memcpy   |0-byte memcpy since
   ||r10-5451
 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Started with r10-5451-gef29b12cfbb4979a89b3c

C testcase for -O2:
__attribute__ ((noipa)) int
foo (const char *s, unsigned long l)
{
  unsigned char r = 0;
  __builtin_memcpy (&r, s, l != 0);
  return r;
}

int
main ()
{
  const char *p = "123456";
  int a = foo (p, __builtin_strlen (p) - 5);
  int b = foo (p, __builtin_strlen (p) - 6);
  if (a != '1')
__builtin_abort ();
  if (b != 0)
__builtin_abort ();
}

[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2023-08-29 Thread carll at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448

Carl Love  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||carll at gcc dot gnu.org

--- Comment #7 from Carl Love  ---
Patch to add built-ins for DFP quantize have been committed.

[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2023-08-29 Thread carll at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448

--- Comment #8 from Carl Love  ---
Status updated to resolved, fixed.

[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov

2023-08-29 Thread mwd at md5i dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827

Michael Duggan  changed:

   What|Removed |Added

  Attachment #55648|0   |1
is obsolete||

--- Comment #7 from Michael Duggan  ---
Created attachment 55812
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55812&action=edit
Better example

This example has both a coroutine and a non-coroutine with the same contents. 
Rather than using cout, it calls an empty function that is designed to not be
optimized away.

[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov

2023-08-29 Thread mwd at md5i dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827

--- Comment #8 from Michael Duggan  ---
Using the better test case, I have determined that the coroutine _is_ being
instrumented with gcov counters.  When disassembled, the output contains the
following in the bar() actor function:

Dump of assembler code for function bar(_Z3barv.Frame *):
...
   0x67a3 <+527>:   mov0x5286(%rip),%rax#
0xba30 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+112>
   0x67aa <+534>:   add$0x1,%rax
   0x67ae <+538>:   mov%rax,0x527b(%rip)#
0xba30 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+112>
=> 0x67b5 <+545>:   call   0x6359 <_Z5emptyv>
   0x67ba <+550>:   mov0x5277(%rip),%rax#
0xba38 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+120>
   0x67c1 <+557>:   add$0x1,%rax
   0x67c5 <+561>:   mov%rax,0x526c(%rip)#
0xba38 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+120>
   0x67cc <+568>:   call   0x6359 <_Z5emptyv>
   0x67d1 <+573>:   mov0x5268(%rip),%rax#
0xba40 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+128>
   0x67d8 <+580>:   add$0x1,%rax
   0x67dc <+584>:   mov%rax,0x525d(%rip)#
0xba40 <__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor+128>
   0x67e3 <+591>:   call   0x6359 <_Z5emptyv>


Therefore, the problem probably lies in either in the mapping from the counters
to the line numbers or in gcov itself, possibly by missing the "actor" version
of bar in favor of the ramp function.

I'll note the following entries in the symbol table, from readelf:

36: 7aa096 OBJECT  LOCAL  DEFAULT   27 __gcov0._Z3barv
37: 2594   943 FUNCLOCAL  DEFAULT   15
bar(bar()::_Z3barv.Frame*) [clone .actor]
38: 294383 FUNCLOCAL  DEFAULT   15
bar(bar()::_Z3barv.Frame*) [clone .destroy]
39: 79c0   216 OBJECT  LOCAL  DEFAULT   27
__gcov0._Z3barPZ3barvE13_Z3barv.Frame.actor
40: 79b016 OBJECT  LOCAL  DEFAULT   27
__gcov0._Z3barPZ3barvE13_Z3barv.Frame.destroy
   107: 23e2   434 FUNCGLOBAL DEFAULT   15 bar()

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2023-08-29 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #38 from Peter Bergner  ---
(In reply to Jakub Jelinek from comment #37)
> What happened with this patch?

It looks like David approved Roger's patch here:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586813.html

...but it was never committed upstream.

[Bug fortran/111218] Conflict in BIND(C) INTERFACEs in two Modules leads to ICE.

2023-08-29 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111218

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords|ice-on-invalid-code |ice-on-valid-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-08-29
 Ever confirmed|0   |1

--- Comment #2 from anlauf at gcc dot gnu.org ---
I get an ICE for all gcc >= 11 here, and I think the code is actually valid.
(Other compilers, like NAG, Intel, NVidia accept it).

Might be a regression, as my installs of 7 <= gcc <= 10 all pass.

[Bug fortran/111218] Conflict in BIND(C) INTERFACEs in two Modules leads to ICE.

2023-08-29 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111218

--- Comment #3 from anlauf at gcc dot gnu.org ---
A workaround is to add a 'private' statement to any of the first two modules.

[Bug tree-optimization/110914] [11/12/13/14 Regression] Optimization eliminating necessary assignment before 0-byte memcpy since r10-5451

2023-08-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110914

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 55813
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55813&action=edit
gcc14-pr110914.patch

Untested fix.

The bogus change in the above mentioned commit was the:
   if (olddsi != NULL
-  && tree_fits_uhwi_p (len)
   && !integer_zerop (len))
-adjust_last_stmt (olddsi, stmt, false);
+{
+  maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
+  adjust_last_stmt (olddsi, stmt, false);
+}
part.  I haven't analyzed what exactly maybe_warn_overflow does, it is some
warning stuff and perhaps can be called when len is not constant, but the
previous guarding of adjust_last_stmt was completely intentional.
As adjust_last_stmt function comment says:
If the last .MEM setter statement before STMT is
memcpy (x, y, strlen (y) + 1), the only .MEM use of it is STMT
and STMT is known to overwrite x[strlen (x)], adjust the last memcpy to
just memcpy (x, y, strlen (y)).  SI must be the zero length
strinfo.
so obviously the fact that memcpy (the second one) doesn't have last argument
constant 0 doesn't mean that it is non-zero length memcpy, we only know it
either if it is constant non-zero length, or variable where say value-range
could prove it is not zero.  We have an adjust_last_stmt call later in the
function which handles length of strlen (x) + 1 though.  So, this patch just
reverts the guard of that function back to what it was before.

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #3 from Andrew Pinski  ---
Just a few notes here.
canonicalize_cond_expr_cond does handle `a ^ b` .

gimple_cond_get_ops_from_tree/is_gimple_condexpr_1 does not but they do handle
TRUTH_NOT_EXPR which is shocking because that is only can come from fold ...

So forwprop and ifcombine all handle this correctly (since they use
canonicalize_cond_expr_cond) but others don't.

[Bug rtl-optimization/110093] [12/13/14 Regression][avr] Move frenzy leading to code bloat

2023-08-29 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110093

--- Comment #3 from Vladimir Makarov  ---
I worked on avr issues quite some time.  And here is my findings.
Before IRA we have start of BB2:

;; lr  in14 [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20]
21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] 34 [argL]
44 45 46

   33: r51:QI=r22:QI
   REG_DEAD r22:QI
   34: r52:QI=r23:QI
  REG_DEAD r23:QI
   35: r53:QI=r24:QI
  REG_DEAD r24:QI
   36: r54:QI=r25:QI
  REG_DEAD r25:QI
   37: r44:SI#0=r51:QI
  REG_DEAD r51:QI
   38: r44:SI#1=r52:QI
  REG_DEAD r52:QI
   39: r44:SI#2=r53:QI
  REG_DEAD r53:QI
   40: r44:SI#3=r54:QI
  REG_DEAD r54:QI

According GCC pseudo r44 conflicts with r51, r52 ...  In reality it is
not.  I could modify BB live analysis in IRA although it is a lot of
work.

But there is a bigger problem. A lot of passes including IRA uses
data-flow analysis framework for global life analysis and it does not
work on subreg level.  You can see that r44 still lives (lr in) at the
beginning of BB2.  DFA is not my responsibility but I can say
modifying DFA this way is a huge project as it will affect a lot of
targets.

Instead, as AVR regs are very small, I propose to avoid the above RTL
code by switching off subreg3 pass (or -fsplit-wide-types) for AVR by
default as it was for gcc-8.

There is still one minor problem: an additional reg-reg move generation for the
test case in comparison with gcc-8.  I'll try to fix it.

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #4 from Andrew Pinski  ---
This solve part of `bug 110637` too.

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2023-08-29 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #39 from Roger Sayle  ---
My apologies for dropping the ball on this patch (series)... My only access to
PowerPC hardware is/was via the GCC compile farm, which complicates things.

Shortly after David's approval, Segher enquired whether the patch could be
modified to also handle -mcpu=power10 (which represents carry differently):
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586868.html

Trying to (also) address this then openned up a rabbit hole/can of worms
related to how middle-end (and rs6000.md) represents overflow, which included a
combine patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586572.html

Soon after GCC entered stage 4 (or stage 3), and the above patches (and an
unsubmitted one for power10) simply got lost in the backlog.  I believe this
patch is sound, but unfortunately I don't have the bandwidth/patience to
(re)check it against mainline on (multiple variants of) rs6000.

If one of the IBM folks could take it from here, that'd be much appreciated.

[Bug testsuite/111228] New: [14 regression] gcc.target/powerpc/vsx-extract-6.c fails after r14-3381-g27de9aa152141e

2023-08-29 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111228

Bug ID: 111228
   Summary: [14 regression] gcc.target/powerpc/vsx-extract-6.c
fails after r14-3381-g27de9aa152141e
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:27de9aa152141e7f3ee66372647d0f2cd94c4b90, r14-3381-g27de9aa152141e
make  -k check-gcc
RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/vsx-extract-6.c"
FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-times \\mxxpermdi\\M 1
FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-not \\mxxspltib\\M
# of expected passes10
# of unexpected failures2

Also

FAIL: gcc.target/powerpc/vsx-extract-7.c scan-assembler-times \\mxxpermdi\\M 1
FAIL: gcc.target/powerpc/vsx-extract-7.c scan-assembler-not \\mxxspltib\\M


commit 27de9aa152141e7f3ee66372647d0f2cd94c4b90 (HEAD, refs/bisect/bad)
Author: Richard Biener 
Date:   Wed Jul 12 15:01:47 2023 +0200

tree-optimization/94864 - vector insert of vector extract simplification

[Bug analyzer/111229] New: -fanalyzer confused about conditional operator branch name

2023-08-29 Thread medhefgo at web dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111229

Bug ID: 111229
   Summary: -fanalyzer confused about conditional operator branch
name
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: medhefgo at web dot de
  Target Milestone: ---

In the following test case, the analyzer has the conditional operator's
branches confused: It thinks the "true" branch is "false" and vice-versa.

$cat test.c
char test_true() {
  char *c, v = 1;
  if (v) {
c = (char *)0;
  } else {
c = "a";
  }
  return *c;
}
char test_false() {
  char *c, v = 1;
  c = v ? (char *)0 : "a";
  return *c;
}

$ gcc -o/dev/null -c test.c -fanalyzer 
test.c: In function ‘test_true’:
test.c:8:10: warning: dereference of NULL ‘c’ [CWE-476]
[-Wanalyzer-null-dereference]
8 |   return *c;
  |  ^~
  ‘test_true’: events 1-4
|
|3 |   if (v) {
|  |  ^
|  |  |
|  |  (1) following ‘true’ branch (when ‘v != 0’)...
|4 | c = (char *)0;
|  | ~
|  |   |
|  |   (2) ...to here
|  |   (3) ‘c’ is NULL
|..
|8 |   return *c;
|  |  ~~
|  |  |
|  |  (4) dereference of NULL ‘c’
|
test.c: In function ‘test_false’:
test.c:13:10: warning: dereference of NULL ‘c’ [CWE-476]
[-Wanalyzer-null-dereference]
   13 |   return *c;
  |  ^~
  ‘test_false’: events 1-5
|
|   12 |   c = v ? (char *)0 : "a";
|  |   ~~^
|  | |   |
|  | |   (1) following ‘false’ branch (when ‘v !=
0’)...
|  | |   (2) ...to here
|  | |   (3) ‘0’ is NULL
|  | (4) ‘c’ is NULL
|   13 |   return *c;
|  |  ~~  
|  |  |
|  |  (5) dereference of NULL ‘c’
|

https://godbolt.org/z/hTvn9PMb4

[Bug target/106562] PRU: Inefficient code for zero check of 64-bit (boolean) AND result

2023-08-29 Thread dimitar at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106562

--- Comment #6 from Dimitar Dimitrov  ---
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599276.html gives a good
analysis why deferring expansion decisions to the backend is preferred.

Most backends already define cstore patterns, so it would not be valuable to
add a generic code in emit_store_flag_int() as a fallback if cstore expansion
fails. Such fallback would simply not be utilized on most architectures.

Hence I intend do add a cstore pattern for PRU as a non-intrusive fix for this
PR.

[Bug libstdc++/104167] Implement C++20 std::chrono::utc_clock, std::chrono::tzdb etc.

2023-08-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104167

--- Comment #11 from Jonathan Wakely  ---
My best guess is that including  in those files causes a dependency on
std::chrono::tzdb::current_zone() which depends on
std::filesystem::read_symlink, which will pull in the symbols in
src/c++17/fs_ops.cc

Does arm-eabi build libstdc++.a with -ffunction-sections?

The tests should be built with -Wl,--gc-sections which combined with
-ffunction-sections should mean that the tests do not pull in symbols they
don't need.

I would *really* prefer not to have to split src/c++20/tzdb.cc and
src/c++17/fs_*.cc into dozens of separate files.

Please file a separate bug for these failures.

[Bug c++/111230] New: show explicit functions in possible candidates

2023-08-29 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111230

Bug ID: 111230
   Summary: show explicit functions in possible candidates
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

struct T {
   T() { } // #1
   explicit T(const T&) { } // #2
};

void
g ()
{
   T t{};
   throw t;
}

shows

h.C: In function ‘void g()’:
h.C:10:10: error: no matching function for call to ‘T::T(T)’
   10 |throw t;
  |  ^
h.C:2:4: note: candidate: ‘T::T()’
2 |T() { } // #1
  |^
h.C:2:4: note:   candidate expects 0 arguments, 1 provided
h.C:10:10: note:   in thrown expression
   10 |throw t;
  |  ^

but it never mentions #2.

[Bug c++/111230] show explicit functions in possible candidates

2023-08-29 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111230

Marek Polacek  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-08-29
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
   Keywords||diagnostic

--- Comment #1 from Marek Polacek  ---
Jason: "in add_candidates when we see an explicit constructor we could add it
to bad_fns instead of ignoring it"

[Bug analyzer/99860] RFE: analyzer does not respect "restrict"

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99860

--- Comment #3 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:034d99e81484fbb83f15da91ee1a744b9301b04f

commit r14-3556-g034d99e81484fbb83f15da91ee1a744b9301b04f
Author: David Malcolm 
Date:   Tue Aug 29 18:12:09 2023 -0400

analyzer: new warning: -Wanalyzer-overlapping-buffers [PR99860]

gcc/ChangeLog:
PR analyzer/99860
* Makefile.in (ANALYZER_OBJS): Add analyzer/ranges.o.

gcc/analyzer/ChangeLog:
PR analyzer/99860
* analyzer-selftests.cc (selftest::run_analyzer_selftests): Call
selftest::analyzer_ranges_cc_tests.
* analyzer-selftests.h (selftest::run_analyzer_selftests): New
decl.
* analyzer.opt (Wanalyzer-overlapping-buffers): New option.
* call-details.cc: Include "analyzer/ranges.h" and "make-unique.h".
(class overlapping_buffers): New.
(call_details::complain_about_overlap): New.
* call-details.h (call_details::complain_about_overlap): New decl.
* kf.cc (kf_memcpy_memmove::impl_call_pre): Call
cd.complain_about_overlap for memcpy and memcpy_chk.
(kf_strcat::impl_call_pre): Call cd.complain_about_overlap.
(kf_strcpy::impl_call_pre): Likewise.
* ranges.cc: New file.
* ranges.h: New file.

gcc/ChangeLog:
PR analyzer/99860
* doc/invoke.texi: Add -Wanalyzer-overlapping-buffers.

gcc/testsuite/ChangeLog:
PR analyzer/99860
* c-c++-common/analyzer/overlapping-buffers.c: New test.

Signed-off-by: David Malcolm 

[Bug libfortran/111022] ES0.0E0 format gave ES0.dE0 output with d too high.

2023-08-29 Thread john.harper at vuw dot ac.nz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111022

--- Comment #15 from john.harper at vuw dot ac.nz ---
My previous test program tried Ex0.0E0 output but not Ex0.0, where x is 
N,S, or absent. Below is a revised version which includes all 6 cases. 
It also tries EN and ES before trying E, with an error stop if an error
is detected. Below the program is its output from ifort, which I think 
is f2018-compliant, compiled with options -standard-semantics -check all

(All lines of gfortran output after the first appeared to be wrong.)

program testen0es0e0 ! EN0.0,ES0.0 good f2018, E0.0 bad, with E0 or not
   integer,parameter::p1 = kind(1e0), p2 = kind(1d0), &
p3 = selected_real_kind(precision(1.0_p2)+1), &
hp = selected_real_kind(precision(1.0_p3)+1), &
p4 = merge(hp,p3,hp>0) ! in gfortran p4 /= p3, in ifort p4 == p3
   character:: ens(3)*2=["EN","ES"," E"],e0(2)*2=["  ","E0"], fmt*14, &
msg*200
   integer iens,ie0,ios
   write(*,"(A,4(1X,I0))") 'real kinds',p1,p2,p3,p4
   do iens = 1,3
  do ie0 = 1,2
 fmt = "(A,1X,"//ens(iens)//"0.0"//e0(ie0)//")"
 write(*, fmt,iostat=ios,iomsg=msg) 'With '//fmt, 666.0_p1
 if(ios/=0) error stop trim(msg)
 write(*, fmt) 'With '//fmt, 666.0_p2
 write(*, fmt) 'With '//fmt, 666.0_p3
 if(p3/=p4) write(*, fmt) 'With '//fmt, 666.0_p4
  end do
   end do
end program testen0es0e0

real kinds 4 8 16 16
With (A,1X,EN0.0  ) 666.E+00
With (A,1X,EN0.0  ) 666.E+00
With (A,1X,EN0.0  ) 666.E+00
With (A,1X,EN0.0E0) 666.E+0
With (A,1X,EN0.0E0) 666.E+0
With (A,1X,EN0.0E0) 666.E+0
With (A,1X,ES0.0  ) 7.E+02
With (A,1X,ES0.0  ) 7.E+02
With (A,1X,ES0.0  ) 7.E+02
With (A,1X,ES0.0E0) 7.E+2
With (A,1X,ES0.0E0) 7.E+2
With (A,1X,ES0.0E0) 7.E+2
With (A,1X, E0.0  ) *
output conversion error, unit 6, file /dev/pts/0


-- John Harper, School of Mathematics and Statistics
Victoria Univ. of Wellington, PO Box 600, Wellington 6140, New Zealand.
e-mail john.har...@vuw.ac.nz

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #5 from Andrew Pinski  ---
So implementing this breaks forwprop ...
Which I kinda of expected.

But we can do the canonicalization inside fold_stmt_1 and that works better.

[Bug libfortran/111022] ES0.0E0 format gave ES0.dE0 output with d too high.

2023-08-29 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111022

--- Comment #16 from Jerry DeLisle  ---
(In reply to john.harper from comment #15)
> My previous test program tried Ex0.0E0 output but not Ex0.0, where x is 
> N,S, or absent. Below is a revised version which includes all 6 cases. 
> It also tries EN and ES before trying E, with an error stop if an error
> is detected. Below the program is its output from ifort, which I think 
> is f2018-compliant, compiled with options -standard-semantics -check all

Thanks John, test cases are always helpful.

[Bug middle-end/99098] invalid/missing -Wfree-nonheap-object warnings

2023-08-29 Thread pross at xvid dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99098

Peter Ross  changed:

   What|Removed |Added

 CC||pross at xvid dot org

--- Comment #2 from Peter Ross  ---
The following test case produces a -Wfree-nonheap-object false positive. I
argue that the memory being free'd is heap memory. It is offset by one to
accomodate the negative offset applied immediately after malloc.

```
#include 
char * knn_alloc()
{
char * w = malloc(sizeof(char));
if (!w)
return NULL;
return w - 1;
}
void knn_free(char * w)
{
free(w + 1);
}
int main()
{
char * w = knn_alloc();
if (!w)
return -1;

knn_free(w);
return 0;
}
```

```
$ gcc knn.c -save-temps
knn.c: In function ‘knn_free’:
knn.c:11:5: warning: ‘free’ called on pointer ‘w’ with nonzero offset 1
[-Wfree-nonheap-object]
   11 | free(w + 1);
  | ^~~

```

gcc --version: gcc (Debian 13.2.0-2) 13.2.0
uname -a: Linux computer 6.4.0-3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.4.11-1
(2023-08-17) x86_64 GNU/Linux

[Bug middle-end/99098] invalid/missing -Wfree-nonheap-object warnings

2023-08-29 Thread pross at xvid dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99098

--- Comment #3 from Peter Ross  ---
Created attachment 55814
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55814&action=edit
Test case -save-temps output

[Bug middle-end/99098] invalid/missing -Wfree-nonheap-object warnings

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99098

--- Comment #4 from Andrew Pinski  ---
(In reply to Peter Ross from comment #2)
> The following test case produces a -Wfree-nonheap-object false positive. I
> argue that the memory being free'd is heap memory. It is offset by one to
> accomodate the negative offset applied immediately after malloc.

Doing -1 on an allocated memory location is undefined because you can only have
the address of 0...size to be taken of the "object" according to the C
standard. So the warning might seem wrong but you have undefined code
happening.

[Bug middle-end/99098] invalid/missing -Wfree-nonheap-object warnings

2023-08-29 Thread pross at xvid dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99098

--- Comment #5 from Peter Ross  ---
The -1 occurs after checking the malloc()==0 case, so the negative offset is
only ever applied to addresses in [1..limit] range. Thanks for your time!

[Bug target/111225] ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-08-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225

--- Comment #1 from Hongtao.liu  ---
So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but
here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p.

case CT_SPECIAL_MEMORY:
  if (satisfies_memory_constraint_p (op, cn))
win = true;
  else if (spilled_pseudo_p (op))
win = true;
  break;

[Bug target/111225] ICE in curr_insn_transform, unable to generate reloads for xor, since r14-2447-g13c556d6ae84be

2023-08-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111225

--- Comment #2 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #1)
> So reload thought CT_SPECIAL_MEMORY is always win for spilled_pesudo_p, but
> here Br should be a vec_dup:mem which doesn't match spilled_pseduo_p.
>  
>   case CT_SPECIAL_MEMORY:
> if (satisfies_memory_constraint_p (op, cn))
>   win = true;
> else if (spilled_pseudo_p (op))
>   win = true;
> break;

vmBr constraint is ok as long as m is matched before Br, but here m in invalid
then exposed the problem.
The backend walkaround is disabling Br when m is not availble.

Or the middle-end fix should be removing win for spilled_pseudo_p (op) in
CT_SPECIAL_MEMORY.

[Bug target/111212] [13/14 Regression] internal compiler error: in extract_insn, at recog.cc:2791

2023-08-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111212

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org
 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Kewen Lin  ---
Thanks for reporting.  lxvl isn't valid for 32bit env, this is duplicated of
PR96762. Haochen had posted a patch to fix it:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628637.html, I've
reviewed and approved it, it should be landed soon.

*** This bug has been marked as a duplicate of bug 96762 ***

[Bug target/96762] ICE in extract_insn, at recog.c:2294 (error: unrecognizable insn)

2023-08-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96762

Kewen Lin  changed:

   What|Removed |Added

 CC||malat at debian dot org

--- Comment #5 from Kewen Lin  ---
*** Bug 111212 has been marked as a duplicate of this bug. ***

[Bug target/111212] [13/14 Regression] internal compiler error: in extract_insn, at recog.cc:2791

2023-08-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111212

--- Comment #4 from Kewen Lin  ---
btw, I think the field "known to work" isn't quite exact, at least I verified
it failed with powerpc64 gcc 12.3.0 with -m32, as which release PR96762 was
filed for, I'd expect it also fail for gcc 11.4.0.

[Bug testsuite/111228] [14 regression] gcc.target/powerpc/vsx-extract-6.c fails after r14-3381-g27de9aa152141e

2023-08-29 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111228

Peter Bergner  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-30
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Peter Bergner  ---
Confirmed.  The testsuite log shows for vsx-extract-6.c and vsx-extract-7.c:

gcc.target/powerpc/vsx-extract-6.c: \\mxxpermdi\\M found 2 times
FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-times \\mxxpermdi\\M 1
FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-not \\mvspltisw\\M

So we have an extra xxpermdi than we expected and we also have a vspltisw when
we expected none.  I haven't looked at whether the code is better or worse
though, to know whether we should just update the expected counts or whether
this is really a code quality regression.

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-08-29 Thread xuli1 at eswincomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

xuli1 at eswincomputing dot com  changed:

   What|Removed |Added

 CC||xuli1 at eswincomputing dot com

--- Comment #4 from xuli1 at eswincomputing dot com  ---
The gcc-13 branch also has the same issue
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61), can I backport this
patch to gcc-13?

[Bug libstdc++/91263] unordered_map and unordered_set operator== double key comparison causes exponential behavior

2023-08-29 Thread fdumont at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91263

François Dumont  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from François Dumont  ---
Can be close now.

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-08-29 Thread xuli1 at eswincomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

--- Comment #5 from xuli1 at eswincomputing dot com  ---
(In reply to xu...@eswincomputing.com from comment #4)
> The gcc-13 branch also has the same issue
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61), can I backport this
> patch to gcc-13?

@kito.ch...@gmail.com @juzhe.zh...@rivai.ai @dimi...@gcc.gnu.org

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-08-29 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #6 from Kito Cheng  ---
Ok for back port :)

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #6 from Andrew Pinski  ---
So g++.dg/vect/simd-bool-comparison-1.cc fails with the canonicalization ...
Looking into how to fix that too.

[Bug testsuite/111228] [14 regression] gcc.target/powerpc/vsx-extract-6.c fails after r14-3381-g27de9aa152141e

2023-08-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111228

--- Comment #2 from Kewen Lin  ---
(In reply to Peter Bergner from comment #1)
> Confirmed.  The testsuite log shows for vsx-extract-6.c and vsx-extract-7.c:
> 
> gcc.target/powerpc/vsx-extract-6.c: \\mxxpermdi\\M found 2 times
> FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-times \\mxxpermdi\\M
> 1
> FAIL: gcc.target/powerpc/vsx-extract-6.c scan-assembler-not \\mvspltisw\\M
> 
> So we have an extra xxpermdi than we expected and we also have a vspltisw
> when we expected none.  I haven't looked at whether the code is better or
> worse though, to know whether we should just update the expected counts or
> whether this is really a code quality regression.

The commit makes the vsx-extract-6.c end up with:

test_vpasted:
.LFB0:
.cfi_startproc
xxspltib 0,0
xxpermdi 34,34,0,1
xxpermdi 34,34,35,1
blr

instead of (the original expected):

test_vpasted:
.LFB0:
.cfi_startproc
xxpermdi 34,34,35,1
blr

I think it's a code quality regression. The optimized gimple IR is changed to:

__vector unsigned long long test_vpasted (__vector unsigned long long high,
__vector unsigned long long low)
{
  __vector unsigned long long res;

   [local count: 1073741824]:
  res_3 = VEC_PERM_EXPR ;
  res_5 = VEC_PERM_EXPR ;
  return res_5;

}

from:

__vector unsigned long long test_vpasted (__vector unsigned long long high,
__vector unsigned long long low)
{
  __vector unsigned long long res;
  long long unsigned int _1;
  long long unsigned int _2;

   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  res_5 = BIT_INSERT_EXPR ;
  _2 = BIT_FIELD_REF ;
  res_7 = BIT_INSERT_EXPR ;
  return res_7;

}

For gimple IRs:

  res_3 = VEC_PERM_EXPR ;
  res_5 = VEC_PERM_EXPR ;

I'd expect it can be further optimized into

  res_5 = VEC_PERM_EXPR ;

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #7 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #6)
> So g++.dg/vect/simd-bool-comparison-1.cc fails with the canonicalization ...
> Looking into how to fix that too.

The difference is:
  _3 = _1 ^ c.0_2;
  cstore_10 = _3 ? 0.0 : 1.0e+0;


vs:
  _3 = _1 != c.0_2;
  cstore_10 = _3 ? 0.0 : 1.0e+0;


Original:
/app/example.cpp:12:17: note:   === vect_determine_precisions ===
/app/example.cpp:12:17: note:   using boolean precision 8 for _3 = _1 != c.0_2;

New:
t.cc:14:17: note:   === vect_determine_precisions ===
t.cc:14:17: note:   using normal nonmask vectors for _3 = _1 ^ c.0_2;

[Bug target/108728] gcc.dg/torture/float128-cmp-invalid.c fails on power 9 BE

2023-08-29 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108728

HaoChen Gui  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from HaoChen Gui  ---
Fixed by xfail the test case.

[Bug target/109725] [14 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4430

2023-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109725

--- Comment #7 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Li Xu :

https://gcc.gnu.org/g:b81d476756a1f17617f0837761785c4b5d1d195d

commit r13-7766-gb81d476756a1f17617f0837761785c4b5d1d195d
Author: Dimitar Dimitrov 
Date:   Mon Jun 5 21:39:16 2023 +0300

riscv: Fix scope for memory model calculation

During libgcc configure stage for riscv32-none-elf, when
"--enable-checking=yes,rtl" has been activated, the following error
is observed:

  during RTL pass: final
  conftest.c: In function 'main':
  conftest.c:16:1: internal compiler error: RTL check: expected code
'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462
 16 | }
| ^
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*,
int, char const*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8ea823 riscv_print_operand
 
/mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462
  0xde84b5 output_operand(rtx_def*, int)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632
  0xde8ef8 output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544
  0xded33b output_asm_insn(char const*, rtx_def**)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421
  0xded33b final_scan_insn_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841
  0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887
  0xded8b7 final_1
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979
  0xdee518 rest_of_handle_final
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240
  0xdee518 execute
  /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318

Fix by moving the calculation of memmodel to the cases where it is used.

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.

PR target/109725

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Signed-off-by: Dimitar Dimitrov 

[Bug target/111161] [13 Regression] ICE: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4394 during build

2023-08-29 Thread xuli1 at eswincomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61

xuli1 at eswincomputing dot com  changed:

   What|Removed |Added

 CC||xuli1 at eswincomputing dot com

--- Comment #1 from xuli1 at eswincomputing dot com  ---
backport 
https://github.com/gcc-mirror/gcc/commit/7f26e76c9848aeea9ec10ea701a6168464a4a9c2
to gcc-13, should be fixed now.

[Bug tree-optimization/105490] unvectorized loop due to bool condition loaded from memory and different size data

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105490

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=49

--- Comment #1 from Andrew Pinski  ---
So what is interesting is we do handle:
```
#define N 256
typedef short T;
extern T a[N];
extern T b[N];
extern T c[N];
extern _Bool pb[N];
extern _Bool pb1[N];

void predicate_by_bool_ne()
{
  for (int i = 0; i < N; i++)
c[i] = pb[i] != pb1[i] ? a[i] : b[i];
}
```

But not:
```
#define N 256
typedef short T;
extern T a[N];
extern T b[N];
extern T c[N];
extern _Bool pb[N];
extern _Bool pb1[N];

void predicate_by_bool_and()
{
  for (int i = 0; i < N; i++)
c[i] = (pb[i] & pb1[i]) ? a[i] : b[i];
}
```

And If I change the canonical form of `bool != bool` into `bool ^ bool` things
break down in a similar way. I tried to look into a reasonible way of handling
this in the vectorizer but I could not figure out how to treat `^` in the same
way as `!=`.

[Bug tree-optimization/111149] bool0 != bool1 should be expanded as bool0 ^ bool1

2023-08-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49

--- Comment #8 from Andrew Pinski  ---
Created attachment 55815
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55815&action=edit
What I have so far

[Bug target/111231] New: armhf: Miscompilation at O2 level

2023-08-29 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Bug ID: 111231
   Summary: armhf: Miscompilation at O2 level
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: malat at debian dot org
  Target Milestone: ---

highway test suite is currently failing on Debian/armhf with GCC 13.2.0.

350/502 Test #350:
HwyMulTestGroup/HwyMulTest.TestAllSatWidenMulPairwiseAdd/EMU128  # GetParam() =
2305843009213693952 Subprocess aborted***Exception:   0.03
sec
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter =
HwyMulTestGroup/HwyMulTest.TestAllSatWidenMulPairwiseAdd/EMU128
[==] Running 1 test from 1 test suite.
[--] Global test environment set-up.
[--] 1 test from HwyMulTestGroup/HwyMulTest
[ RUN  ] HwyMulTestGroup/HwyMulTest.TestAllSatWidenMulPairwiseAdd/EMU128


i16x8 expect [0+ ->]:
  0x,0x0002,0x,0x,0x,0x,0x,
i16x8 actual [0+ ->]:
  0x,0x0004,0x,0x,0x,0x,0x,
Abort at ./hwy/tests/mul_test.cc:587: EMU128, i16x8 lane 1 mismatch: expected
'0x0002', got '0x0004'.

[Bug target/111231] armhf: Miscompilation at O2 level

2023-08-29 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #1 from Mathieu Malaterre  ---
Created attachment 55816
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55816&action=edit
Preprocessed source

% /usr/bin/c++ -save-temps -DHWY_STATIC_DEFINE -I/home/malat/highway -O2 -g
-DNDEBUG -fPIE -fvisibility=hidden -fvisibility-inlines-hidden
-Wno-builtin-macro-redefined -D__DATE__=\"redacted\"
-D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants
-Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor
-fmath-errno -fno-exceptions -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard
-mfp16-format=ieee -DHWY_IS_TEST=1 -DGTEST_HAS_PTHREAD=1 -MD -MT
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -MF
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o.d -o
CMakeFiles/mul_test.dir/hwy/tests/mul_test.cc.o -c
/home/malat/highway/hwy/tests/mul_test.cc

[Bug target/111231] armhf: Miscompilation at O2 level

2023-08-29 Thread malat at debian dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #2 from Mathieu Malaterre  ---
reported upstream as:
* https://github.com/google/highway/issues/1683