[Bug tree-optimization/103995] [11/12 Regression] conj() ignored with tree loop vectorizer

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-13
   Keywords||needs-bisection
 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
 Blocks||53947
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from Richard Biener  ---
Confirmed on the GCC 11 branch head, I suspect it's something to do with SLP
patterns and I cannot reproduce with recent GCC 12.

Tamar, are there any fixes that need backporting in this area?

Martin, can you bisect what fixed it on trunk?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org
   Last reconfirmed||2022-01-13

--- Comment #4 from Richard Biener  ---
I think

  /* If the target does not support partial vectors we can shorten the
 number of modes to analyze for the epilogue as we know we can't pick a
 mode that has at least as many NUNITS as the main loop's vectorization
 factor, since that would imply the epilogue's vectorization factor
 would be at least as high as the main loop's and we would be
 vectorizing for more scalar iterations than there would be left.  */
  if (!supports_partial_vectors
  && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
{

is completely bogus - for -1b.c we autodetect V8SImode and first_vinfo_vf is 8
so we skip V8SImode which is OK.  But then vector_modes[1] is V32QImode,
it doesn't make sense to compare NUNITS of a vector mode with the VF of a loop
(or NUNITS of two vector modes with different element mode).

Previously we started from mode_i of the first loop or what it would consider
as next mode which I suppose provided this "skipping" in case the mode
array is sorted after VF (and not preference).

That said, for this case we do nothing until we hit V4QImode which obviously
cannot be used to vectorize a double.

[Bug tree-optimization/103998] [12 Regression] Recent vectorizer testsuite regressions on x86 since r12-6420 and r12-6523

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103998

--- Comment #6 from Richard Biener  ---
The issue is likely the same as in PR103997, I've commented there.

[Bug tree-optimization/103999] Vectorizer failed to reduce sum with conversion.

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103999

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947
   Keywords||missed-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2022-01-13
 Status|UNCONFIRMED |NEW

--- Comment #2 from Richard Biener  ---
t.c:4:23: note:   reduction path: ans_15 _7 _9 _4 ans_18
t.c:4:23: note:   reduction: unknown pattern

We do handle integer sign conversions by "ignoring" them but with float
truncations/extensions we have to be more careful.

We start with

  ((double)(float)((double)(float)(array[0] + 1.) + array[1] + 1.) + array[2] +
1.) ...

and would need to associate that in some way.  Ideally we'd arrange for
doing the reduction as 'double' and only truncate the final value getting
us "only" extra precision.  But that would require major surgery in
reduction handling.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #3 from Hongtao.liu  ---
for patt_42 = () patt_40;

vectype_in (QImode:nunits 4)

 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18ddc8 precision:1 min  max >
QI size  unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18de70 nunits:4>

vectype_out(HImode)

 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18ddc8 precision:1 min  max >
HI
size  constant 16>
unit-size  constant 2>
align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea18df18 nunits:16>

And ‘vec_pack_sbool_trunc_m’ only handle situation when input and output have
same mode.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #4 from Hongtao.liu  ---

(In reply to Hongtao.liu from comment #3)
> for patt_42 = () patt_40;
> 
> vectype_in (QImode:nunits 4)
> 
>   type  size 
> unit-size 
> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18ddc8 precision:1 min  max
> >
> QI size  unit-size  0x7fffea2e2e58 1>
> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18de70 nunits:4>
> 
> vectype_out(HImode)
> 
>   type  size 
> unit-size 
> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18ddc8 precision:1 min  max
> >
> HI
> size  bitsizetype> constant 16>
> unit-size  sizetype> constant 2>
> align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffea18df18 nunits:16>
> 
> And ‘vec_pack_sbool_trunc_m’ only handle situation when input and output
> have same mode.

And GCC vectorizer only handle 2X elements, but not 4X,8X,...

  /* For scalar masks we may have different boolean
 vector types having the same QImode.  Thus we
 add additional check for elements number.  */
 if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,  
  TYPE_VECTOR_SUBPARTS (narrow_vectype)))

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #5 from Hongtao.liu  ---

> And GCC vectorizer only handle 2X elements, but not 4X,8X,...
> 
>   /* For scalar masks we may have different boolean
>vector types having the same QImode.  Thus we
>add additional check for elements number.  */
>  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
>   TYPE_VECTOR_SUBPARTS (narrow_vectype)))


We can have vec_pack_trunc_qi + vec_pack_sbool_trunc_qi by extending
vec_pack_sbool_trunc_qi to handle 8/4, 4/2, 2/1 vectype_out/vectype_in
elements.

[Bug tree-optimization/103995] [11/12 Regression] conj() ignored with tree loop vectorizer

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

Martin Liška  changed:

   What|Removed |Added

   Keywords|needs-bisection |

--- Comment #5 from Martin Liška  ---
Fixed on master with r12-2573-g3c91efec15af4f92.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #5 from avieira at gcc dot gnu.org ---
Yeah I made a mistake there using the vector_mode like that, since that vector
mode really only determines vector size (and vector ISA for aarch64).

I am almost finished testing a patch that instead goes through the
'used_vector_modes' to find the largest element for all used vector modes, then
use related_vector_mode to get the vector mode for that element with the same
size as the current vector_mode[mode_i]. That would give us the lowest possible
VF for that loop and vector size.

Should be posting the fix soon.

[Bug target/104001] New: [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001

Bug ID: 104001
   Summary: [12 Regression] ICE in extract_insn, at recog.c:2769
since
r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: haochen.jiang at intel dot com
  Target Milestone: ---

The following crashes:

$ cat math.ii
long select_mask, select_from0, select_from1, bigint_cnd_abs_x_0,
bigint_cnd_abs_size___trans_tmp_1 =
select_from0 & select_mask | select_from1 & ~select_mask;

$ g++ math.ii -c -march=znver1 -O2 -fmax-errors=1
math.ii: In function ‘(static initializers for math.ii)’:
math.ii:3:65: error: unrecognizable insn:
3 | select_from0 & select_mask | select_from1 & ~select_mask;
  | ^
(insn 15 14 0 2 (parallel [
(set (mem/c:DI (symbol_ref:DI ("bigint_cnd_abs_size___trans_tmp_1")
[flags 0x2] ) [1
bigint_cnd_abs_size___trans_tmp_1+0 S8 A64])
(ior:DI (reg:DI 92)
(reg:DI 93)))
(clobber (reg:CC 17 flags))
]) "math.ii":3:36 -1
 (expr_list:REG_DEAD (reg:DI 93)
(expr_list:REG_DEAD (reg:DI 92)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
during RTL pass: ira
math.ii:3:65: internal compiler error: in extract_insn, at recog.c:2769
0x218a6a8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/marxin/Programming/gcc/gcc/rtl-error.c:108
0x218a6ca _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/marxin/Programming/gcc/gcc/rtl-error.c:116
0x21584d3 extract_insn(rtx_insn*)
/home/marxin/Programming/gcc/gcc/recog.c:2769
0x1f4bb13 ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def*
(*)(rtx_def*))
/home/marxin/Programming/gcc/gcc/ira.c:5350
0x1f4cd3e remove_scratches
/home/marxin/Programming/gcc/gcc/ira.c:5394
0x1f4cd3e ira
/home/marxin/Programming/gcc/gcc/ira.c:5718
0x1f4cd3e execute
/home/marxin/Programming/gcc/gcc/ira.c:6077
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/104002] New: ICE ‘verify_gimple’ failed since r12-1128-gef8176e0fac935c0

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104002

Bug ID: 104002
   Summary: ICE ‘verify_gimple’ failed since
r12-1128-gef8176e0fac935c0
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Fails since introduction of the feature:

$ gcc
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c
-O2 -c
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c:
In function ‘foo’:
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c:16:1:
error: invalid RHS for gimple memory store: ‘bit_field_ref’
   16 | }
  | ^
D.1983

BIT_FIELD_REF <_1, 64, 0>;

# .MEM_4 = VDEF <.MEM_3(D)>
D.1983 = BIT_FIELD_REF <_1, 64, 0>;
during GIMPLE pass: ssa
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c:16:1:
internal compiler error: verify_gimple failed
0x1ee78a5 verify_gimple_in_cfg(function*, bool)
/home/marxin/Programming/gcc/gcc/tree-cfg.c:5559
0x1d4149e execute_function_todo
/home/marxin/Programming/gcc/gcc/passes.c:2084
0x1d41abb execute_todo
/home/marxin/Programming/gcc/gcc/passes.c:2138
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-01-13
 CC||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #6 from Richard Biener  ---
(In reply to Tamar Christina from comment #1)
> Looks like the change causes the simpler conditional to be detected by the
> vectorizer as a masked operation, which in principle makes sense:
> 
> note:   vect_recog_mask_conversion_pattern: detected: iftmp.0_21 = x.1_14 >
> 255 ? iftmp.0_19 : iftmp.0_20;
> note:   mask_conversion pattern recognized: patt_43 = patt_42 ? iftmp.0_19 :
> iftmp.0_20;
> note:   extra pattern stmt: patt_40 = x.1_14 > 255;
> note:   extra pattern stmt: patt_42 = () patt_40;

if we look at the ifcvt result we see

  iftmp.0_19 = (unsigned char) _18;
  iftmp.0_20 = (unsigned char) _5;
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;
  *_6 = iftmp.0_21;

I suspect we intended to carry the fact that x.1_14 > 255 will produce
a mask from a SImode vector element compare but we need a mask suitable
for a QImode vector element select (iftmp.0_19 and iftmp.0_20).  That's
not something we can express in scalar code and thus a pattern I think.

So the pattern as generated doesn't make very much sense to me.  I suppose
it might try to convert a AVX512 mask to a AVX2 style mask but that needs
to be done with

  patt_42 = patt_40 ? ()-1 : 0;

not with a conversion.  But it's also somewhat pointless since it will
simply cause the same issue again - the COND_EXPR vectorization will need
to cobble up 4 AVX512 masks to produce the desired result.

[Bug target/104003] New: [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6488-g820ac79e8448ad6c

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003

Bug ID: 104003
   Summary: [12 Regression] ICE in extract_insn, at recog.c:2769
since r12-6488-g820ac79e8448ad6c
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: ubizjak at gmail dot com
  Target Milestone: ---

The following fails:

$ g++ /home/marxin/Programming/gcc/gcc/testsuite/g++.target/i386/pr103861-1.C
-mxop -c
/home/marxin/Programming/gcc/gcc/testsuite/g++.target/i386/pr103861-1.C: In
function ‘__v2qu uu(__v2qu, __v2qu)’:
/home/marxin/Programming/gcc/gcc/testsuite/g++.target/i386/pr103861-1.C:11:60:
error: unrecognizable insn:
   11 | __v2qu uu (__v2qu a, __v2qu b) { return (a > b) ? au : bu; }
  |^
(insn 12 11 15 2 (set (reg:V2QI 84 [ _7 ])
(if_then_else:V2QI (reg:V2QI 84 [ _7 ])
(reg:V2QI 82 [ au.0_2 ])
(reg:V2QI 83 [ bu.1_3 ])))
"/home/marxin/Programming/gcc/gcc/testsuite/g++.target/i386/pr103861-1.C":11:56
-1
 (nil))
during RTL pass: vregs
/home/marxin/Programming/gcc/gcc/testsuite/g++.target/i386/pr103861-1.C:11:60:
internal compiler error: in extract_insn, at recog.c:2769
0x218a6a8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/marxin/Programming/gcc/gcc/rtl-error.c:108
0x218a6ca _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/marxin/Programming/gcc/gcc/rtl-error.c:116
0x21584d3 extract_insn(rtx_insn*)
/home/marxin/Programming/gcc/gcc/recog.c:2769
0x1c99f83 instantiate_virtual_regs_in_insn
/home/marxin/Programming/gcc/gcc/function.c:1611
0x1c99f83 instantiate_virtual_regs
/home/marxin/Programming/gcc/gcc/function.c:1985
0x1c99f83 execute
/home/marxin/Programming/gcc/gcc/function.c:2034
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/100280] ICE in lower_omp_target, at omp-low.c:12287

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100280

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:9b32c1669aad5459dd053424f9967011348add83

commit r12-6542-g9b32c1669aad5459dd053424f9967011348add83
Author: Thomas Schwinge 
Date:   Thu Dec 16 22:02:37 2021 +0100

OpenACC 'kernels' decomposition: Mark variables used in synthesized data
clauses as addressable [PR100280]

... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a
temporary:

13073   else if (is_gimple_reg (var))
13074 {
13075   gcc_assert (offloaded);
13076   tree avar = create_tmp_var (TREE_TYPE
(var));
13077   mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not
trivial
to make work, so let's just avoid this case.

gcc/
PR middle-end/100280
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Mark variables used in synthesized data clauses as addressable.
gcc/testsuite/
PR middle-end/100280
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
'--param openacc-kernels=decompose'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Update.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
'--param openacc-kernels=decompose'.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
libgomp/
PR middle-end/100280
*
testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Update.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.

Suggested-by: Julian Brown 

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #7 from Richard Biener  ---
Forcing the pattern to not trigger produces the expected

t.c:8:6: missed:   not vectorized: relevant stmt not supported: iftmp.0_21 =
x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;

since condition vectorization itself doesn't know how to handle this, we end
up at

  if (vectype1 && !useless_type_conversion_p (vectype, vectype1))
return false;

with vectype V32QI and vectype1 V8SI.

Splitting out the compare from the COND_EXPR in the pattern but leaving out
the attempt to "widen" it reveals the same fact that vectorizable_condition
doesn't support packing of multiple vector defs for the mask operand.

I think that is what we need to add.  We also don't have a good representation
for "packing" of masks.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 3ea905538e1..729a1d32612 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
  rhs1_type);
}

-  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
-   TYPE_VECTOR_SUBPARTS (vectype2)))
+  /* AVX512 style masks cannot be packed/unpacked.  */
+  if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
+ && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
+  TYPE_VECTOR_SUBPARTS (vectype2)))
tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
   else
tmp = rhs1;

[Bug tree-optimization/103995] [11/12 Regression] conj() ignored with tree loop vectorizer

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener  ---
Mine then.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #8 from Hongtao.liu  ---
(In reply to Richard Biener from comment #7)
> Forcing the pattern to not trigger produces the expected
> 
> t.c:8:6: missed:   not vectorized: relevant stmt not supported: iftmp.0_21 =
> x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;
> 
> since condition vectorization itself doesn't know how to handle this, we end
> up at
> 
>   if (vectype1 && !useless_type_conversion_p (vectype, vectype1))
> return false;
> 
> with vectype V32QI and vectype1 V8SI.
> 
> Splitting out the compare from the COND_EXPR in the pattern but leaving out
> the attempt to "widen" it reveals the same fact that vectorizable_condition
> doesn't support packing of multiple vector defs for the mask operand.
> 
> I think that is what we need to add.  We also don't have a good
> representation
> for "packing" of masks.
> 
shouldn't multi_step_cvt be supposed to handle this, just a little ambiguous
between ‘vec_pack_sbool_trunc_m’ and ‘vec_pack_trunc_m’, need to make
vectorizer use vec_pack_sbool_trunc_qi + vec_pack_trunc_qi to get a HI mask,
just like how we pack 4 QImode to 1 SImode mask by vec_pack_trunc_qi +
vec_pack_trunc_hi (in the main loop with -mprefer-vector-width=256)

 mask_patt_40.26_118 = vect_x.18_95 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_119 = vect_x.18_96 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_120 = vect_x.18_97 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_40.26_121 = vect_x.18_98 > { 255, 255, 255, 255, 255, 255, 255, 255
};
  mask_patt_42.28_122 = VEC_PACK_TRUNC_EXPR ;
  mask_patt_42.28_123 = VEC_PACK_TRUNC_EXPR ;
  mask_patt_42.27_124 = VEC_PACK_TRUNC_EXPR ;
  vect_patt_43.29_125 = VEC_COND_EXPR ;
  iftmp.0_21 = x.1_14 > 255 ? iftmp.0_19 : iftmp.0_20;

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #7)
> I think that is what we need to add.  We also don't have a good
> representation
> for "packing" of masks.
> 
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 3ea905538e1..729a1d32612 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>   rhs1_type);
> }
>  
> -  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> -   TYPE_VECTOR_SUBPARTS (vectype2)))
> +  /* AVX512 style masks cannot be packed/unpacked.  */
> +  if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
> + && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> +  TYPE_VECTOR_SUBPARTS (vectype2)))
> tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
>else
> tmp = rhs1;
Haven't had time to look at it properly yet, but my first impression
is that that's likely to regress SVE.  Packing and unpacking are
natural operations on boolean vector modes.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #6 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, avieira at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997
> 
> avieira at gcc dot gnu.org changed:
> 
>What|Removed |Added
> 
>  CC||avieira at gcc dot gnu.org
> 
> --- Comment #5 from avieira at gcc dot gnu.org ---
> Yeah I made a mistake there using the vector_mode like that, since that vector
> mode really only determines vector size (and vector ISA for aarch64).
> 
> I am almost finished testing a patch that instead goes through the
> 'used_vector_modes' to find the largest element for all used vector modes, 
> then
> use related_vector_mode to get the vector mode for that element with the same
> size as the current vector_mode[mode_i]. That would give us the lowest 
> possible
> VF for that loop and vector size.

That's of course only accurate in case the vectorization will happen
with the very same structure.  But since we re-do pattern detection
it might be we end up with a lower VF requirement even?  Guess we can
revisit that when it happens ...

It does sound like a reasonable heuristic though.

[Bug target/104003] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6488-g820ac79e8448ad6c

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug tree-optimization/104002] ICE ‘verify_gimple’ failed since r12-1128-gef8176e0fac935c0

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104002

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-01-13
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Richard Biener  ---
I'll have a look.

[Bug target/104001] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
   Priority|P3  |P1

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #10 from Hongtao.liu  ---
with
@@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
   c1 = VEC_PACK_TRUNC_EXPR;
   if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
  && VECTOR_BOOLEAN_TYPE_P (vectype)
- && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
+ && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
+ || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
  && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
optab1 = vec_pack_sbool_trunc_optab;
   else
@@ -12213,6 +12214,7 @@ supportable_narrowing_operation (enum tree_code code,
   if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
  && VECTOR_BOOLEAN_TYPE_P (prev_type)
  && intermediate_mode == prev_mode
+ && known_lt (TYPE_VECTOR_SUBPARTS (intermediate_type), BITS_PER_UNIT)
  && SCALAR_INT_MODE_P (prev_mode))
interm_optab = vec_pack_sbool_trunc_optab;
   else

-march=icelake-server -O3 -mprefer-vector-width=128 now can get vectorized
loop.


vmovdqu8(%rsi,%rax), %xmm0
vpmovzxbw   %xmm0, %xmm2
vpmovzxwd   %xmm2, %xmm1
vpsrldq $8, %xmm0, %xmm0
vpsrldq $8, %xmm2, %xmm2
vpmovzxbw   %xmm0, %xmm0
vpmovzxwd   %xmm2, %xmm2
vpmulld %xmm9, %xmm1, %xmm1
vpmulld %xmm9, %xmm2, %xmm2
vpmovzxwd   %xmm0, %xmm4
vpsrldq $8, %xmm0, %xmm0
vpmovzxwd   %xmm0, %xmm0
vpmulld %xmm9, %xmm4, %xmm4
vpmulld %xmm9, %xmm0, %xmm0
vpcmpud $6, %xmm6, %xmm1, %k0
vpsubd  %xmm1, %xmm7, %xmm3
vpcmpud $6, %xmm6, %xmm2, %k1
vpsubd  %xmm2, %xmm7, %xmm5
vpsrad  $31, %xmm5, %xmm5
vpsrad  $31, %xmm3, %xmm3
vpermt2w%xmm5, %xmm8, %xmm3
vpsubd  %xmm0, %xmm7, %xmm10
vpsubd  %xmm4, %xmm7, %xmm5
kshiftlb$4, %k1, %k1
vpcmpud $6, %xmm6, %xmm0, %k2
vpsrad  $31, %xmm5, %xmm5
vpsrad  $31, %xmm10, %xmm10
kandb   %k3, %k0, %k0
korb%k1, %k0, %k0
vpcmpud $6, %xmm6, %xmm4, %k1
vpermt2w%xmm10, %xmm8, %xmm5
vpermt2w%xmm2, %xmm8, %xmm1
vpermt2w%xmm0, %xmm8, %xmm4
vpermt2b%xmm5, %xmm11, %xmm3
vpermt2b%xmm4, %xmm11, %xmm1
kandb   %k3, %k1, %k1
kshiftlb$4, %k2, %k2
korb%k2, %k1, %k1
kunpckbw%k0, %k1, %k1
vmovdqu8%xmm3, %xmm1{%k1}
vmovdqu8%xmm1, (%rdi,%rax)
addq$16, %rax
cmpq%rax, %r8
jne .L4

[Bug tree-optimization/104002] ICE ‘verify_gimple’ failed since r12-1128-gef8176e0fac935c0

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104002

--- Comment #2 from Richard Biener  ---
Somewhat related to the recently fixed issue.  At -O0 we have invalid

  _1 = VEC_PERM_EXPR ;
  _3 = BIT_FIELD_REF <_1, 64, 0>;
  _5 = VIEW_CONVERT_EXPR(_3)[i_4(D)];

while with -O update_address_taken is run and sets DECL_NOT_GIMPLE_REG_P (var),
leading to

  _1 = VEC_PERM_EXPR ;
  D.1983 = BIT_FIELD_REF <_1, 64, 0>;
  _6 = VIEW_CONVERT_EXPR(D.1983)[i_5(D)];

note the non-register requirement is because of the variable index ARRAY_REF.
The decl should have been marked addressable but c_common_mark_addressable_vec
doesn't handle the TARGET_EXPR we build around the shufflevector result which
is the issue we have here.

[Bug target/104004] New: [12 Regression] ICE: in extract_insn, at recog.c:2769 (error: unrecognizable insn)

2022-01-13 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104004

Bug ID: 104004
   Summary: [12 Regression] ICE: in extract_insn, at recog.c:2769
(error: unrecognizable insn)
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-invalid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
CC: linkw at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
Target: powerpc-*-linux-gnu

1. gcc 12.0.0 20220109 snapshot (g:49d73c9fb644673323845efebfe6b3106e70af8a)
ICEs when compiling gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c w/
-mcpu=e300c2:

% powerpc-e300c3-linux-gnu-gcc-12.0.0 -mcpu=e300c2 -c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c: In function 'foo':
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c:9:1: error: unrecognizable insn:
9 | }
  | ^
(insn 15 14 16 2 (set (reg:DF 119)
(unspec_volatile:DF [
(const_int 0 [0])
] UNSPECV_MFFS))
"gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c":8:3 -1
 (nil))
during RTL pass: vregs
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c:9:1: internal compiler error: in
extract_insn, at recog.c:2769
0x6adda1 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/rtl-error.c:108
0x6addc1 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/rtl-error.c:116
0x6ac3cf extract_insn(rtx_insn*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/recog.c:2769
0xb23843 instantiate_virtual_regs_in_insn
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:1611
0xb23843 instantiate_virtual_regs
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:1985
0xb23843 execute
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:2034

2. After adding -O1 (in fact, w/ any optimization level) to the command line,
it fails to recognize UNSPECV_MTFSB0 instead:

% powerpc-e300c3-linux-gnu-gcc-12.0.0 -mcpu=e300c2 -O1 -c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c: In function 'foo':
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c:9:1: error: unrecognizable insn:
9 | }
  | ^
(insn 5 2 6 2 (unspec_volatile [
(const_int 31 [0x1f])
] UNSPECV_MTFSB0) "gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c":8:3
-1
 (nil))
during RTL pass: vregs
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c:9:1: internal compiler error: in
extract_insn, at recog.c:2769
0x6adda1 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/rtl-error.c:108
0x6addc1 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/rtl-error.c:116
0x6ac3cf extract_insn(rtx_insn*)
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/recog.c:2769
0xb2309c instantiate_virtual_regs_in_insn
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:1660
0xb2309c instantiate_virtual_regs
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:1985
0xb2309c execute
   
/var/tmp/portage/cross-powerpc-e300c3-linux-gnu/gcc-12.0.0_p20220109/work/gcc-12-20220109/gcc/function.c:2034

Other builtins related to loading (storing) data from (to) FPSCR are also
affected.

gcc 11.2 rejects the code correctly w/ the following diagnostics:

% powerpc-e300c3-linux-gnu-gcc-11.2.0 -mcpu=e300c2 -O1 -c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c: In function 'foo':
gcc/testsuite/gcc.target/powerpc/mffscrni_p9.c:8:3: error:
'__builtin_set_fpscr_rn' not supported with '-msoft-float'
8 |   __builtin_set_fpscr_rn (val);
  |   ^~~~

[Bug target/104003] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6488-g820ac79e8448ad6c

2022-01-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2022-01-13
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
Mine.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-13 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #7 from avieira at gcc dot gnu.org ---
Hmm thinking out loud here. As vector sizes (or ISAs) change vectorization
strategies could indeed change. Best that I can think of is things like
rounding, where you might need to do operations in higher precision, and some
targets could potentially support instructions that widen, round and narrow
again in the same instruction at some size + ISA combination and not in other,
which means some would have a 'higher' element size mode in there where others
don't. But that assumes the vectorizer would represent such 'widen + round +
narrow' instructions in a single pattern, hiding the 'higher precision'
elements. Which as far as I know don't exist right now.

There may be other cases I can't think of ofc. We could always be even more
conservative and only skip if the highest possible element size for the current
vector size + ISA would lead to a mode with NUNITS greater or equal to the
current vector mode. Or ... just never skip a mode, I don't have a good feeling
for how much that would cost compile time wise though.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #11 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #10)
> with
> @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
>c1 = VEC_PACK_TRUNC_EXPR;
>if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> && VECTOR_BOOLEAN_TYPE_P (vectype)
> -   && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +   && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> +   || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
> && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
>   optab1 = vec_pack_sbool_trunc_optab;
>else
> @@ -12213,6 +12214,7 @@ supportable_narrowing_operation (enum tree_code code,
>if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
> && VECTOR_BOOLEAN_TYPE_P (prev_type)
> && intermediate_mode == prev_mode
> +   && known_lt (TYPE_VECTOR_SUBPARTS (intermediate_type), BITS_PER_UNIT)
> && SCALAR_INT_MODE_P (prev_mode))
>   interm_optab = vec_pack_sbool_trunc_optab;
>else
> 
> -march=icelake-server -O3 -mprefer-vector-width=128 now can get vectorized
> loop.
> 
> 
>   vmovdqu8(%rsi,%rax), %xmm0
>   vpmovzxbw   %xmm0, %xmm2
>   vpmovzxwd   %xmm2, %xmm1
>   vpsrldq $8, %xmm0, %xmm0
>   vpsrldq $8, %xmm2, %xmm2
>   vpmovzxbw   %xmm0, %xmm0
>   vpmovzxwd   %xmm2, %xmm2
>   vpmulld %xmm9, %xmm1, %xmm1
>   vpmulld %xmm9, %xmm2, %xmm2
>   vpmovzxwd   %xmm0, %xmm4
>   vpsrldq $8, %xmm0, %xmm0
>   vpmovzxwd   %xmm0, %xmm0
>   vpmulld %xmm9, %xmm4, %xmm4
>   vpmulld %xmm9, %xmm0, %xmm0
>   vpcmpud $6, %xmm6, %xmm1, %k0
>   vpsubd  %xmm1, %xmm7, %xmm3
>   vpcmpud $6, %xmm6, %xmm2, %k1
>   vpsubd  %xmm2, %xmm7, %xmm5
>   vpsrad  $31, %xmm5, %xmm5
>   vpsrad  $31, %xmm3, %xmm3
>   vpermt2w%xmm5, %xmm8, %xmm3
>   vpsubd  %xmm0, %xmm7, %xmm10
>   vpsubd  %xmm4, %xmm7, %xmm5
>   kshiftlb$4, %k1, %k1
>   vpcmpud $6, %xmm6, %xmm0, %k2
>   vpsrad  $31, %xmm5, %xmm5
>   vpsrad  $31, %xmm10, %xmm10
>   kandb   %k3, %k0, %k0
>   korb%k1, %k0, %k0
>   vpcmpud $6, %xmm6, %xmm4, %k1
>   vpermt2w%xmm10, %xmm8, %xmm5
>   vpermt2w%xmm2, %xmm8, %xmm1
>   vpermt2w%xmm0, %xmm8, %xmm4
>   vpermt2b%xmm5, %xmm11, %xmm3
>   vpermt2b%xmm4, %xmm11, %xmm1
>   kandb   %k3, %k1, %k1
>   kshiftlb$4, %k2, %k2
>   korb%k2, %k1, %k1
>   kunpckbw%k0, %k1, %k1
>   vmovdqu8%xmm3, %xmm1{%k1}
>   vmovdqu8%xmm1, (%rdi,%rax)
>   addq$16, %rax
>   cmpq%rax, %r8
>   jne .L4

But still not as good as before, since original version we only need to pack
data which is produced by vec_cond_expr, but now need to extraly pack mask.

before

  # x_24 = PHI 
  # vectp_src.11_73 = PHI 
  # vectp_dst.23_112 = PHI 
  # ivtmp_115 = PHI 
  # DEBUG x => NULL
  # DEBUG BEGIN_STMT
  _1 = (sizetype) x_24;
  _2 = src_11(D) + _1;
  vect__3.13_75 = MEM  [(uint8_t *)vectp_src.11_73];
  _3 = *_2;
  vect__4.15_76 = [vec_unpack_lo_expr] vect__3.13_75;
  vect__4.15_77 = [vec_unpack_hi_expr] vect__3.13_75;
  vect__4.14_78 = [vec_unpack_lo_expr] vect__4.15_76;
  vect__4.14_79 = [vec_unpack_hi_expr] vect__4.15_76;
  vect__4.14_80 = [vec_unpack_lo_expr] vect__4.15_77;
  vect__4.14_81 = [vec_unpack_hi_expr] vect__4.15_77;
  _4 = (int) _3;
  vect__5.16_83 = vect__4.14_78 * vect_cst__82;
  vect__5.16_84 = vect__4.14_79 * vect_cst__82;
  vect__5.16_85 = vect__4.14_80 * vect_cst__82;
  vect__5.16_86 = vect__4.14_81 * vect_cst__82;
  _5 = _4 * i_scale_12(D);
  _6 = dst_13(D) + _1;
  # DEBUG x => NULL
  # DEBUG INLINE_ENTRY x264_clip_uint8
  # DEBUG BEGIN_STMT
  vect__14.17_88 = vect__5.16_83 & vect_cst__87;
  vect__14.17_89 = vect__5.16_84 & vect_cst__87;
  vect__14.17_90 = vect__5.16_85 & vect_cst__87;
  vect__14.17_91 = vect__5.16_86 & vect_cst__87;
  _14 = _5 & -256;
  vect__17.18_92 = -vect__5.16_83;
  vect__17.18_93 = -vect__5.16_84;
  vect__17.18_94 = -vect__5.16_85;
  vect__17.18_95 = -vect__5.16_86;
  _17 = -_5;
  vect__18.19_96 = vect__17.18_92 >> 31;
  vect__18.19_97 = vect__17.18_93 >> 31;
  vect__18.19_98 = vect__17.18_94 >> 31;
  vect__18.19_99 = vect__17.18_95 >> 31;
  _18 = _17 >> 31;
  iftmp.0_19 = (unsigned char) _18;
  iftmp.0_20 = (unsigned char) _5;
  _101 = vect__14.17_88 != vect_cst__100;
  vect_patt_40.20_102 = VEC_COND_EXPR <_101, vect__18.19_96, vect__5.16_83>;
  _103 = vect__14.17_89 != vect_cst__100;
  vect_patt_40.20_104 = VEC_COND_EXPR <_103, vect__18.19_97, vect__5.16_84>;
  _105 = vect__14.17_90 != vect_cst__100;
  vect_patt_40.20_106 = VEC_COND_EXPR <_105, vect__18.19_98, vect__5.16_85>;
  _107 = vect__14.17_91 != vect_cst__100;
  vect_patt_40.20_108 = VEC_COND_EXPR <_107, vect__18.19_99, vect__5.16_86>;
  vect_patt_41.22_109 = VEC_PACK_TRUNC_EXPR ;
  vect_p

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #12 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #10 from Hongtao.liu  ---
> with
> @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code code,
>c1 = VEC_PACK_TRUNC_EXPR;
>if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
>   && VECTOR_BOOLEAN_TYPE_P (vectype)
> - && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> + && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> + || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
>   && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))

I think we instead simply want

 if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
 && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
 && VECTOR_BOOLEAN_TYPE_P (vectype)
 && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)

note the docs of vec_pack_sbool_trunc say

This instruction pattern is used when all the vector input and output
operands have the same scalar mode @var{m} and thus using
@code{vec_pack_trunc_@var{m}} would be ambiguous.

It also says "_Narrow_ and merge the elements of two vectors.", I think
"narrow" is misleading here, _trunc in the optab name as well.  So
with the above it suggests we could have used vect_pack_trunc_hi here?

To avoid breaking things for the VnBImode using targets we probably
want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
probably want to adjust the documentation a bit.

This all is with my pasted pattern patch or is this with the weird
inserted conversion still?

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #13 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #9 from rsandifo at gcc dot gnu.org  
> ---
> (In reply to Richard Biener from comment #7)
> > I think that is what we need to add.  We also don't have a good
> > representation
> > for "packing" of masks.
> > 
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > index 3ea905538e1..729a1d32612 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -4679,8 +4679,10 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
> >   rhs1_type);
> > }
> >  
> > -  if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > -   TYPE_VECTOR_SUBPARTS (vectype2)))
> > +  /* AVX512 style masks cannot be packed/unpacked.  */
> > +  if (TYPE_PRECISION (TREE_TYPE (vectype2)) != 1
> > + && maybe_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> > +  TYPE_VECTOR_SUBPARTS (vectype2)))
> > tmp = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
> >else
> > tmp = rhs1;
> Haven't had time to look at it properly yet, but my first impression
> is that that's likely to regress SVE.  Packing and unpacking are
> natural operations on boolean vector modes.

Sure, but we can't produce scalar code mimicking this for
1 bit element vectors.  It "works" for the others because based
on the width of the elements we choose a vector with different
number of elements.  But here the pattern produces a 8 bit element
which isn't want is desired here.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #14 from Hongtao.liu  ---

> 
> But still not as good as before, since original version we only need to pack
> data which is produced by vec_cond_expr, but now need to extraly pack mask.
> 
>

Also for non-avx512 target, it looks like a regression for non-sve target.

https://godbolt.org/z/4aavMjjG4

[Bug tree-optimization/102192] Curious '-O2'-only '-Wmaybe-uninitialized' diagnostics for 'libgomp.oacc-fortran/routine-10.f90'

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102192

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:2edbcaed95b8d8cbb05a6af486179db0da6e3245

commit r12-6547-g2edbcaed95b8d8cbb05a6af486179db0da6e3245
Author: Thomas Schwinge 
Date:   Thu Aug 26 16:55:21 2021 +0200

Document current '-Wuninitialized' diagnostics for
'libgomp.oacc-fortran/routine-10.f90' [PR102192]

libgomp/
PR tree-optimization/102192
* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
'-Wuninitialized' diagnostics.

[Bug tree-optimization/101615] [12 Regression] wrong code at -O3 on x86_64-linux-gnu since r12-1872

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101615

--- Comment #7 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:7f49f50f756c06f4093358ff77c11152777fff1c

commit r11-9458-g7f49f50f756c06f4093358ff77c11152777fff1c
Author: Richard Biener 
Date:   Wed Jul 28 15:12:00 2021 +0200

tree-optimization/101615 - SLP permute opt with CTOR roots

CTOR roots are not explicitely represented so we have to make sure
to materialize permutes on SLP graph entries to them.

2021-07-28  Richard Biener  

PR tree-optimization/101615
PR tree-optimization/103995
* tree-vect-slp.c (vect_optimize_slp): Materialize permutes
at CTOR SLP graph entries.

* gcc.dg/vect/bb-slp-pr101615-2.c: New testcase.

[Bug tree-optimization/103995] [11/12 Regression] conj() ignored with tree loop vectorizer

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

--- Comment #7 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:7f49f50f756c06f4093358ff77c11152777fff1c

commit r11-9458-g7f49f50f756c06f4093358ff77c11152777fff1c
Author: Richard Biener 
Date:   Wed Jul 28 15:12:00 2021 +0200

tree-optimization/101615 - SLP permute opt with CTOR roots

CTOR roots are not explicitely represented so we have to make sure
to materialize permutes on SLP graph entries to them.

2021-07-28  Richard Biener  

PR tree-optimization/101615
PR tree-optimization/103995
* tree-vect-slp.c (vect_optimize_slp): Materialize permutes
at CTOR SLP graph entries.

* gcc.dg/vect/bb-slp-pr101615-2.c: New testcase.

[Bug tree-optimization/103995] [11/12 Regression] conj() ignored with tree loop vectorizer

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
  Known to work||12.0

--- Comment #8 from Richard Biener  ---
In my testing this fixed it on the branch.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 103995, which changed state.

Bug 103995 Summary: [11/12 Regression] conj() ignored with tree loop vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103995

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/104003] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6488-g820ac79e8448ad6c

2022-01-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104003

--- Comment #2 from Uroš Bizjak  ---
 (define_insn "*xop_pcmov_"
-  [(set (match_operand:VI_32 0 "register_operand" "=x")
-(if_then_else:VI_32
-  (match_operand:VI_32 3 "register_operand" "x")
-  (match_operand:VI_32 1 "register_operand" "x")
-  (match_operand:VI_32 2 "register_operand" "x")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x")
+(if_then_else:VI_16_32
+  (match_operand:VI_16_32 3 "register_operand" "x")
+  (match_operand:VI_16_32 1 "register_operand" "x")
+  (match_operand:VI_16_32 2 "register_operand" "x")))]
   "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])

[Bug target/104004] [12 Regression] ICE: in extract_insn, at recog.c:2769 (error: unrecognizable insn)

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104004

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #5 from Jakub Jelinek  ---
The reason we warn is that at -Og we don't optimize away the dead code.  In
uninit2 we have:
  MEM[(struct _Optional_payload_base *)&D.34851]._M_engaged = 0;
...
  _27 = MEM[(const struct _Optional_payload_base &)&D.34851]._M_engaged;
  if (_27 != 0)
goto ; [33.00%]
  else
goto ; [67.00%]

   [local count: 719407024]:
  goto ; [100.00%]

   [local count: 354334800]:
  MEM[(struct __as_base  &)&b] ={v} {CLOBBER};
  MEM[(struct shared_ptr *)&b] ={v} {CLOBBER};
  MEM[(struct __shared_ptr *)&b] ={v} {CLOBBER};
  _30 = MEM[(const struct __shared_ptr &)&D.34851]._M_ptr;
  MEM[(struct __shared_ptr *)&b]._M_ptr = _30;
  MEM[(struct __shared_count *)&b + 8B] ={v} {CLOBBER};
  _31 = MEM[(const struct __shared_count &)&D.34851 + 8]._M_pi;
  MEM[(struct __shared_count *)&b + 8B]._M_pi = _31;
and we haven't figured out that bb 3 is all dead, because we store into
_M_engaged 0 and bb 3 is done only if _M_engaged is non-zero.
-Og runs fre1, but that is too early, fre1 sees
  MEM[(struct _Optional_payload_base *)&D.34851]._M_engaged = 0;
  D.35431 ={v} {CLOBBER};
  b ={v} {CLOBBER};
  MEM[(struct optional *)&b] ={v} {CLOBBER};
  std::_Optional_base::_Optional_base (&MEM[(struct optional
*)&b].D.34640, &D.34851.D.34640);
and _Optional_base isn't inlined there.  And -Og doesn't do any FRE or PRE
after inlining, unlike e.g. -O1 which has fre3 very soon after inlining.
-Og doesn't even do forwprop after inlining.

[Bug tree-optimization/100280] ICE in lower_omp_target, at omp-low.c:12287

2022-01-13 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100280

Thomas Schwinge  changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org,
   ||tschwinge at gcc dot gnu.org
 Resolution|--- |FIXED
   See Also|https://gcc.gnu.org/bugzill |
   |a/show_bug.cgi?id=100400|
   Assignee|unassigned at gcc dot gnu.org  |tschwinge at gcc dot 
gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Thomas Schwinge  ---
Thanks for the report, now fixed in master branch.  Not planning on backporting
OpenACC 'kernels' decomposition changes to release branches -- unless that'd be
useful for you?

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener  ---
Honza, -Og was supposed to not do so much work, I intended to disable IPA
inlining but there's no knob for that.  I wonder where to best put such
guard?  I set flag_inline_small_functions to zero for -Og but we still
run inline_small_functions ().  Basically -Og was supposed to only do
early opts and then what is necessary for correct RTL expansion.  Doing
IPA inlining defeats this :/

Can you help?  Is it safe to simply gate the inline_small_functions ()
call?  Do we want an extra -f[no-]ipa-inline like we have -fearly-inlining?

Using -fdisable-ipa-inline gets rid of the diagnostic

[Bug c/103996] Provide Better diagnostic for invalid reuse of a function name

2022-01-13 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103996

Eric Gallager  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=87656
 CC||egallager at gcc dot gnu.org

--- Comment #3 from Eric Gallager  ---
So, -Wshadow actually used to warn about this, but people didn't like it, so
that part of -Wshadow got removed from GCC in 4.8:
https://gcc.gnu.org/gcc-4.8/changes.html

See also: https://lkml.org/lkml/2006/11/28/239

However, as of GCC 7, the -Wshadow flag now takes additional arguments
(-Wshadow=global, -Wshadow=local, -Wshadow=compatible-local) to let users be
more specific about which shadowing warnings they want:
https://gcc.gnu.org/gcc-7/changes.html 

So, I'm wondering if the part of -Wshadow that got removed in 4.8 could get
added back to it as a non-default argument to the flag? e.g. -Wshadow=function

And maybe the arguments to -Wshadow could become a comma-separated list, too,
so users could do things like -Wshadow=function,global or
-Wshadow=function,local or something (in bug 87656 I was thinking that a
-Wshadow=compatible-global might make sense, too)

[Bug target/104005] New: Regression on arm+sve with -O2 -fPIC

2022-01-13 Thread gilles.gouaillardet at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104005

Bug ID: 104005
   Summary: Regression on arm+sve with -O2 -fPIC
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gilles.gouaillardet at gmail dot com
  Target Milestone: ---

The GROMACS (2021.3) non regression test suite now hangs with the latest trunk
when shared libraries are used from -O2 when two or more thread-MPI are used. I
only tried 512 bit SVE vectors.

I ran a git bisect and it pointed to the following commit:

commit 526e1639aa76b0a8496b0dc3a3ff2c450229544e (refs/bisect/bad)
Author: Richard Sandiford 
Date:   Fri Nov 12 17:33:00 2021 +

aarch64: Detect more consecutive MEMs


When investigating this issue, I found out that rebuilding the
src/gromacs/domdec/redistribute.cpp file **without** -fPIC is enough to avoid
the issue (a crash or a hang depending on the test cases).

FWIW, thread-MPI is a "fake" MPI implementation that used threads instead
processes. In this case, that means two threads. Then each thread enter its own
#pragma omp parallel section (and each thread-MPI thread has a unique OpenMP
thread)


The attached tarball includes preprocessed source (redistribute.cpp.i) and
assembly with -fPIC (redistribute.cpp.PIC.s) and without -fPIC
(redistribute.cpp.noPIC.s) for the commit mentioned above and the commit right
before.


Please let me know if this is enough for you to make sense of this bug,
otherwise I will provide instructions on how to build GROMACS from sources and
manually workaround the issue by "removing" the -fPIC flags for the
redistribute.cpp file.

[Bug target/104001] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b

2022-01-13 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001

Zdenek Sojka  changed:

   What|Removed |Added

 CC||zsojka at seznam dot cz

--- Comment #1 from Zdenek Sojka  ---
Probably the same issue:

$ cat testcase.C
int b, c, d;
int r;

void
foo ()
{
  r = ((b & ~d) | (c & d));
}
$ x86_64-pc-linux-gnu-gcc -O -mavx512bw testcase.c 
testcase.c: In function 'foo':
testcase.c:8:1: error: unrecognizable insn:
8 | }
  | ^
(insn 15 14 0 2 (parallel [
(set (mem/c:SI (symbol_ref:DI ("r") [flags 0x2] ) [1 r+0 S4 A32])
(ior:SI (reg:SI 92)
(reg:SI 93)))
(clobber (reg:CC 17 flags))
]) "testcase.c":7:17 -1
 (expr_list:REG_DEAD (reg:SI 93)
(expr_list:REG_DEAD (reg:SI 92)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)
during RTL pass: ira
testcase.c:8:1: internal compiler error: in extract_insn, at recog.c:2769
0x764803 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/repo/gcc-trunk/gcc/rtl-error.c:108
0x76487f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/repo/gcc-trunk/gcc/rtl-error.c:116
0x753518 extract_insn(rtx_insn*)
/repo/gcc-trunk/gcc/recog.c:2769
0x115b51f ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def*
(*)(rtx_def*))
/repo/gcc-trunk/gcc/ira.c:5350
0x115e115 remove_scratches
/repo/gcc-trunk/gcc/ira.c:5394
0x115e115 ira
/repo/gcc-trunk/gcc/ira.c:5718
0x115e115 execute
/repo/gcc-trunk/gcc/ira.c:6077
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs
> --- Comment #6 from Richard Biener  ---
> Honza, -Og was supposed to not do so much work, I intended to disable IPA
> inlining but there's no knob for that.  I wonder where to best put such
> guard?  I set flag_inline_small_functions to zero for -Og but we still
> run inline_small_functions ().  Basically -Og was supposed to only do
> early opts and then what is necessary for correct RTL expansion.  Doing
> IPA inlining defeats this :/
> 
> Can you help?  Is it safe to simply gate the inline_small_functions ()
> call?  Do we want an extra -f[no-]ipa-inline like we have -fearly-inlining?
> 
> Using -fdisable-ipa-inline gets rid of the diagnostic

You can not disable an IPA pass becasuse then we will mishandle
optimize attributes.  I think you simply want to set

flag_inline_small_functions = 0
flag_inline_functions_called_once = 0 

and we should only inline always_inlines. inline_small_functions will
still loop and check inlinability of functions but if everything is
compiled with -Og it will not find anything inlinable and exit.

Perhaps we may also extend initialize_inline_failed to add
CIF_DEBUG_OPTIMIZE so -Winline does say something more useufl then
"function not considered"

Honza


[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #7 from hubicka at kam dot mff.cuni.cz ---
> --- Comment #6 from Richard Biener  ---
> Honza, -Og was supposed to not do so much work, I intended to disable IPA
> inlining but there's no knob for that.  I wonder where to best put such
> guard?  I set flag_inline_small_functions to zero for -Og but we still
> run inline_small_functions ().  Basically -Og was supposed to only do
> early opts and then what is necessary for correct RTL expansion.  Doing
> IPA inlining defeats this :/
> 
> Can you help?  Is it safe to simply gate the inline_small_functions ()
> call?  Do we want an extra -f[no-]ipa-inline like we have -fearly-inlining?
> 
> Using -fdisable-ipa-inline gets rid of the diagnostic

You can not disable an IPA pass becasuse then we will mishandle
optimize attributes.  I think you simply want to set

flag_inline_small_functions = 0
flag_inline_functions_called_once = 0 

and we should only inline always_inlines. inline_small_functions will
still loop and check inlinability of functions but if everything is
compiled with -Og it will not find anything inlinable and exit.

Perhaps we may also extend initialize_inline_failed to add
CIF_DEBUG_OPTIMIZE so -Winline does say something more useufl then
"function not considered"

Honza

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs
> You can not disable an IPA pass becasuse then we will mishandle
> optimize attributes.  I think you simply want to set
> 
> flag_inline_small_functions = 0
> flag_inline_functions_called_once = 0 

Actually I forgot, we have flag_no_inline which makes
tree_inlinable_function_p to return false for everything except for
ALWAYS_INLINE and so we only want to set this one for Og.



[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #8 from hubicka at kam dot mff.cuni.cz ---
> You can not disable an IPA pass becasuse then we will mishandle
> optimize attributes.  I think you simply want to set
> 
> flag_inline_small_functions = 0
> flag_inline_functions_called_once = 0 

Actually I forgot, we have flag_no_inline which makes
tree_inlinable_function_p to return false for everything except for
ALWAYS_INLINE and so we only want to set this one for Og.

[Bug tree-optimization/100280] ICE in lower_omp_target, at omp-low.c:12287

2022-01-13 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100280

--- Comment #3 from Arseny Solokha  ---
(In reply to Thomas Schwinge from comment #2)
> Thanks for the report, now fixed in master branch.  Not planning on
> backporting OpenACC 'kernels' decomposition changes to release branches --
> unless that'd be useful for you?

Thanks. Not being OpenACC user, I'm fine with that.

[Bug target/104004] [12 Regression] ICE: in extract_insn, at recog.c:2769 (error: unrecognizable insn)

2022-01-13 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104004

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-01-13

--- Comment #1 from Kewen Lin  ---
Thanks for reporting!

We have the below check in rs6000_expand_set_fpscr_rn_builtin before:

  if (rs6000_isa_flags & OPTION_MASK_SOFT_FLOAT)
{
  error ("%<__builtin_set_fpscr_rn%> not supported with %<-msoft-float%>");
  return const0_rtx;
}

It's supposed to be updated with new attribute nosoft in new bif support.

But grepping in the latest trunk code, there are no bifs using nosoft, I guess
there are something not upstreamed eventually unexpectedly.

The fix seems to add nosoft attribute for __builtin_set_fpscr_rn entry.
Besides, by checking in old bif support, the below ones also require nosoft:

__builtin_set_fpscr_drn (this one needs no32bit)
__builtin_mffsl
__builtin_mtfsb0
__builtin_mtfsb1

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #7 from Segher Boessenkool  ---
> Seems cmp+isel on P9 is sub-optimal.

For this particular test, perhaps.  But it is better overall, at least some
years ago.  It was benchmarked (with spec), on p9.

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #9 from Richard Biener  ---
(In reply to hubicka from comment #8)
> > You can not disable an IPA pass becasuse then we will mishandle
> > optimize attributes.  I think you simply want to set
> > 
> > flag_inline_small_functions = 0
> > flag_inline_functions_called_once = 0 

I'm doing the above.

> Actually I forgot, we have flag_no_inline which makes
> tree_inlinable_function_p to return false for everything except for
> ALWAYS_INLINE and so we only want to set this one for Og.

And I'm intentionally not doing this because -Og should still remove
abstraction during early inlining (for functions marked 'inline'), we
just don't want to spend the extra compile time doing IPA inlining
or cleaning up after IPA inlining.

As can be seen with the testcase there are 5 calls inlined at IPA time
still (and 47 in early inlining):

> grep optimized: t.C.083i.inline 
/home/rguenther/install/gcc-12/usr/local/include/c++/12.0.0/optional:703:11:
optimized:  Inlined constexpr std::_Optional_base<_Tp, , 
>::_Optional_base(const std::_Optional_base<_Tp, ,  >&)
[with _Tp = A; bool  = false; bool  = false]/387 into int
main()/360 which now has time 34.125000 and size 17, net change of -7.
/home/rguenther/install/gcc-12/usr/local/include/c++/12.0.0/bits/shared_ptr_base.h:785:26:
optimized:  Inlined void std::_Sp_counted_base<_Lp>::_M_add_ref_copy() [with
__gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]/459 into A::A(const A&)/453
which now has time 13.010361 and size 17, net change of -6.
/home/rguenther/install/gcc-12/usr/local/include/c++/12.0.0/bits/shared_ptr_base.h:196:28:
optimized:  Inlined void std::_Sp_counted_base<_Lp>::_M_release_last_use()
[with __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]/416 into void
std::_Sp_counted_base<_Lp>::_M_release_last_use_cold() [with
__gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]/367 which now has time
33.41 and size 23, net change of -7.
/home/rguenther/install/gcc-12/usr/local/include/c++/12.0.0/bits/stl_construct.h:119:7:
optimized:  Inlined A::A(const A&)/453 into constexpr std::_Optional_base<_Tp,
,  >::_Optional_base(const std::_Optional_base<_Tp,
,  >&) [with _Tp = A; bool  = false; bool
 = false]/387 which now has time 32.808419 and size 11, net change
of -9.
/home/rguenther/install/gcc-12/usr/local/include/c++/12.0.0/bits/shared_ptr_base.h:778:21:
optimized:  Inlined void std::_Sp_counted_base<_Lp>::_M_release() [with
__gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]/272 into constexpr void
std::_Optional_payload_base<_Tp>::_M_reset() [with _Tp = A]/427 which now has
time 14.914755 and size 39, net change of -7.

but of course IPA inline size estimates are a bit off since we are not
going to do any optimization on the inlined body.

[Bug c++/99445] [11 Regression] ICE in hashtab_chk_error, at hash-table.c:137 since r11-7011-g6e0a231a4aa2407b

2022-01-13 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99445

--- Comment #10 from Jonathan Wakely  ---
There's a report of a regression caused by this:
https://gcc.gnu.org/pipermail/gcc-help/2022-January/141127.html
I'll ask for it to be reported to bugzilla.

[Bug libfortran/104006] New: [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

Bug ID: 104006
   Summary: [12 regression] power-ieee128 merge breaks Solaris
build
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libfortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ro at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, tkoenig at gcc dot gnu.org
  Target Milestone: ---
Target: *-*-solaris2.11

Between 20220111 and 20220112, Solaris bootstrap (both sparc and x86) got badly
broken, obviously due to the power-ieee128 merge:

* Initially, the build aborted with

make[2]: *** No rule to make target 'gfortran.ver-sun', needed by 'all'.  Stop.

  This can easily be fixed by renaming the target to gfortran.ver-sun.

* What's way worse, however, is that parallel builds fail repeatedly in
  libgfortran:

  On x86 (-j48), I get

In file included from
/vol/gcc/src/hg/master/local/libgfortran/runtime/bounds.c:25:
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:393:31: error: unknown
type name 'GFC_REAL_4'
  393 | typedef GFC_ARRAY_DESCRIPTOR (GFC_REAL_4) gfc_array_r4;
  |   ^~
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:375:3: note: in
definition of macro 'GFC_ARRAY_DESCRIPTOR'
  375 |   type *base_addr;\
  |   ^~~~
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:394:31: error: unknown
type name 'GFC_REAL_8'
  394 | typedef GFC_ARRAY_DESCRIPTOR (GFC_REAL_8) gfc_array_r8;
  |   ^~
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:375:3: note: in
definition of macro 'GFC_ARRAY_DESCRIPTOR'
  375 |   type *base_addr;\
  |   ^~~~
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:404:31: error: unknown
type name 'GFC_COMPLEX_4'
  404 | typedef GFC_ARRAY_DESCRIPTOR (GFC_COMPLEX_4) gfc_array_c4;
  |   ^
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:375:3: note: in
definition of macro 'GFC_ARRAY_DESCRIPTOR'
  375 |   type *base_addr;\
  |   ^~~~
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:405:31: error: unknown
type name 'GFC_COMPLEX_8'
  405 | typedef GFC_ARRAY_DESCRIPTOR (GFC_COMPLEX_8) gfc_array_c8;
  |   ^
/vol/gcc/src/hg/master/local/libgfortran/libgfortran.h:375:3: note: in
definition of macro 'GFC_ARRAY_DESCRIPTOR'
  375 |   type *base_addr;\
  |   ^~~~

  also for several other files.  All those definitions live in the generated
  kinds.h.  I cannot yet tell if the header isn't generated atomically.

  On sparc (-j64), I get instead:

/vol/gcc/src/hg/master/local/libgfortran/ieee/ieee_exceptions.F90:28:2:

   28 | #include "c99_protos.inc"
  |  1~~
Fatal Error: kinds.inc: No such file or directory

  I notice that there's no .deps/*.Plo file for this source file; probably
  some dependencies are missing.

Manually restarting the libgfortran build (often several times) ultimately
lets the build succeed.

For comparison's sake I also tried -j64 builds on x86_64-pc-linux-gnu and
i686-pc-linux-gnu.  Both succeeded without issues.

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #15 from Hongtao.liu  ---
(In reply to rguent...@suse.de from comment #12)
> On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> > 
> > --- Comment #10 from Hongtao.liu  ---
> > with
> > @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code 
> > code,
> >c1 = VEC_PACK_TRUNC_EXPR;
> >if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> >   && VECTOR_BOOLEAN_TYPE_P (vectype)
> > - && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > + && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > + || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
> >   && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> 
> I think we instead simply want
> 
>  if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
>  && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
>  && VECTOR_BOOLEAN_TYPE_P (vectype)
>  && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)
> 
> note the docs of vec_pack_sbool_trunc say
> 
> This instruction pattern is used when all the vector input and output
> operands have the same scalar mode @var{m} and thus using
> @code{vec_pack_trunc_@var{m}} would be ambiguous.
> 
> It also says "_Narrow_ and merge the elements of two vectors.", I think
> "narrow" is misleading here, _trunc in the optab name as well.  So
> with the above it suggests we could have used vect_pack_trunc_hi here?
> 
> To avoid breaking things for the VnBImode using targets we probably
> want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
> probably want to adjust the documentation a bit.
> 
> This all is with my pasted pattern patch or is this with the weird
> inserted conversion still?

It's w/o your patch, I'm try to handle the weird conversion with multi
steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8
-> HI:16 through vec_pack_sbool_trunc_hi). But on the othersize the weird
inserted conversion shouldn't be existed.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

Rainer Orth  changed:

   What|Removed |Added

   Target Milestone|--- |12.0

[Bug target/104001] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001

--- Comment #2 from Hongtao.liu  ---
I'm testing

1 file changed, 3 insertions(+), 3 deletions(-)
gcc/config/i386/i386.md | 6 +++---

modified   gcc/config/i386/i386.md
@@ -10455,7 +10455,7 @@ (define_insn_and_split "*xordi_1_btc"

 ;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
 (define_insn_and_split "*xor2andn"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand")
+  [(set (match_operand:SWI248 0 "register_operand")
(xor:SWI248
  (and:SWI248
(xor:SWI248
@@ -10464,8 +10464,7 @@ (define_insn_and_split "*xor2andn"
(match_operand:SWI248 3 "nonimmediate_operand"))
  (match_dup 1)))
 (clobber (reg:CC FLAGS_REG))]
-  "(TARGET_BMI || TARGET_AVX512BW)
-   && ix86_pre_reload_split ()"
+  "TARGET_BMI && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 4)
@@ -10486,6 +10485,7 @@ (define_insn_and_split "*xor2andn"
  (clobber (reg:CC FLAGS_REG))])]
 {
   operands[1] = force_reg (mode, operands[1]);
+  operands[2] = force_reg (mode, operands[2]);
   operands[3] = force_reg (mode, operands[3]);
   operands[4] = gen_reg_rtx (mode);
   operands[5] = gen_reg_rtx (mode);

[back]

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2022-01-13
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
Created attachment 52176
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52176&action=edit
gcc12-pr104006.patch

Sorry for the gfortran.map-sun left-over.  The kinds.inc missing dependency
seems to be there since forever, wonder what changed.
Anyway, this patch ought to fix both.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #2 from Jakub Jelinek  ---
Sorry, only added kinds.inc dependencies and not kinds.h.

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #10 from hubicka at kam dot mff.cuni.cz ---
> And I'm intentionally not doing this because -Og should still remove
> abstraction during early inlining (for functions marked 'inline'), we
> just don't want to spend the extra compile time doing IPA inlining
> or cleaning up after IPA inlining.

Indeed it seemed bit too extreme to disable inlining completely at -Og :)
So you want early inliner to behave normally according to flags
while IPA inliner to skip all calls where either caller or
callee is -Og and callee is not always_inline?
This can be done in can_inline_edge_p. I will make patch for that.

It may be nice to also avoid re-analyzing functions completely to save
some compile time, but that may be bit tricky if we decide to do things
like cross-module always_inline.  I will look into that too, but perhaps
that can wait for next stage1.
> 
> but of course IPA inline size estimates are a bit off since we are not
> going to do any optimization on the inlined body.

We still do late ccp and other things, but indeed inline estimates
anticipate FRE to happen which it doesn't.

Looking into what passes are in the pipeline I also noticed that
we could also probably skip late modref from -Og optimization pipeline.
(I think David added it htere originally since we do pure-const).
I am thinking about retiring pure-const from pure-const discovery next
stage1 since modref should be monotonosly stronger doing that (and
if we add a stripped down mode of modref it should not be more expensive
than pure-const)

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #16 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> 
> --- Comment #15 from Hongtao.liu  ---
> (In reply to rguent...@suse.de from comment #12)
> > On Thu, 13 Jan 2022, crazylht at gmail dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771
> > > 
> > > --- Comment #10 from Hongtao.liu  ---
> > > with
> > > @@ -12120,7 +12120,8 @@ supportable_narrowing_operation (enum tree_code 
> > > code,
> > >c1 = VEC_PACK_TRUNC_EXPR;
> > >if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> > >   && VECTOR_BOOLEAN_TYPE_P (vectype)
> > > - && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > > + && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype)
> > > + || known_lt (TYPE_VECTOR_SUBPARTS (vectype), BITS_PER_UNIT))
> > >   && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
> > 
> > I think we instead simply want
> > 
> >  if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype)
> >  && TYPE_PRECISION (TREE_TYPE (narrow_vectype)) == 1
> >  && VECTOR_BOOLEAN_TYPE_P (vectype)
> >  && TYPE_PRECISION (TREE_TYPE (vectype)) == 1)
> > 
> > note the docs of vec_pack_sbool_trunc say
> > 
> > This instruction pattern is used when all the vector input and output
> > operands have the same scalar mode @var{m} and thus using
> > @code{vec_pack_trunc_@var{m}} would be ambiguous.
> > 
> > It also says "_Narrow_ and merge the elements of two vectors.", I think
> > "narrow" is misleading here, _trunc in the optab name as well.  So
> > with the above it suggests we could have used vect_pack_trunc_hi here?
> > 
> > To avoid breaking things for the VnBImode using targets we probably
> > want to retain the SCALAR_INT_MODE_P (prev_mode) check.  And we
> > probably want to adjust the documentation a bit.
> > 
> > This all is with my pasted pattern patch or is this with the weird
> > inserted conversion still?
> 
> It's w/o your patch, I'm try to handle the weird conversion with multi
> steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8
> -> HI:16 through vec_pack_sbool_trunc_hi). But on the othersize the weird
> inserted conversion shouldn't be existed.

But the weird conversion suggests packing { 0, 1, 0, 1 } as
{ 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, ... }
thus expanding each bit to 8 bits.  So it's rather an unpacking :/
As said, the scalar conversion does not make any sense...

But maybe I'm missing something very obvious?

[Bug c++/78655] gcc doesn't exploit the fact that the result of pointer addition can not be nullptr

2022-01-13 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78655

--- Comment #13 from Andrew Macleod  ---
Probably.  rangers nonnull processing also invokes infer_nonnull_range () on
the statement, so should also be picking it up.

The latter test case is really about recomputation then

x_2 = a_1(D) == 0B;
a_3 = a_1(D) + 40;
return x_2;

When x_2 is defined, we don't know that it is non-null.  we only know its
non-null if we were to recompute its value at the use in return x_2.

We can leave this for now.. I'll follow up with it when we revisit the nonnull
processing and recomputation model for the next stage 1.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #3 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #1 from Jakub Jelinek  ---
> Created attachment 52176
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52176&action=edit
> gcc12-pr104006.patch

The patch lists gcc as toplevel dir instead of libgfortran...

> Sorry for the gfortran.map-sun left-over.  The kinds.inc missing dependency

No worries, that was easily fixed.

> seems to be there since forever, wonder what changed.
> Anyway, this patch ought to fix both.

I've moved the previous libgfortran dirs (32 and 64-bit) aside on both
sparc and x86, the ran make -jN configure-target-libgfortran
all-target-libgfortran.

On x86 (-j48), the build worked just fine, on sparc (-j64) I now get the
same error I had on x86 before (while compiling runtime/bounds.c,
libgfortran.h doesn't find GFC_(REAL|COMPLEX)_[48] definitions),
apparently only in the sparcv9 multilib.  Rerunning the make
all-target-libgfortran succeeds, though.

Really strange.  If kinds.h were missing completely at that point, I'd
expect gcc message to that effect, that's why I suspected the header
being incomplete instead.

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #11 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, hubicka at kam dot mff.cuni.cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
> 
> --- Comment #10 from hubicka at kam dot mff.cuni.cz ---
> > And I'm intentionally not doing this because -Og should still remove
> > abstraction during early inlining (for functions marked 'inline'), we
> > just don't want to spend the extra compile time doing IPA inlining
> > or cleaning up after IPA inlining.
> 
> Indeed it seemed bit too extreme to disable inlining completely at -Og :)
> So you want early inliner to behave normally according to flags
> while IPA inliner to skip all calls where either caller or
> callee is -Og and callee is not always_inline?
> This can be done in can_inline_edge_p. I will make patch for that.

Yeah, and since we inline all always inline and also flatten during
early inline the IPA inliner should really do nothing.

> It may be nice to also avoid re-analyzing functions completely to save
> some compile time, but that may be bit tricky if we decide to do things
> like cross-module always_inline.  I will look into that too, but perhaps
> that can wait for next stage1.

I think we decided to have all always inline early and drop bodies now,
didn't you patch it that way this stage1?

> > 
> > but of course IPA inline size estimates are a bit off since we are not
> > going to do any optimization on the inlined body.
> 
> We still do late ccp and other things, but indeed inline estimates
> anticipate FRE to happen which it doesn't.

IIRC the CCP was necessary for some odd reason I don't remember
right now ;)

> Looking into what passes are in the pipeline I also noticed that
> we could also probably skip late modref from -Og optimization pipeline.

Yes, I noticed it was there just now ...

> (I think David added it htere originally since we do pure-const).
> I am thinking about retiring pure-const from pure-const discovery next
> stage1 since modref should be monotonosly stronger doing that (and
> if we add a stripped down mode of modref it should not be more expensive
> than pure-const)

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #12 from hubicka at kam dot mff.cuni.cz ---
> Yeah, and since we inline all always inline and also flatten during
> early inline the IPA inliner should really do nothing.

OK, can_inline_edge_p will do that but we will still walk the calls
which is bit of wasted effort.  Will look into that incrementally.
> 
> > It may be nice to also avoid re-analyzing functions completely to save
> > some compile time, but that may be bit tricky if we decide to do things
> > like cross-module always_inline.  I will look into that too, but perhaps
> > that can wait for next stage1.
> 
> I think we decided to have all always inline early and drop bodies now,
> didn't you patch it that way this stage1?
I think that gets into trouble i.e. with kernel calling always_inlines
indirectly. It is a mess...
> 
> IIRC the CCP was necessary for some odd reason I don't remember
> right now ;)

I would bet it was builtin_constat_p and inlining, so perhaps if we
completely ban late inlining ccp can go.
> 
> > Looking into what passes are in the pipeline I also noticed that
> > we could also probably skip late modref from -Og optimization pipeline.
> 
> Yes, I noticed it was there just now ...

I will make patch to drop it for trunk.  If we disable all optimization
the repeated pure-const seems pointless as well?

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #13 from Richard Biener  ---
(In reply to hubicka from comment #12)
> > Yeah, and since we inline all always inline and also flatten during
> > early inline the IPA inliner should really do nothing.
> 
> OK, can_inline_edge_p will do that but we will still walk the calls
> which is bit of wasted effort.  Will look into that incrementally.
> > 
> > > It may be nice to also avoid re-analyzing functions completely to save
> > > some compile time, but that may be bit tricky if we decide to do things
> > > like cross-module always_inline.  I will look into that too, but perhaps
> > > that can wait for next stage1.
> > 
> > I think we decided to have all always inline early and drop bodies now,
> > didn't you patch it that way this stage1?
> I think that gets into trouble i.e. with kernel calling always_inlines
> indirectly. It is a mess...

Sure - I just remember (falsely?) that we finally decided to do it :)
If we don't run IPA inline we don't figure we failed to inline the
always_inline either ;)  And IPA inline can expose more indirect
alywas-inlines we only discover after even more optimization so the
issue is really moot unless we sorry () (or link-fail).

> > 
> > IIRC the CCP was necessary for some odd reason I don't remember
> > right now ;)
> 
> I would bet it was builtin_constat_p and inlining, so perhaps if we
> completely ban late inlining ccp can go.

Yeah, or __builtin_unreachable, or whatever ;)

> > 
> > > Looking into what passes are in the pipeline I also noticed that
> > > we could also probably skip late modref from -Og optimization pipeline.
> > 
> > Yes, I noticed it was there just now ...
> 
> I will make patch to drop it for trunk.  If we disable all optimization
> the repeated pure-const seems pointless as well?

Yes.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #52176|0   |1
is obsolete||

--- Comment #4 from Jakub Jelinek  ---
Created attachment 52177
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52177&action=edit
gcc12-pr104006.patch

Updated patch.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #5 from Jakub Jelinek  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #3)
> Really strange.  If kinds.h were missing completely at that point, I'd
> expect gcc message to that effect, that's why I suspected the header
> being incomplete instead.

You can hit the case where kinds.h has been created already but nothing has
been stored to it yet, mk-kinds-h.sh script is invoked with > kinds.h so that
is created immediately kinds.h goal starts, but does some compilations first
before it echoes something to stdout.

[Bug tree-optimization/97909] expr_not_equal_to (mainly in match.pd) vs. ranger

2022-01-13 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97909

--- Comment #4 from Andrew Macleod  ---
This functionality was added with fc4076752067fb400b43adbd629081df658da246

Commentary here
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html

All one needs is an active ranger via the enable_ranger() API.
*All* queries made from within folding are READ_ONLY.. ie, no new information
is ever created.  

So for accurate results, you need to query the ranges of any operands before
invoking fold to make sure the caches and information are up to date.

Then ::fold_stmt needs to be invoked via ranger->fold_stmt() instead which
provides a hook for the context required.

Any invocation of folding not done thru this interface will revert to whatever
ranger knows about global ranges.

IF we wish to utilize this via different API points, we can add them.  

Ultimately, we can try to get tighter integration of context with folding.

Re: [Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread Jan Hubicka via Gcc-bugs
> 
> Sure - I just remember (falsely?) that we finally decided to do it :)

I do not recall this, but I may have forgotten :))

> If we don't run IPA inline we don't figure we failed to inline the
> always_inline either ;)  And IPA inline can expose more indirect
> alywas-inlines we only discover after even more optimization so the
> issue is really moot unless we sorry () (or link-fail).

Problem with kernel was that it relied on quite complicated indirect
inliing of always inlined and did not work without it.  At beggining I
think we should have introduced two attributes - always_inline and
disregard_inline_limits just like we have internally. Always_inline
should have never allowed public linkage or taking its address, but
it is probbly late to fix that :(

Honza


[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #14 from hubicka at kam dot mff.cuni.cz ---
> 
> Sure - I just remember (falsely?) that we finally decided to do it :)

I do not recall this, but I may have forgotten :))

> If we don't run IPA inline we don't figure we failed to inline the
> always_inline either ;)  And IPA inline can expose more indirect
> alywas-inlines we only discover after even more optimization so the
> issue is really moot unless we sorry () (or link-fail).

Problem with kernel was that it relied on quite complicated indirect
inliing of always inlined and did not work without it.  At beggining I
think we should have introduced two attributes - always_inline and
disregard_inline_limits just like we have internally. Always_inline
should have never allowed public linkage or taking its address, but
it is probbly late to fix that :(

Honza

[Bug c++/104007] New: new (std::nothrow) S[n] always calls ~S

2022-01-13 Thread sbergman at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104007

Bug ID: 104007
   Summary: new (std::nothrow) S[n] always calls ~S
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sbergman at redhat dot com
  Target Milestone: ---

Apparently a recent regression on GCC trunk:

> $ cat test.cc
> #include 
> #include 
> struct S { ~S() { std::abort(); } };
> int main() {
> new (std::nothrow) S[1];
> }

> $ g++ test.cc
> $ ./a.out
> Aborted (core dumped)

[Bug c++/104007] [12 Regression] new (std::nothrow) S[n] always calls ~S since r12-6328-gbeaee0a871b6485d

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104007

Martin Liška  changed:

   What|Removed |Added

Summary|new (std::nothrow) S[n] |[12 Regression] new
   |always calls ~S |(std::nothrow) S[n] always
   ||calls ~S since
   ||r12-6328-gbeaee0a871b6485d
 CC||jason at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-01-13
 Status|UNCONFIRMED |NEW

--- Comment #1 from Martin Liška  ---
Started with r12-6328-gbeaee0a871b6485d.

[Bug c++/104007] [12 Regression] new (std::nothrow) S[n] always calls ~S since r12-6328-gbeaee0a871b6485d

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104007

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |12.0
   Priority|P3  |P1

[Bug c++/104008] New: New g++ folly compile error with gcc 11.x. Bisected to PR99445 c++: Alias template in pack expansion

2022-01-13 Thread ahornby at fb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104008

Bug ID: 104008
   Summary: New g++ folly compile error with gcc 11.x. Bisected to
PR99445 c++: Alias template in pack expansion
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ahornby at fb dot com
  Target Milestone: ---

Created attachment 52178
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52178&action=edit
gzipped preprocessed source used for the bisect

When compiling folly on Fedora 35 with gcc from the included 11.2.1-7 package,
I found that the badge_test code fails to compile,  whereas it builds fine with
gcc 10.x and with the fedora 35 clang (13.0.0-3.fc35)).

Bisecting from https://gcc.gnu.org/git/gcc.git  with pre-processed source
indicates problem introduced in commit:
[a2531859bf5bf6cf1f29c0dca85fd26e80904a5d] c++: Alias template in pack
expansion [PR99445] (mirror at
https://github.com/gcc-mirror/gcc/commit/a2531859bf5bf6cf1f29c0dca85fd26e80904a5d)

I've also tested latest master and problem present there as well.

Bisected with commands, (bisection script below)
git bisect start basepoints/gcc-12 basepoints/gcc-11
git bisect run ~/local/bisect/gxx_bisect.sh

Example  of the problem from a folly build along with the command line I got
preprocessed source from is in 
https://github.com/facebook/folly/commit/af966d2ce25c14c96373bf39c8ae2b406219ffb4

Error looks like:

'/home/alex/local/bisect/test/0cc79337ad265aabccab63882a810f9dc509a9d0/build'
In file included from /home/alex/local/folly/folly/lang/test/BadgeTest.cpp:19:
/home/alex/local/folly/folly/lang/Badge.h: In instantiation of ‘class
folly::any_badge<{anonymous}::FriendClass, {anonymous}::OtherFriendClass>’:
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp:38:40:   required from
here
/home/alex/local/folly/folly/lang/Badge.h:99:18: error: expansion pattern
‘folly::StrictDisjunction...>’ contains no
parameter packs
   99 |   /* implicit */ any_badge(any_badge) noexcept {}
  |  ^
/home/alex/local/folly/folly/lang/Badge.h: In instantiation of ‘class
folly::any_badge<{anonymous}::FriendClass, {anonymous}::OtherFriendClass,
{anonymous}::DummyClass>’:
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp:39:53:   required from
here
/home/alex/local/folly/folly/lang/Badge.h:99:18: error: expansion pattern
‘folly::StrictDisjunction...>’ contains no
parameter packs
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp: In static member function
‘static void {anonymous}::ProtectedClass::subset({anonymous}::SubsetBadges)’:
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp:39:54: error: cannot
convert ‘any_badge<{anonymous}::FriendClass, {anonymous}::OtherFriendClass>’ to
‘any_badge<{anonymous}::FriendClass, {anonymous}::OtherFriendClass,
{anonymous}::DummyClass>’
   39 |   static void subset(SubsetBadges badges) { superset(badges); }
  |  ^~
  |  |
  | 
any_badge<{anonymous}::FriendClass, {anonymous}::OtherFriendClass>
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp:40:24: note:  
initializing argument 1 of ‘static void
{anonymous}::ProtectedClass::superset({anonymous}::SupersetBadges)’
   40 |   static void superset(SupersetBadges) {}
  |^~
In file included from /home/alex/local/folly/folly/lang/test/BadgeTest.cpp:19:
/home/alex/local/folly/folly/lang/Badge.h: In instantiation of ‘class
folly::any_badge<{anonymous}::FriendClass>’:
/home/alex/local/folly/folly/lang/test/BadgeTest.cpp:47:35:   required from
here
/home/alex/local/folly/folly/lang/Badge.h:99:18: error: expansion pattern
‘folly::StrictDisjunction...>’ contains no
parameter packs
   99 |   /* implicit */ any_badge(any_badge) noexcept {}
  |  ^
Exited with 0


Bisection script was:

#!/bin/sh
# adapted from http://moxielogic.org/blog/bisecting-gcc.html
# Test with:
#   cd local/gcc #(or whereever gcc git repo is)
#   ./bisect/gxx_bisect.sh
# Run with:
#   git bisect run ~/local/bisect/gxx_bisect.sh

# git clone of the gcc tree
GCCSRC="$HOME/local/gcc"

# pre-processed test case
TESTSRC="$HOME/local/bisect/bisect_source.i"

COMMIT=`git rev-parse HEAD`

# Where to put gcc build and install dirs
testdir="$HOME/local/bisect/test/$COMMIT"

mkdir -p "$testdir/build"
mkdir -p "$testdir/install"

# configure for C & C++
(cd "$testdir/build" &&
 $GCCSRC/configure --prefix="$testdir/install" --enable-languages=c,c++
--with-system-zlib --disable-multilib --disable-libsanitizer
--disable-bootstrap &&
 make -j 32 && make -j 32 install)

cxxbin="$testdir/install/bin/g++"

if test -x "$cxxbin"; then
  # build test case
  if "$cxxbin" -std=gnu++17 -c "$TES

[Bug ipa/101941] [12 Regression] Linux kernel build failure due to retaining fnsplit fragment with __attribute__((__error__))

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

--- Comment #26 from Martin Liška  ---
I've just rebuilt kernel-default package from openSUSE:Factory with the
following config:
https://gist.githubusercontent.com/marxin/d5373a0dd6ab35233a47a25337e73dc5/raw/d2c810d2d32104619b57b6f1d118d052302c519f/.config

and there's one more compilation error for fs/lockd/svclock.c:

gcc fs_lockd_svclock.i -c -O2
In file included from ./include/linux/string.h:253,
 from ./include/linux/bitmap.h:10,
 from ./include/linux/cpumask.h:12,
 from ./arch/x86/include/asm/cpumask.h:5,
 from ./arch/x86/include/asm/msr.h:11,
 from ./arch/x86/include/asm/processor.h:22,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:60,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:78,
 from ./include/linux/spinlock.h:55,
 from ./include/linux/mmzone.h:8,
 from ./include/linux/gfp.h:6,
 from ./include/linux/slab.h:15,
 from fs/lockd/svclock.c:25:
In function ‘strcpy’,
inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4,
inlined from ‘nlmsvc_lookup_block’ at fs/lockd/svclock.c:157:721:
./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’
declared with attribute error: detected write beyond size of object (1st
parameter)
  319 | __write_overflow();
  | ^~
In function ‘strcpy’,
inlined from ‘nlmdbg_cookie2a’ at fs/lockd/svclock.c:74:4,
inlined from ‘nlmsvc_find_block’ at fs/lockd/svclock.c:196:672,
inlined from ‘nlmsvc_grant_reply’ at fs/lockd/svclock.c:963:16:
./include/linux/fortify-string.h:319:17: error: call to ‘__write_overflow’
declared with attribute error: detected write beyond size of object (1st
parameter)
  319 | __write_overflow();
  | ^~

that started with r12-6030-g422f9eb7011b76c1.

[Bug ipa/101941] [12 Regression] Linux kernel build failure due to retaining fnsplit fragment with __attribute__((__error__))

2022-01-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

--- Comment #27 from Martin Liška  ---
Created attachment 52179
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52179&action=edit
not reduced test-case

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #6 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #4 from Jakub Jelinek  ---
> Created attachment 52177
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52177&action=edit
> gcc12-pr104006.patch
>
> Updated patch.

Thanks.  The previous errors are gone, but now I get

/vol/gcc/src/hg/master/local/libgfortran/intrinsics/selected_real_kind.f90:56:31:

   56 | if (p2 <= real_infos (i) % precision) found_p = .true.
  |   1
Error: Symbol ‘real_infos’ at (1) has no IMPLICIT type

and several more on both sparc and x86.

[Bug ipa/101941] [12 Regression] Linux kernel build failure due to retaining fnsplit fragment with __attribute__((__error__))

2022-01-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

--- Comment #28 from Andrew Pinski  ---
(In reply to Martin Liška from comment #26)
> that started with r12-6030-g422f9eb7011b76c1.

Please file that bug separately and it might be related to PR 103961 which was
just fixed too.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #5 from Jakub Jelinek  ---
> (In reply to r...@cebitec.uni-bielefeld.de from comment #3)
>> Really strange.  If kinds.h were missing completely at that point, I'd
>> expect gcc message to that effect, that's why I suspected the header
>> being incomplete instead.
>
> You can hit the case where kinds.h has been created already but nothing has
> been stored to it yet, mk-kinds-h.sh script is invoked with > kinds.h so that
> is created immediately kinds.h goal starts, but does some compilations first
> before it echoes something to stdout.

Which suggests kinds.h (and other files generated in a similar way)
should use move-if-change to avoid this.

make kinds.h takes about 1.2s on my sparc box, so there's a considerable
window the file could be incomplete.

[Bug target/104001] [12 Regression] ICE in extract_insn, at recog.c:2769 since r12-6538-g5f19303ada7db92c155332e7ba317233ca05946b

2022-01-13 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104001

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #3 from David Binderman  ---
Another very similar test case:

int Mpm_p_pTruth0, Mpm_p_pTruth1, Mpm_p_tC, Mpm_p_t;

void Mpm_p() {
  Mpm_p_t = Mpm_p_tC & Mpm_p_pTruth1 | ~Mpm_p_tC & Mpm_p_pTruth0;
}

goes wrong with -O1 and -march=bdver2.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #8 from Jakub Jelinek  ---
libgfortran/ieee/ieee_arithmetic.F90:#include "fpu-target.inc"
libgfortran/ieee/ieee_exceptions.F90:#include "fpu-target.inc"
libgfortran/intrinsics/selected_int_kind.f90:  include "selected_int_kind.inc"
libgfortran/intrinsics/selected_real_kind.f90:  include
"selected_real_kind.inc"
libgfortran/runtime/fpu.c:#include "fpu-target.h"

[Bug target/103771] [12 Regression] Missed vectorization under -mavx512f -mavx512vl after r12-5489

2022-01-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103771

--- Comment #17 from Hongtao.liu  ---
> As said, the scalar conversion does not make any sense...
Agree.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #52177|0   |1
is obsolete||

--- Comment #9 from Jakub Jelinek  ---
Created attachment 52180
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52180&action=edit
gcc12-pr104006.patch

So yet another version.

[Bug c++/104008] [11/12 Regression] New g++ folly compile error with gcc 11.x. Bisected to PR99445 c++: Alias template in pack expansion

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104008

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
   Target Milestone|--- |11.3
Summary|New g++ folly compile error |[11/12 Regression] New g++
   |with gcc 11.x. Bisected to  |folly compile error with
   |PR99445 c++: Alias template |gcc 11.x. Bisected to
   |in pack expansion   |PR99445 c++: Alias template
   ||in pack expansion

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:53ead5787921be799593232cfc9931f916b79002

commit r12-6550-g53ead5787921be799593232cfc9931f916b79002
Author: Jakub Jelinek 
Date:   Thu Jan 13 15:59:47 2022 +0100

inliner: Don't emit copy stmts for empty type parameters [PR103989]

The following patch avoids emitting a parameter copy statement when
inlining
if the parameter has empty type.  E.g. the gimplifier does something
similar
(except that it needs to evaluate side-effects if any, which isn't the case
here):
  /* For empty types only gimplify the left hand side and right hand
 side as statements and throw away the assignment.  Do this after
 gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
 types properly.  */
  if (is_empty_type (TREE_TYPE (*from_p))
  && !want_value
  /* Don't do this for calls that return addressable types, expand_call
 relies on those having a lhs.  */
  && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
   && TREE_CODE (*from_p) == CALL_EXPR))
{
  gimplify_stmt (from_p, pre_p);
  gimplify_stmt (to_p, pre_p);
  *expr_p = NULL_TREE;
  return GS_ALL_DONE;
}
Unfortunately, this patch doesn't cure the uninit warnings in that PR,
which is caused by ipa inlining happening even at -Og when the post-IPA
-Og passes don't expect the need to clean up after ipa inlining,
but I think is desirable anyway.

2022-01-13  Jakub Jelinek  

PR tree-optimization/103989
* tree-inline.c (setup_one_parameter): Don't copy parms with
empty type.

[Bug ipa/101941] [12 Regression] Linux kernel build failure due to retaining fnsplit fragment with __attribute__((__error__))

2022-01-13 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101941

--- Comment #29 from Siddhesh Poyarekar  ---
(In reply to Andrew Pinski from comment #28)
> (In reply to Martin Liška from comment #26)
> > that started with r12-6030-g422f9eb7011b76c1.
> 
> Please file that bug separately and it might be related to PR 103961 which
> was just fixed too.

It's kinda like PR 103961, but not the same.  I'll file a new bug; I've got a
reduced reproducer too.

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #10 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #9 from Jakub Jelinek  ---
> Created attachment 52180
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52180&action=edit
> gcc12-pr104006.patch
>
> So yet another version.

That one's quite promising: only one error left on both sparc and x86:

/vol/gcc/src/hg/master/local/libgfortran/ieee/ieee_arithmetic.F90:33:7:

   33 |   use IEEE_EXCEPTIONS
  |   1
Fatal Error: Cannot open module file ‘ieee_exceptions.mod’ for reading at (1):
No such file or directory
compilation terminated.
make[1]: *** [Makefile:7643: ieee_arithmetic.lo] Error 1

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #16 from Jakub Jelinek  ---
Perhaps if we punt for -Og caller (and maybe -Og callees) on IPA inlining
except for always_inline, we could set some flag if IPA inlining happened and
schedule some extra cleanup passes just for those rare cases?
Though arguably, if a call to always_inline function was indirect during
einline,  we don't need to guarantee that it will be inlined.
But, what about -Og -fno-early-inlining?

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #11 from Jakub Jelinek  ---
That is weird.
We have:
ieee_arithmetic.lo: ieee/ieee_arithmetic.F90 ieee_exceptions.lo
dependency and ieee_exceptions.mod is created when compiling
ieee_exceptions.lo.

[Bug tree-optimization/104002] ICE ‘verify_gimple’ failed since r12-1128-gef8176e0fac935c0

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104002

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f45a2232bc8d6b88f52859cac502611395f3caf5

commit r12-6551-gf45a2232bc8d6b88f52859cac502611395f3caf5
Author: Richard Biener 
Date:   Thu Jan 13 11:55:14 2022 +0100

c/104002 - shufflevector variable indexing

Variable indexing of a __builtin_shufflevector result is broken because
we fail to properly mark the TARGET_EXPR decl as addressable.

2022-01-13  Richard Biener  

PR c/104002
gcc/c-family/
* c-common.c (c_common_mark_addressable_vec): Handle TARGET_EXPR.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: Move ...
* c-c++-common/torture/builtin-shufflevector-3.c: ... here.

[Bug tree-optimization/104002] ICE ‘verify_gimple’ failed since r12-1128-gef8176e0fac935c0

2022-01-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104002

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug tree-optimization/103989] [12 regression] std::optional and bogus -Wmaybe-unitialized at -Og since r12-1992-g6feb628a706e86eb

2022-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989

--- Comment #17 from rguenther at suse dot de  ---
On Thu, 13 Jan 2022, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103989
> 
> --- Comment #16 from Jakub Jelinek  ---
> Perhaps if we punt for -Og caller (and maybe -Og callees) on IPA inlining
> except for always_inline, we could set some flag if IPA inlining happened and
> schedule some extra cleanup passes just for those rare cases?

I'd rather not.  At some point I wanted to refactor the main opt
pipeline to work like this and skip some of the early passes there
if we did _not_ inline ...

> Though arguably, if a call to always_inline function was indirect during
> einline,  we don't need to guarantee that it will be inlined.
> But, what about -Og -fno-early-inlining?

-Og -fno-early-inline will still do always inline inlining early.

  /* Even when not optimizing or not inlining inline always-inline
 functions.  */
  inlined = inline_always_inline_functions (node);

[Bug libfortran/104006] [12 regression] power-ieee128 merge breaks Solaris build

2022-01-13 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104006

--- Comment #12 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #11 from Jakub Jelinek  ---
> That is weird.
> We have:
> ieee_arithmetic.lo: ieee/ieee_arithmetic.F90 ieee_exceptions.lo
> dependency and ieee_exceptions.mod is created when compiling
> ieee_exceptions.lo.

Here's what I see: ieee_exceptions.mod is missing.

$ make -n ieee_arithmetic.lo
/bin/ksh ./libtool  --tag=FC   --mode=compile
/var/gcc/regression/master/11.4-gcc/build/./gcc/gfortran
-B/var/gcc/regression/master/11.4-gcc/build/./gcc/
-B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/
-isystem /vol/gcc/i386-pc-solaris2.11/include -isystem
/vol/gcc/i386-pc-solaris2.11/sys-include-DHAVE_CONFIG_H -I.
-I/vol/gcc/src/hg/master/local/libgfortran 
-iquote/vol/gcc/src/hg/master/local/libgfortran/io
-I/vol/gcc/src/hg/master/local/libgfortran/../gcc
-I/vol/gcc/src/hg/master/local/libgfortran/../gcc/config
-I/vol/gcc/src/hg/master/local/libgfortran/../libquadmath -I../.././gcc
-I/vol/gcc/src/hg/master/local/libgfortran/../libgcc -I../libgcc
-I/vol/gcc/src/hg/master/local/libgfortran/../libbacktrace -I../libbacktrace
-I../libbacktrace  -I . -Wall -Werror -fimplicit-none -fno-repack-arrays
-fno-underscoring   -Wno-unused-dummy-argument -Wno-c-binding-type
-ffree-line-length-0 -fallow-leading-underscore -fsignaling-nans
-fbuilding-libgfortran -g -O2 -c -o ieee_arithmetic.lo
/vol/gcc/src/hg/master/local/libgfortran/ieee/ieee_arithmetic.F90
$ make -n ieee_exceptions.mod
:
$ make -n ieee_exceptions.lo
make: Nothing to be done for 'ieee_exceptions.lo'.

ieee_exceptions.lo doesn't exist either.

[Bug tree-optimization/104009] New: r12-6030-g422f9eb7011b76c1 breaks kernel build

2022-01-13 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104009

Bug ID: 104009
   Summary: r12-6030-g422f9eb7011b76c1 breaks kernel build
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: siddhesh at gcc dot gnu.org
  Target Milestone: ---

Reproducer gleaned from the kernel:

const char *
nlmdbg_cookie2a(unsigned n, char **data)
{
  static char buf[255];
  unsigned int i, len = sizeof(buf);
  char *p = buf;

  len--;  /* allow for trailing \0 */
  for (i = 0 ; i < n ; i++)
{
  if (len < 2)
{
  __builtin___strcpy_chk(p-3, "...", __builtin_object_size (p-3, 1));
  break;
}
  p += 2;
  len -= 2;
}
  *p = '\0';

  return buf;
}

$ cat repr.c.031t.early_objsz 

;; Function nlmdbg_cookie2a (nlmdbg_cookie2a, funcdef_no=0, decl_uid=1980,
cgraph_uid=1, symbol_order=0)

Computing maximum subobject size for _1:
Visiting use-def links for _1
Visiting use-def links for p_6
Visiting use-def links for p_9
Visiting use-def links for p_14
Found a dependency loop at p_6
Need to reexamine p_14
Need to reexamine p_6
Need to reexamine _1
Visiting use-def links for _1
Need to reexamine _1
Reexamining _1
Visiting use-def links for p_6
Need to reexamine p_6
Reexamining p_6
Visiting use-def links for p_14
Need to reexamine p_14
Reexamining p_14
_1: maximum subobject size 0
p_6: maximum subobject size 255
p_9: maximum subobject size 255
p_14: maximum subobject size 253
const char * nlmdbg_cookie2a (unsigned int n, char * * data)
{
  char * p;
  unsigned int len;
  unsigned int i;
  static char buf[255];
  char * _1;
  long unsigned int _2;
  char * _3;
  const char * _19;
  long unsigned int _20;

   :
  len_8 = 255;
  p_9 = &buf;
  len_10 = len_8 + 4294967295;
  i_11 = 0;
  goto ; [INV]

   :
  if (len_5 <= 1)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = p_6 + 18446744073709551613;
  _20 = __builtin_object_size (_1, 1);
  _2 = MIN_EXPR <_20, 0>;
  _3 = p_6 + 18446744073709551613;
  __builtin___memcpy_chk (_3, "...", 4, _2);
  goto ; [INV]

   :
  p_14 = p_6 + 2;
  len_15 = len_5 + 4294967294;
  i_16 = i_4 + 1;

   :
  # i_4 = PHI 
  # len_5 = PHI 
  # p_6 = PHI 
  if (i_4 < n_12(D))
goto ; [INV]
  else
goto ; [INV]

   :
  *p_6 = 0;
  _19 = &buf;
  return _19;

}


Basically since p_6 is an estimate (i.e. the wholesize) and not a precise
value, negative offsets don't quite work.  I need to figure out a way to punt
on negative offsets if we're using size estimates instead of precise sizes. 
This means that it'll work for dynamic object sizes (because at the moment
they're always precise expressions) but not always for static sizes.

Right now it breaks for dynamic object sizes too, but that's only because
early_objsz treats __builtin_dynamic_object_size as __builtin_object_size to
get an upper bound and ends up zeroing it.  So punting should fix that too.

[Bug tree-optimization/104009] r12-6030-g422f9eb7011b76c1 breaks kernel build

2022-01-13 Thread siddhesh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104009

Siddhesh Poyarekar  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |siddhesh at gcc dot 
gnu.org
   Priority|P3  |P1
 CC||marxin at gcc dot gnu.org
   Last reconfirmed||2022-01-13
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Siddhesh Poyarekar  ---
Bumping to P1 since we want to be able to build the kernel with fortification.

[Bug c++/103672] [10/11/12 Regression] using with template class> causes internal compiler error

2022-01-13 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103672

Patrick Palka  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=91911

--- Comment #2 from Patrick Palka  ---
Closely related to PR91911 due to the use of CTAD + alias template

[Bug tree-optimization/104009] [12 Regression] r12-6030-g422f9eb7011b76c1 breaks kernel build

2022-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104009

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |12.0
 CC||jakub at gcc dot gnu.org
Summary|r12-6030-g422f9eb7011b76c1  |[12 Regression]
   |breaks kernel build |r12-6030-g422f9eb7011b76c1
   ||breaks kernel build

--- Comment #2 from Jakub Jelinek  ---
Well, it is a P1 just from being a regression from 11 on a primary or secondary
platform (and even more so as it is wrong-code).

[Bug tree-optimization/104010] New: [12 regression] short loop no longer vectorized with Neon after r12-6513

2022-01-13 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

Bug ID: 104010
   Summary: [12 regression] short loop no longer vectorized with
Neon after r12-6513
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

This short loop:
void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
  int i;
  for (i=0; i<4; i++) {
dest[i] = a[i] == b[i];
  }
}

used to be vectorized as:
test_vcmpeq_s32x2:
vld1.32 {d16}, [r1]
vmov.i32d17, #0x1  @ v2si
vld1.32 {d19}, [r2]
vmov.i32d18, #0  @ v2si
vceq.i32d16, d16, d19
vbsld16, d17, d18
vst1.32 {d16}, [r0]
bx  lr

After r12-6513, we get:
test_vcmpeq_s32x2:
ldr ip, [r1]
ldr r3, [r1, #4]
str lr, [sp, #-4]!
ldr lr, [r2]
ldr r2, [r2, #4]
sub ip, ip, lr
clz ip, ip
sub r3, r3, r2
lsr ip, ip, #5
clz r3, r3
lsr r3, r3, #5
str ip, [r0]
str r3, [r0, #4]
ldr pc, [sp], #4

when compiling for arm-none-linux-gnueabihf with -mcpu=cortex-a9 -mfpu=neon

[Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-6513

2022-01-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

--- Comment #1 from Andrew Pinski  ---
I think you have the wrong revision in there as that commit only adds a
testcase.

  1   2   >