[Bug target/100320] [8/9/10/11/12 Regression] 32-bit x86 memcpy is suboptimal

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Keywords||missed-optimization
 Target||i?86-*-*

[Bug target/100329] New: ICE: verify_ssa failed (error: definition in block 3 does not dominate use in block 4)

2021-04-29 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100329

Bug ID: 100329
   Summary: ICE: verify_ssa failed (error: definition in block 3
does not dominate use in block 4)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---
Target: aarch64-linux-gnu

gcc-11.0.1-alpha20210426 snapshot (g:d3212299e2cfc3c16dd23bab26ec6c49024105f8)
ICEs when compiling the following testcase, reduced from
gcc/testsuite/gcc.c-torture/compile/asmgoto-5.c, w/ -O1 -fwrapv:

int a0;

int
foo (int a1, int a2)
{
  int x;

  asm goto ("" : "=r" (x) : : : lab);
  a0 = x;

 lab:
  return x + a1 + a2 + 1;
}

% aarch64-linux-gnu-gcc-11.0.1 -O1 -fwrapv -c gugaqse9.c
gugaqse9.c: In function 'foo':
gugaqse9.c:4:1: error: definition in block 3 does not dominate use in block 4
4 | foo (int a1, int a2)
  | ^~~
for SSA_NAME: _2 in statement:
_9 = _1 + _2;
during GIMPLE pass: reassoc
gugaqse9.c:4:1: internal compiler error: verify_ssa failed
0x112e39e verify_ssa(bool, bool)
   
/var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210426/work/gcc-11-20210426/gcc/tree-ssa.c:1214
0xdfd118 execute_function_todo
   
/var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210426/work/gcc-11-20210426/gcc/passes.c:2049
0xdfd711 do_per_function
   
/var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210426/work/gcc-11-20210426/gcc/passes.c:1687
0xdfd711 execute_todo
   
/var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210426/work/gcc-11-20210426/gcc/passes.c:2096

GCC 10 rejects this code.

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread torel at simula dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #2 from Tor  ---
Same issue on gcc-11.1.0.  Can't pick D language on x86_64/amd64 platform.

make[2]: Leaving directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/zlib'
make[2]: Entering directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos'
make "AR_FLAGS=rc" "CC_FOR_BUILD=x86_64-linux-gnu-gcc"
"CC_FOR_TARGET=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/xgcc
-B/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/"
"CCASFLAGS=-g -O2" "CFLAGS=-g -O2" "CXXFLAGS=-g -O2 -D_GNU_SOURCE"
"CFLAGS_FOR_BUILD=-g -O2" "CFLAGS_FOR_TARGET=-g -O2"
"GDC_FOR_TARGET=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/gdc
-B/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/"
"GDC=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/gdc
-B/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/
-B/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/bin/
-B/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/lib/ -isystem
/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/include -isystem
/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/sys-include   -fchecking=1"
"GDCFLAGS=-O2 -g" "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install
-c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c"
"INSTALL_SCRIPT=/usr/bin/install -c" "LDFLAGS=" "LIBCFLAGS=-g -O2"
"LIBCFLAGS_FOR_TARGET=-g -O2" "MAKE=make" "MAKEINFO=makeinfo
--split-size=500 --split-size=500 " "PICFLAG=" "PICFLAG_FOR_TARGET="
"SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/cm/shared/apps/gcc/11.1.0"
"infodir=/cm/shared/apps/gcc/11.1.0/share/info"
"libdir=/cm/shared/apps/gcc/11.1.0/usr/lib"
"includedir=/cm/shared/apps/gcc/11.1.0/include"
"prefix=/cm/shared/apps/gcc/11.1.0"
"tooldir=/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu"
"gdc_include_dir=/cm/shared/apps/gcc/11.1.0/usr/lib/gcc/x86_64-linux-gnu/11/include/d"
"AR=x86_64-linux-gnu-ar"
"AS=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/as"
"LD=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/collect-ld"
"RANLIB=x86_64-linux-gnu-ranlib"
"NM=/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/nm"
"NM_FOR_BUILD=" "NM_FOR_TARGET=x86_64-linux-gnu-nm" "DESTDIR=" "WERROR="
all-recursive
make[3]: Entering directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos'
Making all in libdruntime
make[4]: Entering directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos/libdruntime'
/bin/sh ../libtool --tag=D   --mode=compile
/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/gdc
-B/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/./gcc/
-B/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/bin/
-B/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/lib/ -isystem
/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/include -isystem
/cm/shared/apps/gcc/11.1.0/x86_64-linux-gnu/sys-include   -fchecking=1
-prefer-pic -fversion=Shared -Wall  -frelease  -ffunction-sections
-fdata-sections -fcf-protection -mshstk -fversion=CET -O2 -g  -nostdinc -I
../../../../libphobos/libdruntime -I . -c -o core/atomic.lo
../../../../libphobos/libdruntime/core/atomic.d
/bin/sh: ../libtool: No such file or directory
Makefile:2338: recipe for target 'core/atomic.lo' failed
make[4]: *** [core/atomic.lo] Error 127
make[4]: Leaving directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos/libdruntime'
Makefile:484: recipe for target 'all-recursive' failed
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos'
Makefile:411: recipe for target 'all' failed
make[2]: *** [all] Error 2
make[2]: Leaving directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64/x86_64-linux-gnu/libphobos'
Makefile:23601: recipe for target 'all-target-libphobos' failed
make[1]: *** [all-target-libphobos] Error 2
make[1]: Leaving directory
'/home/torel/workspace/GCC/gcc-11_1_0-release/Build-x86_64'
Makefile:1001: recipe for target 'all' failed
make: *** [all] Error 2
torel@srl-login1:~/workspace/GCC/gcc-11_1_0-release/Build-x86_64$

[Bug middle-end/100323] #pragma and attribute optimize don't enable inlining

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100323

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org
   Last reconfirmed||2021-04-29
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Note in your example 'f' is compiled with -O0 so it wouldn't be inlined
because there's a mismatch in optimize options.

But the real problem is that IPA and early inlining look at the global
'optimize' and -finline settings.  For the early inliner that's possible to
fix.

Fixed testcase:

__attribute__ ((optimize (2)))
static int f (void) { return 0; }

__attribute__ ((optimize (2)))
int g (void) { return f (); }

#pragma GCC optimize ("2")
int h (void) { return f (); }

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #3 from Richard Biener  ---
/bin/sh: ../libtool: No such file or directory

that's odd.  What's your host operating system, in particular what shell
is /bin/sh?

You'd need to see what (and why) libphobos picks up as @LIBTOOL@

[Bug c++/100326] Crash with `#pragma GCC unroll` when calling value which can't be called in template function

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100326

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||ice-on-invalid-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-29

--- Comment #1 from Richard Biener  ---
Confirmed.

#0  0x00d66842 in cp_parser_pragma_unroll (parser=0x765977b8, 
pragma_tok=0x77f2f0c8)
at /home/rguenther/src/gcc3/gcc/cp/parser.c:44896
44896 if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
(gdb) l
44891 location_t location = cp_lexer_peek_token (parser->lexer)->location;
44892 tree expr = cp_parser_constant_expression (parser);
44893 unsigned short unroll;
44894 expr = maybe_constant_value (expr);
44895 HOST_WIDE_INT lunroll = 0;
44896 if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
44897 || TREE_CODE (expr) != INTEGER_CST
44898 || (lunroll = tree_to_shwi (expr)) < 0
44899 || lunroll >= USHRT_MAX)
44900   {
(gdb) p debug_tree (expr)
 >
used VOID t.ii:1:31
align:8 warn_if_not_align:0 context >
t.ii:2:25 start: t.ii:2:24 finish: t.ii:2:26>

so we parsed this to an invalid GENERIC CALL_EXPR with a NULL TREE_TYPE

(gdb) p expr->typed.type
$3 = 

it should have at least been error_mark_node (or void_type_node).

[Bug bootstrap/100327] [12 regression] bootstrap failure after r12-228

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100327

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Target Milestone|--- |12.0

--- Comment #2 from Richard Biener  ---
I think backends or the FE are supposed to pre-define __LIBGCC_KF_* if
-fbuilding-libgcc

See c-family/c-cppbuiltin.c where it checks for flag_building_libgcc.

Not sure why that doesn't pick up KFmode?

[Bug bootstrap/100327] [12 regression] bootstrap failure after r12-228

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100327

--- Comment #3 from Richard Biener  ---
Possibly

  if (!targetm.scalar_mode_supported_p (mode)
  || !targetm.libgcc_floating_mode_supported_p (mode))
continue;

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread torel at simula dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #4 from Tor  ---
(In reply to Richard Biener from comment #3)

> that's odd.  What's your host operating system, in particular what shell
> is /bin/sh?

Ubuntu 18.04.5LTS on any arch. Interestingly, on aarch it is dash while on
where it is not working it is /bin/bash

torel@srl-login1:~$ uname -ar
Linux srl-login1 4.15.0-142-generic #124-Ubuntu SMP Thu Oct 15 13:03:05 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux

torel@srl-login1:~$ ll /bin/sh
lrwxrwxrwx 1 root root 4 Mar 18  2019 /bin/sh -> bash*
torel@srl-login1:~$ srun --partition=armq --ntasks=1 --nodes=1
--cpus-per-task=256 --pty /bin/bash --login
srun: job 155896 queued and waiting for resources
srun: job 155896 has been allocated resources
You are on an Aarch64 node

torel@n005:~$ ll /bin/sh
lrwxrwxrwx 1 root root 4 Apr 30  2019 /bin/sh -> dash*

torel@n005:~$ uname -ar
Linux n005 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:08:49 UTC 2021
aarch64 aarch64 aarch64 GNU/Linux
torel@n005:~$

[Bug c++/61592] ICE with large array with initialization

2021-04-29 Thread sbence92 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61592

Bence Szabó  changed:

   What|Removed |Added

 CC||sbence92 at gmail dot com

--- Comment #2 from Bence Szabó  ---
A variant of the is still an issue even with GCC 11.

The below is a reduced version of a file that took 33 minutes(!) to compile,
cc1 starting from 1GB of memory usage, reaching 2GB eventually.
This example generates 10MB of assembly as it unrolls the constructor calls
even with O0.

struct B
{
   B() {}
};

struct A
{

   B b[128][1875];
};

A a{};


I've tried to find a similar PR, this was the closest one.
Is this a known issue?

[Bug tree-optimization/100329] ICE: verify_ssa failed (error: definition in block 3 does not dominate use in block 4)

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100329

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code,
   ||needs-bisection
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-04-29
  Component|target  |tree-optimization

--- Comment #1 from Richard Biener  ---
Looks like a latent issue, at least I didn't find sth that would avoid this
in other control-flow situations.  Eventually rank computation does
but then high reassoc-width can still cause association in a way breaking
SSA dominance.

[local count: 1073741824]:
+  _1 = a2_8(D) + 1;
   __asm__ goto("" : "=r" x_4 :  :  : "lab" lab);

[local count: 536870913]:
+  _2 = x_4 + a1_7(D);
   a0 = x_4;

[local count: 1073741824]:
 lab:
-  _10 = a2_8(D) + 1;
-  _11 = a1_7(D) + _10;
-  _9 = x_4 + _11;
+  _9 = _1 + _2;
   return _9;

So the issue is that we're using 'insert_stmt_before_use' even for 'leafs'
(and thus uses outside of the chain we're rewriting).

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Tom de Vries :

https://gcc.gnu.org/g:4d7c874e2c64ebf7631049ace642d246843febae

commit r12-249-g4d7c874e2c64ebf7631049ace642d246843febae
Author: Tom de Vries 
Date:   Wed Apr 28 16:00:01 2021 +0200

[omp, simt] Fix expand_GOMP_SIMT_*

When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2021-04-28  Tom de Vries  

PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #5 from Iain Buclaw  ---
(In reply to Tor from comment #0)
> make -j 64
> 
Never had an issue with parallel builds up to -j16. Won't have the hardware to
even attempt -j64 for another fortnight.

Minimal libtool support is in libphobos/m4/libtool.m4, however I don't think
there'd be anything there that might explain why libtool is called before it is
created.

[Bug middle-end/38474] compile time explosion in dataflow_set_preserve_mem_locs at -O3

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #101 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:c57a8aea0c3ab8394f7dbfa417ee27b4613f63b7

commit r12-280-gc57a8aea0c3ab8394f7dbfa417ee27b4613f63b7
Author: Richard Biener 
Date:   Thu Apr 29 08:32:00 2021 +0200

middle-end/38474 - speedup PTA constraint solving

In testcases like PR38474 and PR99912 we're seeing very slow
PTA solving.  One can observe an excessive amount of forwarding,
mostly during sd constraint solving.  The way we solve the graph
does not avoid forwarding the same bits through multiple paths,
and especially when such alternate path involves ESCAPED as
intermediate this causes the ESCAPED solution to be expanded
in receivers.

The following adds heuristic to add_graph_edge which adds
forwarding edges but also guards the initial solution forwarding
(which is the expensive part) to detect the case of ESCAPED
receiving the same set and the destination already containing
ESCAPED.

This speeds up the PTA solving process by more than 50%.

2021-04-29  Richard Biener  

PR middle-end/38474
* tree-ssa-structalias.c (add_graph_edge): Avoid direct
forwarding when indirect forwarding through ESCAPED
alread happens.

[Bug target/100305] [10/11/12 Regression] aarch64: ICE in output_operand_lossage with -O3

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100305

--- Comment #10 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:cd0a059bd384da58d43674496a79ecb7de610800

commit r11-8323-gcd0a059bd384da58d43674496a79ecb7de610800
Author: Richard Sandiford 
Date:   Thu Apr 29 09:27:52 2021 +0100

aarch64: Fix address mode for vec_concat pattern [PR100305]

The load_pair_lanes patterns match a vec_concat of two
adjacent 64-bit memory locations as a single 128-bit load.
The Utq constraint made sure that the address was suitable
for a 128-bit vector, but this meant that it allowed some
addresses that aren't valid for the 64-bit element mode.

Two obvious fixes were:

(1) Continue to accept addresses that aren't valid for the element
modes.  This would mean changing the mode of operands[1] before
printing it.  It would also mean using a custom predicate instead
of the current memory_operand.

(2) Restrict addresses to the intersection of those that are valid
element and vector addresses.

The problem with (1) is that, as well as being more complicated,
it doesn't deal with the fact that we still have a memory_operand
for the second element.  If we encourage the first operand to be
outside the range of a normal element memory_operand, we'll have
to reload the second operand to make it valid.  This reload will
often be dead code, but will be kept around because the RTL
pattern makes it look as though the second element address
is still needed.

This patch therefore does (2) instead.

As mentioned in the PR notes, I think we have a general problem
with the way that the aarch64 port deals with paired addresses.
There's nothing to guarantee that the two addresses will be
reloaded in a way that keeps them âobviouslyâ adjacent, so the
rtx_equal_p conditions could fail if something rechecked them
later.

For this particular pattern, I think it would be better to teach
simplify-rtx.c to fold the vec_concat to a normal vector memory
reference, to remove any suggestion that targets should try to
match the unsimplified form.  That obviously wouldn't be suitable
for backports though.

gcc/
PR target/100305
* config/aarch64/constraints.md (Utq): Require the address to
be valid for both the element mode and for V2DImode.

gcc/testsuite/
PR target/100305
* gcc.c-torture/compile/pr100305.c: New test.

(cherry picked from commit 668df9e769e7d89bcefa07f72b68dcae9a8f3970)

[Bug target/100270] [10/11 Backport] _Generic can't distinguish VLS SVE vectors and GNU vectors

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100270

--- Comment #3 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Richard Sandiford
:

https://gcc.gnu.org/g:dfaa29b5441689ce05e3c09012d3afe269770e94

commit r11-8322-gdfaa29b5441689ce05e3c09012d3afe269770e94
Author: Richard Sandiford 
Date:   Thu Apr 29 09:27:51 2021 +0100

aarch64: Handle SVE attributes in comp_type_attributes [PR100270]

Even though "SVE type" and "SVE sizeless type" are marked as
affecting type identity, the middle end doesn't truly believe
it unless we also handle them in comp_type_attributes.

gcc/
PR target/100270
* config/aarch64/aarch64.c (aarch64_comp_type_attributes): Handle
SVE attributes.

gcc/testsuite/
PR target/100270
* gcc.target/aarch64/sve/acle/general-c/pr100270_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Change
expected error message when subtracting pointers to different
vector types.  Expect warnings when mixing them elsewhere.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Remove
XFAILs.  Tweak error messages for some cases.

(cherry picked from commit 4cea5b8cb715e40e10174e6de405f26202fa3d6a)

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #7 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Tom de Vries
:

https://gcc.gnu.org/g:f94c6caac7f03815c26c03a532f834c37517519c

commit r11-8324-gf94c6caac7f03815c26c03a532f834c37517519c
Author: Tom de Vries 
Date:   Wed Apr 28 16:00:01 2021 +0200

[omp, simt] Fix expand_GOMP_SIMT_*

When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2021-04-28  Tom de Vries  

PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

(cherry picked from commit 4d7c874e2c64ebf7631049ace642d246843febae)

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

Jonathan Wakely  changed:

   What|Removed |Added

 CC||aaron at aarongraham dot com

--- Comment #3 from Jonathan Wakely  ---
*** Bug 100322 has been marked as a duplicate of this bug. ***

[Bug c++/100322] Switching from std=c++17 to std=c++20 causes performance regression in relationals

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100322

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #8 from Jonathan Wakely  ---
Ah yes, thanks, Marc.

*** This bug has been marked as a duplicate of bug 94589 ***

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=94006

--- Comment #4 from Jonathan Wakely  ---
PR 100322 shows that this missed-optimization causes a regression for code
using std::chrono::duration types. Since C++20 their comparisons use
operator<=> and so produce much worse code than the same source compiled as
C++17.

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |11.2
 Resolution|--- |FIXED

--- Comment #8 from Tom de Vries  ---
I tried backporting to releases/gcc-10, but ran into:
...
FAIL: libgomp.c/target-43.c (test for excess errors)
Excess errors:
unresolved symbol __sync_val_compare_and_swap_1
mkoffload: fatal error:
/home/vries/oacc/trunk/install/offload-nvptx-none/bin//x86_64-pc-linux-gnu-accel-nvptx-none-gcc
returned 1 exit status
compilation terminated.
...

So I guess backporting stops at gcc-11.

[Bug c++/100319] Incorrect check for detach clause argument in data-sharing clauses

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100319

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:1b462deabf70e0f4bebb1f85118827d9c2eeffb5

commit r12-281-g1b462deabf70e0f4bebb1f85118827d9c2eeffb5
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:11:37 2021 +0200

c++: Fix up detach clause vs. data-sharing clause checking [PR100319]

The standard says that "The event-handle will be considered as if it
was specified on a firstprivate clause." which means that it can't
be explicitly specified in some other data-sharing clause.
The checking is implemented correctly for C, but for C++ when detach_seen
is true (i.e. the construct had detach clause) we were comparing
OMP_CLAUSE_DECL (c) with t, which was previously initialized to
OMP_CLAUSE_DECL (c), which means it complained about any explicit
data-sharing clause on the same construct with a detach clause.

Fixed by remembering the detach clause in detach_seen (instead of a boolean
flag) and comparing against its OMP_CLAUSE_DECL.

2021-04-29  Jakub Jelinek  

PR c++/100319
* semantics.c (finish_omp_clauses): Fix up check that variable
mentioned in detach clause doesn't appear in data-sharing clauses.

* c-c++-common/gomp/task-detach-3.c: New test.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #5 from Jonathan Wakely  ---
(In reply to Marc Glisse from comment #0)
> For most comparisons @ we do optimize (i<=>0)@0 to just i@0, but not for >
> and <=. Spaceship operator<=> is very painful to use, but I expect we will
> end up seeing a lot of it with C++20, and comparing its result with 0 is
> almost the only way to use its output, so it seems important to optimize
> this common case.

Maybe even common enough to transform it in the FE.


(In reply to Victor Khimenko from comment #2)
> Note that gcc looks bad even on the example from Microsoft's blog post:
> 
> https://godbolt.org/z/Jc7TcN


Right, this doesn't only affect (i<=>0)@0 but also (i<=>j)@0 which is very
common, because it's what the FE synthesizes for operator@ when the type only
defines operator<=>


https://godbolt.org/z/19dM8PdaM

#include  

struct X
{
int i = 0;
auto operator<=>(const X&) const = default;
};

bool lt(X l, X r) { return l once is easier than defining all of < > <= >= but the code is bad:

lt(X, X):
xor eax, eax
cmp edi, esi
je  .L2
setge   al
lea eax, [rax-1+rax]
.L2:
shr al, 7
ret
lt(Y, Y):
cmp edi, esi
setlal
ret

It seems like it should be possible for the FE to recognize that in this
trivial case the defaulted <=> is just comparing integers, and therefore the
synthesized op< could be transformed from (l.i <=> r.i) < 0 to simply l.i < r.i

The FE is not required to synthesize exactly (l.i<=>r.i) @ 0 as long as the
result is correct, so l.i @ r.i would be OK (and avoids the poor codegen until
the missed-optimization gets done).

[Bug c++/100330] New: operator bool() is used when operator<() is available to do comparison

2021-04-29 Thread yin.li at bytedance dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100330

Bug ID: 100330
   Summary: operator bool() is used when operator<() is available
to do comparison
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yin.li at bytedance dot com
  Target Milestone: ---

code:
#include 
#include 

class vertex_descriptor {
public:
vertex_descriptor() : p(nullptr) {}
vertex_descriptor(void *pp) : p(pp) {}

operator bool() const {
std::cout << __func__ << std::endl;
return p;
}

bool operator<(const vertex_descriptor b) const {
std::cout << __func__ << std::endl;
return p < b.p;
}
private:
void *p;
};

int main()
{
std::map, vertex_descriptor> vs;
vertex_descriptor v1(nullptr), v2((void *)1);

vs[std::make_pair(v1, 0)] = v1;
vs[std::make_pair(v2, 0)] = v2;
return 0;
}

result:
~ $ g++ --std=c++17 1.cpp && ./a.out
operator<
operator<
operator<
operator<
~ $ g++ --std=c++20 1.cpp && ./a.out
operator bool
operator bool
operator bool
operator bool
operator bool
operator bool
~ $ g++ --version
g++ (Ubuntu 10.2.0-13ubuntu1) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/100302] [11/12 Regression] ICE in abs_hwi, at hwint.h:324 since r11-7861

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100302

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:1bb3e2c0ce6ed363c72caf814a6ba6d7b17c3e0a

commit r12-282-g1bb3e2c0ce6ed363c72caf814a6ba6d7b17c3e0a
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:34:50 2021 +0200

aarch64: Fix ICE in aarch64_add_offset_1_temporaries [PR100302]

In PR94121 I've changed aarch64_add_offset_1 to use absu_hwi instead of
abs_hwi because offset can be HOST_WIDE_INT_MIN.  As can be seen with
the testcase below, aarch64_add_offset_1_temporaries suffers from the same
problem and should be in sync with aarch64_add_offset_1, i.e. for
HOST_WIDE_INT_MIN it needs a temporary.

2021-04-29  Jakub Jelinek  

PR target/100302
* config/aarch64/aarch64.c (aarch64_add_offset_1_temporaries): Use
absu_hwi instead of abs_hwi.

* gcc.target/aarch64/sve/pr100302.c: New test.

[Bug c++/100319] Incorrect check for detach clause argument in data-sharing clauses

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100319

--- Comment #3 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:50c826db7a3ee672be2a0b41a937651e1834a837

commit r11-8325-g50c826db7a3ee672be2a0b41a937651e1834a837
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:11:37 2021 +0200

c++: Fix up detach clause vs. data-sharing clause checking [PR100319]

The standard says that "The event-handle will be considered as if it
was specified on a firstprivate clause." which means that it can't
be explicitly specified in some other data-sharing clause.
The checking is implemented correctly for C, but for C++ when detach_seen
is true (i.e. the construct had detach clause) we were comparing
OMP_CLAUSE_DECL (c) with t, which was previously initialized to
OMP_CLAUSE_DECL (c), which means it complained about any explicit
data-sharing clause on the same construct with a detach clause.

Fixed by remembering the detach clause in detach_seen (instead of a boolean
flag) and comparing against its OMP_CLAUSE_DECL.

2021-04-29  Jakub Jelinek  

PR c++/100319
* semantics.c (finish_omp_clauses): Fix up check that variable
mentioned in detach clause doesn't appear in data-sharing clauses.

* c-c++-common/gomp/task-detach-3.c: New test.

(cherry picked from commit 1b462deabf70e0f4bebb1f85118827d9c2eeffb5)

[Bug target/100302] [11/12 Regression] ICE in abs_hwi, at hwint.h:324 since r11-7861

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100302

--- Comment #6 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:5ac1313f32c5cd875ad047f6575dd4608e1949cf

commit r11-8326-g5ac1313f32c5cd875ad047f6575dd4608e1949cf
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:34:50 2021 +0200

aarch64: Fix ICE in aarch64_add_offset_1_temporaries [PR100302]

In PR94121 I've changed aarch64_add_offset_1 to use absu_hwi instead of
abs_hwi because offset can be HOST_WIDE_INT_MIN.  As can be seen with
the testcase below, aarch64_add_offset_1_temporaries suffers from the same
problem and should be in sync with aarch64_add_offset_1, i.e. for
HOST_WIDE_INT_MIN it needs a temporary.

2021-04-29  Jakub Jelinek  

PR target/100302
* config/aarch64/aarch64.c (aarch64_add_offset_1_temporaries): Use
absu_hwi instead of abs_hwi.

* gcc.target/aarch64/sve/pr100302.c: New test.

(cherry picked from commit 1bb3e2c0ce6ed363c72caf814a6ba6d7b17c3e0a)

[Bug target/100302] [11/12 Regression] ICE in abs_hwi, at hwint.h:324 since r11-7861

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100302

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jakub Jelinek  ---
Fixed for 11.2 and on the trunk.

[Bug c++/100319] Incorrect check for detach clause argument in data-sharing clauses

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100319

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |11.2

--- Comment #4 from Jakub Jelinek  ---
Fixed for 11.2 and on the trunk.

[Bug target/100305] [10 Regression] aarch64: ICE in output_operand_lossage with -O3

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100305

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:62a44a9797edce11b1f7051ea0016ee975d41233

commit r12-283-g62a44a9797edce11b1f7051ea0016ee975d41233
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:42:08 2021 +0200

testsuite: Remove dg-options from pr100305.c [PR100305]

The test FAILs on i?86-linux (due to -Wpsabi warnings).  But, on closer
inspection it seems there is another problem, the dg-options in the
testcase
means that the test is compiled with -O0 -O, -O1 -O, -O2 -O, -O3 -O, -Os -O
etc. options, so effectively is tested multiple times with the same
options.

Fixed by dropping the dg-options line, then we have -w by default and
iterate
over all the optimization levels (including the -O).

2021-04-29  Jakub Jelinek  

PR target/100305
* gcc.c-torture/compile/pr100305.c: Remove dg-options.  Add PR
line.

[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

--- Comment #6 from Richard Biener  ---
So the issue is we're getting a dataref pointer like

&MEM <__int128 unsigned> [(char * {ref-all})&s + 25B]

and the first access has DR_MISALIGNMENT of 9 and the target alignment is 16.
So we have

align == 16
misalign == 9

then we do

  data_ref = fold_build2 (MEM_REF, vectype,
  dataref_ptr,
  dataref_offset
  ? dataref_offset
  : build_int_cst (ref_type, 0));
  if (aligned_access_p (first_dr_info))
;
  else if (DR_MISALIGNMENT (first_dr_info) == -1)
TREE_TYPE (data_ref)
  = build_aligned_type (TREE_TYPE (data_ref),
align * BITS_PER_UNIT);
  else
TREE_TYPE (data_ref)
  = build_aligned_type (TREE_TYPE (data_ref),
TYPE_ALIGN (elem_type));

but since DR_MISALIGNMENT is not -1 we assume element alignment
(since DR_MISALIGNMENT is the misalign in elements and at least at
some point wasn't arbitrary ... unless I misremember).  Since
the vector type is vector(1) __int128 unsigned we get an aligned
access.  Note how we're using 'align' in the == -1 case but that's
the target alignment ...

The load code has the same issue.  I'm testing a simplification.

[Bug target/100305] [10 Regression] aarch64: ICE in output_operand_lossage with -O3

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100305

--- Comment #12 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:a515ce926b9d779922debc33ecad424c9ac22c65

commit r11-8327-ga515ce926b9d779922debc33ecad424c9ac22c65
Author: Jakub Jelinek 
Date:   Thu Apr 29 11:42:08 2021 +0200

testsuite: Remove dg-options from pr100305.c [PR100305]

The test FAILs on i?86-linux (due to -Wpsabi warnings).  But, on closer
inspection it seems there is another problem, the dg-options in the
testcase
means that the test is compiled with -O0 -O, -O1 -O, -O2 -O, -O3 -O, -Os -O
etc. options, so effectively is tested multiple times with the same
options.

Fixed by dropping the dg-options line, then we have -w by default and
iterate
over all the optimization levels (including the -O).

2021-04-29  Jakub Jelinek  

PR target/100305
* gcc.c-torture/compile/pr100305.c: Remove dg-options.  Add PR
line.

(cherry picked from commit 62a44a9797edce11b1f7051ea0016ee975d41233)

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread torel at simula dot no via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #6 from Tor  ---
(In reply to Iain Buclaw from comment #5)

> Never had an issue with parallel builds up to -j16. Won't have the hardware
> to even attempt -j64 for another fortnight.
> 
> Minimal libtool support is in libphobos/m4/libtool.m4, however I don't think
> there'd be anything there that might explain why libtool is called before it
> is created.

Built with -j 256 on aarch64 Cavium Tx2 CN9980 nodes. Worked like a charm.

Not sure why "dash" is ok, while "bash" is not!? Do you?

I will set default shell to dash on amd64/x86_64 and see how it goes. 

Thx.

[Bug tree-optimization/99954] [8 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

--- Comment #9 from CVS Commits  ---
The releases/gcc-8 branch has been updated by Richard Biener
:

https://gcc.gnu.org/g:ab620c241b61733c92a7596620b73af4f380b5e0

commit r8-10928-gab620c241b61733c92a7596620b73af4f380b5e0
Author: Richard Biener 
Date:   Wed Apr 7 13:17:05 2021 +0200

tree-optimization/99954 - fix loop distribution memcpy classification

This fixes bogus classification of a copy as memcpy.  We cannot use
plain dependence analysis to decide between memcpy and memmove when
it computes no dependence.  Instead we have to try harder later which
the patch does for the gcc.dg/tree-ssa/ldist-24.c testcase by resorting
to tree-affine to compute the difference between src and dest and
compare against the copy size.

2021-04-07  Richard Biener  

PR tree-optimization/99954
* tree-loop-distribution.c: Include tree-affine.h.
(generate_memcpy_builtin): Try using tree-affine to prove
non-overlap.
(loop_distribution::classify_builtin_ldst): Always classify
as PKIND_MEMMOVE.

* gcc.dg/torture/pr99954.c: New testcase.

(cherry picked from commit b091cb1efa1881e93fb2e264daaab8876acf6800)

[Bug tree-optimization/99954] [8 Regression] Copy loop over array of unions at -O3 generates memcpy instead of memmove, resulting in incorrect code

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99954

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||8.4.1
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Richard Biener  ---
Fixed.

[Bug c++/100330] operator bool() is used when operator<() is available to do comparison

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100330

--- Comment #1 from Jonathan Wakely  ---
Reduced:

#include 

extern "C" int puts(const char*);

struct vertex_descriptor {

operator bool() const {
puts(__func__);
return i;
}

bool operator<(const vertex_descriptor b) const {
puts(__func__);
return i < b.i;
}

int i = 0;
};

int main()
{
std::pair p1({1}, 0);
std::pair p2({2}, 0);
if (!(p1 < p2))
  throw 1;
}

This aborts in C++20. I think GCC is correct according to the standard.

In C++20 the operator< for std:pair is synthesized from operator<=>.

That uses the synth-three-way helper defined in
http://wg21.link/expos.only.func which determines that vertex_descriptor is
three-way-comparable, and so uses <=> to compare them. But that uses your
implicit conversion to bool, i.e. it is equivalent to p1.first==true <=>
p2.first==true

An explicit operator bool avoids the problem, because synth-three-way will not
implicitly convert the operands to bool, and so will do:

  if (p1.first < p2.first) return weak_ordering::less;
  if (p2.first < p1.first) return weak_ordering::greater;
  return weak_ordering::equivalent;

This uses vertex_descrioptor::operator< as intended.

[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552

2021-04-29 Thread mark at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217

--- Comment #13 from Mark Wielaard  ---
(In reply to Jakub Jelinek from comment #12)
> For valgrind, the quick workaround would be -march=z13 when compiling the
> s390x tests that have register long double variables.

Yes, this works, if fpext and pfpo are build with -march=z13 they compile (and
the tests pass under valgrind).

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

Richard Biener  changed:

   What|Removed |Added

  Known to work||11.1.0

--- Comment #6 from Richard Biener  ---
Note the original testcase is optimized with GCC 11

   [local count: 1073741824]:
  _2 = i_1(D) > 0;
  return _2;

but not on the GCC 10 branch.  Not sure what fixed it there.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #7 from Marc Glisse  ---
Some key steps in the optimization:
PRE turns PHI<-1,0,1> > 0 into PHI<0,0,1>
reassoc then combines the operations (it didn't in gcc-10)
forwprop+phiopt cleans up (i>0)!=0?1:0 into just i>0.

Having to wait until phiopt4 to get the simplified form is still very long, and
most likely causes missed optimizations in earlier passes. But nice progress!

[Bug target/100331] New: 128 bit arithmetic --- suboptimal after shifting when referencing other variables

2021-04-29 Thread zero at smallinteger dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100331

Bug ID: 100331
   Summary: 128 bit arithmetic --- suboptimal after shifting when
referencing other variables
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zero at smallinteger dot com
  Target Milestone: ---

Created attachment 50706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50706&action=edit
Reproduction test case

Compile the given C program with -O2.  Enabling the #if 1 branch results in:

compute(unsigned long, unsigned long):
mov ecx, edi
xor edx, edx
mov rax, rsi
xor esi, esi
and ecx, 63
shrdrax, rdx, cl
shr rdx, cl
testcl, 64
mov r8d, ecx
cmovne  rax, rdx
cmovne  rdx, rsi
and r8d, 63
mov rsi, rax
mov rax, r8
mov rdi, rdx
xor edx, edx
add rax, rsi
adc rdx, rdi
ret

Note test cl, 64 and the subsequent cmovs are unnecessary because the result of
the test is already known after and ecx, 63.  Note also mov r8d, ecx, followed
by and r8d, 63, redoing work.

Enabling the #if 0 branch results in this code instead.

compute(unsigned long, unsigned long):
mov rcx, rdi
xor edx, edx
mov rax, rsi
shrdrax, rdx, cl
shr rdx, cl
ret

That is, now gcc realizes the range of possible values for cl and does not emit
the test, the cmovs, and the redoing of the and with r8d.  One way or another,
the double precision shift is also unnecessary because only the lower 64 bits
of result may be non-zero.

Verified on Ubuntu 20.04 LTS, as well as Godbolt with gcc 9.3.0, gcc 11.1, and
gcc trunk.  This issue is similar to other 128 bit arithmetic reported bugs,
but unlike those others this one seems to be controlled exclusively by the
addition in the #if 1 branch.

For the sake of comparison, clang trunk emits the following code for the #if 1
and #if 0 branches, as per Godbolt.

compute(unsigned long, unsigned long):   #
@compute(unsigned long, unsigned long)
mov rcx, rdi
mov eax, ecx
shr rsi, cl
and eax, 63
xor edx, edx
add rax, rsi
setbdl
ret


compute(unsigned long, unsigned long):   #
@compute(unsigned long, unsigned long)
mov rax, rsi
mov rcx, rdi
shr rax, cl
xor edx, edx
ret

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #8 from Marc Glisse  ---
PR96480 would be my guess.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #9 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #6)
> Not sure what fixed it there.

Seems to be r11-2593

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #10 from Jonathan Wakely  ---
(In reply to Marc Glisse from comment #8)
> PR96480 would be my guess.

Yes

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #11 from rguenther at suse dot de  ---
On Thu, 29 Apr 2021, glisse at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589
> 
> --- Comment #7 from Marc Glisse  ---
> Some key steps in the optimization:
> PRE turns PHI<-1,0,1> > 0 into PHI<0,0,1>
> reassoc then combines the operations (it didn't in gcc-10)
> forwprop+phiopt cleans up (i>0)!=0?1:0 into just i>0.
> 
> Having to wait until phiopt4 to get the simplified form is still very 
> long, and most likely causes missed optimizations in earlier passes. But 
> nice progress!

Agreed - requiring PRE (and thus -O2) is also less than optimal for
such a core feature.  But the IL we get is simply awkward ;)

ifcombine/phiopt2 see

   [local count: 1073741824]:
  if (i_1(D) == 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  if (i_1(D) < 0)
goto ; [41.00%]
  else
goto ; [59.00%]

   [local count: 316753840]:

   [local count: 1073741824]:
  # c$_M_value_2 = PHI <0(2), -1(3), 1(4)>
  _4 = c$_M_value_2 > 0;

I guess that's what we should try to work with.  For PR7
I have prototyped a forwprop patch to try constant folding
stmts with all-constant PHIs, thus in this case c$_M_value_2 > 0,
when there's only a single use of it (that basically does what
PRE later does but earlier and for a very small subset of cases).

[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:af4ccaa7515b8e72449448c509916575831e6292

commit r12-284-gaf4ccaa7515b8e72449448c509916575831e6292
Author: Richard Biener 
Date:   Thu Apr 29 11:52:08 2021 +0200

tree-optimization/100253 - fix bogus aligned vectorized loads/stores

At some point DR_MISALIGNMENT was supposed to be -1 when the
access was not element aligned.  That's obviously not true at this
point so this adjusts both store and load vectorizing to no longer
assume this which in turn allows simplifying the code.

2021-04-29  Richard Biener  

PR tree-optimization/100253
* tree-vect-stmts.c (vectorizable_load): Do not assume
element alignment when DR_MISALIGNMENT is -1.
(vectorizable_store): Likewise.

* g++.dg/pr100253.C: New testcase.

[Bug tree-optimization/100253] [10/11 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253

Richard Biener  changed:

   What|Removed |Added

  Known to work||12.0
Summary|[10/11/12 Regression] wrong |[10/11 Regression] wrong
   |code with -O2   |code with -O2
   |-fno-tree-bit-ccp   |-fno-tree-bit-ccp
   |-ftree-slp-vectorize|-ftree-slp-vectorize
   |(unaligned movdqa)  |(unaligned movdqa)
  Known to fail|12.0|

--- Comment #8 from Richard Biener  ---
Fixed on trunk sofar.  The issue seems quite old btw, just got easier to
trigger.

[Bug target/100332] New: mcore-elf: error: 'prev_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]

2021-04-29 Thread jbglaw--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100332

Bug ID: 100332
   Summary: mcore-elf: error: 'prev_addr' may be used
uninitialized in this function
[-Werror=maybe-uninitialized]
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jbg...@lug-owl.de
  Target Milestone: ---

I'm revamping my testing efforts, building cross compilers based on targets
listed in ./contrib/config-list.mk.

With .../configure --target=mcore-elf --enable-werror-always
--enable-languages=all --prefix=/tmp/gcc-mcore-elf (using g++ (Debian
20210320-1) 11.0.1 20210320 (experimental) [master revision
3279a9a5a9a:6526c452d22:5f256a70a05fcfc5a1caf56678ceb12b4f87f781] as the host's
compiler), build breaks (as of 37846c42f1f5ac4d9ba190d49c4373673c89c8b5) with
this (cf. http://toolchain.lug-owl.de:8080/jobs/gcc-mcore-elf/11):

make all-gcc
[...]
all 2021-04-29 09:39:37.528923] /usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c   -g
-O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I.
-I../.././gcc -I../.././gcc/. -I../.././gcc/../include
-I../.././gcc/../libcpp/include -I../.././gcc/../libcody 
-I../.././gcc/../libdecnumber -I../.././gcc/../libdecnumber/dpd
-I../libdecnumber -I../.././gcc/../libbacktrace   -o dwarf2out.o -MT
dwarf2out.o -MMD -MP -MF ./.deps/dwarf2out.TPo ../.././gcc/dwarf2out.c
[all 2021-04-29 09:40:06.839260] ../.././gcc/dwarf2out.c: In function 'void
output_one_line_info_table(dw_line_info_table*)':
[all 2021-04-29 09:40:06.839476] ../.././gcc/dwarf2out.c:12831:82: error:
'prev_addr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[all 2021-04-29 09:40:06.839552] 12831 |
ASM_GENERATE_INTERNAL_LABEL (prev_label, LINE_CODE_LABEL, prev_addr->val);
[all 2021-04-29 09:40:06.839627]   |   
  ^~~
[all 2021-04-29 09:41:39.466117] cc1plus: all warnings being treated as errors
[all 2021-04-29 09:41:39.525930] make[1]: *** [Makefile:1141: dwarf2out.o]
Error 1
[all 2021-04-29 09:41:39.578999] make[1]: Leaving directory
'/var/lib/laminar/run/gcc-mcore-elf/11/gcc/host-x86_64-pc-linux-gnu/gcc'
[all 2021-04-29 09:41:39.818955] make: *** [Makefile:4428: all-gcc] Error 2


Thanks,
Jan-Benedict

[Bug rtl-optimization/99927] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P3
   Target Milestone|11.2|---

[Bug rtl-optimization/100311] UB in sel-sched.c:init_regs_for_mode with -march=armv8-m.base

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100311

--- Comment #3 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Richard Earnshaw
:

https://gcc.gnu.org/g:bda407c9a0da4aacdc62306c85712b93afa1bbc3

commit r11-8328-gbda407c9a0da4aacdc62306c85712b93afa1bbc3
Author: Richard Earnshaw 
Date:   Wed Apr 28 17:56:38 2021 +0100

arm: fix UB due to missing mode check [PR100311]

Some places in the compiler iterate over all the fixed registers to
check if that register can be used in a particular mode.  The idiom is
to iterate over the register and then for that register, if it
supports the current mode to check all that register and any
additional registers needed (HARD_REGNO_NREGS).  If these two checks
are not fully aligned then it is possible to generate a buffer overrun
when testing data objects that are sized by the number of hard regs in
the machine.

The VPR register is a case where these checks were not consistent and
because this is the last HARD register the result was that we ended up
overflowing the fixed_regs array.

gcc:
PR target/100311
* config/arm/arm.c (arm_hard_regno_mode_ok): Only allow VPR to be
used in HImode.
(cherry picked from commit 59f5d16f2c5db4d9592c8ce6453afe81334bb012)

[Bug rtl-optimization/100311] UB in sel-sched.c:init_regs_for_mode with -march=armv8-m.base

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100311

--- Comment #4 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Richard Earnshaw
:

https://gcc.gnu.org/g:d0ae39ce2c3b4d635de6102ec3750cf6109cdc8d

commit r10-9778-gd0ae39ce2c3b4d635de6102ec3750cf6109cdc8d
Author: Richard Earnshaw 
Date:   Wed Apr 28 17:56:38 2021 +0100

arm: fix UB due to missing mode check [PR100311]

Some places in the compiler iterate over all the fixed registers to
check if that register can be used in a particular mode.  The idiom is
to iterate over the register and then for that register, if it
supports the current mode to check all that register and any
additional registers needed (HARD_REGNO_NREGS).  If these two checks
are not fully aligned then it is possible to generate a buffer overrun
when testing data objects that are sized by the number of hard regs in
the machine.

The VPR register is a case where these checks were not consistent and
because this is the last HARD register the result was that we ended up
overflowing the fixed_regs array.

gcc:
PR target/100311
* config/arm/arm.c (arm_hard_regno_mode_ok): Only allow VPR to be
used in HImode.
(cherry picked from commit 59f5d16f2c5db4d9592c8ce6453afe81334bb012)

[Bug d/100324] gcc-10.2.0 (and earlier) fails to build on x86_64, but has builds just fine aarch64

2021-04-29 Thread ibuclaw at gdcproject dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100324

--- Comment #7 from Iain Buclaw  ---
(In reply to Richard Biener from comment #3)
> /bin/sh: ../libtool: No such file or directory
> 
> that's odd.  What's your host operating system, in particular what shell
> is /bin/sh?
> 
> You'd need to see what (and why) libphobos picks up as @LIBTOOL@
LIBTOOL is unconditionally set as:

---
# Always use our own libtool.
LIBTOOL='$(SHELL) $(top_builddir)/libtool'
---

Which is the same everywhere as far as I understand.

[Bug rtl-optimization/100311] UB in sel-sched.c:init_regs_for_mode with -march=armv8-m.base

2021-04-29 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100311

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Richard Earnshaw  ---
Fixed on all relevant branches.

[Bug other/63426] [meta-bug] Issues found with -fsanitize=undefined

2021-04-29 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63426
Bug 63426 depends on bug 100311, which changed state.

Bug 100311 Summary: UB in sel-sched.c:init_regs_for_mode with 
-march=armv8-m.base
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100311

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/100173] telecom/viterb00data_1 has 16.92% regression compared O2 -ftree-vectorize -fvect-cost-model=very-cheap to O2 on CLX/ICX, 9% regression on znver3

2021-04-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100173

--- Comment #4 from Hongtao.liu  ---

> but yes, cselim will also sink the first store, moving it across the

Can we also sink loads? assign pointer to another temp pointer in both if and
else bb, and then load val from this temp pointer. those assignments that in if
and else branch would be finially transformed to conditional mov. 

performance can benifit 100% with below change.

 for (i = 0; i < (1<<5)/2; i++) {

esMetricIn = *pBranchMetric++;

esMetric1 = pIn1->m_esPathMetric - esMetricIn;
esMetric2 = pIn2->m_esPathMetric + esMetricIn;

e_s16 *t1p = (esMetric1 >= esMetric2) ? &(pIn1->m_esState) :
&(pIn2->m_esState);
e_s16 t1  = (esMetric1 >= esMetric2) ? esMetric1 : esMetric2;
pOut->m_esPathMetric = t1;
pOut->m_esState = *t1p << 1;
pOut++;

esMetric1 = pIn1->m_esPathMetric + esMetricIn;
esMetric2 = pIn2->m_esPathMetric - esMetricIn;

e_s16 *t2p = (esMetric1 >= esMetric2) ? &(pIn1->m_esState) :
&(pIn2->m_esState);
e_s16 t2  = (esMetric1 >= esMetric2) ? esMetric1 : esMetric2;
pOut->m_esPathMetric = t2;
pOut->m_esState = *t2p << 1;
pOut++;

pIn1++;
pIn2++;
  }

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #12 from Marc Glisse  ---
(In reply to rguent...@suse.de from comment #11)

> For PR7
> I have prototyped a forwprop patch to try constant folding
> stmts with all-constant PHIs, thus in this case c$_M_value_2 > 0,
> when there's only a single use of it

Maybe we could handle any case where trying to fold the single use (counting
x*x as a single use of x) with each possible value satisfies is_gimple_val (or
whatever the condition is to be allowed in a PHI, and without introducing a use
of a ssa_name before it is defined), so that things like PHI & X would
simplify. But the constant case is indeed the most important, and should allow
the optimization in this PR before the vectorizer using reassoc1.

[Bug libstdc++/100259] ODR violations in

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259

--- Comment #3 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:4e54a34eed4f941cf09f27245bb3a5bfdb406a16

commit r11-8330-g4e54a34eed4f941cf09f27245bb3a5bfdb406a16
Author: Jonathan Wakely 
Date:   Mon Apr 26 11:37:38 2021 +0100

libstdc++: Add missing 'inline' specifiers to net::ip functions [PR 100259]

libstdc++-v3/ChangeLog:

PR libstdc++/100259
* include/experimental/internet (net::ip::make_error_code)
(net::ip::make_error_condition, net::ip::make_network_v4)
(net::ip::operator==(const udp&, const udp&)): Add 'inline'.

(cherry picked from commit 3f4aa4579a6c03e0a0b0a6aec68aa5a301264d45)

[Bug libstdc++/100259] ODR violations in

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259

--- Comment #4 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:9b6fecd9e594c7342141c334644d3938c8e5fcdb

commit r10-9780-g9b6fecd9e594c7342141c334644d3938c8e5fcdb
Author: Jonathan Wakely 
Date:   Mon Apr 26 11:37:38 2021 +0100

libstdc++: Add missing 'inline' specifiers to net::ip functions [PR 100259]

libstdc++-v3/ChangeLog:

PR libstdc++/100259
* include/experimental/internet (net::ip::make_error_code)
(net::ip::make_error_condition, net::ip::make_network_v4)
(net::ip::operator==(const udp&, const udp&)): Add 'inline'.

(cherry picked from commit 3f4aa4579a6c03e0a0b0a6aec68aa5a301264d45)

[Bug libstdc++/100259] ODR violations in

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259

--- Comment #5 from CVS Commits  ---
The releases/gcc-9 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:75d1d6841dd90649ea03cbe2ec43c3cfb27bebe8

commit r9-9475-g75d1d6841dd90649ea03cbe2ec43c3cfb27bebe8
Author: Jonathan Wakely 
Date:   Mon Apr 26 11:37:38 2021 +0100

libstdc++: Add missing 'inline' specifiers to net::ip functions [PR 100259]

libstdc++-v3/ChangeLog:

PR libstdc++/100259
* include/experimental/internet (net::ip::make_error_code)
(net::ip::make_error_condition, net::ip::make_network_v4)
(net::ip::operator==(const udp&, const udp&)): Add 'inline'.

(cherry picked from commit 3f4aa4579a6c03e0a0b0a6aec68aa5a301264d45)

[Bug target/100333] New: arm: ICE (unrecognizable insn) with CMSE nonsecure call and -march=armv8.1-m.main

2021-04-29 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100333

Bug ID: 100333
   Summary: arm: ICE (unrecognizable insn) with CMSE nonsecure
call and -march=armv8.1-m.main
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

The following fails:

$ cat test.c
typedef void __attribute__((cmse_nonsecure_call)) t(void);
t g;
void f() {
  g();
}
$ ./arm-eabi-gcc -c test.c -mcmse -march=armv8.1-m.main -mfloat-abi=soft
test.c: In function 'f':
test.c:5:1: error: unrecognizable insn:
5 | }
  | ^
(call_insn 5 2 0 2 (parallel [
(call (unspec:SI [
(mem:SI (symbol_ref:SI ("g") [flags 0x41]
) [0 g S4 A32])
] UNSPEC_NONSECURE_MEM)
(const_int 0 [0]))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) "test.c":4:3 -1
 (nil)
(nil))
during RTL pass: vregs
test.c:5:1: internal compiler error: in extract_insn, at recog.c:2770
0xcc2347 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/alecop01/toolchain/src/gcc/gcc/rtl-error.c:108
0xcc2366 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/alecop01/toolchain/src/gcc/gcc/rtl-error.c:116
0xc90fa6 extract_insn(rtx_insn*)
/home/alecop01/toolchain/src/gcc/gcc/recog.c:2770
0x9b74e4 instantiate_virtual_regs_in_insn
/home/alecop01/toolchain/src/gcc/gcc/function.c:1658
0x9b74e4 instantiate_virtual_regs
/home/alecop01/toolchain/src/gcc/gcc/function.c:1983
0x9b74e4 execute
/home/alecop01/toolchain/src/gcc/gcc/function.c:2032
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

The ICE does not occur if I instead specify -march=armv8-m.main.

[Bug target/100333] arm: ICE (unrecognizable insn) with CMSE nonsecure call and -march=armv8.1-m.main

2021-04-29 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100333

Alex Coplan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Keywords||ice-on-valid-code
  Known to fail||12.0
   Last reconfirmed||2021-04-29
 Ever confirmed|0   |1
 Target||arm-eabi

--- Comment #1 from Alex Coplan  ---
Taking a look at this.

[Bug libstdc++/100259] ODR violations in

2021-04-29 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |9.4
 Status|NEW |RESOLVED

--- Comment #6 from Jonathan Wakely  ---
Fixed for 9.4, 10.4, 11.2 and trunk (but the networking code is still largely
broken/useless on the old branches and only likely to get fixed on trunk).

[Bug middle-end/90773] Improve piecewise operation

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90773

--- Comment #6 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:985b3a6837dee7001e6b618f073ed74f0edf5787

commit r12-285-g985b3a6837dee7001e6b618f073ed74f0edf5787
Author: H.J. Lu 
Date:   Mon Jun 10 09:57:15 2019 -0700

Generate offset adjusted operation for op_by_pieces operations

Add an overlap_op_by_pieces_p target hook for op_by_pieces operations
between two areas of memory to generate one offset adjusted operation
in the smallest integer mode for the remaining bytes on the last piece
operation of a memory region to avoid doing more than one smaller
operations.

Pass the RTL information from the previous iteration to m_constfn in
op_by_pieces operation so that builtin_memset_[read|gen]_str can
generate the new RTL from the previous RTL.

Tested on Linux/x86-64.

gcc/

PR middle-end/90773
* builtins.c (builtin_memcpy_read_str): Add a dummy argument.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Add an argument for the previous RTL
information and generate the new RTL from the previous RTL info.
(builtin_memset_gen_str): Likewise.
* builtins.h (builtin_strncpy_read_str): Update the prototype.
(builtin_memset_read_str): Likewise.
* expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p()
returns true, round up size and alignment to the widest integer
mode for maximum size.
(pieces_addr::adjust): Add a pointer to by_pieces_prev argument
and pass it to m_constfn.
(op_by_pieces_d): Add m_push and m_overlap_op_by_pieces.
(op_by_pieces_d::op_by_pieces_d): Add a bool argument to
initialize m_push.  Initialize m_overlap_op_by_pieces with
targetm.overlap_op_by_pieces_p ().
(op_by_pieces_d::run): Pass the previous RTL information to
pieces_addr::adjust and generate overlapping operations if
m_overlap_op_by_pieces is true.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d
change.
(can_store_by_pieces): Use by_pieces_constfn on constfun.
(store_by_pieces): Use by_pieces_constfn on constfun.  Updated
for op_by_pieces_d change.
(clear_by_pieces_1): Add a dummy argument.
(clear_by_pieces): Updated for op_by_pieces_d change.
(compare_by_pieces_d::compare_by_pieces_d): Likewise.
(string_cst_read_str): Add a dummy argument.
* expr.h (by_pieces_constfn): Add a dummy argument.
(by_pieces_prev): New.
* target.def (overlap_op_by_pieces_p): New target hook.
* config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New.
* doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P.
* doc/tm.texi: Regenerated.

gcc/testsuite/

PR middle-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
* gcc.target/i386/pr90773-12.c: Likewise.
* gcc.target/i386/pr90773-13.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #13 from Richard Biener  ---
Created attachment 50707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50707&action=edit
prototype

For reference this is the prototype patch I mentioned.  I wasn't entirely happy
and wanted to explore the ??? in the commit message.

[Bug tree-optimization/100173] telecom/viterb00data_1 has 16.92% regression compared O2 -ftree-vectorize -fvect-cost-model=very-cheap to O2 on CLX/ICX, 9% regression on znver3

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100173

--- Comment #5 from Richard Biener  ---
(In reply to Hongtao.liu from comment #4)
> > but yes, cselim will also sink the first store, moving it across the
> 
> Can we also sink loads?

loads are usually hoisted, not sunk.

> assign pointer to another temp pointer in both if
> and else bb, and then load val from this temp pointer. those assignments
> that in if and else branch would be finially transformed to conditional mov. 

Hmm, so conditional move is faster.  No, I don't think we'd do this at
the moment and I'm not sure we want in general (since aggressive
if-conversion tends to be bad).

> performance can benifit 100% with below change.
> 
>  for (i = 0; i < (1<<5)/2; i++) {
> 
> esMetricIn = *pBranchMetric++;
> 
> esMetric1 = pIn1->m_esPathMetric - esMetricIn;
> esMetric2 = pIn2->m_esPathMetric + esMetricIn;
> 
> e_s16 *t1p = (esMetric1 >= esMetric2) ? &(pIn1->m_esState) :
> &(pIn2->m_esState);
> e_s16 t1  = (esMetric1 >= esMetric2) ? esMetric1 : esMetric2;
> pOut->m_esPathMetric = t1;
> pOut->m_esState = *t1p << 1;
> pOut++;
> 
> esMetric1 = pIn1->m_esPathMetric + esMetricIn;
> esMetric2 = pIn2->m_esPathMetric - esMetricIn;
> 
> e_s16 *t2p = (esMetric1 >= esMetric2) ? &(pIn1->m_esState) :
> &(pIn2->m_esState);
> e_s16 t2  = (esMetric1 >= esMetric2) ? esMetric1 : esMetric2;
> pOut->m_esPathMetric = t2;
> pOut->m_esState = *t2p << 1;
> pOut++;
> 
> pIn1++;
> pIn2++;
>   }

[Bug libgomp/81778] libgomp.c/for-5.c failure on nvptx -- illegal memory access

2021-04-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81778

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |12.0
 Status|NEW |RESOLVED

--- Comment #12 from Tom de Vries  ---
Fixed in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fc14ff611181c274584c7963bc597a6ca50c20a1

[Bug target/100331] 128 bit arithmetic --- suboptimal after shifting when referencing other variables

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100331

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-29
  Known to fail||12.0
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 Target||x86_64-*-*
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Confirmed with GCC head.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #14 from Richard Biener  ---
#include 
bool k(int a, int b){
  auto c = (a <=> b);
  return c>0;
}

Produces

   [local count: 1073741824]:
  if (a_1(D) == b_3(D))
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 708669601]:
  if (a_1(D) < b_3(D))
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 354334801]:

   [local count: 1073741824]:
  # prephitmp_8 = PHI <0(2), 0(3), 1(4)>
  return prephitmp_8;

from PRE (or the proposed patch) where it is not matched further.  This kind
of patters could be handled by phiopt.

[Bug libstdc++/100334] New: atomic::notify_one() sometimes wakes wrong thread

2021-04-29 Thread m.cencora at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334

Bug ID: 100334
   Summary: atomic::notify_one() sometimes wakes wrong thread
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: m.cencora at gmail dot com
  Target Milestone: ---

If waiter pool implementation is used in std::atomic::wait/notify for given
T, then notify_one must underneath call notify_all to make sure that proper
thread is awaken.
I.e. if multiple threads call atomic::wait() on different atomic
instances, but all of them share same waiter, then notify_one on only one of
atomics will possibly wake the wrong thread.
This can lead to program hangs, deadlocks, etc.

Following test app reproduces the bug:
g++-11 -std=c++20 -lpthread

#include 
#include 
#include 
#include 
#include 
#include 

void verify(bool cond, std::source_location loc =
std::source_location::current())
{
if (!cond)
{
std::cout << "Failed at line " << loc.line() << '\n';
std::abort();
}
}

template 
struct atomics_sharing_same_waiter
{
   std::unique_ptr> a[4];
};

unsigned get_waiter_key(void * ptr)
{
   return std::_Hash_impl::hash(ptr) & 0xf;
}

template 
atomics_sharing_same_waiter create_atomics()
{
   std::vector>> non_matching_atomics;

   atomics_sharing_same_waiter atomics;
   atomics.a[0] = std::make_unique>(0);

   auto key = get_waiter_key(atomics.a[0].get());
   for (auto i = 1u; i < 4u; ++i)
   {
  while (true)
  {
 auto atom = std::make_unique>(0);
 if (get_waiter_key(atom.get()) == key)
 {
atomics.a[i] = std::move(atom);
break;
 }
 else
 {
non_matching_atomics.push_back(std::move(atom));
 }
  }
   }

   return atomics;
}


int main()
{
// all atomic share the same waiter
auto atomics = create_atomics();

auto fut0 = std::async(std::launch::async, [&] {
atomics.a[0]->wait(0);
});

auto fut1 = std::async(std::launch::async, [&] {
atomics.a[1]->wait(0);
});

auto fut2 = std::async(std::launch::async, [&] {
atomics.a[2]->wait(0);
});

auto fut3 = std::async(std::launch::async, [&] {
atomics.a[3]->wait(0);
});

// make sure the all threads already await
std::this_thread::sleep_for(std::chrono::milliseconds{100});

atomics.a[2]->store(1);
atomics.a[2]->notify_one(); // changing to notify_all() allows this test to
pass

verify(std::future_status::timeout ==
fut0.wait_for(std::chrono::milliseconds{100}));
verify(std::future_status::timeout ==
fut1.wait_for(std::chrono::milliseconds{100}));
verify(std::future_status::ready ==
fut2.wait_for(std::chrono::milliseconds{100}));
verify(std::future_status::timeout ==
fut3.wait_for(std::chrono::milliseconds{100}));

atomics.a[0]->store(1);
atomics.a[0]->notify_one();
atomics.a[1]->store(1);
atomics.a[1]->notify_one();
atomics.a[3]->store(1);
atomics.a[3]->notify_one();
}

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #15 from Jakub Jelinek  ---
C testcase (though surely we need C++ one too because with C++ there are
aggregates and inline functions involved):

int f1 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 0; }
int f2 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a < 0; }
int f3 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a > 0; }
int f4 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 0; }
int f5 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a <= 0; }
int f6 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a >= 0; }
int f7 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == -1; }
int f8 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != -1; }
int f9 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a == 1; }
int f10 (int x, int y) { int a = x == y ? 0 : x < y ? -1 : 1; return a != 1; }
int f11 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a ==
0; }
int f12 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a < 0;
}
int f13 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a > 0;
}
int f14 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a !=
0; }
int f15 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a <=
0; }
int f16 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a >=
0; }
int f17 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a ==
-1; }
int f18 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a !=
-1; }
int f19 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a ==
1; }
int f20 (float x, float y) { int a = x == y ? 0 : x < y ? -1 : 1; return a !=
1; }

[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread

2021-04-29 Thread m.cencora at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334

--- Comment #1 from m.cencora at gmail dot com ---
This test assumes previous waiter implementation (I used gcc-11 available from
Ubuntu 21.04), latest atomic_wait impl has the same problem, it is just that
waiter is selected in a different way, and create_atomics would have to be
adjusted to accordingly (e.g. allocate 49 atomics in a single vector, and
0,16,32,48 instances should share same waiter instance).

[Bug target/99401] Rebuilding the compiler with itself fails at -O2

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

--- Comment #7 from Jakub Jelinek  ---
Maybe dup of PR99872 ?

[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread

2021-04-29 Thread m.cencora at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334

--- Comment #2 from m.cencora at gmail dot com ---
I have adapted the test to gcc trunk, but I am not entirely sure it is correct,
because I don't have gcc trunk locally, I was just testing this on wandbox.org

The problem is even bigger here, because it seems that besides wrong thread is
woken up,
also wrong std::atomic instance finishes the wait() call!

#include 
#include 
#include 
#include 
#include 
#include 

void verify(bool cond, std::source_location loc =
std::source_location::current())
{
if (!cond)
{
std::cout << "Failed at line " << loc.line() << '\n';
//std::abort();
}
}


void emptyDeleter(void *)
{}

template 
struct atomics_sharing_same_waiter
{
   std::atomic tmp[49 * 4] = {};
   std::shared_ptr> a[4] = {
  { &tmp[0], emptyDeleter },
  { &tmp[16 * 4], emptyDeleter },
  { &tmp[32 * 4], emptyDeleter },
  { &tmp[48 * 4], emptyDeleter }
   };
};

constexpr unsigned key(void * a)
{
constexpr uintptr_t __ct = 16;
return (uintptr_t(a) >> 2) % __ct;
}

int main()
{
// all atomic share the same waiter
atomics_sharing_same_waiter atomics;
for (auto& atom : atomics.a)
{
   atom->store(0);
}

std::cout << "atom0 " << atomics.a[0].get() << " key " <<
key(atomics.a[0].get()) << '\n';
std::cout << "atom1 " << atomics.a[1].get() << " key " <<
key(atomics.a[1].get()) << '\n';
std::cout << "atom2 " << atomics.a[2].get() << " key " <<
key(atomics.a[2].get()) << '\n';
std::cout << "atom3 " << atomics.a[3].get() << " key " <<
key(atomics.a[3].get()) << '\n';


verify(&std::__detail::__waiter_pool_base::_S_for(reinterpret_cast(atomics.a[0].get())) == 
   &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast(atomics.a[1].get(;

auto fut0 = std::async(std::launch::async, [&] {
atomics.a[0]->wait(0);
});

auto fut1 = std::async(std::launch::async, [&] {
atomics.a[1]->wait(0);
});

auto fut2 = std::async(std::launch::async, [&] {
atomics.a[2]->wait(0);
});

auto fut3 = std::async(std::launch::async, [&] {
atomics.a[3]->wait(0);
});

// make sure the all threads already await
std::this_thread::sleep_for(std::chrono::milliseconds{100});

atomics.a[2]->store(1);
atomics.a[2]->notify_one(); // changing to notify_all() doesn't help on gcc
trunk

verify(std::future_status::timeout ==
fut0.wait_for(std::chrono::milliseconds{100}));
verify(atomics.a[0]->load() == 0);
verify(std::future_status::timeout ==
fut1.wait_for(std::chrono::milliseconds{100}));
verify(atomics.a[1]->load() == 0);
verify(std::future_status::ready ==
fut2.wait_for(std::chrono::milliseconds{100}));
verify(atomics.a[2]->load() == 1);
verify(std::future_status::timeout ==
fut3.wait_for(std::chrono::milliseconds{100}));
verify(atomics.a[3]->load() == 0);

atomics.a[0]->store(1);
atomics.a[0]->notify_one();
atomics.a[1]->store(1);
atomics.a[1]->notify_one();
atomics.a[3]->store(1);
atomics.a[3]->notify_one();
}

[Bug bootstrap/100327] [12 regression] bootstrap failure after r12-228

2021-04-29 Thread meissner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100327

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #4 from Michael Meissner  ---
Created attachment 50708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50708&action=edit
patch to define floating point constants for KFmode and IFmode.

I am about to kick off a bootstrap with this patch.

[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread

2021-04-29 Thread m.cencora at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334

--- Comment #3 from m.cencora at gmail dot com ---
If my analysis is correct then:
 - we need to force __all = true param in __waiter_pool_base::_M_notify,
 - protect from spurious wakeups in __waiter_pool::_M_do_wait by rechecking if
the value has changed from old, if not then wait again

[Bug target/100312] __builtin_ia32_maskloadpd256 and friends should be pure

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100312

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:fd5d57946036c967dae292330fa0aa856a58fb4b

commit r12-290-gfd5d57946036c967dae292330fa0aa856a58fb4b
Author: Uros Bizjak 
Date:   Thu Apr 29 16:43:33 2021 +0200

i386: Mark x86 masked load builtins pure [PR100312]

Mark x86 AVX and AVX2 masked load builtins pure to enable dead code
elimination and more appropriate alias analysis.

2021-04-29  Uroš Bizjak  
Richard Biener  
gcc/
PR target/100312
* config/i386/i386-builtin.def (IX86_BUILTIN_MASKLOADPD)
(IX86_BUILTIN_MASKLOADPS, IX86_BUILTIN_MASKLOADPD256)
(IX86_BUILTIN_MASKLOADPS256, IX86_BUILTIN_MASKLOADD)
(IX86_BUILTIN_MASKLOADQ, IX86_BUILTIN_MASKLOADD256)
(IX86_BUILTIN_MASKLOADQ256): Move from SPECIAL_ARGS
to PURE_ARGS category.
* config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins):
Handle PURE_ARGS category.
* config/i386/i386-expand.c (ix86_expand_builtin): Ditto.

[Bug c++/100335] New: Using statement of a ref-qualified method from base class: method not callable on derived object

2021-04-29 Thread Daniel.Withopf at web dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100335

Bug ID: 100335
   Summary: Using statement of a ref-qualified method from base
class: method not callable on derived object
   Product: gcc
   Version: 10.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Daniel.Withopf at web dot de
  Target Milestone: ---

The following code examples defines 2 ref-qualified method with identical names
in a base class (const& and const&&). Derived also defines a (non-const) method
with the same name and adds a "using Base::method;"

With all gcc Versions I tested (10.3, 9.3, 8.3, 7.5), the compilation with
--std=c++11 or --std=c++14 fails if the method in the derived class does not
have a ref-qualifier.

class Base {
public:
  void method() const&& {}
  void method() const& {}
};

class Derived : public Base {
public:
  using Base::method;
  // this leads to a compiler error
  void method() {}

  // with a ref-qualifier the code is compiling
  //  void method() & {}
};

int main() {
const Derived test;
test.method();
}

I believe that this is incorrect and the code should also compile when the
method in the Derived class has no ref-qualifier.

[Bug bootstrap/100327] [12 regression] bootstrap failure after r12-228

2021-04-29 Thread meissner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100327

--- Comment #5 from Michael Meissner  ---
Unfortunately the patch does not work because there aren't suffixes for IFmode
and KFmode.

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-29 Thread munroesj at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #5 from Steven Munroe  ---
Any progress on this?

[Bug target/100312] __builtin_ia32_maskloadpd256 and friends should be pure

2021-04-29 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100312

Uroš Bizjak  changed:

   What|Removed |Added

   Assignee|rguenth at gcc dot gnu.org |ubizjak at gmail dot com
 Target|x86_64-*-* i?86-*-* |x86
   Target Milestone|--- |12.0
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Uroš Bizjak  ---
Fixed.

[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321

--- Comment #3 from Tom de Vries  ---
C example:
...
/* { dg-additional-options "-foffload=-latomic" } */

#include 

struct s
{
  int i;
};

#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i)

int
main (void)
{
  const int N0 = 32768;

  printf ("Expected: %d\n", N0);

  struct s counter_N0 = { 0 };
#pragma omp target
#pragma omp for simd reduction(+: counter_N0)
  for (int i0 = 0 ; i0 < N0 ; i0++ )
counter_N0.i += 1;
  printf ("Got : %d\n", counter_N0.i);

  return 0;
}
...

[Bug c++/95486] ICE for alias CTAD with non-dependent argument and constrained constructor

2021-04-29 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95486

--- Comment #12 from Patrick Palka  ---
(In reply to Johel Ernesto Guerrero Peña from comment #11)
> Thank you. But the first CE link: https://godbolt.org/z/cPWdGW, and with the
> addition in Comment 2: https://godbolt.org/z/Gezh5h5W4, they still ICE.

Hmm, these CE links are still using 10.1.  These examples should compile fine
with a recent GCC built from the 10 release branch (which I don't think CE
provides unfortunately).

[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321

--- Comment #4 from Tom de Vries  ---
During lower_rec_input_clauses in omp-low.c, the reduction clause is handled:
...
case OMP_CLAUSE_REDUCTION:
case OMP_CLAUSE_IN_REDUCTION:
  /* OpenACC reductions are initialized using the   
 GOACC_REDUCTION internal function.  */
  if (is_gimple_omp_oacc (ctx->stmt))
break;
  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
...

AFAICT, the problem is that the the SIMT handling code is added only in the
!OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) case.

For this test-case, the OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) path is taken
instead.

So, something like this reflects the current state:
...
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7b122059c6e..a0561800977 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6005,6 +6005,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimple_seq *dlist,
  tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
  gimple *tseq;
  tree ptype = TREE_TYPE (placeholder);
+ if (sctx.is_simt)
+   {
+ sorry ("SIMT not fully implemented");
+ abort ();
+   }
  if (cond)
{
  x = error_mark_node;
...

[Bug bootstrap/100327] [12 regression] bootstrap failure after r12-228

2021-04-29 Thread meissner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100327

Michael Meissner  changed:

   What|Removed |Added

  Attachment #50708|0   |1
is obsolete||
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-04-29
   Assignee|unassigned at gcc dot gnu.org  |meissner at gcc dot 
gnu.org

--- Comment #6 from Michael Meissner  ---
Created attachment 50709
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50709&action=edit
Use FLT128 constants for complex KFmode division.

This patch uses the FLT128 constants that are already defined for building
_divkf3.c instead of trying expect the constants to be defined as __LIBGCC_KF. 
The __LIBGCC_KF constants are not defined.

I am starting a build now with this patch.

[Bug target/100336] New: file trunk/gcc/config/i386/i386-isa.def doesn't get installed ok ?

2021-04-29 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100336

Bug ID: 100336
   Summary: file trunk/gcc/config/i386/i386-isa.def doesn't get
installed ok ?
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

I had a go a couple of days ago of compiling the new Linux kernel-5.12
with x86_64 gcc dated 20210426. I got this error:

/home/dcb/gcc/results.20210426/lib/gcc/x86_64-pc-linux-gnu/12.0.0/plugin/include/config/i386/i386.h:2197:10:
fatal error: i386-isa.def: No such file or directory

I bodged it by copying the relevant file out of the gcc trunk development tree 
into the results directory:

$ cp /home/dcb/gcc/trunk.git/gcc/config/i386/i386-isa.def
/home/dcb/gcc/results.20210429.asan.ubsan/lib/gcc/x86_64-pc-linux-gnu/12.0.0/plugin/include/config/i386/
$

My best guess is that the "make install" should do this copy.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #16 from Jakub Jelinek  ---
Writing a phiopt patch now.

[Bug fortran/92482] BIND(C) with array-descriptor mishandled for type character

2021-04-29 Thread ivan.tubert-brohman at schrodinger dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92482

Ivan Tubert-Brohman  changed:

   What|Removed |Added

 CC||ivan.tubert-brohman@schrodi
   ||nger.com

--- Comment #3 from Ivan Tubert-Brohman  ---
I also encountered this issue. I was trying the example given by Metcalf et al.
on _Modern Fortran Explained_ (2018), section 21.4, "Assumed character length":

  interface
 subroutine err_msg(string) bind(c)
use iso_c_binding
character(*, c_char), intent(in):: string
 end subroutine err_msg
  end interface

The example worked with the Intel Fortran compiler, but with gfortran 11.1 I
get

  Error: Character argument ‘string’ at (1) must be length 1 because procedure
‘err_msg’ is BIND(C)

I only comment to confirm that both a famous author and an alternative compiler
implementation agree with the interpretation of the standard that this should
work. :-)

[Bug c++/100288] [11/12 Regression] g++-11 internal error and fails to precompile a concept

2021-04-29 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100288

Patrick Palka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug debug/100303] [11/12 Regression] -fcompare-debug failure (length) with -O -fno-dce -ftracer

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100303

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:c97351c0cf4872cc0e99e73ed17fb16659fd38b3

commit r12-292-gc97351c0cf4872cc0e99e73ed17fb16659fd38b3
Author: Richard Sandiford 
Date:   Thu Apr 29 17:24:57 2021 +0100

rtl-ssa: Fix -fcompare-debug failure [PR100303]

This patch fixes an oversight in the handling of debug instructions
in rtl-ssa.  At the moment (and whether this is a good idea or not
remains to be seen), we maintain a linear RPO sequence of definitions
and non-debug uses.  If a register is defined more than once, we use
a degenerate phi to reestablish a previous definition where necessary.

However, debug instructions shouldn't of course affect codegen,
so we can't create a new definition just for them.  In those situations
we instead hang the debug use off the real definition (meaning that
debug uses do not follow a linear order wrt definitions).  Again,
it remains to be seen whether that's a good idea.

The problem in the PR was that we weren't taking this into account
when increasing (or potentially increasing) the live range of an
existing definition.  We'd create the phi even if it would only
be used by debug instructions.

The patch goes for the simple but inelegant approach of passing
a bool to say whether the use is a debug use or not.  I imagine
this area will need some tweaking based on experience in future.

gcc/
PR rtl-optimization/100303
* rtl-ssa/accesses.cc (function_info::make_use_available): Take a
boolean that indicates whether the use will only be used in
debug instructions.  Treat it in the same way that existing
cross-EBB debug references would be handled if so.
(function_info::make_uses_available): Likewise.
* rtl-ssa/functions.h (function_info::make_uses_available): Update
prototype accordingly.
(function_info::make_uses_available): Likewise.
* fwprop.c (try_fwprop_subst): Update call accordingly.

[Bug debug/100303] [11 Regression] -fcompare-debug failure (length) with -O -fno-dce -ftracer

2021-04-29 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100303

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

Summary|[11/12 Regression]  |[11 Regression]
   |-fcompare-debug failure |-fcompare-debug failure
   |(length) with -O -fno-dce   |(length) with -O -fno-dce
   |-ftracer|-ftracer

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
Fixed on trunk so far.

[Bug libstdc++/100334] atomic::notify_one() sometimes wakes wrong thread

2021-04-29 Thread rodgertq at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100334

--- Comment #4 from Thomas Rodgers  ---
This analysis is likely correct, except for -

"- protect from spurious wakeups in __waiter_pool::_M_do_wait by rechecking if
the value has changed from old, if not then wait again"

An earlier version of this code did this, but was subject to ABA problems, and 
the standard doesn't require that we do this -

"(23.3) — Blocks until it is unblocked by an atomic notifying operation or is
unblocked spuriously."

[Bug target/58067] ICE in GFortran recog.c:2158

2021-04-29 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58067

Zdenek Sojka  changed:

   What|Removed |Added

 CC||zsojka at seznam dot cz

--- Comment #13 from Zdenek Sojka  ---
Even the original fortran testcase does not fail in 8.3.1, 9.3.0, 10.3.0,
11.1.0 for me.

[Bug target/46250] ICE: in extract_insn, at recog.c:2110 (unrecognizable insn) with -fPIC -mcmodel=large and __thread variable

2021-04-29 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46250

Zdenek Sojka  changed:

   What|Removed |Added

  Known to work||5.5.0, 6.5.0, 7.5.0

--- Comment #7 from Zdenek Sojka  ---
No longer fails for me since at least 5.5.0.

[Bug fortran/100337] New: Should be able to pass non-present optional arguments to CO_BROADCAST

2021-04-29 Thread everythingfunctional at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100337

Bug ID: 100337
   Summary: Should be able to pass non-present optional arguments
to CO_BROADCAST
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: everythingfunctional at protonmail dot com
  Target Milestone: ---

Attempting to write a CO_BROADCAST wrapper for a derived type, I found that
passing optional arguments to CO_BROADCAST caused a segfault if the arguments
were not present. This should be allowed. See the following minimal example

module iso_varying_string
implicit none

type :: varying_string
character(len=1), allocatable :: characters(:)
contains
procedure :: co_broadcast => co_broadcast_varying_string
end type
contains
subroutine co_broadcast_varying_string(a, source_image, stat, errmsg)
class(varying_string), intent(inout) :: a
integer, intent(in) :: source_image
integer, intent(out), optional :: stat
character(len=*), intent(inout), optional :: errmsg

integer :: string_length

if (this_image() == source_image) string_length = size(a%characters)

call co_broadcast(string_length, source_image, stat, errmsg)

if (present(stat)) then
if (stat /= 0) return
end if

if (this_image() /= source_image) then
allocate(a%characters(string_length))
end if

call co_broadcast(a%characters, source_image, stat, errmsg)
end subroutine
end module

program main
use iso_varying_string, only: varying_string

implicit none

character(len=*), parameter :: MESSAGE = "Hello, World!"
integer :: i
type(varying_string) :: the_string

if (this_image() == 1) the_string%characters = [(MESSAGE(i:i), i = 1,
len(MESSAGE))]
call the_string%co_broadcast(1)
print *, the_string%characters
end program

Compiling and executing with gfortran produces the following:

$ gfortran -fcoarray=single -g example.f90
$ ./a.out 

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fed85487d01 in ???
#1  0x7fed85486ed5 in ???
#2  0x7fed852be94f in ???
#3  0x55cd33d21683 in __iso_varying_string_MOD_co_broadcast_varying_string
at /home/brad/tmp/co_broadcast_optional/example.f90:20
#4  0x55cd33d21872 in MAIN__
at /home/brad/tmp/co_broadcast_optional/example.f90:43
#5  0x55cd33d21915 in main
at /home/brad/tmp/co_broadcast_optional/example.f90:35
zsh: segmentation fault (core dumped)  ./a.out

Note that using opencoarrays this program works, as follows:

$ caf example.f90 
$ cafrun -n 2 ./a.out 
 Hello, World!
 Hello, World!

[Bug target/99401] Rebuilding the compiler with itself fails at -O2

2021-04-29 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99401

Brecht Sanders  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Brecht Sanders  
---
Recently released version 11.1.0 does build for Windows 32-bit with MinGW-w64
without issues.

[Bug c++/94102] Variadic template deduction guide issue - error: 'In instantiation of'

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94102

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:f24702258fc78ac37b3e8154d76239cccd30c422

commit r12-294-gf24702258fc78ac37b3e8154d76239cccd30c422
Author: Marek Polacek 
Date:   Thu Apr 29 13:30:39 2021 -0400

c++: Add testcase for already fixed PR [PR94102]

We correctly accept this testcase since r11-1571.

gcc/testsuite/ChangeLog:

PR c++/94102
* g++.dg/cpp1z/class-deduction87.C: New test.

[Bug c++/94102] Variadic template deduction guide issue - error: 'In instantiation of'

2021-04-29 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94102

Marek Polacek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Marek Polacek  ---
Reduced test added.

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2021-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #17 from Jakub Jelinek  ---
Created attachment 50710
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50710&action=edit
gcc12-pr94589-wip.patch

WIP patch that just matches those spaceship comparisons followed by single use
comparison of that with -1/0/1, but doesn't yet perform any of the
optimizations.

[Bug c++/68942] overly strict use of deleted function before argument-dependent lookup (ADL)

2021-04-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68942

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Patrick Palka :

https://gcc.gnu.org/g:efeca0ac4155b76ce713155f190422aac20537c5

commit r12-295-gefeca0ac4155b76ce713155f190422aac20537c5
Author: Patrick Palka 
Date:   Thu Apr 29 13:43:00 2021 -0400

c++: Overeager use of deleted function before ADL [PR68942]

Here, at template definition time, ordinary name lookup for 'foo(t)'
finds only the deleted function, and so we form a CALL_EXPR thereof.
Later at instantiation time, when initially substituting into this
CALL_EXPR with T=N::A, we end up calling mark_used on this deleted
function (since it's the only function in the overload set), triggering
a bogus "use of deleted function error", before we get to augment the
overload set via ADL.

This patch fixes this issue by using the tf_conv flag to disable
mark_used during the initial substitution into the callee of a
CALL_EXPR when KOENIG_P, since at this point we're still figuring out
which functions are candidates.

gcc/cp/ChangeLog:

PR c++/68942
* pt.c (tsubst_copy_and_build) : When KOENIG_P,
set tf_conv during the initial substitution into the function.

gcc/testsuite/ChangeLog:

PR c++/68942
* g++.dg/template/koenig12.C: New test.

[Bug c++/68942] overly strict use of deleted function before argument-dependent lookup (ADL)

2021-04-29 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68942

Patrick Palka  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |12.0

--- Comment #3 from Patrick Palka  ---
Fixed for GCC 12, thanks for the bug report.

[Bug target/100331] 128 bit arithmetic --- suboptimal after shifting when referencing other variables

2021-04-29 Thread mbenfield at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100331

Michael Benfield  changed:

   What|Removed |Added

 CC||mbenfield at google dot com

--- Comment #2 from Michael Benfield  ---
Given code with unused variables/parameters of some struct type in C that warns
for `-Wunused-variable`, the identical code warns in C++. But the same is not
true of `-Wunused-but-set-variable`: code that warns in C will not warn in C++.
This seems inconsistent to me; I suspect the code should warn in both
languages.

Similar comments apply to `-Wunused-parameter` and
`-Wunused-but-set-parameter`. 

Given this code in both gcc-test.c and gcc-test.cpp:

struct S { 
  int x;
};

void f_unused_but_set(struct S p1) {
  struct S s;
  p1 = s; 

  struct S v1;
  v1 = s;
} 

void f_unused(struct S p2) { 
  struct S v2; 
}

and compiling with `gcc -fsyntax-only -Wunused -Wextra gcc-test.c`, for
gcc-test.c, we get warnings for all of p1, v1, p2, v2, while for gcc-test.cpp
we get warnings only for p2 and v2.

  1   2   >