[Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #20 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f24dfc76177b3994434c8beb287cde1a9976b5ce

commit r12-7318-gf24dfc76177b3994434c8beb287cde1a9976b5ce
Author: Richard Biener 
Date:   Fri Feb 18 11:50:44 2022 +0100

tree-optimization/104582 - make SLP node available in vector cost hook

This adjusts the vectorizer costing API to allow passing down the
SLP node the vector stmt is created from.

2022-02-18  Richard Biener  

PR tree-optimization/104582
* tree-vectorizer.h (stmt_info_for_cost::node): New field.
(vector_costs::add_stmt_cost): Add SLP node parameter.
(dump_stmt_cost): Likewise.
(add_stmt_cost): Likewise, new overload and adjust.
(add_stmt_costs): Adjust.
(record_stmt_cost): New overload.
* tree-vectorizer.cc (dump_stmt_cost): Dump the SLP node.
(vector_costs::add_stmt_cost): Adjust.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
Adjust.
* tree-vect-slp.cc (vect_prologue_cost_for_slp): Record
the SLP node for costing.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.cc (record_stmt_cost): Adjust and add
new overloads.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Adjust.
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
Adjust.
* config/rs6000/rs6000.cc (rs6000_vector_costs::add_stmt_cost):
Adjust.
(rs6000_cost_data::adjust_vect_cost_per_loop): Likewise.

[Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:90d693bdc9d71841f51d68826ffa5bd685d7f0bc

commit r12-7319-g90d693bdc9d71841f51d68826ffa5bd685d7f0bc
Author: Richard Biener 
Date:   Fri Feb 18 14:32:14 2022 +0100

target/99881 - x86 vector cost of CTOR from integer regs

This uses the now passed SLP node to the vectorizer costing hook
to adjust vector construction costs for the cost of moving an
integer component from a GPR to a vector register when that's
required for building a vector from components.  A cruical difference
here is whether the component is loaded from memory or extracted
from a vector register as in those cases no intermediate GPR is involved.

The pr99881.c testcase can be Un-XFAILed with this patch, the
pr91446.c testcase now produces scalar code which looks superior
to me so I've adjusted it as well.

2022-02-18  Richard Biener  

PR tree-optimization/104582
PR target/99881
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Cost GPR to vector register moves for integer vector construction.

* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
* gcc.target/i386/pr99881.c: Un-XFAIL.
* gcc.target/i386/pr91446.c: Adjust to not expect vectorization.

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:90d693bdc9d71841f51d68826ffa5bd685d7f0bc

commit r12-7319-g90d693bdc9d71841f51d68826ffa5bd685d7f0bc
Author: Richard Biener 
Date:   Fri Feb 18 14:32:14 2022 +0100

target/99881 - x86 vector cost of CTOR from integer regs

This uses the now passed SLP node to the vectorizer costing hook
to adjust vector construction costs for the cost of moving an
integer component from a GPR to a vector register when that's
required for building a vector from components.  A cruical difference
here is whether the component is loaded from memory or extracted
from a vector register as in those cases no intermediate GPR is involved.

The pr99881.c testcase can be Un-XFAILed with this patch, the
pr91446.c testcase now produces scalar code which looks superior
to me so I've adjusted it as well.

2022-02-18  Richard Biener  

PR tree-optimization/104582
PR target/99881
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Cost GPR to vector register moves for integer vector construction.

* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
* gcc.target/i386/pr99881.c: Un-XFAIL.
* gcc.target/i386/pr91446.c: Adjust to not expect vectorization.

[Bug tree-optimization/104582] [11 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-* i?86-*-*
Summary|[11/12 Regression]  |[11 Regression] Unoptimal
   |Unoptimal code for __negdi2 |code for __negdi2 (and
   |(and others) from libgcc2   |others) from libgcc2 due to
   |due to unwanted |unwanted vectorization
   |vectorization   |
  Known to work||12.0

--- Comment #22 from Richard Biener  ---
This is now fixed on trunk for x86.

[Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #13 from Richard Biener  ---
This is now fixed again for GCC 12 which enables vectorization at -O2.

[Bug tree-optimization/100457] [meta bug] Enabling O2 vectorization in GCC 12

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100457
Bug 100457 depends on bug 99881, which changed state.

Bug 99881 Summary: Regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99881

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX.

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Blocks||26163
   Target Milestone|--- |12.0
   Last reconfirmed||2022-02-22
 Status|RESOLVED|ASSIGNED
 Resolution|FIXED   |---
Summary|r12-2549 regress x264_r by  |[12 Regression] r12-7319
   |4% on CLX.  |regress x264_r by 4% on
   ||CLX.
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener  ---
Proactively re-opening.  I will see whether something can be done.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 101929, which changed state.

Bug 101929 Summary: [12 Regression] r12-7319 regress x264_r by 4% on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

   What|Removed |Added

 Status|RESOLVED|ASSIGNED
 Resolution|FIXED   |---

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

Artur Bać  changed:

   What|Removed |Added

 CC||gcc at ebasoft dot com.pl

--- Comment #4 from Artur Bać  ---
trunk at compiler explorer still rejects valid code

https://godbolt.org/z/v4ebhj9Gh, only the message of requirement of namespace
scope is missing from gcc 11.2, invalid use of template-name without an
argument list

https://godbolt.org/z/7Wev6saWz "ctad" must be declared at namespace scope +
invalid use of template-name without an argument list

clang https://godbolt.org/z/vavPTbf36 works as expected

[Bug c++/104631] New: Visibility of static member s yields duplicate symbols.

2022-02-22 Thread max.sagebaum at scicomp dot uni-kl.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104631

Bug ID: 104631
   Summary: Visibility of static member s yields duplicate
symbols.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: max.sagebaum at scicomp dot uni-kl.de
  Target Milestone: ---

Created attachment 52488
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52488&action=edit
Small case for showing the bug.

We develop a header only library where we have a static structure member inside
of a class. We require that this static member is seen by all operations on
this class.

If our library is included with '-fvisibility=hidden' then we get multiple
symbols of the same static member. We tried to fix this by declaring the static
member with '__attribute__((visibility("default")))' but for members which are
structs this does not seem to work.

We created a simple example, that can reproduce the behavior. If the static
member is an 'int' then everything works. If this static member is a structure,
then there is a duplication of the symbols.

You can run the example with 'make T=2'. 'T=1' sets no hidden flag and 'T=3'
removes the hidden flag for the library.

The output should always be:
func: Pointer A::counter: 0x404194
func: Pointer A::counter.counter: 0x404198
main: Pointer A::counter: 0x404194
main: Pointer A::counter.counter: 0x404198
Counter value 'int': 1
Counter value 'Inc': 1

But with T=2 the output on our machines is:
func: Pointer A::counter: 0x404194
func: Pointer A::counter.counter: 0x7f9f69625050
main: Pointer A::counter: 0x404194
main: Pointer A::counter.counter: 0x404198
Counter value 'int': 1
Counter value 'Inc': 0

In the real world case the struct 'Inc' is very involved and uses nearly every
other structure in our library. We would be fine by changing the visibility of
our library to default but we could not detect any preprocessor variables, that
hint that the library is included with the hidden visibility. (That is
'-fvisibility=hidden -E -dM func.cpp' and '-E -dM func.cpp' yielded the same
results.)

I hope this is the correct place to submit this bug and ask the question about
the preprocessor macro.

System: Fedora  5.16.9-200.fc35.x86_64
g++: g++ (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
ld: GNU ld version 2.37-10.fc35

[Bug target/103069] cmpxchg isn't optimized

2022-02-22 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #15 from Hongyu Wang  ---
(In reply to Thiago Macieira from comment #14)
> I'd restrict relaxations to loops emitted by the compiler. All other atomic
> operations shouldn't be modified at all, unless the user asks for it. That
> includes non-looping atomic operations (like LOCK BTC, LOCK XADD) as well as
> a pure LOCK CMPXCHG that came from a single __atomic_compare_exchange by the
> user.
> 
> I'd welcome the ability to relax the latter, especially if with one codebase
> I could be efficient in CAS architectures as well as LL/SC ones.

The latest patch relaxed the pure LOCK CMPXCHG with -mrelax-cmpxchg-loop as the
commit message shows. So if you want, I can split this part to another switch
like -mrelax-cmpxchg-insn.

[Bug c++/104632] New: Missed optimization about backward reads

2022-02-22 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104632

Bug ID: 104632
   Summary: Missed optimization about backward reads
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lh_mouse at 126 dot com
  Target Milestone: ---
Target: x86_64-linux-gnu

This is a piece of code that has been simplified from a Boyer-Moore-Horspool
implementation:

https://gcc.godbolt.org/z/766GYM8xf
```c++
// In real code this was
//   `load_le32_backwards(::std::reverse_iterator ptr)
unsigned
load_le32_backwards(const unsigned char* ptr)
  {
unsigned word =ptr[-1];
word = word << 8 | ptr[-2];
word = word << 8 | ptr[-3];
word = word << 8 | ptr[-4];
return word;
  }
```

This is equivalent to `return ((unsigned*)ptr)[-1];` on x86_64, but GCC fails
to optimize it:

GCC output:
```
load_le32_backwards(unsigned char const*):
movzx   edx, BYTE PTR [rdi-1]
movzx   eax, BYTE PTR [rdi-2]
sal edx, 8
or  eax, edx
movzx   edx, BYTE PTR [rdi-3]
sal eax, 8
or  edx, eax
movzx   eax, BYTE PTR [rdi-4]
sal edx, 8
or  eax, edx
ret
```

Clang output:
```
load_le32_backwards(unsigned char const*): #
@load_le32_backwards(unsigned char const*)
mov eax, dword ptr [rdi - 4]
ret
```

[Bug c++/104631] Visibility of static member s yields duplicate symbols.

2022-02-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104631

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
Inc is a different class in the shared library and the header if you use
-fvisibility=hidden. Since the type of the static variable is hidden still, the
variable will be still hidden even if it was marked as default visibilitity.
If you want to share Inc across shared libraries and executables, then you need
to mark that class as visibility default too.

[Bug tree-optimization/104632] Missed optimization about reading backwards

2022-02-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104632

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
  Component|c++ |tree-optimization

--- Comment #1 from Andrew Pinski  ---
There might be another bug that is similar to this.

[Bug target/103353] Indefinite recursion when compiling -mmma requiring testcase w/ -maltivec

2022-02-22 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103353

Kewen Lin  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org

--- Comment #3 from Kewen Lin  ---
And I am also curious about what will we miss if just changing it back to
return const0_rtx?

Back to the failure itself, without TARGET_MMA set we don't have the optab
movoo support. When it is expanding the bif, it tries to emit_move_insn
(target, valreg)

(gdb) pr target
(reg:OO 117 [ _1 ])

(gdb) pr valreg
(reg:OO 66 2)

For the target, it's pseudo and able to get subreg:SI; while for the valreg,
it's hard reg, fails to gen subreg, but it will call operand_subword_force (y,
i, mode) to get subword further, it goes with:

   copy_to_reg (op);

 further call: emit_move_insn (temp, x) 

// back to the original, OOmode move from the hard register to another pseudo,
and again and again... until memory run out for allocation.

If the answer to the question above is it's still meaningful to expand this
call without an expected context, I think we have to extend OOmode and XOmode
handling.

Any thoughts?

[Bug c/104633] New: [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

Bug ID: 104633
   Summary: [12 Regression] -Winfinite-recursion diagnoses fortify
wrappers
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

extern inline __attribute__((always_inline))
int memcmp (const void * p, const void *q, unsigned long size)
{
  return __builtin_memcmp (p, q, size);
}

are diagnosed with

t.c: In function 'memcmp':
t.c:2:5: warning: infinite recursion detected [-Winfinite-recursion]
2 | int memcmp (const void * p, const void *q, unsigned long size)
  | ^~
t.c:4:10: note: recursive call
4 |   return __builtin_memcmp (p, q, size);
  |  ^

This pattern happens in glibc fortify wrappers (but need -Wsystem-headers in
addition to -Wall).  It's reportedly also triggering for kernel wrappers
in its fortify-string.h which does not get the benefit of doubt via
-Wsystem-headers.

These kind of wrappers are not recursions (I think the kernel doesn't have
'extern' on the inline), inline instances will not be called itself and
__builtin_XXX are never inline expanded(?).

[Bug c/104633] [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic
   Target Milestone|--- |12.0
 CC||msebor at gcc dot gnu.org

[Bug c++/104631] Visibility of static member s yields duplicate symbols.

2022-02-22 Thread max.sagebaum at scicomp dot uni-kl.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104631

--- Comment #2 from Max S.  ---
Ok, thank you for the answer. In the example I can set this for the class but,
in the library it would be problematic. So here the best solution would be to
set the whole library to default.

Since the user should decide how the visibility of the library is, we do not
want to set it to default. Since after all, our library could only be used in a
library of the user and other programs never need to see it.

But we would like to warn the user if our library is included with visibility
hidden. 

Is there a preprocessor macro that can provide this information? 

Or is it possible to detect the visibility of a type during compile time in
order to trigger a static assert?

Thanks.

[Bug target/104363] hppa: __asm__ directive .global and multiple .symver not supported

2022-02-22 Thread mathieu.malaterre at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104363

Mathieu Malaterre  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from Mathieu Malaterre  ---
This is not a regression, I can reproduce the exact same error using gcc-10:

[...]
libkcapi-1.3.1/apps/kcapi-rng.c:302: undefined reference to
`kcapi_rng_generate'
/usr/lib/gcc-cross/hppa-linux-gnu/10/../../../../hppa-linux-gnu/bin/ld:
libkcapi-1.3.1/apps/kcapi-rng.c:328: undefined reference to
`kcapi_memset_secure'
[...]


Closing as invalid.

[Bug tree-optimization/104632] Missed optimization about reading backwards

2022-02-22 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104632

--- Comment #2 from LIU Hao  ---
I don't think it's a duplicate. This only happens when reading through a
pointer by negative offsets. If I change the code to read by non-negative
offsets, GCC is actually very happy about it:

https://gcc.godbolt.org/z/sT9hzcndW

```
// In real code this was
//   `load_le32_backwards(::std::reverse_iterator ptr)
unsigned
load_le32_backwards(const unsigned char* ptr)
  {
unsigned word =ptr[3];
word = word << 8 | ptr[2];
word = word << 8 | ptr[1];
word = word << 8 | ptr[0];
return word;
  }

```

```
load_le32_backwards(unsigned char const*):
mov eax, DWORD PTR [rdi]
ret
```

[Bug testsuite/104146] FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104146

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Tom de Vries :

https://gcc.gnu.org/g:6263b656c8fcfc6d7e1d2af55a88bc0429a4b352

commit r12-7322-g6263b656c8fcfc6d7e1d2af55a88bc0429a4b352
Author: Tom de Vries 
Date:   Mon Feb 21 20:02:13 2022 +0100

[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA

When running the libgomp testsuite on x86_64 with nvptx accelerator, we run
into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call
prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
 { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-02-21  Tom de Vries  

PR testsuite/104146
* testsuite/libgomp.c++/pr96390.C: Add additional-option
-foffload=-Wa,--verify for nvptx.
* testsuite/libgomp.c-c++-common/pr96390.c: Same.

[Bug testsuite/104146] FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test

2022-02-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104146

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |12.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Tom de Vries  ---
Fixed by "[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA".

[Bug target/104612] [12 Regression] ICE in mark_jump_label_1 since r12-3435

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104612

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:7e691189ca9c04fdba71ceada1faba62afbc1463

commit r12-7323-g7e691189ca9c04fdba71ceada1faba62afbc1463
Author: Jakub Jelinek 
Date:   Tue Feb 22 10:38:37 2022 +0100

i386: Fix up copysign/xorsign expansion [PR104612]

We ICE on the following testcase for -m32 since r12-3435. because
operands[2] is (subreg:SF (reg:DI ...) 0) and
lowpart_subreg (V4SFmode, operands[2], SFmode)
returns NULL, and that is what we use in AND etc. insns we emit.

My earlier version of the patch fixes that by calling force_reg for the
input operands, to make sure they are really REGs and so lowpart_subreg
will succeed on them - even for theoretical MEMs using REGs there seems
desirable, we don't want to read following memory slots for the paradoxical
subreg.  For the outputs, I thought we'd get better code by always
computing
result into a new pseudo and them move lowpart of that pseudo into dest.

Unfortunately it regressed
FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps
on which the patch changes:
vandps  .LC0(%rip), %xmm1, %xmm1
-   vxorps  %xmm0, %xmm1, %xmm0
+   vxorps  %xmm0, %xmm1, %xmm1
+   vmovaps %xmm1, %xmm0
ret
The RA sees:
(insn 8 4 9 2 (set (reg:V4SF 85)
(and:V4SF (subreg:V4SF (reg:SF 90) 0)
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16
A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3}
 (expr_list:REG_DEAD (reg:SF 90)
(nil)))
(insn 9 8 10 2 (set (reg:V4SF 87)
(xor:V4SF (reg:V4SF 85)
(subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842
{*xorv4sf3}
 (expr_list:REG_DEAD (reg:SF 89)
(expr_list:REG_DEAD (reg:V4SF 85)
(nil
(insn 10 9 14 2 (set (reg:SF 82 [  ])
(subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142
{*movsf_internal}
 (expr_list:REG_DEAD (reg:V4SF 87)
(nil)))
(insn 14 10 15 2 (set (reg/i:SF 20 xmm0)
(reg:SF 82 [  ])) "pr89984-2.c":8:1 142 {*movsf_internal}
 (expr_list:REG_DEAD (reg:SF 82 [  ])
(nil)))
(insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1
 (nil))
and doesn't know that if it would use xmm0 not just for pseudo 82
but also for pseudo 87, it could create a noop move in insn 10 and
so could avoid an extra register copy and nothing later on is able
to figure that out either.  I don't know how the RA should know
that though.

So that we don't regress, this version of the patch
will do this stuff (i.e. use fresh vector pseudo as destination and
then move lowpart of that to dest) over what it used before (i.e.
use paradoxical subreg of the dest) only if lowpart_subreg returns NULL.

2022-02-22  Jakub Jelinek  

PR target/104612
* config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg
on input operands before calling lowpart_subreg on it.  For output
operand, use a vmode pseudo as destination and then move its
lowpart
subreg into operands[0] if lowpart_subreg fails on dest.
(ix86_expand_xorsign): Likewise.

* gcc.dg/pr104612.c: New test.

[Bug tree-optimization/104604] [12 Regression]wrong code with -O2 VRP Complex integer division issue since r12-3328

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104604

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:d44dc131f48254fccc69ec4178fec030e0e2761d

commit r12-7324-gd44dc131f48254fccc69ec4178fec030e0e2761d
Author: Jakub Jelinek 
Date:   Tue Feb 22 10:43:13 2022 +0100

ranger: Fix up REALPART_EXPR/IMAGPART_EXPR handling [PR104604]

The following testcase is miscompiled since r12-3328.
That change assumed that if rhs1 of a GIMPLE_ASSIGN is COMPLEX_CST, then
that is the value of the lhs of the stmt, but that is not the case always,
only if it is a GIMPLE_SINGLE_RHS stmt.  If it is e.g.
GIMPLE_UNARY_RHS or GIMPLE_BINARY_RHS (the latter happens in the testcase),
then it can be e.g.
__complex__ (3, 0) / var
and the REALPART_EXPR of that isn't 3, but the realpart of the division.
I assume once the ranger can do complex numbers adjust_*part_expr will just
fetch one or the other range from a underlying complex range, but until
then, we should limit this to what r12-3328 meant to do.

2022-02-22  Jakub Jelinek  

PR tree-optimization/104604
* gimple-range-fold.cc (adjust_imagpart_expr,
adjust_realpart_expr):
Only check if gimple_assign_rhs1 is COMPLEX_CST if
gimple_assign_rhs_code is COMPLEX_CST.

* gcc.c-torture/execute/pr104604.c: New test.

[Bug c++/104634] New: Explicit template instantiation does not work when there are multiple partial template specialization using concepts Денис Шкиря

2022-02-22 Thread denis.shkirja at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104634

Bug ID: 104634
   Summary: Explicit template instantiation does not work when
there are multiple partial template specialization
using concepts  Денис Шкиря 
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: denis.shkirja at gmail dot com
  Target Milestone: ---

In the code below I am trying to explicitly instantiate two different
partial specializations of a template struct with static function and
I expect to see two global weak symbols (W) that correspond to
different instantiations of that function in the object file. But for
some reason I see only one symbol that corresponds to the first
specialization. I observe this behavior on all gcc versions from 10.01 to 12 (I
checked it using godbolt - https://godbolt.org/z/4ev4ePPYe).
Also, it looks like Clang and MSVC produce both symbols.

The reproducer:
```
$ cat main.cpp
#include 

template
struct Struct {
static void func();
};

template T>
struct Struct {
static void func() {}
};

template T>
struct Struct {
static void func() {}
};

template struct Struct;
template struct Struct;


$ g++ -c ./main.cpp -std=gnu++20 -o main.o
$ nm -C main.o
 W Struct::func()
```
GCC version that I used to reproduce the problem locally:
```
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.1.0-1ubuntu1~18.04.1'
--with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror
--disable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-YRKbe7/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-YRKbe7/gcc-11-11.1.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~18.04.1)
```

I am not sure if this will be useful, but here is what the symbols
look like when compiled with clang:
```
$ /usr/lib/llvm-13/bin/clang -c ./main.cpp -std=c++20 -o main_clang.o
$ nm -C main_clang.o
 W Struct::func()
 W Struct::func()
```

[Bug tree-optimization/104604] [12 Regression]wrong code with -O2 VRP Complex integer division issue since r12-3328

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104604

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jakub Jelinek  ---
Fixed.

[Bug target/104612] [12 Regression] ICE in mark_jump_label_1 since r12-3435

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104612

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jakub Jelinek  ---
Fixed.

[Bug c++/104634] Explicit template instantiation does not work when there are multiple partial template specialization using concepts

2022-02-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104634

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||link-failure
   Last reconfirmed||2022-02-22
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed,
template 
constexpr bool type_same = false;

template
constexpr bool type_same = true;

template 
concept same_as = type_same;

template
struct Struct {
static void func1(){}
};

template T>
struct Struct {
static void func3() {}
};

template T>
struct Struct {
static void func2() {}
};

template struct Struct;
template struct Struct;

- CUT 
The function is in .original so maybe it is not being marked as used 

[Bug c/104633] [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

--- Comment #1 from Richard Biener  ---
Note the inline function needs __attribute__((gnu_inline)) or -fgnu89-inline to
not compile to an endless recursion, but we then still get the undesired
diagnostic.

[Bug c++/104008] [11/12 Regression] New g++ folly compile error since r11-7931-ga2531859bf5bf6cf

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104008

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW
Summary|[11/12 Regression] New g++  |[11/12 Regression] New g++
   |folly compile error with|folly compile error since
   |gcc 11.x. Bisected to   |r11-7931-ga2531859bf5bf6cf
   |PR99445 c++: Alias template |
   |in pack expansion   |
   Keywords|needs-reduction |

--- Comment #7 from Martin Liška  ---
Confirmed, started with r11-7931-ga2531859bf5bf6cf.

Test-case:

template  struct conjunction;
template  struct disjunction;
template  struct is_same;
template  struct enable_if;
template  using enable_if_t = typename enable_if<_Cond>::type;
struct B;
struct __uniq_ptr_impl {
  struct _Ptr {
using type = B *;
  };
  using pointer = _Ptr::type;
};
struct unique_ptr {
  using pointer = __uniq_ptr_impl::pointer;
  unique_ptr(pointer);
};
template  unique_ptr make_unique(_Args... __args)
{
  return new B(__args...);
}
template 
using IsOneOf = disjunction...>;
template  class any_badge;
struct badge {
  badge(any_badge<>);
  badge();
};
template  struct any_badge {
  template ...>::value>>
  any_badge();
};
struct B {
  B(badge);
  unique_ptr b_ = make_unique(badge{});
};

[Bug tree-optimization/102645] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: Segmentation fault

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102645

--- Comment #5 from Martin Liška  ---
@Richi: Can you please add the testcase and close this issue?

[Bug ipa/104533] [12 Regression] ICE in update_vtable_references, at ipa-visibility.cc:383 since r12-2900-gfa28520fadb9405f

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104533

Martin Liška  changed:

   What|Removed |Added

Summary|[12 Regression] ICE in  |[12 Regression] ICE in
   |update_vtable_references,   |update_vtable_references,
   |at ipa-visibility.cc:383|at ipa-visibility.cc:383
   ||since
   ||r12-2900-gfa28520fadb9405f
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org

--- Comment #2 from Martin Liška  ---
With:

g++ pr104533.C -c -fPIC -Ofast -fno-semantic-interposition

it crashes since r12-2900-gfa28520fadb9405f. I can take a look.

[Bug rtl-optimization/104589] [11/12 Regression] Emitted binary code changes when -g is enabled at -O0 -flto and optimize attribute since r11-3026-gfea13fcd0da03535

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104589

--- Comment #2 from Martin Liška  ---
Started with r11-3026-gfea13fcd0da03535.

[Bug c++/104618] [12 Regression] trunk 20220221 on x86_64-linux-gnu ICEs building sh.cc for sh4-linux-gnu (in build_call_a, at cp/call.cc:381)

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104618

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug c++/104622] [12 Regression] ICE in compare_ics, at cp/call.cc:11419

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104622

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug c++/104623] [11/12 Regression] ICE in cp_parser_skip_to_pragma_eol, at cp/parser.cc:4107

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104623

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug c++/104624] [10/11/12 Regression] ICE in standard_conversion, at cp/call.cc:1213

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104624

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.4
   Priority|P3  |P2

[Bug c/104627] [12 Regression] New failure in deprecated.c

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104627

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Keywords||diagnostic

[Bug lto/104617] Bug in handling of 64k+ sections

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104617

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:2f59f067610f22c3f2ec9b1516e24b85836676ed

commit r12-7325-g2f59f067610f22c3f2ec9b1516e24b85836676ed
Author: Jakub Jelinek 
Date:   Tue Feb 22 11:32:08 2022 +0100

libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6)
A(n##7) A(n##8) A(n##9)
 #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6)
B(n##7) B(n##8) B(n##9)
 #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6)
C(n##7) C(n##8) C(n##9)
 #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6)
D(n##7) D(n##8) D(n##9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c
-ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range
sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range
sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range
sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should
index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should
index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should
index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info
and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so
before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).

[Bug tree-optimization/104632] Missed optimization about reading backwards

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104632

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-02-22
 Status|UNCONFIRMED |NEW

--- Comment #3 from Richard Biener  ---
The bswap pass is supposed to handle these but has

bool
find_bswap_or_nop_load (gimple *stmt, tree ref, struct symbolic_number *n)
{
...
  /* Avoid returning a negative bitpos as this may wreak havoc later.  */
  if (maybe_lt (bit_offset, 0))
{

commenting the code produces the desired

load_le32_backwards:
.LFB0:
.cfi_startproc
movl-4(%rdi), %eax
ret

the code is present since the introduction of memory source to the bswap pass.
It needs to be investigated what exactly the "havoc" is but clearly the "havoc"
should be mitigated closer to the offenders since the above case seems to work
just fine.

[Bug c++/104390] internal compiler error: tree check: accessed elt 2 of 'tree_vec' with 0 elts in tsubst_pack_expansion, at cp/pt.cc:13125

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104390

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #2 from Martin Liška  ---
All GCC releases I have ICE with -std=c++17:

pr104390.C: In function ‘void g()’:
pr104390.C:15:28: internal compiler error: tree check: accessed elt 2 of
tree_vec with 0 elts in tsubst_pack_expansion, at cp/pt.c:10007
 C::f<&B::v>({});
^

[Bug c++/104635] New: for loop optimized into infinite loop

2022-02-22 Thread szullo.adam at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

Bug ID: 104635
   Summary: for loop optimized into infinite loop
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: szullo.adam at gmail dot com
  Target Milestone: ---

compiling with g++, optimization (other than -O0) turn for loop into infinite
loop

#include 
int test()
{
for (int i=0; i<4; i++)
{
printf("%d\n",i);
}
}
int main()
{
test();
}

i DO ACKNOWLEDGE, that (contrary to c) in c++ missing return value is invalid
(-Wreturn-type)

Expected behaviour:
return with garbage value
-or-
segfault on return
// or maybe instead of warning, throw error

[Bug c/104633] [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

--- Comment #2 from Richard Biener  ---
The following is a valid use of extern inline I think

extern int memcmp (const void * p, const void *q, unsigned long size);

extern inline __attribute__((always_inline,gnu_inline))
int memcmp (const void * p, const void *q, unsigned long size)
{
  return memcmp (p, q, size);
}

int foo (const void * p, const void *q)
{
  return memcmp (p, q, 4);
}

I think the user needs to avoid name-lookup issues here by using an alias
for the call to memcmp in the inline wrapper which might also avoid the
diagnostic.  That __builtin_memcmp finds the memcmp definition is quite
unfortunate and makes it not usable as a way to avoid using an alias.

[Bug fortran/104391] [9/10/11 Regression] bind(C) and allocatable or pointer attribute don't work

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104391

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-02-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||burnus at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org,
   ||pault at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
Fixed on master with r12-2511-g0cbf03689e3e7d9d, started with
r9-5372-gbbf18dc5d248a79a.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Richard Biener  ---
There's nothing wrong here - by omitting the return you tell GCC that the loop
exit cannot be possibly reached which means we elide it and the exit test.

You do get a diagnostic by default for this which cannot be an error since the
behavior is only undefined at runtime.

[Bug fortran/104393] incorrect results with elemental functions of scalar derived types with allocatable components

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104393

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
(In reply to Andrew Pinski from comment #1)
> In GCC before 9, the program would seg fault at runtime.
> 
> Confirmed.

That got fixed with r9-5091-g07b700ead5362478.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread szullo.adam at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #2 from Szüllő Ádám  ---
>There's nothing wrong here
how a missing retun statement corrupt an independent code block, with "private"
variable inside it's own scope?

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Please read https://blog.regehr.org/archives/213

[Bug c/104633] [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #2)
> The following is a valid use of extern inline I think
> 
> extern int memcmp (const void * p, const void *q, unsigned long size);
> 
> extern inline __attribute__((always_inline,gnu_inline))
> int memcmp (const void * p, const void *q, unsigned long size)
> {
>   return memcmp (p, q, size);
> }

I think the above isn't actually valid, the compiler should still inline it
infinitely.  This is an infinite recursion.
extern inline __attribute__((always_inline,gnu_inline))
int memcmp (const void * p, const void *q, unsigned long size)
{
  return __builtin_memcmp (p, q, size);
}
is ok, __builtin_memcmp there doesn't mean it should use the user memcmp
inline, it should handle it as the builtin and if it decides it wants to call,
will call memcmp (but the out of line one).

What glibc uses are either the call __builtin_whatever forms, or call an alias,
say:
extern int __memcmp_alias (const void *, const void *, unsigned long) __asm
("memcmp");

extern inline __attribute__((always_inline,gnu_inline))
int memcmp (const void * p, const void *q, unsigned long size)
{
  return __memcmp_alias (p, q, size);
}
This one is also fine, it should call the external function, not the inline
recursively.
So, -Winfinite-recursion shouldn't warn about these 2 forms and should warn
about the #c2 case.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread szullo.adam at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #4 from Szüllő Ádám  ---
i understand that missing return value is undefined behaviour.
my point is, that this should be limited to the act of return (return with
garbage, segfault, stuck in an infinite loop _after_ the for loop)

what i can't understand is, that even if GCC sees that the loop exit cannot be
possibly reached, how that will lead to the for loop disobeying the test
expression.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #5 from Jakub Jelinek  ---
If you'd bothered to read the link I've provided (actually all 3 parts of it),
maybe you'd understand.  Anyway, bugzilla is for reporting bugs (there is none
on  the compiler side), not for teaching users how to program.

[Bug target/103353] Indefinite recursion when compiling -mmma requiring testcase w/ -maltivec

2022-02-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103353

--- Comment #4 from Segher Boessenkool  ---
You miss all extra errors the expand_call can generate.  This is the general
reason why we try to continue instead of stopping after the first error.  The
reason is that later errors may be more obvious to the user.  This of course
does no longer work so well because our errors now take 30 lines instead of 1.

It probably is best if the generic opaque-mode emit_move code does not try
to move it via some other mode_class.  Peter?

Failing that, we can work around it by having move patterns for those modes
always, but hard erroring on them (FAIL is no good).

[Bug c++/104636] New: implicit use of explicit constructor

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104636

Bug ID: 104636
   Summary: implicit use of explicit constructor
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc at ebasoft dot com.pl
  Target Milestone: ---

during build of kde-misc/kdeconnect with clang and then gcc I found that gcc
allows implicit use of explicit ctor mentioning even that it would use
explicit, it sounds like it is compiling this as c++98 or so ..

https://godbolt.org/z/WsWPbbKfe

:14:13: warning: converting to 'foo' from initializer list would use
explicit constructor 'foo::foo(const char*, uint8_t)'
   14 | return {};
  | ^
:14:13: note: in C++11 and above a default constructor can be explicit

similar bug was resolved and fixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40685

[Bug c/104627] [12 Regression] New failure in deprecated.c

2022-02-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104627

--- Comment #4 from Segher Boessenkool  ---
The old warning was more helpful and specific, it would be nice if we could
have that back.

[Bug target/104637] New: ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx

2022-02-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

Bug ID: 104637
   Summary: ICE: maximum number of LRA assignment passes is
achieved (30) with -Og -fno-forward-propagate -mavx
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 52489
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52489&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -Og -fno-forward-propagate -mavx testcase.c
testcase.c: In function 'foo':
testcase.c:12:5: warning: division by zero [-Wdiv-by-zero]
   12 |   u /= 0;
  | ^~
during RTL pass: reload
testcase.c:16:1: internal compiler error: maximum number of LRA assignment
passes is achieved (30)
   16 | }
  | ^
0x11b4d4c lra_assign(bool&)
/repo/gcc-trunk/gcc/lra-assigns.cc:1694
0x11af12f lra(_IO_FILE*)
/repo/gcc-trunk/gcc/lra.cc:2395
0x115fbc9 do_reload
/repo/gcc-trunk/gcc/ira.cc:5940
0x115fbc9 execute
/repo/gcc-trunk/gcc/ira.cc:6126
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/96442] [9/10/11/12 Regression] ICE in tree check: expected integer_type or enumeral_type or boolean_type or real_type or fixed_point_type, have record_type in int_fits_type_p, at tree.c:8954

2022-02-22 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96442

Roger Sayle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com
 CC||roger at nextmovesoftware dot 
com
 Status|NEW |ASSIGNED

--- Comment #4 from Roger Sayle  ---
Patch proposed.
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590716.html

[Bug rtl-optimization/104638] New: ICE: maximum number of LRA assignment passes is achieved (30) with -O

2022-02-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104638

Bug ID: 104638
   Summary: ICE: maximum number of LRA assignment passes is
achieved (30) with -O
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 52490
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52490&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O testcase.c
during RTL pass: reload
testcase.c: In function 'foo':
testcase.c:19:1: internal compiler error: maximum number of LRA assignment
passes is achieved (30)
   19 | }
  | ^
0x11b4d4c lra_assign(bool&)
/repo/gcc-trunk/gcc/lra-assigns.cc:1694
0x11af12f lra(_IO_FILE*)
/repo/gcc-trunk/gcc/lra.cc:2395
0x115fbc9 do_reload
/repo/gcc-trunk/gcc/ira.cc:5940
0x115fbc9 execute
/repo/gcc-trunk/gcc/ira.cc:6126
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-7324-20220222104313-gd44dc131f48-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r12-7324-20220222104313-gd44dc131f48-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.1 20220222 (experimental) (GCC)

[Bug c/104633] [12 Regression] -Winfinite-recursion diagnoses fortify wrappers

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104633

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2022-02-22
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #4 from Jakub Jelinek  ---
Created attachment 52491
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52491&action=edit
gcc12-pr104633.patch

Untested fix.

[Bug target/102768] [feature request] Add compiler support for aarch64 shadow call stack

2022-02-22 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #9 from nsz at gcc dot gnu.org ---
i'm closing this as fixed. open separate bugs for further improvements.

Fixed by

https://gcc.gnu.org/g:ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e

commit ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e
Author: Dan Li 
AuthorDate: 2022-02-21 20:01:14 +

aarch64: Add compiler support for Shadow Call Stack

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc (parse_sanitizer_options): Add shadow-call-stack
and exclude SANITIZE_SHADOW_CALL_STACK.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.
* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.

[Bug tree-optimization/104639] New: Useless loop not fully optimized anymore

2022-02-22 Thread denis.campredon at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104639

Bug ID: 104639
   Summary: Useless loop not fully optimized anymore
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

the following code compiled with -02
---
bool foo(int i) {
while (i == 4)
i += 2;
return i;
}
--

Trunk generates the following assembly
-
foo(int):
cmp edi, 4
mov eax, 6
cmove   edi, eax
testedi, edi
setne   al
ret
-

whereas 11.2 generate more optimized assembly

-
foo(int):
testedi, edi
setne   al
ret


[Bug tree-optimization/104639] [12 Regression] Useless loop not fully optimized anymore

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104639

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-02-22
   Target Milestone|--- |12.0
 Ever confirmed|0   |1
 CC||aldyh at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
Summary|Useless loop not fully  |[12 Regression] Useless
   |optimized anymore   |loop not fully optimized
   ||anymore
   Priority|P3  |P1

--- Comment #1 from Jakub Jelinek  ---
Started with r12-3453-g01b5038718056b024b370b74a874fbd92c5bbab3

[Bug tree-optimization/104639] [12 Regression] Useless loop not fully optimized anymore

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104639

--- Comment #2 from Jakub Jelinek  ---
One thing is not threading through loop latches, but in this case once the loop
is optimized into straight line code in thread2 we don't thread that further,
so end up with
  if (i_2(D) == 4)
goto ; [97.00%]
  else
goto ; [3.00%]

   [local count: 3540129]:

   [local count: 118111600]:
  # i_6 = PHI 
  _3 = i_6 != 0;
  return _3;
For the result, the
 i_6 = i_2(D) == 4 ? 6 : i_2(D);
is equivalent to just i_2(D) because we only care whether it is non-zero and
both 4 and 6 are non-zero.

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

--- Comment #5 from Patrick Palka  ---
(In reply to Artur Bać from comment #4)
> trunk at compiler explorer still rejects valid code
> 
> https://godbolt.org/z/v4ebhj9Gh, only the message of requirement of
> namespace scope is missing from gcc 11.2, invalid use of template-name
> without an argument list
> 
> https://godbolt.org/z/7Wev6saWz "ctad" must be declared at namespace scope +
> invalid use of template-name without an argument list

It works on trunk if you omit the dubious 'typename' before the template name:
https://godbolt.org/z/77M4dh3n5

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

--- Comment #6 from Artur Bać  ---
The typename was from my real code by mistake where value_type s template
param.
But in real code withing template I have to use typename and it doesn't work
with trunk too.

https://godbolt.org/z/E6Pavhfza

[Bug target/104409] -march=armv8.6-a+ls64 crashes, LS64 builtins causes ICE

2022-02-22 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104409

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org,
   ||wirkus at gcc dot gnu.org

--- Comment #6 from Martin Liška  ---
So the root cause is quite simple:

#0  pushclass (type=) at
/home/marxin/Programming/gcc/gcc/cp/class.cc:8069
#1  0x00f869ea in begin_class_definition (t=) at /home/marxin/Programming/gcc/gcc/cp/semantics.cc:3466
#2  0x00d284da in cxx_simulate_record_decl (loc=1, name=0x2edeca1
"__arm_data512_t", fields=...) at
/home/marxin/Programming/gcc/gcc/cp/decl.cc:16689
#3  0x01e51927 in aarch64_init_ls64_builtins_types () at
/home/marxin/Programming/gcc/gcc/config/aarch64/aarch64-builtins.cc:1610
#4  0x01e51ac4 in aarch64_init_ls64_builtins () at
/home/marxin/Programming/gcc/gcc/config/aarch64/aarch64-builtins.cc:1622
#5  0x01e51f17 in aarch64_general_init_builtins () at
/home/marxin/Programming/gcc/gcc/config/aarch64/aarch64-builtins.cc:1735
#6  0x01d57551 in aarch64_init_builtins () at
/home/marxin/Programming/gcc/gcc/config/aarch64/aarch64.cc:14489
#7  0x0105c566 in c_define_builtins
(va_list_ref_type_node=,
va_list_arg_type_node=) at
/home/marxin/Programming/gcc/gcc/c-family/c-common.cc:4225
#8  0x0105e204 in c_common_nodes_and_builtins () at
/home/marxin/Programming/gcc/gcc/c-family/c-common.cc:4712
#9  0x00cf6279 in cxx_init_decl_processing () at
/home/marxin/Programming/gcc/gcc/cp/decl.cc:4452
#10 0x00d9bb5e in cxx_init () at
/home/marxin/Programming/gcc/gcc/cp/lex.cc:328
#11 0x0186796c in lang_dependent_init (name=0x3982f00
"/home/marxin/Programming/testcases/pr104409.c") at
/home/marxin/Programming/gcc/gcc/toplev.cc:1858
#12 0x018681a7 in do_compile (no_backend=false) at
/home/marxin/Programming/gcc/gcc/toplev.cc:2153
#13 0x018685d2 in toplev::main (this=0x7fffdb5a, argc=20,
argv=0x7fffdc88) at /home/marxin/Programming/gcc/gcc/toplev.cc:2320
#14 0x02bdc260 in main (argc=20, argv=0x7fffdc88) at
/home/marxin/Programming/gcc/gcc/main.cc:39

It's called at the time when current_class_stack_size == 0 which explain the
ICE.

The variable is initialized in   init_class_processing (), which is also called
from cxx_init_decl_processing (void), but later.

CCint Przemyslaw

[Bug rtl-optimization/104596] Means to add a comment in the assembly

2022-02-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104596

--- Comment #3 from Tom de Vries  ---
Submitted patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

--- Comment #7 from Artur Bać  ---
Do I have to open new bug because of You marked it as fixed while it is not
fixed ?

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread szullo.adam at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #6 from Szüllő Ádám  ---
I'd bothered to read the articles, and there were no new information (nor
relevant to this exact case).
You are right that this is not a bug, becasue the code is invalid, as myself
emphasized too in the description. This case is perfectly covered by undefined
behaviour. This is why i was hesitant to open the bug report in the first
place.
Still, i opened it, because it felt not right, the optimizer optimized away
well defined code, something that might worth a check.

[Bug c++/104640] New: incomplete unicode support for User-defined literals

2022-02-22 Thread maik.urbannek at cattatech dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104640

Bug ID: 104640
   Summary: incomplete unicode support for User-defined literals
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: maik.urbannek at cattatech dot de
  Target Milestone: ---

```
int operator""_π(const unsigned long long i){return (static_cast(i));}

template  double operator "" _π(){return 12.0;}

int main() {
const double d=0.0_π;
return 234_π+static_cast(d);
}
```

In my opinion this should work (clang compiles it).

gcc produces the following error:
```
 error: expected initializer before '\U03c0'
1 | int operator""_π(const unsigned long long i){return
(static_cast(i));}
  |^
Compiler returned: 1
```

Gcc(since gcc10) can compile the second operator (line 3), but not the first
(line 1).

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

--- Comment #8 from Patrick Palka  ---
(In reply to Artur Bać from comment #7)
> Do I have to open new bug because of You marked it as fixed while it is not
> fixed ?

Yes please, it'd be easier to track that way.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #7 from Jonathan Wakely  ---
It isn't well-defined code though. It's undefined, as you yourself said. It
can't be both.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread szullo.adam at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #8 from Szüllő Ádám  ---
Yes, the code as a whole is invalid.
But for(int i=0; i<4; i++) is well defined.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #9 from Jonathan Wakely  ---
(In reply to Szüllő Ádám from comment #4)
> i understand that missing return value is undefined behaviour.
> my point is, that this should be limited to the act of return (return with
> garbage, segfault, stuck in an infinite loop _after_ the for loop)

No.

> what i can't understand is, that even if GCC sees that the loop exit cannot
> be possibly reached, how that will lead to the for loop disobeying the test
> expression.

Undefined behaviour does not mean "returns garbage, but otherwise behaves as
you expect".

There is no bound on how surprising undefined behaviour can be.

[Bug c++/104635] for loop optimized into infinite loop

2022-02-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104635

--- Comment #10 from Jonathan Wakely  ---
(In reply to Szüllő Ádám from comment #8)
> Yes, the code as a whole is invalid.
> But for(int i=0; i<4; i++) is well defined.

No, that's not how undefined behaviour works. It isn't bounded or localised to
a specific part of the program.

[Bug target/104363] hppa: __asm__ directive .global and multiple .symver not supported

2022-02-22 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104363

--- Comment #11 from dave.anglin at bell dot net ---
On 2022-02-22 4:15 a.m., mathieu.malaterre at gmail dot com wrote:
> [...]
> libkcapi-1.3.1/apps/kcapi-rng.c:302: undefined reference to
> `kcapi_rng_generate'
> /usr/lib/gcc-cross/hppa-linux-gnu/10/../../../../hppa-linux-gnu/bin/ld:
> libkcapi-1.3.1/apps/kcapi-rng.c:328: undefined reference to
> `kcapi_memset_secure'
> [...]
As I tried to say previously, the problem was with the asm used in libkcapi.

[Bug c++/104641] New: Deduction guide for member template class rejected at class scope when used with typename dependant type

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104641

Bug ID: 104641
   Summary: Deduction guide for member template class rejected at
class scope when used with typename dependant type
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc at ebasoft dot com.pl
  Target Milestone: ---

Follow up of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

Doesn't work when used with template typename inside template
https://godbolt.org/z/E6Pavhfza

[Bug c++/100983] Deduction guide for member template class rejected at class scope

2022-02-22 Thread gcc at ebasoft dot com.pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983

--- Comment #9 from Artur Bać  ---
created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104641

[Bug c++/104642] New: Add __builtin_trap() for missing return at -O0

2022-02-22 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104642

Bug ID: 104642
   Summary: Add __builtin_trap() for missing return at -O0
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

Again and again and again users are surprised (and often angry) about the
effects of a missing return in C++. It's undefined behaviour but that doesn't
stop bug reports like PR 104635 and questions like
https://gcc.gnu.org/pipermail/gcc/2022-January/238200.html

Users expect the undefined behaviour for a missing return to be "semi-defined".
We should just make it trap at -O0, then at least we can tell them that for
unoptimized code, the result is guaranteed to crash. With optimisation, it's
still undefined and can invalidate all their assumptions.

As I said in https://gcc.gnu.org/pipermail/gcc/2022-January/238204.html

What if we inserted the trap for -O0?

1. Not everybody uses ubsan even when they should use it.

2. The code can use unreachable annotations if it really needs to leave
some paths unhandled, but really can't live with the branch and trap
instructions. (The C++ standard is getting std::unreachable and std::assume
to do that in a portable way, so there is less excuse for not doing it).

[Bug tree-optimization/101636] [11/12 Regression] ICE: verify_gimple failed (error: conversion of register to a different size in 'view_convert_expr')

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636

--- Comment #14 from Richard Biener  ---
Created attachment 52492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52492&action=edit
GIMPLE testcase

So I think that the IL we produce from SLP vectorizing the if-converted loop
body is not great and we should address this issue there.  In particular
emitting a
VECTOR_BOOLEAN_TYPE_P CTOR for the external bools is not OK which is also what
the iffy code in vect_create_constant_vectors shows.  A non-loop GIMPLE
testcase
for this is attached.

It doesn't ICE but the code generated is just awful.

I've tried to compensate in vect_create_constant_vectors itself by creating
a non-VECTOR_BOOLEAN_TYPE_P CTOR and producing a VECTOR_BOOLEAN_TYPE_P via
a NE comparison but with just AVX512F we can handle V16SImode compares but
not V16QImode which is what would naturally appear - and vector lowering will
decompose that again and we have no means of failing vectorization in this
function.

Instead I think this needs to be handled by patterns and if it is not,
rejected.  In this case it's vectorizable_operation for bitwise ops
that just picks the result vector type here

  /* If op0 is an external or constant def, infer the vector type
 from the scalar type.  */
  if (!vectype)
{
  /* For boolean type we cannot determine vectype by
 invariant value (don't know whether it is a vector
 of booleans or vector of integers).  We use output
 vectype because operations on boolean don't change
 type.  */
  if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op0)))
{
  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (scalar_dest)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "not supported operation on bool value.\n");
  return false;
}
  vectype = vectype_out;
}

but that assumes we can create a vector bool from invariants or externals
which we generally cannot.  If we disable that here we'll run into the
same issue for the COND_EXPR.

Looking at vect_recog_bool_pattern it really does two things at the same time,
optimize |& sequences _and_ perform correctness transforms based on mask
uses.  In this case we only start from the COND_EXPR as a mask use but
once we see the internal-def & external-def mask def we decide we do not
want to optimize it.  But we'd still need to make the external def suitable
for the mask use (and we know the precision to use there).

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Tom de Vries :

https://gcc.gnu.org/g:5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1

commit r12-7332-g5ed77fb3ed1ee0289a0ec9499ef52b99b39421f1
Author: Tom de Vries 
Date:   Tue Apr 20 08:47:03 2021 +0200

[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last
thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread,
it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  

PR target/99555
* config/nvptx/bar.c (generation_to_barrier): New function, copied
from config/rtems/bar.c.
(futex_wait, futex_wake): New function.
(do_spin, do_wait): New function, copied from config/linux/wait.h.
(gomp_barrier_wait_end, gomp_barrier_wait_last)
(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel):
Remove
and replace with include of config/linux/bar.c.
* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
(gomp_barrier_init): Init new fields.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove
nvptx-specific
workarounds.
* testsuite/libgomp.c/pr99555-1.c: Same.
* testsuite/libgomp.fortran/task-detach-6.f90: Same.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2022-02-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |12.0

--- Comment #12 from Tom de Vries  ---
Fixed in "[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end".

[Bug c++/104642] Add __builtin_trap() for missing return at -O0

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104642

--- Comment #1 from Richard Biener  ---
Not sure, people will still see the surprising behavior at -O1+ then and at -O0
we're not exploiting the __builtin_unreachable () in too surprising ways (we'll
just fall thru to the next function or so - heh.

I'd rather have -funreachable-traps or so, enabled by default at -O0, rather
than special-casing the return case btw.

[Bug c++/104641] Deduction guide for member template class rejected at class scope when used with typename dependant type

2022-02-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104641

Patrick Palka  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
   Last reconfirmed||2022-02-22
 Ever confirmed|0   |1
 CC||ppalka at gcc dot gnu.org
 Depends on||100983
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Patrick Palka  ---
Confirmed, this never worked.  Reduced rejects-valid testcase:

template
struct A {
  template struct B { B(U); };
};

template
void f() {
  typename A::B x(0);
}

template void f();


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100983
[Bug 100983] Deduction guide for member template class rejected at class scope

[Bug tree-optimization/104639] [12 Regression] Useless loop not fully optimized anymore

2022-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104639

--- Comment #3 from Richard Biener  ---
it's odd that VRP doesn't optimize this though.  VRP2 says

Exported global range table:

i_6  : int ~[4, 4]
bool foo (int i)
{
  bool _3;

   [local count: 118111600]:
  if (i_2(D) == 4)
goto ; [97.00%]
  else
goto ; [3.00%]

   [local count: 955630224]:

   [local count: 118111600]:
  # i_6 = PHI 
  _3 = i_6 != 0;
  return _3;

but shouldn't ranger figure that i_2(D) != 0 && 6 != 0 is the same as
i_2(D) != 0?  Alternatively this could be sth for phiopt.  PRE still
sees the loop (I guess it was previously the one optimizing this)

[Bug c++/104624] [10/11/12 Regression] ICE in standard_conversion, at cp/call.cc:1213

2022-02-22 Thread gscfq--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104624

--- Comment #1 from G. Steinmetz  ---
> Started between 20200712 and 20200719 :
Sorry, a cut&pasto from pr104623. It started between 20200419 and 20200509.

[Bug tree-optimization/104639] [12 Regression] Useless loop not fully optimized anymore

2022-02-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104639

--- Comment #4 from Jakub Jelinek  ---
The ranger itself can't, i_2(D) in the PHI arg is ~[4, 4], and _3 is still [0,
1] aka VARYING.
Yes, phiopt could handle this by seeing a PHI result is only used in an
equality comparison and try to figure out something for it using range info or
so, or vrp2 could.
In GCC 11, it was indeed PRE that optimized that:
  if (i_2(D) == 4)
goto ; [97.00%]
  else
goto ; [3.00%]

   [local count: 3540129]:

   [local count: 118111600]:
  # i_6 = PHI 
  _3 = i_6 != 0;
which is what we have on the trunk until optimized with:
  _Bool _1;
  _Bool prephitmp_7;

   [local count: 118111600]:
  if (i_2(D) == 4)
goto ; [97.00%]
  else
goto ; [3.00%]

   [local count: 3540129]:
  _1 = i_2(D) != 0;

   [local count: 118111600]:
  # i_6 = PHI 
  # prephitmp_7 = PHI <_1(3), 1(2)>
and later reassoc2 optimizes that to just
  _6 = i_2(D) != 0;
  return _6;

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:884f77b489510e1df9db2889b60c5df6fcda

commit r12-7338-g884f77b489510e1df9db2889b60c5df6fcda
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:22 2021 +

arm: Implement MVE predicates as vectors of booleans

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b'.  This patch updates the aarch64 definition of VNx*BI as
needed.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add
qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New
modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.cc (init_emit_once): Handle all boolean modes.
* genmodes.cc (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
define BImode.
* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.cc (output_constant_pool_2): Likewise.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:884f77b489510e1df9db2889b60c5df6fcda

commit r12-7338-g884f77b489510e1df9db2889b60c5df6fcda
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:22 2021 +

arm: Implement MVE predicates as vectors of booleans

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b'.  This patch updates the aarch64 definition of VNx*BI as
needed.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add
qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New
modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.cc (init_emit_once): Handle all boolean modes.
* genmodes.cc (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
define BImode.
* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.cc (output_constant_pool_2): Likewise.

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #14 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:91224cf625dc90304bb515a0cc602beed48fe3da

commit r12-7339-g91224cf625dc90304bb515a0cc602beed48fe3da
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:27 2021 +

arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

This patch also adds tests derived from the one provided in PR
target/101325: there is a compile-only test because I did not have
access to anything that could execute MVE code until recently.  I have
been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon 
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_bool_vec_to_const): New.
* config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_bool_vec_to_const): New.
(neon_make_constant): Call mve_bool_vec_to_const when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:91224cf625dc90304bb515a0cc602beed48fe3da

commit r12-7339-g91224cf625dc90304bb515a0cc602beed48fe3da
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:27 2021 +

arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

This patch also adds tests derived from the one provided in PR
target/101325: there is a compile-only test because I did not have
access to anything that could execute MVE code until recently.  I have
been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon 
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_bool_vec_to_const): New.
* config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_bool_vec_to_const): New.
(neon_make_constant): Call mve_bool_vec_to_const when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:df0e57c2c032cea0f77f2e68231c035f282b26d6

commit r12-7340-gdf0e57c2c032cea0f77f2e68231c035f282b26d6
Author: Christophe Lyon 
Date:   Wed Oct 20 15:30:16 2021 +

arm: Fix vcond_mask expander for MVE (PR target/100757)

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) &
select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:e6a4aefce8e47a7d3ba781066a1410ebfa963e59

commit r12-7341-ge6a4aefce8e47a7d3ba781066a1410ebfa963e59
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:35 2021 +

arm: Convert remaining MVE vcmp builtins to predicate qualifiers

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:e6a4aefce8e47a7d3ba781066a1410ebfa963e59

commit r12-7341-ge6a4aefce8e47a7d3ba781066a1410ebfa963e59
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:35 2021 +

arm: Convert remaining MVE vcmp builtins to predicate qualifiers

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:724d6566cd11c676f3bc082a9771784c825affb1

commit r12-7342-g724d6566cd11c676f3bc082a9771784c825affb1
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:40 2021 +

arm: Convert more MVE builtins to predicate qualifiers

This patch covers all builtins that have an HI operand and use the
 iterator, thus we can replace HI whe .

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc
(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New.
(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(STRS_P_QUALIFIERS): Use predicate qualifier.
(STRU_P_QUALIFIERS): Likewise.
(STRSU_P_QUALIFIERS): Likewise.
(STRSS_P_QUALIFIERS): Likewise.
(LDRGS_Z_QUALIFIERS): Likewise.
(LDRGU_Z_QUALIFIERS): Likewise.
(LDRS_Z_QUALIFIERS): Likewise.
(LDRU_Z_QUALIFIERS): Likewise.
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to
...
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(BINOP_NONE_NONE_PRED_QUALIFIERS): New.
(BINOP_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def: Use new predicated qualifiers.
* config/arm/mve.md: Use MVE_VPRED instead of HI.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #20 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:724d6566cd11c676f3bc082a9771784c825affb1

commit r12-7342-g724d6566cd11c676f3bc082a9771784c825affb1
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:40 2021 +

arm: Convert more MVE builtins to predicate qualifiers

This patch covers all builtins that have an HI operand and use the
 iterator, thus we can replace HI whe .

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc
(TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New.
(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
(QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
(STRS_P_QUALIFIERS): Use predicate qualifier.
(STRU_P_QUALIFIERS): Likewise.
(STRSU_P_QUALIFIERS): Likewise.
(STRSS_P_QUALIFIERS): Likewise.
(LDRGS_Z_QUALIFIERS): Likewise.
(LDRGU_Z_QUALIFIERS): Likewise.
(LDRS_Z_QUALIFIERS): Likewise.
(LDRU_Z_QUALIFIERS): Likewise.
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to
...
(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
(BINOP_NONE_NONE_PRED_QUALIFIERS): New.
(BINOP_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def: Use new predicated qualifiers.
* config/arm/mve.md: Use MVE_VPRED instead of HI.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #21 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:6a7c13a0cf2290b60ab36f9ce1027b92838586bd

commit r12-7343-g6a7c13a0cf2290b60ab36f9ce1027b92838586bd
Author: Christophe Lyon 
Date:   Wed Oct 20 15:39:17 2021 +

arm: Convert more load/store MVE builtins to predicate qualifiers

This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we keep the HI mode for predicates.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:6a7c13a0cf2290b60ab36f9ce1027b92838586bd

commit r12-7343-g6a7c13a0cf2290b60ab36f9ce1027b92838586bd
Author: Christophe Lyon 
Date:   Wed Oct 20 15:39:17 2021 +

arm: Convert more load/store MVE builtins to predicate qualifiers

This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we keep the HI mode for predicates.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

[Bug target/101325] [12 Regression] arm: Wrong code with MVE vcmpeqq intrinsic since r12-671-gd083fbf72

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101325

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:c6b4ea7ab1aa6c5c07798fa6c6ad15dd1761b5ed

commit r12-7344-gc6b4ea7ab1aa6c5c07798fa6c6ad15dd1761b5ed
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:49 2021 +

arm: Convert more MVE/CDE builtins to predicate qualifiers

This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

[Bug target/100757] [12 Regression] arm: ICE (unrecognizable insn) with MVE VPSELQ_S since r12-834-ga6eacbf10

2022-02-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100757

--- Comment #22 from CVS Commits  ---
The master branch has been updated by Christophe Lyon :

https://gcc.gnu.org/g:c6b4ea7ab1aa6c5c07798fa6c6ad15dd1761b5ed

commit r12-7344-gc6b4ea7ab1aa6c5c07798fa6c6ad15dd1761b5ed
Author: Christophe Lyon 
Date:   Wed Oct 13 09:16:49 2021 +

arm: Convert more MVE/CDE builtins to predicate qualifiers

This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

[Bug c++/104641] Deduction guide for member template class rejected at class scope when used with typename dependant type

2022-02-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104641

--- Comment #2 from Patrick Palka  ---
One workaround is to use a helper member function (defined alongside the nested
class template) that performs the CTAD, something like:

#include 

template
struct A {
  template struct B { B(U); };

  template
  static auto make_B(Ts&&... args) {
return B(std::forward(args)...);
  }
};

template
void f() {
  auto x = A::make_B(0);
}

template void f();

https://godbolt.org/z/vqjff9nbs

  1   2   >