[Bug d/108763] va_arg usage in D doesn't compile

2023-02-12 Thread ibuclaw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108763

ibuclaw at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ibuclaw at gcc dot gnu.org

--- Comment #5 from ibuclaw at gcc dot gnu.org ---
I abandoned the idea of supporting RTTI-based variadics years ago. Even the
current reference implementation only supports a subset of the x86_64 ABI in
its current incarnation as far as I recall.

I had considered maybe libffi might allow us to do this, but I didn't see
anything that would allow me to say "retrieve the next variadic argument of
size SIZE and mode MODE". But I could not see anything that looked exactly as
that, even though as I understand there is limited support for constructing a
variadic call to a C function.

[Bug d/108763] va_arg usage in D doesn't compile

2023-02-12 Thread ibuclaw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108763

--- Comment #6 from ibuclaw at gcc dot gnu.org ---
I'll add it as a note to the deviations page.

https://gcc.gnu.org/onlinedocs/gdc/Missing-Features.html#Missing-Features

I'd actually forgotten about this.

[Bug target/108764] [RISCV] Cost model for RVB is too aggressive

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764

--- Comment #2 from Andrew Pinski  ---
sllia4,a2,3
sh3add  a5,a2,a0

vs
sllia2,a2,3
add a5,a0,a2

I think the first one is better really because you have two indepedent
instructions and can be issued at the same time.
Really this is all core specific and the generic tuning should be "generic"
which means this is the correct tuning ...

[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

2023-02-12 Thread michael.crusoe at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927

Michael Crusoe  changed:

   What|Removed |Added

 CC||michael.crusoe at gmail dot com

--- Comment #1 from Michael Crusoe  ---
2023 update: this is still happening in GCC 10.1+ including trunk

https://godbolt.org/z/YKKcdP8MY

[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927

--- Comment #2 from Andrew Pinski  ---
Are you sure _mm_cvttpd_epi32 is documented that way? I suspect it is just
unspecified behavior.

[Bug target/108764] [RISCV] Cost model for RVB is too aggressive

2023-02-12 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #3 from Kito Cheng  ---
> I think one solution is to change the cost model of such complex instructions 
> to the sum of the cost for each part. E.g. 
> cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # 
> cost of addition

Some RISC-V core implementation did has one cycle for shNadd operation as I
know,  but I know it's not true for every implementation.

Anyway, it's really uarch dependent, so I would prefer keep as it for now, and
then extend the cost model function to easier handle different uarch (-mtune)
when GCC 14 is open.

[Bug target/108764] [RISCV] Cost model for RVB is too aggressive

2023-02-12 Thread sinan.lin at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764

--- Comment #4 from Sinan  ---
(In reply to Andrew Pinski from comment #2)
>   sllia4,a2,3
>   sh3add  a5,a2,a0
> 
> vs
> sllia2,a2,3
> add a5,a0,a2
> 
> I think the first one is better really because you have two indepedent
> instructions and can be issued at the same time.
> Really this is all core specific and the generic tuning should be "generic"
> which means this is the correct tuning ...

Thanks for pointing it out. This might not be a good case(I only notice the
extra `mv` brought from zba). I just have a quick check with spec2017, and it
seems that the current cost model indeed does a better job in terms of the
dependency of slli && add.

[Bug target/108764] [RISCV] Cost model for RVB is too aggressive

2023-02-12 Thread sinan.lin at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764

--- Comment #5 from Sinan  ---
(In reply to Kito Cheng from comment #3)
> > I think one solution is to change the cost model of such complex 
> > instructions to the sum of the cost for each part. E.g. 
> > cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # 
> > cost of addition
> 
> Some RISC-V core implementation did has one cycle for shNadd operation as I
> know,  but I know it's not true for every implementation.

Thanks for the info. Interestingly, the shNadd-like instructions(add reg1,
reg2, reg3, lsl #N) in AArch64/neoverse-n1 are also one cycle
operations(https://developer.arm.com/documentation/pjdoc466751330-9707/latest),
but the cost model for them is different from the one in riscv backend(AArch64
doesn't generate add r1, r2, r3, lsl #3 for the given test case).

> Anyway, it's really uarch dependent, so I would prefer keep as it for now,
> and then extend the cost model function to easier handle different uarch
> (-mtune) when GCC 14 is open.

Agree.

[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

2023-02-12 Thread michael.crusoe at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927

--- Comment #3 from Michael Crusoe  ---
Good question, lets check the reference.

Summary: it is specified behavior that _mm_cvttpd_epi32 returns Integer
Indefinite (8000H) for NaN inputs.

All references below are from the December 2022 edition (Order Number:
325462-078US) of "Intel® 64 and IA-32 Architectures Software Developer’s Manual
Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" from
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

The formal signature of the _mm_cvttpd_epi32 intrinsic is in Table C-1 "Simple
Intrinsics" on page 2987, reminding us that the mnemonic is CVTTPD2DQ.

The formal definition of CVTTPD2DQ is given in section 5.6.1.6 "Intel® SSE2
Conversion Instructions" on page 133

> Convert with truncation packed double precision floating-point values to 
> packed double-
word integers.

On page 106 we learn more about what truncation means in the definition of
CVTTPD2DQ

> 4.8.4.2 Truncation with Intel® SSE, SSE2, and AVX Conversion Instructions
> The following Intel SSE/SSE2 instructions automatically truncate the results 
> of
> conversions from floating-point values to integers when the result it 
> inexact: CVTTPD2DQ,
> CVTTPS2DQ, CVTTPD2PI, CVTTPS2PI, CVTTSD2SI, and CVTTSS2SI. Here, truncation 
> means the
> round toward zero mode described in Table 4-8. There are also several Intel 
> AVX2 and
> AVX-512 instructions which use truncation (VCVTT*)

Table 4.8 from section 4.8.4 states

> Rounding Mode: Round toward zero (Truncate)
> Description: Rounded result is closest to but no greater in absolute value 
> than the infinitely precise result.

Section 11.4.1.6 ("SSE2 Conversion Instructions") states that

> The CVTTPD2DQ (convert with truncation packed double precision floating-point 
> values to
> packed doubleword integers) instruction is similar to the CVTPD2DQ 
> instruction except
> that truncation is used to round a source value to an integer value.

Table 11-1. "Masked Responses of SSE/SSE2/SSE3 Instructions to Invalid
Arithmetic Operations" states that

> Condition: Conversion to integer when the value in the source register is a 
> NaN, ∞, or
> exceeds the representable range for CVTPS2PI, CVTTPS2PI, CVTSS2SI, CVTTSS2SI, 
> CVTPD2PI,
> CVTSD2SI, CVTPD2DQ, CVTTPD2PI, CVTTSD2SI, CVTTPD2DQ, CVTPS2DQ, or CVTTPS2DQ

> Masked Response: Return the integer Indefinite

More explicitly stated is in section D.4.2.2 "Results of Operations with NaN
Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions" where Table
D-8 (page 455) ("CVTPS2PI, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, CVTPD2PI, CVTSD2SI,
CVTTPD2PI, CVTTSD2SI, CVTPS2DQ, CVTTPS2DQ, CVTPD2DQ, CVTTPD2DQ") states that
the masked result from any type of NaN (SNaN or QNaN) will be the Integer
Indefinite (8000H in for 32-bit values).

[Bug objc/108743] -fconstant-cfstrings not supported

2023-02-12 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743

--- Comment #8 from Iain Sandoe  ---
(In reply to Andrew Pinski from comment #7)
> Hmm,
> https://inbox.sourceware.org/gcc-patches/B4F496F4-F31D-41D2-8942-
> 1f0aefbd7...@sandoe-acoustics.co.uk/
> 
> Seems didn't get installed even though it was approved ...

these things happen, I guess we can make it a darwin-specific driver option (as
Jakub says, the 'm' version is technically correct, but we have to accommodate
compatibility sometimes).

There is at least one other platform that I think it s using the NeXT library
(it is open sourced), so maybe it is an appropriate option for that platform
too.

[Bug c++/108761] Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object

2023-02-12 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761

--- Comment #3 from Fangrui Song  ---
New syntax setting the flags will be useful. Also, currently there is no way to
customize the section type.

[Bug c/105660] [12/13 Regression] ICE in warn_parm_array_mismatch when merging two function decls and VLA arguments since r12-1218-gc6503fa93b5565c9

2023-02-12 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105660

--- Comment #11 from Martin Uecker  ---
PATCH: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611817.html

[Bug c/108423] [12/13 Regression] ICE in make_ssa_name_fn with VLA types in arguments and inlining since r12-5338-g4e6bf0b9dd5585df

2023-02-12 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108423

--- Comment #8 from Martin Uecker  ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611562.html

[Bug middle-end/108765] New: ICE with non-local goto

2023-02-12 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765

Bug ID: 108765
   Summary: ICE with non-local goto
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: muecker at gwdg dot de
  Target Milestone: ---

ICE. This is from simplifying PR107840, but seems to be a separate issue and a
new regression.

https://godbolt.org/z/xbT7rrqdW


int main()
{
void foo(void)
{ 
__label__ trgt; 

void jmp(void)
{
goto trgt;   
}
trgt:   ;
}

foo();
}

[Bug middle-end/108765] ICE with non-local goto

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||10.1.0, 6.1.0
Version|13.0|unknown
   Keywords||ice-checking,
   ||ice-on-valid-code

--- Comment #1 from Andrew Pinski  ---
>a new regression.

I really doubt it is a new regression, the ICE only shows up with checking
turned on .

[Bug middle-end/108765] ICE with non-local goto

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Andrew Pinski  ---
It is just a reduced testcase.

*** This bug has been marked as a duplicate of bug 107840 ***

[Bug middle-end/107840] ICE when compiling cursed setjmp/longjmp nested function calls and non-local jumps

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840

--- Comment #5 from Andrew Pinski  ---
*** Bug 108765 has been marked as a duplicate of this bug. ***

[Bug middle-end/107840] ICE when compiling cursed setjmp/longjmp nested function calls and non-local jumps

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840

--- Comment #6 from Andrew Pinski  ---
Reduced testcase:
```
int main()
{
  void foo(void)
  { 
__label__ trgt; 
void jmp(void)  {  goto trgt; }
trgt: ;
  }
  foo();
}
```

[Bug middle-end/108765] ICE with non-local goto

2023-02-12 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765

--- Comment #3 from Martin Uecker  ---
I see. Thanks. The checking is new? Or just because it is not a release built?

[Bug middle-end/108765] ICE with non-local goto

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765

--- Comment #4 from Andrew Pinski  ---
>seems to be a separate issue and a new regression.

It is not, it is just a reduced testcase and the ICE happens with GCC 6 and
above with -fchecking very similar and all.

[Bug target/108766] New: unaligned byteswapped 16bit load is just bad

2023-02-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108766

Bug ID: 108766
   Summary: unaligned byteswapped 16bit load is just bad
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
Target: riscv

Take:
```
short f(unsigned char *a)
{
  return a[0] << 8 | a[1];
}
```
`-O2 -march=rv64iadc_zba_zbb_zbc_zbs_zicsr` produces:
```
lbu a5,1(a0)
lbu a4,0(a0)
sllia0,a5,8
or  a0,a0,a4
slliw   a5,a0,8
srlia0,a0,8
or  a0,a0,a5
sext.h  a0,a0
ret
```

That is just horrible.
It should just be:
```
lbu a5,1(a0)
lbu a4,0(a0)
sllia0,a4,8
or  a0,a0,a5
sext.h  a0,a0
ret
```

It is trying to do an unaligned short load and then a byteswap.

[Bug tree-optimization/108752] word_mode vectorization is pessimized by hard limit on nunits

2023-02-12 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108752

Hans-Peter Nilsson  changed:

   What|Removed |Added

 CC||hp at gcc dot gnu.org

--- Comment #3 from Hans-Peter Nilsson  ---
(In reply to Richard Biener from comment #0)
> emulated vectors (aka word_mode vectorization).

Ackchyually, more commonly known as SWAR: https://en.wikipedia.org/wiki/SWAR.
(IWBN if options and identifiers were keyed off that acronym.)

[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272

--- Comment #7 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:e5a63c986978699a25f4bfb9b58a0111951e7d43

commit r12-9168-ge5a63c986978699a25f4bfb9b58a0111951e7d43
Author: Kewen Lin 
Date:   Mon Jan 16 02:15:39 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]

As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements.  This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.

PR target/108272

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.

(cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)

[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348

--- Comment #8 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:3c7bb6c0b0003f4e1fb52f814ad1a9a7f09573c6

commit r12-9169-g3c7bb6c0b0003f4e1fb52f814ad1a9a7f09573c6
Author: Kewen Lin 
Date:   Wed Jan 18 02:34:19 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]

PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then.  This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.

btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.

PR target/108348

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses of MMA opaque type in function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.

(cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)

[Bug target/108396] [12/13 Regression] PPCLE: vec_vsubcuq missing since r12-5752-gd08236359eb229

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108396

--- Comment #7 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:cb6861acc4074fd2c30a96b52d68c2cd33b9e94d

commit r12-9170-gcb6861acc4074fd2c30a96b52d68c2cd33b9e94d
Author: Kewen Lin 
Date:   Wed Jan 18 02:34:25 2023 -0600

rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396]

As Andrew pointed out in PR108396, there is one typo in
rs6000-overload.def on built-in function vec_vsubcuq:

  [VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq]

"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq, so that compiler is not able to realize
the built-in function name vec_vsubcuq any more.

Co-authored-By: Andrew Pinski 

PR target/108396

gcc/ChangeLog:

* config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo
vec_vsubcuqP with vec_vsubcuq.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108396.c: New test.

(cherry picked from commit aaf29ae6cdbaad58b709a77784375d15138174b3)

[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272

--- Comment #8 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:79a81d526babb6ffb6d85b4a05b29269470ab49d

commit r11-10521-g79a81d526babb6ffb6d85b4a05b29269470ab49d
Author: Kewen Lin 
Date:   Mon Jan 16 02:15:39 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]

As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements.  This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.

PR target/108272

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.

(cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)

[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348

--- Comment #9 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:0e41d8a77887b838de5493c491f411274376227a

commit r11-10522-g0e41d8a77887b838de5493c491f411274376227a
Author: Kewen Lin 
Date:   Wed Jan 18 02:34:19 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]

PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then.  This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.

btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.

PR target/108348

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses of MMA opaque type in function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.

(cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)

[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272

--- Comment #9 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:ec4d91aa885297c3b5bb4bbfb3133ffe2e5e6a2f

commit r10-11211-gec4d91aa885297c3b5bb4bbfb3133ffe2e5e6a2f
Author: Kewen Lin 
Date:   Sun Feb 12 09:35:27 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]

As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements.  This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.

PR target/108272

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.

(cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)

[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348

--- Comment #10 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:7bbed35a05d735387d406afbf866384feaac21e7

commit r10-11212-g7bbed35a05d735387d406afbf866384feaac21e7
Author: Kewen Lin 
Date:   Wed Jan 18 02:34:19 2023 -0600

rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]

PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then.  This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.

btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.

PR target/108348

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add
the
support for invalid uses of MMA opaque type in function arguments.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.

(cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)

[Bug target/108396] [12/13 Regression] PPCLE: vec_vsubcuq missing since r12-5752-gd08236359eb229

2023-02-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108396

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Kewen Lin  ---
Fixed on trunk.

[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292

2023-02-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from Kewen Lin  ---
Fixed on trunk and backported to related branches.

[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339

2023-02-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272

Kewen Lin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Kewen Lin  ---
Fixed on trunk and backported to related branches.

[Bug analyzer/108767] New: O2 optimization has side effects on static analysis.

2023-02-12 Thread geoffreydgr at icloud dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108767

Bug ID: 108767
   Summary: O2 optimization has side effects on static analysis.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: geoffreydgr at icloud dot com
  Target Milestone: ---

Hi, David. I found a problem through the following case that the optimization
`-O2` has side effects on static analysis. GCC static analzyer falsely gives a
NPD warning under the optimization `-O2`.

Input:
```c
#include "stdio.h"
extern void __analyzer_describe ();
extern void __analyzer_eval ();
extern void __analyzer_dump ();

int main()
{
int b = 1;
int e = 2;
int f = 3;
int *g[] = {&e, &e};
int *h = &b;
int *j = &f;

for (int d = 0; d <= 1; d++)
{
*j = (*h && (h = g[d]));
// __analyzer_dump ();
__analyzer_eval(h==0);
// __analyzer_describe(0,h);
}
printf("NPD_FLAG %d\n", *j);

}
```

options: -O2 -fanalyzer
Output:
```
: In function 'main':
:19:9: warning: FALSE
   19 | __analyzer_eval(h==0);
  | ^
:19:9: warning: UNKNOWN
:19:9: warning: TRUE
:19:9: warning: TRUE
:19:9: warning: UNKNOWN
:19:9: warning: TRUE
:19:9: warning: TRUE
:19:9: warning: UNKNOWN
:17:15: warning: dereference of NULL 'h' [CWE-476]
[-Wanalyzer-null-dereference]
   17 | *j = (*h && (h = g[d]));
  |   ^~
  'main': events 1-9
|
|   15 | for (int d = 0; d <= 1; d++)
|  | ~~^~~~
|  |   |
|  |   (1) following 'true' branch (when 'd !=
2')...
|  |   (5) following 'true' branch (when 'd !=
2')...
|  |   (7) following 'true' branch (when 'd !=
2')...
|   16 | {
|   17 | *j = (*h && (h = g[d]));
|  | ~~~
|  | |  |  |  |
|  | |  |  |  (3) following 'true' branch...
|  | |  |  (9) dereference of NULL 'h'
|  | |  (4) ...to here
|  | (2) ...to here
|  | (6) ...to here
|  | (8) ...to here
|
Compiler returned: 0
```

options : -O1 -fanalyzer
Output:
```
: In function 'main':
:19:9: warning: FALSE
   19 | __analyzer_eval(h==0);
  | ^
:19:9: warning: UNKNOWN
Compiler returned: 0
```
-O2: https://godbolt.org/z/GeTaeGMaf
-O1: https://godbolt.org/z/adnY8aa3K

[Bug c/108768] New: bogus -Warray-bounds warnings

2023-02-12 Thread mi+gcc at aldan dot algebra.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108768

Bug ID: 108768
   Summary: bogus -Warray-bounds warnings
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mi+gcc at aldan dot algebra.com
  Target Milestone: ---

Created attachment 54453
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54453&action=edit
Test case (otpcode.c after passing preprocessor)

Compiling the attached file (part of TCL-TRF package) with:

gcc12 -O2 -Wall -Werror -c otpcode.i -o otpcode.o

You'll get:
In function 'extract',
inlined from 'FlushDecoder' at otpcode.i:6170:21:
otpcode.i:6262:9: error: array subscript 9 is outside array bounds of 'char[9]'
[-Werror=array-bounds]
 6262 |   cc = s[start/8 +1];
  |~^~~~
otpcode.i: In function 'FlushDecoder':
otpcode.i:6131:8: note: at offset 9 into object 'b' of size 9
 6131 |   char b[9];
  |^
In function 'extract',
inlined from 'FlushDecoder' at otpcode.i:6170:21:
otpcode.i:6263:9: error: array subscript 10 is outside array bounds of
'char[9]' [-Werror=array-bounds]
 6263 |   cr = s[start/8 +2];
  |~^~~~
otpcode.i: In function 'FlushDecoder':
otpcode.i:6131:8: note: at offset 10 into object 'b' of size 9
 6131 |   char b[9];
  |^


Note, how the "start/8 + 1" is being misread as "9"...

[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

2023-02-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #4 from Hongtao.liu  ---
The intrinsic is expanded to rtl FIX, and then be optimized to 0 for NANs.

2201  /* Although the overflow semantics of RTL's FIX and UNSIGNED_FIX
2202 operators are intentionally left unspecified (to ease
implementation
2203 by target backends), for consistency, this routine implements the
2204 same semantics for constant folding as used by the middle-end.  */
2205
2206  /* This was formerly used only for non-IEEE float.
2207 egg...@twinsun.com says it is safe for IEEE also.  */
2208  REAL_VALUE_TYPE t;
2209  const REAL_VALUE_TYPE *x = CONST_DOUBLE_REAL_VALUE (op);
2210  wide_int wmax, wmin;
2211  /* This is part of the abi to real_to_integer, but we check
2212 things before making this call.  */
2213  bool fail;
2214
2215  switch (code)
2216{
2217case FIX:
2218  if (REAL_VALUE_ISNAN (*x))
2219return const0_rtx;

According to IEEE-2019, when a NaN or infinite operand cannot be represented in
the destination format and this cannot otherwise be indicated, the invalid
operation exception shall be signaled.
And there's comments says "for consistency, this routine implements the same
semantics for constant folding as used by the middle-end." and "This was
formerly used only for non-IEEE float."

Maybe we should prevent this.

[Bug tree-optimization/106722] bogus uninit warning in tree-vect-loop-manip.cc

2023-02-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106722

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:338739645b8e5bf34636d8d4829d7650001ad08c

commit r13-5958-g338739645b8e5bf34636d8d4829d7650001ad08c
Author: Richard Biener 
Date:   Fri Feb 10 10:28:29 2023 +0100

tree-optimization/106722 - fix CD-DCE edge marking

The following fixes a latent issue when we mark control edges but
end up with marking a block with no stmts necessary.  In this case
we fail to mark dependent control edges of that block.

PR tree-optimization/106722
* tree-ssa-dce.cc (mark_last_stmt_necessary): Return
whether we marked a stmt.
(mark_control_dependent_edges_necessary): When
mark_last_stmt_necessary didn't mark any stmt make sure
to mark its control dependent edges.
(propagate_necessity): Likewise.

* gcc.dg/torture/pr108737.c: New testcase.

[Bug tree-optimization/108737] [13 Regression] Apparent miscompile of infinite loop on gcc trunk in cddce2 pass since r13-3875

2023-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108737

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Richard Biener  ---
I botched the changelog.  Fixed with the following, but latent everywhere.

commit 338739645b8e5bf34636d8d4829d7650001ad08c (origin/master, origin/HEAD)
Author: Richard Biener 
Date:   Fri Feb 10 10:28:29 2023 +0100

tree-optimization/106722 - fix CD-DCE edge marking

The following fixes a latent issue when we mark control edges but
end up with marking a block with no stmts necessary.  In this case
we fail to mark dependent control edges of that block.

PR tree-optimization/106722
* tree-ssa-dce.cc (mark_last_stmt_necessary): Return
whether we marked a stmt.
(mark_control_dependent_edges_necessary): When
mark_last_stmt_necessary didn't mark any stmt make sure
to mark its control dependent edges.
(propagate_necessity): Likewise.

* gcc.dg/torture/pr108737.c: New testcase.

[Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls)

2023-02-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

--- Comment #22 from Richard Biener  ---
(In reply to Vladimir Makarov from comment #20)
> (In reply to Richard Biener from comment #14)
> > Thanks for the new testcase.  With -O0 (and a --enable-checking=release
> > built compiler) this builds in ~11 minutes (on a Ryzen 9 7900X) with
> > 
> >  integrated RA  :  38.96 (  6%)   1.94 ( 20%)  42.00 ( 
> > 6%)  3392M ( 23%)
> >  LRA non-specific   :  18.93 (  3%)   1.24 ( 13%)  23.78 ( 
> > 4%)   450M (  3%)
> >  LRA virtuals elimination   :   5.67 (  1%)   0.05 (  1%)   5.75 ( 
> > 1%)   457M (  3%)
> >  LRA reload inheritance : 318.25 ( 49%)   0.24 (  2%) 318.51 (
> > 48%) 0  (  0%)
> >  LRA create live ranges : 199.24 ( 31%)   0.12 (  1%) 199.38 (
> > 30%)   228M (  2%)
> > 645.67user 10.29system 11:04.42elapsed 98%CPU (0avgtext+0avgdata
> > 30577844maxresident)k
> > 3936200inputs+1091808outputs (122053major+10664929minor)pagefaults 0swaps
> >
> 
> I've tried test-1M.i with -O0 for clang-14.  It took about 12hours on
> E5-2697 v3 vs about 30min for GCC.  The most time (99%) of clang is spent in
> "fast register allocator":
> 
>   Total Execution Time: 42103.9395 seconds (42243.9819 wall clock)
> 
>---User Time---   --System Time--   --User+System--   ---Wall Time--- 
> --- Name ---
>   41533.7657 ( 99.5%)  269.5347 ( 78.6%)  41803.3005 ( 99.3%)  41942.4177 (
> 99.3%)  Fast Register Allocator
>   139.1669 (  0.3%)  16.4785 (  4.8%)  155.6454 (  0.4%)  156.3196 (  0.4%) 
> X86 DAG->DAG Instruction Selection
> 
> I've tried the same for -O1.  Again gcc took about 30min and I stopped clang
> (with another used RA algorithm) after 120hours.
> 
> So the situation with RA is not so bad for GCC.  But in any case I'll try to
> improve the speed for this case.

I bet the LLVM folks do not focus on making -O{0,1} usable for these kind
of testcases which have practical application for auto-generated code.

Of course that's not a reason to not improve GCC even more! ;)

> > so register allocation taking all of the time.  There's maybe the 
> > possibility
> > to gate some of its features on the # of BBs or insns (or whatever the 
> > actual
> > "bad" thing is - I didn't look closer yet).
> > 
> > It also seems to use 30GB of peak memory at -O0 ...
> > 
> 
> I see only 3GB.  Improving this is hard task.  The IRA for -O0 uses very
> simple algorithm with usage of very few resources.  We could use even
> simpler method (assigning memory only for all pseudos) but I think it does
> not worth to do as the generated code will be much bigger and probably will
> be 1.5-2 times slower.

For some RTL opts algorithm simply splitting large blocks tends to help.
Also some gate on the number of BBs only but their algorithms are quadratic
in the number of insns instead ...

Of course we cannot simply gate RA ... maybe there's a way to have a
"simpler" algorithm that works on smaller regions of a function and
only promote allocnos live across region boundaries to memory?  Ideally
you'd have sth that has linear time complexity - for LRA that should be
possible, since we have done global RA already?

Anyway - thanks for improving things here!