[Bug d/108763] va_arg usage in D doesn't compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108763 ibuclaw at gcc dot gnu.org changed: What|Removed |Added CC||ibuclaw at gcc dot gnu.org --- Comment #5 from ibuclaw at gcc dot gnu.org --- I abandoned the idea of supporting RTTI-based variadics years ago. Even the current reference implementation only supports a subset of the x86_64 ABI in its current incarnation as far as I recall. I had considered maybe libffi might allow us to do this, but I didn't see anything that would allow me to say "retrieve the next variadic argument of size SIZE and mode MODE". But I could not see anything that looked exactly as that, even though as I understand there is limited support for constructing a variadic call to a C function.
[Bug d/108763] va_arg usage in D doesn't compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108763 --- Comment #6 from ibuclaw at gcc dot gnu.org --- I'll add it as a note to the deviations page. https://gcc.gnu.org/onlinedocs/gdc/Missing-Features.html#Missing-Features I'd actually forgotten about this.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 --- Comment #2 from Andrew Pinski --- sllia4,a2,3 sh3add a5,a2,a0 vs sllia2,a2,3 add a5,a0,a2 I think the first one is better really because you have two indepedent instructions and can be issued at the same time. Really this is all core specific and the generic tuning should be "generic" which means this is the correct tuning ...
[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 Michael Crusoe changed: What|Removed |Added CC||michael.crusoe at gmail dot com --- Comment #1 from Michael Crusoe --- 2023 update: this is still happening in GCC 10.1+ including trunk https://godbolt.org/z/YKKcdP8MY
[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 --- Comment #2 from Andrew Pinski --- Are you sure _mm_cvttpd_epi32 is documented that way? I suspect it is just unspecified behavior.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #3 from Kito Cheng --- > I think one solution is to change the cost model of such complex instructions > to the sum of the cost for each part. E.g. > cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # > cost of addition Some RISC-V core implementation did has one cycle for shNadd operation as I know, but I know it's not true for every implementation. Anyway, it's really uarch dependent, so I would prefer keep as it for now, and then extend the cost model function to easier handle different uarch (-mtune) when GCC 14 is open.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 --- Comment #4 from Sinan --- (In reply to Andrew Pinski from comment #2) > sllia4,a2,3 > sh3add a5,a2,a0 > > vs > sllia2,a2,3 > add a5,a0,a2 > > I think the first one is better really because you have two indepedent > instructions and can be issued at the same time. > Really this is all core specific and the generic tuning should be "generic" > which means this is the correct tuning ... Thanks for pointing it out. This might not be a good case(I only notice the extra `mv` brought from zba). I just have a quick check with spec2017, and it seems that the current cost model indeed does a better job in terms of the dependency of slli && add.
[Bug target/108764] [RISCV] Cost model for RVB is too aggressive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108764 --- Comment #5 from Sinan --- (In reply to Kito Cheng from comment #3) > > I think one solution is to change the cost model of such complex > > instructions to the sum of the cost for each part. E.g. > > cost for shNadd = COSTS_N_INSNS (SINGLE_SHIFT_COST) + COSTS_N_INSNS (1) # > > cost of addition > > Some RISC-V core implementation did has one cycle for shNadd operation as I > know, but I know it's not true for every implementation. Thanks for the info. Interestingly, the shNadd-like instructions(add reg1, reg2, reg3, lsl #N) in AArch64/neoverse-n1 are also one cycle operations(https://developer.arm.com/documentation/pjdoc466751330-9707/latest), but the cost model for them is different from the one in riscv backend(AArch64 doesn't generate add r1, r2, r3, lsl #3 for the given test case). > Anyway, it's really uarch dependent, so I would prefer keep as it for now, > and then extend the cost model function to easier handle different uarch > (-mtune) when GCC 14 is open. Agree.
[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 --- Comment #3 from Michael Crusoe --- Good question, lets check the reference. Summary: it is specified behavior that _mm_cvttpd_epi32 returns Integer Indefinite (8000H) for NaN inputs. All references below are from the December 2022 edition (Order Number: 325462-078US) of "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" from https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html The formal signature of the _mm_cvttpd_epi32 intrinsic is in Table C-1 "Simple Intrinsics" on page 2987, reminding us that the mnemonic is CVTTPD2DQ. The formal definition of CVTTPD2DQ is given in section 5.6.1.6 "Intel® SSE2 Conversion Instructions" on page 133 > Convert with truncation packed double precision floating-point values to > packed double- word integers. On page 106 we learn more about what truncation means in the definition of CVTTPD2DQ > 4.8.4.2 Truncation with Intel® SSE, SSE2, and AVX Conversion Instructions > The following Intel SSE/SSE2 instructions automatically truncate the results > of > conversions from floating-point values to integers when the result it > inexact: CVTTPD2DQ, > CVTTPS2DQ, CVTTPD2PI, CVTTPS2PI, CVTTSD2SI, and CVTTSS2SI. Here, truncation > means the > round toward zero mode described in Table 4-8. There are also several Intel > AVX2 and > AVX-512 instructions which use truncation (VCVTT*) Table 4.8 from section 4.8.4 states > Rounding Mode: Round toward zero (Truncate) > Description: Rounded result is closest to but no greater in absolute value > than the infinitely precise result. Section 11.4.1.6 ("SSE2 Conversion Instructions") states that > The CVTTPD2DQ (convert with truncation packed double precision floating-point > values to > packed doubleword integers) instruction is similar to the CVTPD2DQ > instruction except > that truncation is used to round a source value to an integer value. Table 11-1. "Masked Responses of SSE/SSE2/SSE3 Instructions to Invalid Arithmetic Operations" states that > Condition: Conversion to integer when the value in the source register is a > NaN, ∞, or > exceeds the representable range for CVTPS2PI, CVTTPS2PI, CVTSS2SI, CVTTSS2SI, > CVTPD2PI, > CVTSD2SI, CVTPD2DQ, CVTTPD2PI, CVTTSD2SI, CVTTPD2DQ, CVTPS2DQ, or CVTTPS2DQ > Masked Response: Return the integer Indefinite More explicitly stated is in section D.4.2.2 "Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions" where Table D-8 (page 455) ("CVTPS2PI, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, CVTPD2PI, CVTSD2SI, CVTTPD2PI, CVTTSD2SI, CVTPS2DQ, CVTTPS2DQ, CVTPD2DQ, CVTTPD2DQ") states that the masked result from any type of NaN (SNaN or QNaN) will be the Integer Indefinite (8000H in for 32-bit values).
[Bug objc/108743] -fconstant-cfstrings not supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108743 --- Comment #8 from Iain Sandoe --- (In reply to Andrew Pinski from comment #7) > Hmm, > https://inbox.sourceware.org/gcc-patches/B4F496F4-F31D-41D2-8942- > 1f0aefbd7...@sandoe-acoustics.co.uk/ > > Seems didn't get installed even though it was approved ... these things happen, I guess we can make it a darwin-specific driver option (as Jakub says, the 'm' version is technically correct, but we have to accommodate compatibility sometimes). There is at least one other platform that I think it s using the NeXT library (it is open sourced), so maybe it is an appropriate option for that platform too.
[Bug c++/108761] Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761 --- Comment #3 from Fangrui Song --- New syntax setting the flags will be useful. Also, currently there is no way to customize the section type.
[Bug c/105660] [12/13 Regression] ICE in warn_parm_array_mismatch when merging two function decls and VLA arguments since r12-1218-gc6503fa93b5565c9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105660 --- Comment #11 from Martin Uecker --- PATCH: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611817.html
[Bug c/108423] [12/13 Regression] ICE in make_ssa_name_fn with VLA types in arguments and inlining since r12-5338-g4e6bf0b9dd5585df
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108423 --- Comment #8 from Martin Uecker --- https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611562.html
[Bug middle-end/108765] New: ICE with non-local goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765 Bug ID: 108765 Summary: ICE with non-local goto Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: muecker at gwdg dot de Target Milestone: --- ICE. This is from simplifying PR107840, but seems to be a separate issue and a new regression. https://godbolt.org/z/xbT7rrqdW int main() { void foo(void) { __label__ trgt; void jmp(void) { goto trgt; } trgt: ; } foo(); }
[Bug middle-end/108765] ICE with non-local goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765 Andrew Pinski changed: What|Removed |Added Known to fail||10.1.0, 6.1.0 Version|13.0|unknown Keywords||ice-checking, ||ice-on-valid-code --- Comment #1 from Andrew Pinski --- >a new regression. I really doubt it is a new regression, the ICE only shows up with checking turned on .
[Bug middle-end/108765] ICE with non-local goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andrew Pinski --- It is just a reduced testcase. *** This bug has been marked as a duplicate of bug 107840 ***
[Bug middle-end/107840] ICE when compiling cursed setjmp/longjmp nested function calls and non-local jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840 --- Comment #5 from Andrew Pinski --- *** Bug 108765 has been marked as a duplicate of this bug. ***
[Bug middle-end/107840] ICE when compiling cursed setjmp/longjmp nested function calls and non-local jumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840 --- Comment #6 from Andrew Pinski --- Reduced testcase: ``` int main() { void foo(void) { __label__ trgt; void jmp(void) { goto trgt; } trgt: ; } foo(); } ```
[Bug middle-end/108765] ICE with non-local goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765 --- Comment #3 from Martin Uecker --- I see. Thanks. The checking is new? Or just because it is not a release built?
[Bug middle-end/108765] ICE with non-local goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108765 --- Comment #4 from Andrew Pinski --- >seems to be a separate issue and a new regression. It is not, it is just a reduced testcase and the ICE happens with GCC 6 and above with -fchecking very similar and all.
[Bug target/108766] New: unaligned byteswapped 16bit load is just bad
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108766 Bug ID: 108766 Summary: unaligned byteswapped 16bit load is just bad Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: riscv Take: ``` short f(unsigned char *a) { return a[0] << 8 | a[1]; } ``` `-O2 -march=rv64iadc_zba_zbb_zbc_zbs_zicsr` produces: ``` lbu a5,1(a0) lbu a4,0(a0) sllia0,a5,8 or a0,a0,a4 slliw a5,a0,8 srlia0,a0,8 or a0,a0,a5 sext.h a0,a0 ret ``` That is just horrible. It should just be: ``` lbu a5,1(a0) lbu a4,0(a0) sllia0,a4,8 or a0,a0,a5 sext.h a0,a0 ret ``` It is trying to do an unaligned short load and then a byteswap.
[Bug tree-optimization/108752] word_mode vectorization is pessimized by hard limit on nunits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108752 Hans-Peter Nilsson changed: What|Removed |Added CC||hp at gcc dot gnu.org --- Comment #3 from Hans-Peter Nilsson --- (In reply to Richard Biener from comment #0) > emulated vectors (aka word_mode vectorization). Ackchyually, more commonly known as SWAR: https://en.wikipedia.org/wiki/SWAR. (IWBN if options and identifiers were keyed off that acronym.)
[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272 --- Comment #7 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:e5a63c986978699a25f4bfb9b58a0111951e7d43 commit r12-9168-ge5a63c986978699a25f4bfb9b58a0111951e7d43 Author: Kewen Lin Date: Mon Jan 16 02:15:39 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272] As PR108272 shows, there are some invalid uses of MMA opaque types in inline asm statements. This patch is to teach the function rs6000_opaque_type_invalid_use_p for inline asm, check and error any invalid use of MMA opaque types in input and output operands. PR target/108272 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses in inline asm, factor out the checking and erroring to lambda function check_and_error_invalid_use. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108272-1.c: New test. * gcc.target/powerpc/pr108272-2.c: New test. * gcc.target/powerpc/pr108272-3.c: New test. * gcc.target/powerpc/pr108272-4.c: New test. (cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)
[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348 --- Comment #8 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:3c7bb6c0b0003f4e1fb52f814ad1a9a7f09573c6 commit r12-9169-g3c7bb6c0b0003f4e1fb52f814ad1a9a7f09573c6 Author: Kewen Lin Date: Wed Jan 18 02:34:19 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348] PR108348 shows one special case that MMA opaque types are used in function arguments and treated as pass by reference, it results in one copying from argument to a temp variable, since this copying happens before rs6000_function_arg check, it can cause ICE without MMA support then. This patch is to teach function rs6000_opaque_type_invalid_use_p to check if any function argument in a gcall stmt has the invalid use of MMA opaque types. btw, I checked the handling on return value, it doesn't have this kind of issue as its checking and error emission is quite early, so this doesn't handle function return value. PR target/108348 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses of MMA opaque type in function arguments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108348-1.c: New test. * gcc.target/powerpc/pr108348-2.c: New test. (cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)
[Bug target/108396] [12/13 Regression] PPCLE: vec_vsubcuq missing since r12-5752-gd08236359eb229
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108396 --- Comment #7 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:cb6861acc4074fd2c30a96b52d68c2cd33b9e94d commit r12-9170-gcb6861acc4074fd2c30a96b52d68c2cd33b9e94d Author: Kewen Lin Date: Wed Jan 18 02:34:25 2023 -0600 rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396] As Andrew pointed out in PR108396, there is one typo in rs6000-overload.def on built-in function vec_vsubcuq: [VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq] "vec_vsubcuqP" should be "vec_vsubcuq", this typo caused us to define vec_vsubcuqP in rs6000-vecdefines.h instead of vec_vsubcuq, so that compiler is not able to realize the built-in function name vec_vsubcuq any more. Co-authored-By: Andrew Pinski PR target/108396 gcc/ChangeLog: * config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo vec_vsubcuqP with vec_vsubcuq. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108396.c: New test. (cherry picked from commit aaf29ae6cdbaad58b709a77784375d15138174b3)
[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272 --- Comment #8 from CVS Commits --- The releases/gcc-11 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:79a81d526babb6ffb6d85b4a05b29269470ab49d commit r11-10521-g79a81d526babb6ffb6d85b4a05b29269470ab49d Author: Kewen Lin Date: Mon Jan 16 02:15:39 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272] As PR108272 shows, there are some invalid uses of MMA opaque types in inline asm statements. This patch is to teach the function rs6000_opaque_type_invalid_use_p for inline asm, check and error any invalid use of MMA opaque types in input and output operands. PR target/108272 gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses in inline asm, factor out the checking and erroring to lambda function check_and_error_invalid_use. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108272-1.c: New test. * gcc.target/powerpc/pr108272-2.c: New test. * gcc.target/powerpc/pr108272-3.c: New test. * gcc.target/powerpc/pr108272-4.c: New test. (cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)
[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348 --- Comment #9 from CVS Commits --- The releases/gcc-11 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:0e41d8a77887b838de5493c491f411274376227a commit r11-10522-g0e41d8a77887b838de5493c491f411274376227a Author: Kewen Lin Date: Wed Jan 18 02:34:19 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348] PR108348 shows one special case that MMA opaque types are used in function arguments and treated as pass by reference, it results in one copying from argument to a temp variable, since this copying happens before rs6000_function_arg check, it can cause ICE without MMA support then. This patch is to teach function rs6000_opaque_type_invalid_use_p to check if any function argument in a gcall stmt has the invalid use of MMA opaque types. btw, I checked the handling on return value, it doesn't have this kind of issue as its checking and error emission is quite early, so this doesn't handle function return value. PR target/108348 gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses of MMA opaque type in function arguments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108348-1.c: New test. * gcc.target/powerpc/pr108348-2.c: New test. (cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)
[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272 --- Comment #9 from CVS Commits --- The releases/gcc-10 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:ec4d91aa885297c3b5bb4bbfb3133ffe2e5e6a2f commit r10-11211-gec4d91aa885297c3b5bb4bbfb3133ffe2e5e6a2f Author: Kewen Lin Date: Sun Feb 12 09:35:27 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272] As PR108272 shows, there are some invalid uses of MMA opaque types in inline asm statements. This patch is to teach the function rs6000_opaque_type_invalid_use_p for inline asm, check and error any invalid use of MMA opaque types in input and output operands. PR target/108272 gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses in inline asm, factor out the checking and erroring to lambda function check_and_error_invalid_use. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108272-1.c: New test. * gcc.target/powerpc/pr108272-2.c: New test. * gcc.target/powerpc/pr108272-3.c: New test. * gcc.target/powerpc/pr108272-4.c: New test. (cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)
[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348 --- Comment #10 from CVS Commits --- The releases/gcc-10 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:7bbed35a05d735387d406afbf866384feaac21e7 commit r10-11212-g7bbed35a05d735387d406afbf866384feaac21e7 Author: Kewen Lin Date: Wed Jan 18 02:34:19 2023 -0600 rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348] PR108348 shows one special case that MMA opaque types are used in function arguments and treated as pass by reference, it results in one copying from argument to a temp variable, since this copying happens before rs6000_function_arg check, it can cause ICE without MMA support then. This patch is to teach function rs6000_opaque_type_invalid_use_p to check if any function argument in a gcall stmt has the invalid use of MMA opaque types. btw, I checked the handling on return value, it doesn't have this kind of issue as its checking and error emission is quite early, so this doesn't handle function return value. PR target/108348 gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses of MMA opaque type in function arguments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108348-1.c: New test. * gcc.target/powerpc/pr108348-2.c: New test. (cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)
[Bug target/108396] [12/13 Regression] PPCLE: vec_vsubcuq missing since r12-5752-gd08236359eb229
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108396 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #8 from Kewen Lin --- Fixed on trunk.
[Bug target/108348] ICE in gen_movoo, at config/rs6000/mma.md:292
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108348 Kewen Lin changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #11 from Kewen Lin --- Fixed on trunk and backported to related branches.
[Bug target/108272] [13 Regression] ICE in gen_movxo, at config/rs6000/mma.md:339
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108272 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Kewen Lin --- Fixed on trunk and backported to related branches.
[Bug analyzer/108767] New: O2 optimization has side effects on static analysis.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108767 Bug ID: 108767 Summary: O2 optimization has side effects on static analysis. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: geoffreydgr at icloud dot com Target Milestone: --- Hi, David. I found a problem through the following case that the optimization `-O2` has side effects on static analysis. GCC static analzyer falsely gives a NPD warning under the optimization `-O2`. Input: ```c #include "stdio.h" extern void __analyzer_describe (); extern void __analyzer_eval (); extern void __analyzer_dump (); int main() { int b = 1; int e = 2; int f = 3; int *g[] = {&e, &e}; int *h = &b; int *j = &f; for (int d = 0; d <= 1; d++) { *j = (*h && (h = g[d])); // __analyzer_dump (); __analyzer_eval(h==0); // __analyzer_describe(0,h); } printf("NPD_FLAG %d\n", *j); } ``` options: -O2 -fanalyzer Output: ``` : In function 'main': :19:9: warning: FALSE 19 | __analyzer_eval(h==0); | ^ :19:9: warning: UNKNOWN :19:9: warning: TRUE :19:9: warning: TRUE :19:9: warning: UNKNOWN :19:9: warning: TRUE :19:9: warning: TRUE :19:9: warning: UNKNOWN :17:15: warning: dereference of NULL 'h' [CWE-476] [-Wanalyzer-null-dereference] 17 | *j = (*h && (h = g[d])); | ^~ 'main': events 1-9 | | 15 | for (int d = 0; d <= 1; d++) | | ~~^~~~ | | | | | (1) following 'true' branch (when 'd != 2')... | | (5) following 'true' branch (when 'd != 2')... | | (7) following 'true' branch (when 'd != 2')... | 16 | { | 17 | *j = (*h && (h = g[d])); | | ~~~ | | | | | | | | | | | (3) following 'true' branch... | | | | (9) dereference of NULL 'h' | | | (4) ...to here | | (2) ...to here | | (6) ...to here | | (8) ...to here | Compiler returned: 0 ``` options : -O1 -fanalyzer Output: ``` : In function 'main': :19:9: warning: FALSE 19 | __analyzer_eval(h==0); | ^ :19:9: warning: UNKNOWN Compiler returned: 0 ``` -O2: https://godbolt.org/z/GeTaeGMaf -O1: https://godbolt.org/z/adnY8aa3K
[Bug c/108768] New: bogus -Warray-bounds warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108768 Bug ID: 108768 Summary: bogus -Warray-bounds warnings Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mi+gcc at aldan dot algebra.com Target Milestone: --- Created attachment 54453 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54453&action=edit Test case (otpcode.c after passing preprocessor) Compiling the attached file (part of TCL-TRF package) with: gcc12 -O2 -Wall -Werror -c otpcode.i -o otpcode.o You'll get: In function 'extract', inlined from 'FlushDecoder' at otpcode.i:6170:21: otpcode.i:6262:9: error: array subscript 9 is outside array bounds of 'char[9]' [-Werror=array-bounds] 6262 | cc = s[start/8 +1]; |~^~~~ otpcode.i: In function 'FlushDecoder': otpcode.i:6131:8: note: at offset 9 into object 'b' of size 9 6131 | char b[9]; |^ In function 'extract', inlined from 'FlushDecoder' at otpcode.i:6170:21: otpcode.i:6263:9: error: array subscript 10 is outside array bounds of 'char[9]' [-Werror=array-bounds] 6263 | cr = s[start/8 +2]; |~^~~~ otpcode.i: In function 'FlushDecoder': otpcode.i:6131:8: note: at offset 10 into object 'b' of size 9 6131 | char b[9]; |^ Note, how the "start/8 + 1" is being misread as "9"...
[Bug target/100927] [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #4 from Hongtao.liu --- The intrinsic is expanded to rtl FIX, and then be optimized to 0 for NANs. 2201 /* Although the overflow semantics of RTL's FIX and UNSIGNED_FIX 2202 operators are intentionally left unspecified (to ease implementation 2203 by target backends), for consistency, this routine implements the 2204 same semantics for constant folding as used by the middle-end. */ 2205 2206 /* This was formerly used only for non-IEEE float. 2207 egg...@twinsun.com says it is safe for IEEE also. */ 2208 REAL_VALUE_TYPE t; 2209 const REAL_VALUE_TYPE *x = CONST_DOUBLE_REAL_VALUE (op); 2210 wide_int wmax, wmin; 2211 /* This is part of the abi to real_to_integer, but we check 2212 things before making this call. */ 2213 bool fail; 2214 2215 switch (code) 2216{ 2217case FIX: 2218 if (REAL_VALUE_ISNAN (*x)) 2219return const0_rtx; According to IEEE-2019, when a NaN or infinite operand cannot be represented in the destination format and this cannot otherwise be indicated, the invalid operation exception shall be signaled. And there's comments says "for consistency, this routine implements the same semantics for constant folding as used by the middle-end." and "This was formerly used only for non-IEEE float." Maybe we should prevent this.
[Bug tree-optimization/106722] bogus uninit warning in tree-vect-loop-manip.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106722 --- Comment #8 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:338739645b8e5bf34636d8d4829d7650001ad08c commit r13-5958-g338739645b8e5bf34636d8d4829d7650001ad08c Author: Richard Biener Date: Fri Feb 10 10:28:29 2023 +0100 tree-optimization/106722 - fix CD-DCE edge marking The following fixes a latent issue when we mark control edges but end up with marking a block with no stmts necessary. In this case we fail to mark dependent control edges of that block. PR tree-optimization/106722 * tree-ssa-dce.cc (mark_last_stmt_necessary): Return whether we marked a stmt. (mark_control_dependent_edges_necessary): When mark_last_stmt_necessary didn't mark any stmt make sure to mark its control dependent edges. (propagate_necessity): Likewise. * gcc.dg/torture/pr108737.c: New testcase.
[Bug tree-optimization/108737] [13 Regression] Apparent miscompile of infinite loop on gcc trunk in cddce2 pass since r13-3875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108737 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Richard Biener --- I botched the changelog. Fixed with the following, but latent everywhere. commit 338739645b8e5bf34636d8d4829d7650001ad08c (origin/master, origin/HEAD) Author: Richard Biener Date: Fri Feb 10 10:28:29 2023 +0100 tree-optimization/106722 - fix CD-DCE edge marking The following fixes a latent issue when we mark control edges but end up with marking a block with no stmts necessary. In this case we fail to mark dependent control edges of that block. PR tree-optimization/106722 * tree-ssa-dce.cc (mark_last_stmt_necessary): Return whether we marked a stmt. (mark_control_dependent_edges_necessary): When mark_last_stmt_necessary didn't mark any stmt make sure to mark its control dependent edges. (propagate_necessity): Likewise. * gcc.dg/torture/pr108737.c: New testcase.
[Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500 --- Comment #22 from Richard Biener --- (In reply to Vladimir Makarov from comment #20) > (In reply to Richard Biener from comment #14) > > Thanks for the new testcase. With -O0 (and a --enable-checking=release > > built compiler) this builds in ~11 minutes (on a Ryzen 9 7900X) with > > > > integrated RA : 38.96 ( 6%) 1.94 ( 20%) 42.00 ( > > 6%) 3392M ( 23%) > > LRA non-specific : 18.93 ( 3%) 1.24 ( 13%) 23.78 ( > > 4%) 450M ( 3%) > > LRA virtuals elimination : 5.67 ( 1%) 0.05 ( 1%) 5.75 ( > > 1%) 457M ( 3%) > > LRA reload inheritance : 318.25 ( 49%) 0.24 ( 2%) 318.51 ( > > 48%) 0 ( 0%) > > LRA create live ranges : 199.24 ( 31%) 0.12 ( 1%) 199.38 ( > > 30%) 228M ( 2%) > > 645.67user 10.29system 11:04.42elapsed 98%CPU (0avgtext+0avgdata > > 30577844maxresident)k > > 3936200inputs+1091808outputs (122053major+10664929minor)pagefaults 0swaps > > > > I've tried test-1M.i with -O0 for clang-14. It took about 12hours on > E5-2697 v3 vs about 30min for GCC. The most time (99%) of clang is spent in > "fast register allocator": > > Total Execution Time: 42103.9395 seconds (42243.9819 wall clock) > >---User Time--- --System Time-- --User+System-- ---Wall Time--- > --- Name --- > 41533.7657 ( 99.5%) 269.5347 ( 78.6%) 41803.3005 ( 99.3%) 41942.4177 ( > 99.3%) Fast Register Allocator > 139.1669 ( 0.3%) 16.4785 ( 4.8%) 155.6454 ( 0.4%) 156.3196 ( 0.4%) > X86 DAG->DAG Instruction Selection > > I've tried the same for -O1. Again gcc took about 30min and I stopped clang > (with another used RA algorithm) after 120hours. > > So the situation with RA is not so bad for GCC. But in any case I'll try to > improve the speed for this case. I bet the LLVM folks do not focus on making -O{0,1} usable for these kind of testcases which have practical application for auto-generated code. Of course that's not a reason to not improve GCC even more! ;) > > so register allocation taking all of the time. There's maybe the > > possibility > > to gate some of its features on the # of BBs or insns (or whatever the > > actual > > "bad" thing is - I didn't look closer yet). > > > > It also seems to use 30GB of peak memory at -O0 ... > > > > I see only 3GB. Improving this is hard task. The IRA for -O0 uses very > simple algorithm with usage of very few resources. We could use even > simpler method (assigning memory only for all pseudos) but I think it does > not worth to do as the generated code will be much bigger and probably will > be 1.5-2 times slower. For some RTL opts algorithm simply splitting large blocks tends to help. Also some gate on the number of BBs only but their algorithms are quadratic in the number of insns instead ... Of course we cannot simply gate RA ... maybe there's a way to have a "simpler" algorithm that works on smaller regions of a function and only promote allocnos live across region boundaries to memory? Ideally you'd have sth that has linear time complexity - for LRA that should be possible, since we have done global RA already? Anyway - thanks for improving things here!