[Bug c/95133] [9/10/11 Regression] ICE in gimple_redirect_edge_and_branch_force, at tree-cfg.c:6075
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95133 --- Comment #2 from James Greenhalgh --- Should reproduce further back if you force it on with -ftree-vectorize . i.e. gcc foo.c -ftree-vectorize -O3 Breaks somewhere between: gcc version 7.0.0 20160615 gcc version 7.0.0 20160907
[Bug target/96313] [AArch64] vqmovun* return types should be unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313 James Greenhalgh changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org Status|WAITING |NEW --- Comment #2 from James Greenhalgh --- Confirmed by inspection; types in arm_neon.h are: int8_t vqmovunh_s16 (int16_t __a) int16_t vqmovuns_s32 (int32_t __a) int32_t vqmovund_s64 (int64_t __a) Types in the documentation are: uint8_t vqmovunh_s16 (int16_t a) uint16_t vqmovuns_s32 (int32_t a) uint32_t vqmovund_s64 (int64_t a)
[Bug libstdc++/96958] New: Long Double in Hash Table policy forces soft-float calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958 Bug ID: 96958 Summary: Long Double in Hash Table policy forces soft-float calculations Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- It was pointed out that some forks of GCC ( https://github.com/FEX-Emu/gcc/commit/8a2b7389f50a50a4e26ec98101d47fb1fc1c1bcd ) reduce the hashtable policy implementation from a long double to a double. Doing this reduces it from a soft-float calculation to hardware floating-point. Reading the discussion on libstdc++ from when this code was introduced the intention was to provide massive amounts of forwards compatibility for Very Big hash tables. We're taking quite an efficiency hit for that future proofing.
[Bug libstdc++/96958] Long Double in Hash Table policy forces soft-float calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958 --- Comment #1 from James Greenhalgh --- Asleep at the wheel today, I had intended to link to the https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html original discussion rather than leave it as a tedious exercise for the reader.
[Bug target/57586] New: ICE when expanding volatile asm using unaligned pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586 Bug ID: 57586 Summary: ICE when expanding volatile asm using unaligned pointer Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Created attachment 30290 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30290&action=edit Reduced testcase Using built-in specs. COLLECT_GCC=../build-arm-none-eabi/install/bin/arm-none-eabi-gcc COLLECT_LTO_WRAPPER=/work/build-arm-none-eabi/install/libexec/gcc/arm-none-eabi/4.9.0/lto-wrapper Target: arm-none-eabi Configured with: /work/src/gcc/configure --target=arm-none-eabi --prefix=/work/build-arm-none-eabi/install --with-gmp=/work/build-arm-none-eabi/host-tools --with-mpfr=/work/build-arm-none-eabi/host-tools --with-mpc=/work/build-arm-none-eabi/host-tools --with-pkgversion=unknown --disable-shared --disable-nls --disable-threads --disable-tls --enable-checking=yes --enable-languages=c,c++ --with-newlib Thread model: single gcc version 4.9.0 20130326 (experimental) (unknown) ../build-arm-none-eabi/install/bin/arm-none-eabi-gcc ../testcases/pr-reduced.c -O1 ../testcases/pr-reduced.c: In function 'foo': ../testcases/pr-reduced.c:12:3: error: output number 0 not directly addressable __asm__ __volatile__("": "+m" (c->x) : "r" (&c->x) : ); ^ ../testcases/pr-reduced.c:12:3: internal compiler error: in expand_asm_operands, at stmt.c:910 0x8c1be8 expand_asm_operands /work/oban-dev/src/gcc/gcc/stmt.c:910 0x8c28a7 expand_asm_stmt(gimple_statement_d*) /work/oban-dev/src/gcc/gcc/stmt.c:1151 0x5dfe5f expand_gimple_stmt_1 /work/oban-dev/src/gcc/gcc/cfgexpand.c:2154 0x5dfe5f expand_gimple_stmt /work/oban-dev/src/gcc/gcc/cfgexpand.c:2309 0x5e1b69 expand_gimple_basic_block /work/oban-dev/src/gcc/gcc/cfgexpand.c:4143 0x5e4a33 gimple_expand_cfg /work/oban-dev/src/gcc/gcc/cfgexpand.c:4662 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. No difference with -mno-unaligned-access or -maligned-access. A manifestation of this bug prevents a Linux Kernel build.
[Bug target/57586] ICE when expanding volatile asm using unaligned pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586 --- Comment #1 from jgreenhalgh at gcc dot gnu.org --- A bisect shows that this bug first occurs after r197095: 2013-03-26 Richard Biener * emit-rtl.c (set_mem_attributes_minus_bitpos): Remove alignment computations and rely on get_object_alignment_1 for the !TYPE_P case. Commonize DECL/COMPONENT_REF handling in the ARRAY_REF path.
[Bug target/57586] ICE when expanding volatile asm using unaligned pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org --- Comment #2 from jgreenhalgh at gcc dot gnu.org --- Created attachment 30292 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30292&action=edit Working testcase Modifying the '1' in: counter *c = &((counter_wrapper *)(1))->y; To something more aligned like a '4' as in the attached file and in: counter *c = &((counter_wrapper *)(4))->y; causes compilation to proceed as expected without an ICE.
[Bug target/57586] ICE when expanding volatile asm using unaligned pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57586 --- Comment #4 from jgreenhalgh at gcc dot gnu.org --- Created attachment 30293 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30293&action=edit Less reduced failing testcase Yes, the same thing happens for packed versions of those structs. Perhaps the attached, less-reduced, version of the testcase will make the issue more clear. This expanded testcase fails with the same error and ICE: ../build-arm-none-eabi/install/bin/arm-none-eabi-gcc ../testcases/pr-less-reduced.c -O1 -Wall ../testcases/pr-less-reduced.c: In function 'inet_rtm_getroute': ../testcases/pr-less-reduced.c:22:3: error: output number 0 not directly addressable __asm__ __volatile__("" ^ ../testcases/pr-less-reduced.c:22:3: internal compiler error: in expand_asm_operands, at stmt.c:910 0x8c1be8 expand_asm_operands /work/oban-dev/src/gcc/gcc/stmt.c:910 0x8c28a7 expand_asm_stmt(gimple_statement_d*) /work/oban-dev/src/gcc/gcc/stmt.c:1151 0x5dfe5f expand_gimple_stmt_1 /work/oban-dev/src/gcc/gcc/cfgexpand.c:2154 0x5dfe5f expand_gimple_stmt /work/oban-dev/src/gcc/gcc/cfgexpand.c:2309 0x5e1b69 expand_gimple_basic_block /work/oban-dev/src/gcc/gcc/cfgexpand.c:4143 0x5e4a33 gimple_expand_cfg /work/oban-dev/src/gcc/gcc/cfgexpand.c:4662 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
[Bug middle-end/58106] ICE: in ipa_edge_duplication_hook, at ipa-prop.c:2839
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58106 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2013-08-09 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from jgreenhalgh at gcc dot gnu.org --- Confirmed on aarch64-none-elf.
[Bug rtl-optimization/58383] New: ICE when RTL folds vector operations using constants after gne_int_mode changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58383 Bug ID: 58383 Summary: ICE when RTL folds vector operations using constants after gne_int_mode changes Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org The patch set around [1/4] Using gen_int_mode instead of GEN_INT causes a number of similair regressions when building for AArch64. To pick one example, when building gcc.target/aarch64/vect-fcm-eq-d.c we can get in to the situation where simplify_unary_expression_1 is trying to simplify (V2DI: NOT (NEG X)) and will thus try to generate (V2DI: PLUS (X - 1)). Now we will call plus_constant, and from there gen_int_mode (-1, v2di). From here we call trunc_int_for_mode (-1, v2di) and trigger the assert: /* You want to truncate to a _what_? */ gcc_assert (SCALAR_INT_MODE_P (mode)); The failures eventually look like: In file included from ../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c:9:0: ../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm.x: In function 'foo': ../src/gcc/gcc/testsuite/gcc.target/aarch64/vect-fcm.x:25:1: internal compiler error: in trunc_int_for_mode, at explow.c:55 } ^ 0x6abc8e trunc_int_for_mode(long, machine_mode) /work/gcc-dev/src/gcc/gcc/explow.c:55 0x69bb28 gen_int_mode(long, machine_mode) /work/gcc-dev/src/gcc/gcc/emit-rtl.c:420 0x6abcf2 plus_constant /work/gcc-dev/src/gcc/gcc/explow.c:189 0x6abcf2 plus_constant /work/gcc-dev/src/gcc/gcc/explow.c:79 0x8f107f simplify_gen_unary(rtx_code, machine_mode, rtx_def*, machine_mode) /work/gcc-dev/src/gcc/gcc/simplify-rtx.c:369 0xc55e09 propagate_rtx_1 /work/gcc-dev/src/gcc/gcc/fwprop.c:490 0xc55e6f propagate_rtx_1 /work/gcc-dev/src/gcc/gcc/fwprop.c:497 0xc55e86 propagate_rtx_1 /work/gcc-dev/src/gcc/gcc/fwprop.c:498 0xc56409 propagate_rtx /work/gcc-dev/src/gcc/gcc/fwprop.c:675 0xc57dff forward_propagate_and_simplify /work/gcc-dev/src/gcc/gcc/fwprop.c:1337 0xc57dff forward_propagate_into /work/gcc-dev/src/gcc/gcc/fwprop.c:1394 0xc58593 forward_propagate_into /work/gcc-dev/src/gcc/gcc/fwprop.c:1359 0xc58593 fwprop /work/gcc-dev/src/gcc/gcc/fwprop.c:1479 0xc58593 execute /work/gcc-dev/src/gcc/gcc/fwprop.c:1515 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
[Bug rtl-optimization/58383] ICE when RTL folds vector operations using constants after gne_int_mode changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58383 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org --- Comment #1 from jgreenhalgh at gcc dot gnu.org --- Created attachment 30788 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30788&action=edit Proposed fix A patch along these lines works for me, covering the case where gen_int_mode is called to generate a vector integer.
[Bug tree-optimization/58553] New: New fail in PASS->FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Bug ID: 58553 Summary: New fail in PASS->FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Created attachment 30917 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30917&action=edit Preprocessed source Jeff's change to the Jump-Threading code here: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01910.html Introduced a regression for arm and aarch64 in gcc.c-torture/execute/memcpy-2.c, such that I now see: *** EXIT code emu: host signal 0 When executing the testcase on a model with command line: /work/gcc-clean/build-arm-none-eabi/install/bin/arm-none-eabi-gcc -B/work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/ /work/gcc-clean/src/gcc/gcc/testsuite/gcc.c-torture/execute/memcpy-2.c -fno-diagnostics-show-caret -fdiagnostics-color=never -w -O3 -g -Wa,-mno-warn-deprecated -lm -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -o /work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/testsuite/gcc/memcpy-2.x -save-temps I've attached the preprocessed source and the output from -fdump-tree-dom1-details
[Bug tree-optimization/58553] New fail in PASS->FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 --- Comment #1 from jgreenhalgh at gcc dot gnu.org --- Created attachment 30918 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30918&action=edit Output of dom1
[Bug middle-end/59037] ICE when accessing invalid element (nelts + 1) of vector
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59037 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-11-07 CC||jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from jgreenhalgh at gcc dot gnu.org --- Reproduced on aarch64-none-elf and arm-none-eabi.
[Bug tree-optimization/54742] Switch elimination in FSM loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org --- Comment #27 from jgreenhalgh at gcc dot gnu.org --- Created attachment 31308 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31308&action=edit Dumps for less reduced testcase in comment 27 As of revision 205398, I'm not seeing this optimisation trigger when compiling the benchmark in question. I've attached the dumps from a less agressively reduced version of the testcase given in the intial report, which we don't currently thread. This testcase is more representative of the control structure in the benchmark code. In particular, we have the problematic scenario of two 'joiner' blocks in the thread path. Looking at the dumps for this testcase I think that we would need to spot threads like: (17, 23) incoming edge; (23, 4) joiner; (4, 5) joiner; (5, 8) back-edge; (8, 15) switch-statement; The testcase I am using is: --- int sum0, sum1, sum2, sum3; int foo(char * s, char** ret) { int state=0; char c; for (; *s && state != 4; s++) { c = *s; if (c == '*') { s++; break; } switch (state) { case 0: if (c == '+') state = 1; else if (c != '-') sum0+=c; break; case 1: if (c == '+') state = 2; else if (c == '-') state = 0; else sum1+=c; break; case 2: if (c == '+') state = 3; else if (c == '-') state = 1; else sum2+=c; break; case 3: if (c == '-') state = 2; else if (c == 'x') state = 4; break; default: break; } } *ret = s; return state; }
[Bug tree-optimization/54742] Switch elimination in FSM loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 jgreenhalgh at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #28 from jgreenhalgh at gcc dot gnu.org --- I've REOPENED this bug for the less-reduced testcase given in #27. If anyone has objections, or thinks it would be more appropriate, I can open a new bug.
[Bug tree-optimization/19794] [meta-bug] Jump threading related bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19794 Bug 19794 depends on bug 54742, which changed state. Bug 54742 Summary: Switch elimination in FSM loop http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |---
[Bug tree-optimization/59471] New: ICE using vector extensions (non-top-level BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59471 Bug ID: 59471 Summary: ICE using vector extensions (non-top-level BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org The following code: typedef unsigned char uint8x8_t __attribute__ ((__vector_size__ (8))); typedef unsigned short uint16x8_t __attribute__ ((__vector_size__ (16))); typedef unsigned long uint64x2_t __attribute__ ((__vector_size__ (16))); uint8x8_t foo (uint16x8_t x) { return (uint8x8_t) ((uint64x2_t) x)[0]; } Will give this ICE for current trunk on AArch64, ARM and X86_64: /work/build-x86/install/bin/gcc ../testcases/view-convert-expr.c -O3 ../testcases/view-convert-expr.c: In function ‘foo’: ../testcases/view-convert-expr.c:11:1: error: non-top-level BIT_FIELD_REF, IMAGPART_EXPR or REALPART_EXPR foo (uint16x8_t x) ^ BIT_FIELD_REF (x), 64, 0> ../testcases/view-convert-expr.c:13:3: note: in statement return (uint8x8_t) ((uint64x2_t) x)[0]; ^ D.1792 = VIEW_CONVERT_EXPR(BIT_FIELD_REF (x), 64, 0>); ../testcases/view-convert-expr.c:11:1: internal compiler error: verify_gimple failed foo (uint16x8_t x) ^ 0x9b5a5a verify_gimple_in_cfg(function*) ../../src/gcc/gcc/tree-cfg.c:4837 0x8df347 execute_function_todo ../../src/gcc/gcc/passes.c:1847 0x8dfb73 execute_todo ../../src/gcc/gcc/passes.c:1877 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. Looking at -fdump-tree-all-raw, I see this expression in view-convert-expr.c.004t.gimple: foo (uint16x8_t x) gimple_bind < uint8x8_t D.1792; gimple_assign (BIT_FIELD_REF (x), 64, 0>), NULL, NULL> gimple_return > For reference, my x86 compiler was configured as: Configured with: ../src/gcc/configure --prefix=/work/build-x86/install
[Bug c/88887] New: Warn on unexpected continuation of 'return' to new line in if statement.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7 Bug ID: 7 Summary: Warn on unexpected continuation of 'return' to new line in if statement. Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- A colleague tripped up on this typo: void bar(); void foo (int x) { if (x) return bar (); } Their intention was to return immediately if (x) holds, but they missed the semicolon after 'return' and because bar() is declared with a void return type didn't hit any warnings. In my opinion, it would be reasonable for -wmisleading-indentation to cover a case like this. The related case: void foo2 (int x) { if (x) return bar (); } Could also be warned.
[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466 James Greenhalgh changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org --- Comment #3 from James Greenhalgh --- Created attachment 43988 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43988&action=edit Reduced testcase I believe this testcase shows the issue being reported here. Clang seems to spot this is essentially a memset across the array, while GCC doesn't. On AArch64 with Clang: .LBB1_9:// =>This Inner Loop Header: Depth=1 stp q0, q0, [x8, #-16] subsx20, x20, #8// =8 add x8, x8, #32 // =32 b.ne.LBB1_9 On x86-64 with Clang: .LBB1_9:# =>This Inner Loop Header: Depth=1 movups %xmm0, -144(%rax,%rcx,4) movups %xmm0, -128(%rax,%rcx,4) movups %xmm0, -112(%rax,%rcx,4) movups %xmm0, -96(%rax,%rcx,4) movups %xmm0, -80(%rax,%rcx,4) movups %xmm0, -64(%rax,%rcx,4) movups %xmm0, -48(%rax,%rcx,4) movups %xmm0, -32(%rax,%rcx,4) movups %xmm0, -16(%rax,%rcx,4) movups %xmm0, (%rax,%rcx,4) addq$40, %rcx cmpq$100036, %rcx # imm = 0x186C4 jne .LBB1_9 GCC doesn't spot this. On the other hand G++'s inlining of the various random number initialisation routines really hammers Clang, which ends up emulating 128-bit arithmetic on AArch64.
[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466 --- Comment #11 from James Greenhalgh --- With Jonathon's suggested change, copied in to the original poster's framework (without -fno-trapping-math), Clang hot loop ( score: 165065 http://quick-bench.com/6NaD8ay0f8qMh9n0aMriYEiuKNA ) is: 0.16% movups 0x61a80(%r15,%rax,4),%xmm6 1.15% movups 0x61a90(%r15,%rax,4),%xmm7 0.60% movaps %xmm1,%xmm3 5.44% cmpltps %xmm6,%xmm3 0.44% movaps %xmm1,%xmm6 0.40% cmpltps %xmm7,%xmm6 0.44% movaps %xmm5,%xmm7 4.97% andps %xmm3,%xmm7 0.20% andnps %xmm4,%xmm3 0.36% orps %xmm7,%xmm3 1.04% movaps %xmm5,%xmm7 4.97% andps %xmm6,%xmm7 0.11% andnps %xmm4,%xmm6 4.95% orps %xmm7,%xmm6 5.53% movups %xmm3,0x61a80(%rbx,%rax,4) 0.47% movups %xmm6,0x61a90(%rbx,%rax,4) 4.42% movups 0x61aa0(%r15,%rax,4),%xmm3 20.42% movups 0x61ab0(%r15,%rax,4),%xmm6 1.00% movaps %xmm1,%xmm7 0.49% cmpltps %xmm3,%xmm7 9.79% movaps %xmm1,%xmm3 0.16% cmpltps %xmm6,%xmm3 2.26% movaps %xmm5,%xmm6 0.60% andps %xmm7,%xmm6 4.20% andnps %xmm4,%xmm7 1.18% orps %xmm6,%xmm7 2.22% movaps %xmm5,%xmm6 0.47% andps %xmm3,%xmm6 4.24% andnps %xmm4,%xmm3 4.88% movups %xmm7,0x61aa0(%rbx,%rax,4) 0.27% orps %xmm6,%xmm3 5.22% movups %xmm3,0x61ab0(%rbx,%rax,4) 6.02% add$0x10,%rax jne405b30 GCC hot loop ( score: 2385754 http://quick-bench.com/ehLe-aqkpXkkx2sHLd6TWq_p4g4 ) is: 0.56% movss 0x0(%rbp,%rdx,1),%xmm0 1.47% xor%eax,%eax 2.00% subss %xmm2,%xmm0 7.02% ucomiss %xmm1,%xmm0 6.77% seta %al 4.96% xor%ecx,%ecx 0.25% ucomiss %xmm0,%xmm1 0.84% pxor %xmm0,%xmm0 0.09% seta %cl 5.40% sub%ecx,%eax 3.22% cvtsi2ss %eax,%xmm0 9.87% ucomiss %xmm0,%xmm1 6.53% ja 4053a8 10.24% mulss %xmm4,%xmm0 11.55% addss %xmm3,%xmm0 5.46% movss %xmm0,(%rbx,%rdx,1) 2.00% add$0x4,%rdx cmp$0x61a80,%rdx jne405350 Daniel Elliott does that better match your expectations? If so, I think this can be resolved as missed optimization of invalid code.
[Bug middle-end/85682] New: Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85682 Bug ID: 85682 Summary: Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: luis.machado at linaro dot org Reporter: jgreenhalgh at gcc dot gnu.org CC: hjl.tools at gmail dot com, law at redhat dot com, luis.machado at linaro dot org Target Milestone: --- Target: x86-64-none-linux-gnu Hi, our bisect robot spotted failures in gcc.dg/tree-ssa/prefetch-9.c gcc.dg/tree-ssa/prefetch-8.c gcc.dg/tree-ssa/prefetch-7.c gcc.dg/tree-ssa/prefetch-6.c gcc.dg/tree-ssa/prefetch-3.c gcc.target/i386/opt-1.c gcc.target/i386/opt-2.c gcc.dg/tree-ssa/loop-28.c gcc.dg/tree-ssa/prefetch-5.c after revision r259995 on x86-64-none-linux-gnu. Would you mind taking a look?
[Bug middle-end/85682] Regression: gcc.dg/tree-ssa/prefetch-5.c at r259995
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85682 --- Comment #3 from James Greenhalgh --- The bisect robot doesn't bootstrap, only build a stage 1 compiler. I've checked your most recent patch against these testcases, and they execute and complete fine. (In reply to Luis Machado from comment #2) > I did a fresh x86-64 bootstrap with the changes in and those prefetch tests > are not executed as part of dg.exp. Running by hand they look sane to me. I'm sure this is just a typo, but you probably didn't mean "dg.exp" in this case - the prefetch tests are in tree-ssa.exp and the opt-1 and opt-2 tests are in i386.exp .
[Bug target/83663] [8 regression] aarch64_be regressions after r255946
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83663 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-01-03 CC||jgreenhalgh at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from James Greenhalgh --- I spotted this too, the problem is (as it always is for big-endian vectors in GCC) the mismatch in lane numbering between our architecture and GCC's numbering. I'm working on a patch. Sorry for the inconvenience.
[Bug middle-end/84040] [8 regression] compilation time of gcc.c-torture/compile/limits-blockid.c is 50x slower
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84040 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-01-25 CC||jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from James Greenhalgh --- Confirmed on aarch64-none-linux-gnu. My bisect pointed to the same revision r255569 . The 50x slow-down is surprising, and may be much larger than expected? Otherwise we could workaround this with -gno-statement-frontiers for this test.
[Bug lto/84242] New: [8 Regression] g++.dg/torture/pr67600.C at r257412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84242 Bug ID: 84242 Summary: [8 Regression] g++.dg/torture/pr67600.C at r257412 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- Target: aarch64-none-linux-gnu, x86-64-none-linux-gnu Hi Our testing robot spotted a failure in g++.dg/torture/pr67600.C, after revision r257412 on aarch64-none-linux-gnu and x86-64-none-linux-gnu. Would you mind taking a look?
[Bug lto/84242] [8 Regression] g++.dg/torture/pr67600.C at r257412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84242 --- Comment #1 from James Greenhalgh --- Also gcc.target/i386/mvc9.c on x86-64-none-linux-gnu.
[Bug testsuite/84243] New: [8 Regression] gcc.target/i386/cet-intrin-4.c at r257414
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84243 Bug ID: 84243 Summary: [8 Regression] gcc.target/i386/cet-intrin-4.c at r257414 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org CC: itsimbal at gcc dot gnu.org Target Milestone: --- Target: x86-64-none-linux-gnu, aarch64-none-linux-gnu Hi, our bisect robot spotted a failure in gcc.target/i386/cet-intrin-3.c, gcc.target/i386/cet-intrin-4.c, after revision r257414 on x86-64-none-linux-gnu, and c-c++-common/fcf-protection-6.c and c-c++-common/fcf-protection-7.c on aarch64-none-linux.gnu. Would you mind taking a look? Your new tests will always FAIL on non-x86 targets (for example aarch64-none-linux-gnu). Is dg-error really the right directive, that is a guaranteed FAIL, I would expect a skip.
[Bug testsuite/84243] [8 Regression] gcc.target/i386/cet-intrin-4.c at r257414
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84243 --- Comment #2 from James Greenhalgh --- gcc -v: Configured with: .../gcc/configure --disable-bootstrap --enable-languages=c,c++,fortran --disable-multilib --disable-libsanitizer --prefix=.../build/install/ FAIL: gcc.target/i386/cet-intrin-3.c (internal compiler error) FAIL: gcc.target/i386/cet-intrin-3.c (test for excess errors) Excess errors: .../build/gcc/include/pmmintrin.h:35:9: internal compiler error: in ix86_option_override_internal, at config/i386/i386.c:4952 0xfa1687 ix86_option_override_internal .../gcc/config/i386/i386.c:4952 0xfaf246 ix86_valid_target_attribute_tree(tree_node*, gcc_options*, gcc_options*) .../gcc/config/i386/i386.c:5656 0x76b7cb ix86_pragma_target_parse .../gcc/config/i386/i386-c.c:539 0x743cd3 handle_pragma_target .../gcc/c-family/c-pragma.c:907 0x6c2349 c_parser_pragma .../gcc/c/c-parser.c:11122 0x6e600d c_parser_external_declaration .../gcc/c/c-parser.c:1624 0x6e6971 c_parser_translation_unit .../gcc/c/c-parser.c:1524 0x6e6971 c_parse_file() .../gcc/c/c-parser.c:18410 0x7417f5 c_common_parse_file() .../gcc/c-family/c-opts.c:1132 FAIL: gcc.target/i386/cet-intrin-4.c (test for excess errors) Excess errors: cc1: error: '-fcf-protection=full' requires Intel CET support. Use -mcet or both of -mibt and -mshstk options to enable CET
[Bug rtl-optimization/86685] [8/9 Regression] 436.cactusADM regression on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86685 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-07-30 CC||jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from James Greenhalgh --- On the platforms I'm looking at, this is equal to a 13% regression in dynamic instruction count, and a code size regression in the key loop. Confirmed.
[Bug target/84521] [8 Regression] aarch64: Frame-pointer corruption with setjmp/longjmp and -fomit-frame-pointer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-02-22 CC||ramana.radhakrishnan at arm dot co ||m Ever confirmed|0 |1 --- Comment #2 from James Greenhalgh --- It is a bug that we have changed to -fomit-frame-pointer by default for AArch64. That changes a long standing ABI decision made at the dawn of the port, and promised as a feature of the architecture. I would like to see this fixed for GCC 8. Ramana was testing a patch to fix this and change us back to -fno-omit-frame-pointer, it (or someone else's patch achieving the same) would be appreciated as the immediate fix for this issue. I haven't validated the longer-term problem you mention with -fomit-frame-pointer. Ramana, can you pick this up and set us back to the appropriate default? Otherwise, I can spin a patch. We should fix this urgently, or we miss the good value that comes from whole-distribution testing.
[Bug tree-optimization/69556] New: [6 Regression] forwprop4/match.pd undoing work from recip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69556 Bug ID: 69556 Summary: [6 Regression] forwprop4/match.pd undoing work from recip Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- For this code compiled at -Ofast: double bar (double, double, double, double, double); double foo (double a) { return bar (1.0/a, 2.0/a, 4.0/a, 8.0/a, 16.0/a); } GCC 5 generates: foo: .LFB0: .cfi_startproc movsd .LC0(%rip), %xmm1 movsd .LC1(%rip), %xmm4 movsd .LC2(%rip), %xmm3 divsd %xmm0, %xmm1 movsd .LC3(%rip), %xmm2 mulsd %xmm1, %xmm4 movapd %xmm1, %xmm0 mulsd %xmm1, %xmm3 mulsd %xmm1, %xmm2 addsd %xmm1, %xmm1 jmp bar (i.e. one divide, 4 multiplies) GCC trunk at revision r232907 generates: foo: .LFB0: .cfi_startproc movapd %xmm0, %xmm5 movsd .LC0(%rip), %xmm4 movsd .LC4(%rip), %xmm0 movsd .LC1(%rip), %xmm3 movsd .LC2(%rip), %xmm2 movsd .LC3(%rip), %xmm1 divsd %xmm5, %xmm0 divsd %xmm5, %xmm4 divsd %xmm5, %xmm3 divsd %xmm5, %xmm2 divsd %xmm5, %xmm1 jmp bar (i.e. 5 divides) This is bad for performance. forwprop4 shows: Applying pattern match.pd:453, gimple-match.c:32116 gimple_simplified to _2 = 1.6e+1 / a_1(D); Applying pattern match.pd:453, gimple-match.c:32116 gimple_simplified to _3 = 8.0e+0 / a_1(D); Applying pattern match.pd:453, gimple-match.c:32116 gimple_simplified to _4 = 4.0e+0 / a_1(D); Applying pattern match.pd:453, gimple-match.c:32116 gimple_simplified to _5 = 2.0e+0 / a_1(D); This starts with r229107 which moves the (C1/X)*C2 into (C1*C2)/X pattern from fold-const.c to match.pd.
[Bug tree-optimization/69556] [6 Regression] forwprop4/match.pd undoing work from recip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69556 James Greenhalgh changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #3 from James Greenhalgh --- (In reply to Andrew Pinski from comment #2) > (In reply to Andrew Pinski from comment #1) > > I suspect we should disable "Fold (C1/X)*C2 into (C1*C2)/X" for gimple then > > and have it only for generic. > > Or check for single use of the divide. I had thought that was what the :s in the first line of pattern was trying to do: (simplify (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2) (if (flag_associative_math) (with { tree tem = const_binop (MULT_EXPR, type, @0, @2); } (if (tem) (rdiv { tem; } @1) If I capture the rdiv, and explicitly check it for single_use (as in the untested patch below), then the rule fails. So there's either a misunderstanding/disagreement here about what :s implies, or the match.pd machinery has a bug. diff --git a/gcc/match.pd b/gcc/match.pd index 5f28215..9460a9b 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -445,11 +445,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify - (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2) + (mult (rdiv:s@3 REAL_CST@0 @1) REAL_CST@2) (if (flag_associative_math) (with { tree tem = const_binop (MULT_EXPR, type, @0, @2); } -(if (tem) +(if (tem && single_use (@3)) (rdiv { tem; } @1) /* Convert C1/(X*C2) into (C1/C2)/X */
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 --- Comment #2 from James Greenhalgh --- (In reply to Jakub Jelinek from comment #1) > I guess ifcvt only triggers some latent bug, either RA or more likely in > reg-stack. That said, all the comments about the r229822 changes say its > purpose is to handle multiple sets in the conditional block, but clearly the > patch as implemented considers one set to be also multiple sets. The > problem with that is that it handles it worse than the code later on in > ifcvt - it uses temporaries and hopes later passes get rid of those > temporaries, but they actually affect the register allocation. > By restricting the ifcvt multiple sets coversion to actually multiple sets > like: > --- gcc/ifcvt.c.jj2016-01-21 17:53:32.0 +0100 > +++ gcc/ifcvt.c 2016-01-31 13:47:34.171323086 +0100 > @@ -3295,7 +3295,7 @@ bb_ok_for_noce_convert_multiple_sets (ba >if (count > limit) > return false; > > - return count > 0; > + return count > 1; > } > > /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to > convert > the test passes again, after ifcvt there are no additional unneeded > temporaries and e.g. postreload dump contains 5 fewer instruction, and has > fewer spills/fills. Of course we really need to figure out what the bug > actually is, but unless there is some strong reason (which should be > documented), IMHO the above patch is right too. Yes, that patch makes sense to me. If other ifcvt paths are doing a better job of handling a single register move than the multiple-set code, then we should use them.
[Bug target/69671] New: [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671 Bug ID: 69671 Summary: [6 Regression] FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org CC: kyrylo.tkachov at arm dot com Target Milestone: --- Target: x86_64-none-linux-gnu Starts with r233133: PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovdb-1.c scan-assembler-times vpmovdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqb-1.c scan-assembler-times vpmovqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovqw-1.c scan-assembler-times vpmovqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times vpmovsdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times vpmovsdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times vpmovsdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsdb-1.c scan-assembler-times vpmovsdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times vpmovsqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times vpmovsqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times vpmovsqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqb-1.c scan-assembler-times vpmovsqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times vpmovsqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times vpmovsqw[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times vpmovsqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovsqw-1.c scan-assembler-times vpmovsqw[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times vpmovusdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times vpmovusdb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times vpmovusdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusdb-1.c scan-assembler-times vpmovusdb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times vpmovusqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times vpmovusqb[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times vpmovusqb[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(? PASS->FAIL: gcc.target/i386/avx512vl-vpmovusqb-1.c scan-assembler-times vpmovusqb[ \\t]+[^{\n]*%ymm[0-9]+
[Bug testsuite/69371] UNRESOLVED: special_functions/18_riemann_zeta/check_value.cc compilation failed to produce executable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69371 James Greenhalgh changed: What|Removed |Added Target|arm-none-eabi |arm-none-eabi, ||aarch64-none-elf, ||aarch64_be-none-elf Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-04 Ever confirmed|0 |1 --- Comment #6 from James Greenhalgh --- Confirmed, and also seen on aarch64-none-elf and aarch64_be-none-elf.
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-22 CC||jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from James Greenhalgh --- Confirmed, I'm trying to figure out what is going wrong.
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 James Greenhalgh changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jgreenhalgh at gcc dot gnu.org --- Comment #2 from James Greenhalgh --- At the heart of the problem, the compiler has decided that the second parameter to this templated function has an overaligned member (64-byte aligned in f2, 8-byte aligned in f1). This gives different parameter passing rules, and you get the code difference above. I haven't figured out what causes the alignment to differ between the two TUs, or why the compiler feels it is safe to propagate the alignment information without specializing the function name. I'll take the bug while I look deeper.
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 James Greenhalgh changed: What|Removed |Added CC||alan.lawrence at arm dot com, ||jakub at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #3 from James Greenhalgh --- I'm still confused by this. After coming out of the front end I checked the DECL_ALIGN for each field of each of the parameters being passed to this function. I see: f1.ii std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_emplace_hint_unique(std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::const_iterator, _Args&& ...) (struct _Rb_tree * const this (decl alignment: 32), struct const_iterator __pos Fields: _M_node (decl alignment: 32) _Rb_tree_const_iterator (decl alignment: 8) value_type (decl alignment: 8) reference (decl alignment: 32) pointer (decl alignment: 32) iterator (decl alignment: 8) iterator_category (decl alignment: 8) difference_type (decl alignment: 32) _Self (decl alignment: 8) _Base_ptr (decl alignment: 32) _Link_type (decl alignment: 32) (decl alignment: 32, max field alignment: 32), const struct piecewise_construct_t & __args#0 (decl alignment: 32), struct tuple & __args#1 (decl alignment: 32), struct tuple & __args#2 (decl alignment: 32)) f2.ii std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_emplace_hint_unique(std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::const_iterator, _Args&& ...) (struct _Rb_tree * const this (decl alignment: 32), struct const_iterator __pos Fields: _M_node (decl alignment: 32) _Rb_tree_const_iterator (decl alignment: 8) value_type (decl alignment: 64) reference (decl alignment: 32) pointer (decl alignment: 32) iterator (decl alignment: 32) iterator_category (decl alignment: 8) difference_type (decl alignment: 32) _Self (decl alignment: 8) _Base_ptr (decl alignment: 32) _Link_type (decl alignment: 32) (decl alignment: 32, max field alignment: 64), const struct piecewise_construct_t & __args#0 (decl alignment: 32), struct tuple & __args#1 (decl alignment: 32), struct tuple & __args#2 (decl alignment: 32)) --- That is to say, after gimplification we've already decided that the alignment of the value_type field of the std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::const_iterator parameter to std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_emplace_hint_unique in f2.ii is 64, whereas in f1.ii we don't have any extra alignment information. I know nothing about the C++ front-end and how we could end up in this situation. I can understand why, given this, we would generate the code we do for ARM.
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 James Greenhalgh changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #4 from James Greenhalgh --- Goes away on trunk after r223301 Author: jason Date: Mon May 18 17:14:11 2015 + DR 1391 * pt.c (type_unification_real): Check convertibility here. (unify_one_argument): Not here. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@223301 After which the DECL_ALIGN in both TUs is 64, fixing the bug.
[Bug testsuite/70009] test case libgomp.oacc-c-c++-common/vprop.c fails starting with its introduction in r233607
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70009 James Greenhalgh changed: What|Removed |Added Target|powerpc*-*-*, aarch64-*-* |powerpc*-*-*, aarch64-*-*, ||arm*-*-* Last reconfirmed|2016-02-29 00:00:00 |2016-3-7 CC||jgreenhalgh at gcc dot gnu.org --- Comment #5 from James Greenhalgh --- Also failing on arm/aarch64 (so good further evidence of signed vs. unsigned char). Forcing the macro to use signed types clears the error for me on arm-none-linux-gnueabihf (though I don't know if this is correct).
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 --- Comment #5 from James Greenhalgh --- I don't know enough about the C++ standard to know whether this patch is reasonable to backport to GCC 5. Jason, do you have an opinion?
[Bug testsuite/68232] gcc.dg/ifcvt-4.c fails on some arm configurations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232 --- Comment #8 from James Greenhalgh --- (In reply to Pat Haugen from comment #6) > (In reply to James Greenhalgh from comment #5) > > "Fixed" with the testsuite skips. Feel free to add any other target triplets > > for which this test is unreliable. > > I was going to modify the powerpc64le triplet to just powerpc*-*-* since it > also fails for powerpc64 (big endian) and powerpc-ibm-aix, but looking at > gcc/config/rs6000/rs6000.h, it has BRANCH_COST defined to a non-zero value: > > #define BRANCH_COST(speed_p, predictable_p) 3 > > > So there must be something more than just "doesn't work for targets with > branch cost == 0". I'm still happy to make the change if there are other > reasons, but didn't want to do so without hearing first. Sorry that I took a while to get round to looking at this. For powerpc64 you'll need to enable conditional move instructions using "-misel" (or equivalent) for this test to pass. For hppa64, the "experimental" movdicc pattern has this restriction: if (GET_MODE (XEXP (operands[1], 0)) != DImode || GET_MODE (XEXP (operands[1], 0)) != GET_MODE (XEXP (operands[1], 1))) But, we're trying to expand with this comparison in operands[1]: (le (subreg/s/u:SI (reg/v:DI 70 [ x+-4 ]) 4) (subreg/s/u:SI (reg/v:DI 71 [ y+-4 ]) 4)) so this test fails, and we fail to ifcvt the sequence. The test should be skipped on hppa64 until more complete support for conditional moves is added.
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 James Greenhalgh changed: What|Removed |Added Status|WAITING |NEW CC||jgreenhalgh at gcc dot gnu.org --- Comment #5 from James Greenhalgh --- The crux of this issue is going to be that your Cortex-A53 has no support for the cryptography extension, but does have support for the CRC extensions. By inspection of host_detect_local_cpu, I see that we run through all the extensions that we know about, checking to see whether that extension is a substring of the Features we read from /proc/cpuinfo . If it is we add +extension, if not we add +noextension. So, it seems reasonable to me that if we run this algorithm on a core without crypto, but with CRC, we'll get the string described (-march=armv8-a+fp+simd+nocrypto+crc+nolse) forwarded to the assembler on command line. And sure enough, the assembler wants to read everything you've got before you start telling it what you've not got. I see a few issues. 1) There's not really a good reason for an assembler to have this syntax restriction. The code does the right thing whatever order you put your features in. 2) We'll have to support these older assemblers anyway, so at the least we'll have to hold off writing the "+no" extension strings until we're done with the "+" extension strings. 3) We should think about whether we need to put out these +no extension strings at all. I don't like that for my older systems I'll need to keep updating my binutils to cover any new extension strings (e.g. +nolse) that are added by GCC if I want to use -march=native . We shouldn't force that if we don't have to. So, Confirmed.
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 James Greenhalgh changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jgreenhalgh at gcc dot gnu.org --- Comment #8 from James Greenhalgh --- (In reply to Christophe Lyon from comment #6) > > 3) We should think about whether we need to put out these +no extension > > strings at all. I don't like that for my older systems I'll need to keep > > updating my binutils to cover any new extension strings (e.g. +nolse) that > > are added by GCC if I want to use -march=native . We shouldn't force that if > > we don't have to. > > > > Do you know why these +no where introduced in the first place? > > Why would there be a difference between "+nolse" and "" for instance? We don't keep track (in aarch64-driver.c) of which flags are implicitly included (e.g. +fp+simd) and would need an explicit +nofp to disable, and which flags need explicitly enabled (e.g. +crc) and so don't need to be explicitly disabled. I'm working on a clean-up.
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 James Greenhalgh changed: What|Removed |Added Target||aarch64*-none-linux-gnu Host||aarch64*-none-linux-gnu Version|5.3.1 |6.0 Target Milestone|--- |6.0
[Bug rtl-optimization/68749] FAIL: gcc.dg/ifcvt-4.c scan-rtl-dump ce1 "2 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68749 --- Comment #4 from James Greenhalgh --- Hi, sorry I missed this. I need to write a better filter for bugs I'm CCed on, I'll work on that. I'm hitting the limits of what I can guess from the Sparc machine files. I don't understand why we get an expansion for the conditional branch that explicitly generates new temporaries for i and j, necessitating an if..else.. structure. Compare how we expand on Sparc: --- (insn 12 5 13 2 (set (reg:CC 100 %icc) (compare:CC (subreg/s/u:SI (reg/v:DI 113 [ xD.1388+-4 ]) 4) (subreg/s/u:SI (reg/v:DI 114 [ yD.1389+-4 ]) 4))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1 (nil)) (jump_insn 13 12 14 2 (set (pc) (if_then_else (le (reg:CC 100 %icc) (const_int 0 [0])) (label_ref:DI 29) (pc))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 48 {*normal_branch} (int_list:REG_BR_PROB 3900 (nil)) -> 29) ;; succ: 4 [61.0%] (FALLTHRU) ;; 5 [39.0%] ;; basic block 4, loop depth 0, count 0, freq 6100, maybe hot ;; prev block 2, next block 5, flags: (NEW, REACHABLE, RTL, MODIFIED) ;; pred: 2 [61.0%] (FALLTHRU) (note 14 13 8 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 8 14 9 4 (set (reg/v:SI 110 [ jD.1394 ]) (subreg:SI (reg/v:DI 115 [ aD.1390+-4 ]) 4)) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:15 -1 (nil)) (insn 9 8 26 4 (set (reg/v:SI 109 [ iD.1393 ]) (subreg:SI (reg/v:DI 115 [ aD.1390+-4 ]) 4)) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:14 -1 (nil)) (jump_insn 26 9 27 4 (set (pc) (label_ref 15)) -1 (nil) -> 15) ;; succ: 6 [100.0%] (barrier 27 26 29) ;; basic block 5, loop depth 0, count 0, freq 3900, maybe hot ;; prev block 4, next block 6, flags: (NEW, REACHABLE, RTL, MODIFIED) ;; pred: 2 [39.0%] (code_label 29 27 28 5 3 "" [1 uses]) (note 28 29 6 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 6 28 7 5 (set (reg/v:SI 110 [ jD.1394 ]) (subreg:SI (reg/v:DI 114 [ yD.1389+-4 ]) 4)) -1 (nil)) (insn 7 6 15 5 (set (reg/v:SI 109 [ iD.1393 ]) (subreg:SI (reg/v:DI 113 [ xD.1388+-4 ]) 4)) -1 (nil)) ;; succ: 6 [100.0%] (FALLTHRU) ;; basic block 6, loop depth 0, count 0, freq 1, maybe hot ;; prev block 5, next block 1, flags: (NEW, REACHABLE, RTL) ;; pred: 5 [100.0%] (FALLTHRU) ;; 4 [100.0%] (code_label 15 7 16 6 2 "" [1 uses]) (note 16 15 17 6 [bb 6] NOTE_INSN_BASIC_BLOCK) (insn 17 16 18 6 (set (reg:DI 117) (mult:DI (subreg:DI (reg/v:SI 109 [ iD.1393 ]) 0) (subreg:DI (reg/v:SI 110 [ jD.1394 ]) 0))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:17 -1 (nil)) --- Where [bb 5] acts as an else block setting registers 109/110 to the "old" values. And the AArch64 expansion of the same: --- (insn 10 5 11 2 (set (reg:CC 66 cc) (compare:CC (reg/v:SI 74 [ xD.2750 ]) (reg/v:SI 75 [ yD.2751 ]))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1 (nil)) (jump_insn 11 10 12 2 (set (pc) (if_then_else (le (reg:CC 66 cc) (const_int 0 [0])) (label_ref 13) (pc))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:12 -1 (int_list:REG_BR_PROB 3900 (nil)) -> 13) ;; succ: 4 [61.0%] (FALLTHRU) ;; 5 [39.0%] ;; basic block 4, loop depth 0, count 0, freq 6100, maybe hot ;; prev block 2, next block 5, flags: (NEW, REACHABLE, RTL, MODIFIED) ;; pred: 2 [61.0%] (FALLTHRU) (note 12 11 6 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 6 12 7 4 (set (reg/v:SI 75 [ yD.2751 ]) (reg/v:SI 76 [ aD.2752 ])) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:15 -1 (nil)) (insn 7 6 13 4 (set (reg/v:SI 74 [ xD.2750 ]) (reg/v:SI 76 [ aD.2752 ])) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:14 -1 (nil)) ;; succ: 5 [100.0%] (FALLTHRU) ;; basic block 5, loop depth 0, count 0, freq 1, maybe hot ;; prev block 4, next block 1, flags: (NEW, REACHABLE, RTL) ;; pred: 2 [39.0%] ;; 4 [100.0%] (FALLTHRU) (code_label 13 7 14 5 2 "" [1 uses]) (note 14 13 15 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 15 14 16 5 (set (reg:SI 77) (mult:SI (reg/v:SI 74 [ xD.2750 ]) (reg/v:SI 75 [ yD.2751 ]))) ../../src/gcc/gcc/testsuite/gcc.dg/ifcvt-4.c:17 -1 (nil)) --- I guess it is those subregs down from DImode to SImode. Sure enough, if we swap int for long in this testcase, we get the expected expansion and the expected number of true changes made. So, I'm not worried that the optimization is broken for Sparc (it does the right thing for long), but I'm not sure I know the best way to work around this for your target. swapping int for long would also help HPPA. HPPA chose to skip the test entirely. That might also be right for Sparc. What do you think?
[Bug c++/70494] Capturing an array of vectors in a lambda
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70494 James Greenhalgh changed: What|Removed |Added Target||*-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2016-04-01 CC||jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from James Greenhalgh --- Fails for me on trunk and 5.3. Trunk backtrace for an aarch64-none-elf compiler (but the target doesn't matter, same fail on arm-none-eabi and a not-quite-trunk x86_64-none-linux-gnu): foo.cpp: In function ‘int main()’: foo.cpp:7:23: internal compiler error: tree check: expected record_type or union_type or qual_union_type, have array_type in build_special_member_call, at cp/call.c:7936 auto lambda = [v]{}; 0xf52300 tree_check_failed(tree_node const*, char const*, int, char const*, ...) .../tree.c:9643 0x5b24b2 tree_check3(tree_node*, char const*, int, char const*, tree_code, tree_code, tree_code) .../tree.h:3046 0x5b24b2 build_special_member_call(tree_node*, tree_node*, vec**, tree_node*, int, int) .../cp/call.c:7951 0x661169 split_nonconstant_init_1 .../cp/typeck2.c:695 0x66248d split_nonconstant_init(tree_node*, tree_node*) .../cp/typeck2.c:745 0x666ca1 store_init_value(tree_node*, tree_node*, vec**, int) .../cp/typeck2.c:850 0x5df656 check_initializer .../cp/decl.c:6150 0x5e4d52 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int) .../cp/decl.c:6798 0x6e7109 cp_parser_init_declarator .../cp/parser.c:18658 0x6e73bb cp_parser_simple_declaration .../cp/parser.c:12379 0x6e7f7c cp_parser_block_declaration .../cp/parser.c:12248 0x6e80c6 cp_parser_declaration_statement .../cp/parser.c:11860 0x6c7b07 cp_parser_statement .../cp/parser.c:10528 0x6c7bea cp_parser_statement_seq_opt .../cp/parser.c:10806 0x6c7ce6 cp_parser_compound_statement .../cp/parser.c:10760 0x6e647d cp_parser_function_body .../cp/parser.c:20653 0x6e647d cp_parser_ctor_initializer_opt_and_function_body .../cp/parser.c:20689 0x6e677d cp_parser_function_definition_after_declarator .../cp/parser.c:25351 0x6e6b52 cp_parser_function_definition_from_specifiers_and_declarator .../cp/parser.c:25263 0x6e6b52 cp_parser_init_declarator .../cp/parser.c:18429
[Bug target/67896] Inconsistent behaviour between C and C++ for types poly8x8_t and poly16x8_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67896 --- Comment #6 from James Greenhalgh --- Author: jgreenhalgh Date: Fri Apr 1 09:45:44 2016 New Revision: 234665 URL: https://gcc.gnu.org/viewcvs?rev=234665&root=gcc&view=rev Log: Backport: [PATCH] Do not set structural equality on polynomial types gcc/ChangeLog: PR target/67896 * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtin_types): Do not set structural equality to __Poly{8,16,64,128}_t types. gcc/testsuite/ChangeLog: PR target/67896 * gcc.target/aarch64/simd/pr67896.C: New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C - copied unchanged from r232818, trunk/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C Modified: branches/gcc-5-branch/ (props changed) branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/aarch64/aarch64-builtins.c branches/gcc-5-branch/gcc/testsuite/ChangeLog Propchange: branches/gcc-5-branch/ ('svn:mergeinfo' modified)
[Bug target/67896] Inconsistent behaviour between C and C++ for types poly8x8_t and poly16x8_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67896 James Greenhalgh changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from James Greenhalgh --- Fixed on trunk and 5.
[Bug c++/70531] Turning optimisation level 2 causes the output program to go into infinite loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70531 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jgreenhalgh at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from James Greenhalgh --- Try compiling and running with -fsanitize=undefined. You have a bug in your logic that results in an out-of-bounds memory access: .../ab2.cpp:97:26: runtime error: index -1 out of bounds for type 'long long int [101]' .../ab2.cpp:97:18: runtime error: index -1 out of bounds for type 'long long int [101][101][101]' Segmentation fault (core dumped) (At least) this condition is in the wrong place: if (xs > xe || ys > ye) return 0; When rec is called with arguments (0, -1, 0, -1) (as it will be), this condition comes after the memory dereference at: if (dp[xs][xe][ys][ye] != -1) return dp[xs][xe][ys][ye]; So you will be trying to access dp[0][-1][0][-1] - which is invalid. I haven't fully audited your code for other logic errors. Please check your algorithm. For simple inputs I always get a crash, not an infinite loop - but such is the nature of undefined behaviour. If your bug report relies on particular input to cause the loop, you'll need to provide that. As it stands, this looks invalid, but feel free to reopen it after you have audited your code for other undefined sequences.
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 --- Comment #10 from James Greenhalgh --- Author: jgreenhalgh Date: Mon Apr 11 10:14:59 2016 New Revision: 234876 URL: https://gcc.gnu.org/viewcvs?rev=234876&root=gcc&view=rev Log: [Patch AArch64 2/3] Rework the code to print extension strings (pr70133) gcc/ PR target/70133 * config/aarch64/aarch64-common.c (aarch64_option_extension): Keep track of a canonical flag name. (all_extensions): Likewise. (arch_to_arch_name): Also track extension flags enabled by the arch. (all_architectures): Likewise. (aarch64_parse_extension): Move to here. (aarch64_get_extension_string_for_isa_flags): Take a new argument, rework. (aarch64_rewrite_selected_cpu): Update for above change. * config/aarch64/aarch64-option-extensions.def: Rework the way flags are handled, such that the single explicit value enabled by an extension is kept seperate from the implicit values it also enables. * config/aarch64/aarch64-protos.h (aarch64_parse_opt_result): Move to here. (aarch64_parse_extension): New. * config/aarch64/aarch64.c (aarch64_parse_opt_result): Move from here to config/aarch64/aarch64-protos.h. (aarch64_parse_extension): Move from here to common/config/aarch64/aarch64-common.c. (aarch64_option_print): Update. (aarch64_declare_function_name): Likewise. (aarch64_start_file): Likewise. * config/aarch64/driver-aarch64.c (arch_extension): Keep track of the canonical flag for extensions. * config.gcc (aarch64*-*-*): Extend regex for capturing extension flags. gcc/testsuite/ PR target/70133 * gcc.target/aarch64/mgeneral-regs_4.c: Fix expected output. * gcc.target/aarch64/target_attr_15.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/common/config/aarch64/aarch64-common.c trunk/gcc/config.gcc trunk/gcc/config/aarch64/aarch64-option-extensions.def trunk/gcc/config/aarch64/aarch64-protos.h trunk/gcc/config/aarch64/aarch64.c trunk/gcc/config/aarch64/driver-aarch64.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_4.c trunk/gcc/testsuite/gcc.target/aarch64/target_attr_15.c
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 --- Comment #11 from James Greenhalgh --- Author: jgreenhalgh Date: Mon Apr 11 10:16:26 2016 New Revision: 234877 URL: https://gcc.gnu.org/viewcvs?rev=234877&root=gcc&view=rev Log: [Patch AArch64 3/3] Fix up for pr70133 gcc/ PR target/70133 * config/aarch64/driver-aarch64.c (aarch64_get_extension_string_for_isa_flags): New. (arch_extension): Rename to... (aarch64_arch_extension): ...This. (ext_to_feat_string): Rename to... (aarch64_extensions): ...This. (aarch64_core_data): Keep track of architecture extension flags. (cpu_data): Rename to... (aarch64_cpu_data): ...This. (aarch64_arch_driver_info): Keep track of architecture extension flags. (get_arch_name_from_id): Rename to... (get_arch_from_id): ...This, change return type. (host_detect_local_cpu): Update and reformat for renames, handle extensions through common infrastructure. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/driver-aarch64.c
[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133 James Greenhalgh changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #12 from James Greenhalgh --- Fixed on trunk with r234875 r234876 and r234877 . You'll need to contact Linaro through their support/bug channels if you think these fixes should be ported to the Linaro releases.
[Bug target/69841] Wrong template instantiation in C++11 on armv7l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69841 James Greenhalgh changed: What|Removed |Added CC||jason at redhat dot com --- Comment #6 from James Greenhalgh --- *ping*
[Bug c++/70657] testing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70657 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jgreenhalgh at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from James Greenhalgh --- Please, stop this.
[Bug c/70707] INT_MAX used before it is defined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70707 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jgreenhalgh at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from James Greenhalgh --- Hi Lewis, This bugzilla is for reporting bugs against GCC, rather than asking for usage help. Feel free to post the same message on gcc-h...@gcc.gnu.org where you're more likely to get an answer. Thanks, James Greenhalgh
[Bug target/70809] New: [AArch64] aarch64_vmls pattern should be rejected if -ffp-contract=off
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70809 Bug ID: 70809 Summary: [AArch64] aarch64_vmls pattern should be rejected if -ffp-contract=off Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Take this simple testcase: void foo (float * __restrict__ __attribute__ ((aligned (16))) a, float * __restrict__ __attribute__ ((aligned (16))) x, float * __restrict__ __attribute__ ((aligned (16))) y, float * __restrict__ __attribute__ ((aligned (16))) z) { unsigned i = 0; for (i = 0; i < 256; i++) a[i] = x[i] - (y[i] * z[i]); } GCC for AArch64 (all versions) will generate a vectorized fmls instruction even when given the --fp-contract=off command (for trunk and 6 you'll need to play with -mcpu options to find one which permits the combine through the cost model): (for trunk) $ gcc -O3 -ffp-contract=off -mcpu=xgene1 foo.c .L4: ldr q2, [x9, x4] add w5, w5, 1 ldr q1, [x8, x4] cmp w5, w7 ldr q0, [x10, x4] fmlsv0.4s, v2.4s, v1.4s str q0, [x6, x4] add x4, x4, 16 bcc .L4 The problem seems pretty clear, the aarch64_vmls pattern needs to be tightened up not to fuse multiplies and subtracts when we're not in -ffp-contract=fast. (define_insn "aarch64_vmls" [(set (match_operand:VDQF 0 "register_operand" "=w") (minus:VDQF (match_operand:VDQF 1 "register_operand" "0") (mult:VDQF (match_operand:VDQF 2 "register_operand" "w") (match_operand:VDQF 3 "register_operand" "w"] "TARGET_SIMD" "fmls\\t%0., %2., %3." [(set_attr "type" "neon_fp_mla__scalar")] )
[Bug target/66200] GCC for ARM / AArch64 doesn't define TARGET_RELAXED_ORDERING
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66200 James Greenhalgh changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||jgreenhalgh at gcc dot gnu.org Resolution|--- |FIXED --- Comment #11 from James Greenhalgh --- Looks like this is fixed on all live branches. Ramana, please reopen if there is something more to be done that I've missed.
[Bug tree-optimization/71478] New: ICE in tree-ssa-reassoc.c after r236564
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71478 Bug ID: 71478 Summary: ICE in tree-ssa-reassoc.c after r236564 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org CC: kugan.vivekanandarajah at linaro dot org Target Milestone: --- Created attachment 38671 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38671&action=edit Reduced testcase I took a look at PR71170, PR71230, PR71252 and PR 71281, but they seemed subtly different to my issue. The attached testcase fails for me on trunk with -O1 on x86_x64-unknown-linux-gnu and aarch64-none-linux-gnu. The ICE looks like: x.c: In function 'foo': x.c:7:1: internal compiler error: gimple check: expected gimple_assign(error_mark), have gimple_call() in gimple_assign_rhs1, at gimple.h:2493 foo (void) ^~~ 0x856d7b gimple_check_failed(gimple const*, char const*, int, char const*, gimple_code, tree_code) .../gcc/gimple.c:1177 0xc92547 GIMPLE_CHECK2 .../gcc/gimple.h:73 0xc92547 gimple_assign_rhs1 .../gcc/gimple.h:2493 0xc96d28 rewrite_expr_tree .../gcc/tree-ssa-reassoc.c:3834 0xc97112 rewrite_expr_tree .../gcc/tree-ssa-reassoc.c:3931 0xca1970 reassociate_bb .../gcc/tree-ssa-reassoc.c:5372 0xca1c07 reassociate_bb .../gcc/tree-ssa-reassoc.c:5414 0xca21c0 do_reassoc .../gcc/tree-ssa-reassoc.c:5528 0xca21c0 execute_reassoc .../gcc/tree-ssa-reassoc.c:5615 0xca21c0 execute .../gcc/tree-ssa-reassoc.c:5654 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
[Bug target/81456] [7/8 Regression] x86-64 optimizer makes wrong decision when optimizing for size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81456 --- Comment #2 from James Greenhalgh --- (In reply to Martin Liška from comment #1) > Confirmed, started with r238594. The cost model relies on the target giving a reasonable approximation for an instruction size through ix86_rtx_costs. The basic branch structure looks like: t = mod if (a / b % 2) t = b - mod In RTL, this looks like: (insn 14 13 15 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 99) (const_int 0 [0]))) "foo.c":5 3 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 99) (nil))) (jump_insn 15 14 16 2 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref:DI 22) (pc))) "foo.c":5 617 {*jcc_1} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 2 (nil))) -> 22) (note 16 15 17 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 17 16 22 3 (parallel [ (set (reg/v:SI 93 [ ]) (minus:SI (reg/v:SI 95 [ b ]) (reg/v:SI 93 [ ]))) (clobber (reg:CC 17 flags)) ]) "foo.c":5 273 {*subsi_1} (expr_list:REG_DEAD (reg/v:SI 95 [ b ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (code_label 22 17 25 4 1 (nil) [1 uses]) That is to say, we're starting with a comparison, a branch and a subtract. We want to know if that sequence is cheaper than a subtract a and conditional select. In the cost model, we take an approximation for the branch and comparison of COST_N_INSNS(2) and the backend tells us the cost of a subtract is COST_N_INSNS(1). Thus, the cost before transformation is COST_N_INSNS (3) == 12. After the transformation, we create this RTL: (insn 31 0 32 (set (reg:SI 102) (reg/v:SI 93 [ ])) 82 {*movsi_internal} (nil)) (insn 32 31 33 (parallel [ (set (reg:SI 101) (minus:SI (reg/v:SI 95 [ b ]) (reg/v:SI 93 [ ]))) (clobber (reg:CC 17 flags)) ]) 273 {*subsi_1} (nil)) (insn 33 32 34 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 99) (const_int 0 [0]))) 3 {*cmpsi_ccno_1} (nil)) (insn 34 33 0 (set (reg/v:SI 93 [ ]) (if_then_else:SI (ne (reg:CCZ 17 flags) (const_int 0 [0])) (reg:SI 101) (reg:SI 102))) 966 {*movsicc_noc} (nil)) That is a set to protect the "false" value, the same subtract, a comparison to set the flags, and a conditional move. When we ask the backend to give us costs for this it gives us COST_N_INSNS(1) for the set, COST_N_INSNS(1) for the subtract, COST_N_INSNS(1) for the comparison, and COST_N_INSNS(2) for the conditional move. That's a total cost of COST_N_INSNS(5) == 20 for the whole sequence. 20 > 12, so from the perspective of the ifcvt cost model this is a bad transformation. Note that ifcvt is not aware that an extra set will be introduced after the original subtract, nor does it care about the final movl %edx, %eax as that is unconditional. I thinks it is being asked to trade test, branch, subtract for set, subtract, test branch - when you spell it out like that it should be clear why it makes the decision it does. I can't treproduce your comment about -m32 - I still see branches at -Os.
[Bug middle-end/81832] [8 Regression] ICE in expand_LOOP_DIST_ALIAS, at internal-fn.c:2273
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81832 James Greenhalgh changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #2 from James Greenhalgh --- (In reply to Martin Liška from comment #1) > Confirmed, started with r250619. Interesting. That commit seems unlikely to have broken anything (if it does, the bug would be latent and would have been possible to trigger using the revision prior). My bisect points to r250959 , which seems much more likely, given the backtrace. What I imagine you've done with your bisect is continued back through the revisions with -ftree-loop-distribute set, that does get you to r250619, but as this is also really just a change to default "options", you should continue going back with -ftree-vectorize to find the real culprit. For example, r250617 will also ICE with -O3 -ftree-loop-distribute -ftree-vectorize . I think this is a general and latent problem with the interaction between the copy-header pass, and the loop distribution pass. Tracing back further I see this start with r249994 .
[Bug rtl-optimization/82237] New: [AArch64] Destructive operations result in poor register allocation after scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82237 Bug ID: 82237 Summary: [AArch64] Destructive operations result in poor register allocation after scheduling Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- A destructive operation is one in which an input operand is both read and written. For example, in the vector FMLA instruction in AArch64: FMLA v0.4s, v1.4s, v2.4s The first operand is used for the accumulator value (the operation is v0 = v0 + v1 * v2) and is both read and written by the instruction. In RTL terms, this is: (define_insn "fma4" [(set (match_operand:VHSDF 0 "register_operand" "=w") (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w") (match_operand:VHSDF 2 "register_operand" "w") (match_operand:VHSDF 3 "register_operand" "0")))] "TARGET_SIMD" "fmla\\t%0., %1., %2." [(set_attr "type" "neon_fp_mla_")] ) from config/aarch64/aarch64-simd.md . We can get suboptimal code where a read/write operand is used both by a destructive operation, and a non-destructive operation, and the destructive operation is scheduled before the non-destructive operation. For example, with this auto-vectorizable code (with trunk, -O3 -mcpu=cortex-a57): void foo (float* __restrict__ in1, float* __restrict__ in2, float* __restrict__ out1, float* __restrict__ out2) { for (int i = 0; i < 1024; i++) { float t = out1[i]; out1[i] = t + in1[i] * in2[i]; out2[i] = t + in1[i]; } } ldr q1, [x2, x4] ldr q0, [x0, x4] ldr q2, [x1, x4] mov v3.16b, v1.16b // <<<<<< 1) fmlav3.4s, v2.4s, v0.4s // <<<<<< 2) faddv0.4s, v0.4s, v1.4s // <<<<<< 3) str q3, [x2, x4] str q0, [x3, x4] The scheduling of 2) before 3) forces a reload from v1 in to v3 at 1). With an improved schedule, this could be: ldr q1, [x2, x4] ldr q0, [x0, x4] ldr q2, [x1, x4] faddv4.4s, v0.4s, v1.4s // <<<<<< 3) fmlav3.4s, v2.4s, v0.4s // <<<<<< 2) str q3, [x2, x4] str q4, [x3, x4] In larger loops, we can end up in this situation more frequently than we would like - the cost of the extra move instructions can be high.
[Bug rtl-optimization/82237] [AArch64] Destructive operations result in poor register allocation after scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82237 James Greenhalgh changed: What|Removed |Added Target||aarch64*-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2017-09-18 Ever confirmed|0 |1
[Bug testsuite/77634] some vectorized testcases fail with -mcpu=thunderx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77634 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jgreenhalgh at gcc dot gnu.org Resolution|--- |FIXED --- Comment #2 from James Greenhalgh --- Comment 1 claims this is fixed, Andrew, please reopen if it is still an issue.
[Bug target/63250] Complex fp16 arithmetic uses nonexistent libgcc functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63250 James Greenhalgh changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from James Greenhalgh --- This should have been fixed by my work last year, I think.
[Bug tree-optimization/79534] [7 Regression] tree-ifcombine aarch64 performance regression with trunk@245151
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534 --- Comment #8 from James Greenhalgh --- In the case before Honza's patch, corrupt profile information leads to a branch being marked as 100% taken. After Honza's patch, the branch is instead seen with 95.6% taken: (jump_insn 1916 1915 1922 309 (set (pc) (if_then_else (ne (reg:CC 66 cc) (const_int 0 [0])) (label_ref 1905) (pc))) "foo.cpp":59 9 {condjump} (expr_list:REG_DEAD (reg:CC 66 cc) (int_list:REG_BR_PROB 1 (nil))) -> 1905) ;; succ: 227 [95.6%] ;; 226 [4.4%] (FALLTHRU) That's enough for GCC to consider the branch unpredictable, which in turn causes GCC to use the "unpredictable" number for BRANCH_COST when setting the maximum , which when tuning for Cortex-A57 is 1 for predictable branches (not high enough to trigger the transform) and 3 for unpredictable branches (high enough to trigger the transform). That explains why we don't see the performance difference for -mcpu=generic, where BRANCH_COST always returns 2 - which is always high enough to trigger this if-conversion. The cost model looks reasonable, this is clearly a borderline case for the heuristic. The only thing I found surprising in my analysis of this regression is that GCC considers a 95.6% taken branch as unpredictable. I'm not sure what the correct course for fixing this is - nothing in the compiler seems to be broken, we're just on an unlucky side of the static prediction engine and the ifcvt heuristics.
[Bug tree-optimization/79534] [7 Regression] tree-ifcombine aarch64 performance regression with trunk@245151
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534 --- Comment #10 from James Greenhalgh --- The most striking improvement was in libquantum, for which we saw a 15% performance improvement on Cortex-A72 (3% on cortex-A57) directly attributable to basic block ordering after this patch. Otherwise, I don't have a direct before/after comparison for just Honza's patch across a wider set of benchmarks, but our nightly runs show general improvements in benchmarks from Spec which are sensitive to block reordering after the day of the patch. I don't see any large regressions in this time.
[Bug tree-optimization/79534] [7/8 Regression] tree-ifcombine aarch64 performance regression with trunk@245151
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79534 --- Comment #12 from James Greenhalgh --- So while there's nothing buggy about the if-conversion which causes the performance issue, it does show an interesting missed optimization that ifcvt can't handle. We make the transform through find_if_case_2, which looks for things of the form: /* TEST BLOCK */ if (test) goto E; // x not live /* FALLTHRU */ /* ELSE BLOCK */ x = big(); goto L; E: /* THEN BLOCK */ x = b; goto M; And transforms them to: /* Unconditional copy of THEN BLOCK */ x = b; /* TEST BLOCK */ if (test) goto M; /* ELSE BLOCK */ x = big(); goto L; In the testcase, using the naming conventions above, and snipping irrelevant details, this looks like: TEST BLOCK (309) ;; basic block 309, loop depth 4, count 0, freq 3153, maybe hot ;; prev block 308, next block 311, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 219 [98.0%] (insn 1915 1914 1916 309 (set (reg:CC 66 cc) (compare:CC (reg:SI 1117) (const_int 0 [0]))) "foo.cpp":59 391 {cmpsi} (expr_list:REG_DEAD (reg:SI 1117) (nil))) (jump_insn 1916 1915 1922 309 (set (pc) (if_then_else (ne (reg:CC 66 cc) (const_int 0 [0])) (label_ref 1905) (pc))) "foo.cpp":59 9 {condjump} (expr_list:REG_DEAD (reg:CC 66 cc) (int_list:REG_BR_PROB 9558 (nil))) -> 1905) ;; succ: 227 [95.6%] ;; 226 [4.4%] (FALLTHRU) ELSE BLOCK (226) ;; basic block 226, loop depth 4, count 0, freq 201, maybe hot ;; prev block 224, next block 227, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 309 [4.4%] (FALLTHRU) ;; 311 [3.8%] (FALLTHRU) (code_label 1917 1413 1417 226 141 (nil) [0 uses]) (note 1417 1917 1418 226 [bb 226] NOTE_INSN_BASIC_BLOCK) (insn 1418 1417 1905 226 (set (reg:SI 690 [ _1517 ]) (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ]) (const_int -3 [0xfffd]))) 95 {*addsi3_aarch64} (nil)) ;; succ: 237 [100.0%] (FALLTHRU) THEN BLOCK (227) ;; basic block 227, loop depth 4, count 0, freq 3013, maybe hot ;; prev block 226, next block 228, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 309 [95.6%] (code_label 1905 1418 1421 227 140 (nil) [1 uses]) (note 1421 1905 1422 227 [bb 227] NOTE_INSN_BASIC_BLOCK) (insn 1422 1421 1802 227 (set (reg:SI 690 [ _1517 ]) (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ]) (const_int -3 [0xfffd]))) 95 {*addsi3_aarch64} (nil)) ;; succ: 237 [100.0%] (FALLTHRU) So the interesting thing is that the THEN block and the ELSE block are as good as identical! Both compute (plus (reg 703) (const_int -3)) and both fall through to block 237. The normal if-convert machinery won't catch this because basic block 226 (the ELSE block) has multiple predecessors. But the transformation we make through find_if_case_2 ends up looking silly! (again, snipping some unrelated details/insns): TEST BLOCK (279) ;; basic block 279, loop depth 4, count 0, freq 3153, maybe hot ;; prev block 278, next block 280, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 203 [98.0%] /* Unconditional copy of THEN BLOCK. */ (insn 1422 1914 1915 279 (set (reg:SI 690 [ _1517 ]) (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ]) (const_int -3 [0xfffd]))) 95 {*addsi3_aarch64} (nil)) (insn 1915 1422 1916 279 (set (reg:CC 66 cc) (compare:CC (reg:SI 1117) (const_int 0 [0]))) "foo.cpp":59 391 {cmpsi} (expr_list:REG_DEAD (reg:SI 1117) (nil))) (jump_insn 1916 1915 1922 279 (set (pc) (if_then_else (ne (reg:CC 66 cc) (const_int 0 [0])) (label_ref:DI 1470) (pc))) "foo.cpp":59 9 {condjump} (expr_list:REG_DEAD (reg:CC 66 cc) (int_list:REG_BR_PROB 9558 (nil))) -> 1470) ;; succ: 218 [95.6%] ;; 209 [4.4%] (FALLTHRU) ELSE BLOCK (209): ;; basic block 209, loop depth 4, count 0, freq 201, maybe hot ;; prev block 208, next block 210, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 279 [4.4%] (FALLTHRU) ;; 280 [3.8%] (FALLTHRU) (code_label 1917 1413 1417 209 141 (nil) [0 uses]) (note 1417 1917 1418 209 [bb 209] NOTE_INSN_BASIC_BLOCK) (insn 1418 1417 1802 209 (set (reg:SI 690 [ _1517 ]) (plus:SI (reg:SI 703 [ ivtmp.56D.5375 ]) (const_int -3 [0xfffd]))) 95 {*addsi3_aarch64} (nil)) ;; succ: 218 [100.0%] (FALLTHRU) Note that if we are on the "else" path, we now we compute (pl
[Bug target/80530] New: [7 Regression][AArch64] ICE when expanding reciprocal square root with -mcpu=exynos-m1 or -mcpu=xgene-1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80530 Bug ID: 80530 Summary: [7 Regression][AArch64] ICE when expanding reciprocal square root with -mcpu=exynos-m1 or -mcpu=xgene-1 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- This testcase: double bar (double a) { return 1.0/__builtin_sqrt(a); } Fails with an ICE on AArch64 with the options: gcc -funsafe-math-optimizations -O1 foo.c -mcpu=xgene1 on Linux. g.c: In function ‘bar’: g.c:11:14: internal compiler error: in expand_insn, at optabs.c:7130 return 1.0/__builtin_sqrt(a); ^ 0xa70a15 expand_insn(insn_code, unsigned int, expand_operand*) .../gcc/optabs.c:7130 0x94589e expand_direct_optab_fn .../gcc/internal-fn.c:2600 0x71d4b7 expand_call_stmt .../gcc/cfgexpand.c:2569 0x71d4b7 expand_gimple_stmt_1 .../gcc/cfgexpand.c:3571 0x71d4b7 expand_gimple_stmt .../gcc/cfgexpand.c:3737 0x71ee69 expand_gimple_basic_block .../gcc/cfgexpand.c:5744 0x7247d6 execute .../gcc/cfgexpand.c:6357 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. The problem will be somewhere in the approximate square root expander, as the same ICE does not occur for -mcpu values which do not use the approximate square root expansion path.
[Bug tree-optimization/80457] vectorizable_condition does not update the vectorizer cost model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80457 --- Comment #3 from James Greenhalgh --- (In reply to Bill Schmidt from comment #2) > Per https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00967.html, James > Greenhalgh has a more comprehensive patch for this, so removing myself from > the Assignee field and will await his patch. Thanks, James! I'm out of office until June, would you mind applying the patch on my behalf (and reverting it if anything goes wrong!) in my abscence? Thanks!
[Bug tree-optimization/80457] vectorizable_condition does not update the vectorizer cost model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80457 --- Comment #7 from James Greenhalgh --- Thanks for your help!
[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778 --- Comment #7 from James Greenhalgh --- Author: jgreenhalgh Date: Fri Jun 16 17:29:56 2017 New Revision: 249272 URL: https://gcc.gnu.org/viewcvs?rev=249272&root=gcc&view=rev Log: [Patch ARM] Fix PR71778 gcc/ PR target/71778 * config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET if given a non-constant argument for an intrinsic which requires a constant. gcc/testsuite/ PR target/71778 * gcc.target/arm/pr71778.c: New. Added: trunk/gcc/testsuite/gcc.target/arm/pr71778.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm-builtins.c trunk/gcc/testsuite/ChangeLog
[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778 --- Comment #8 from James Greenhalgh --- Author: jgreenhalgh Date: Mon Jun 19 16:58:03 2017 New Revision: 249379 URL: https://gcc.gnu.org/viewcvs?rev=249379&root=gcc&view=rev Log: Backport: [Patch ARM] Fix PR71778 gcc/ PR target/71778 * config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET if given a non-constant argument for an intrinsic which requires a constant. gcc/testsuite/ PR target/71778 * gcc.target/arm/pr71778.c: New. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/pr71778.c - copied unchanged from r249272, trunk/gcc/testsuite/gcc.target/arm/pr71778.c Modified: branches/gcc-7-branch/ (props changed) branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/arm/arm-builtins.c branches/gcc-7-branch/gcc/testsuite/ChangeLog Propchange: branches/gcc-7-branch/ ('svn:mergeinfo' added)
[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778 --- Comment #9 from James Greenhalgh --- Author: jgreenhalgh Date: Mon Jun 19 17:12:12 2017 New Revision: 249380 URL: https://gcc.gnu.org/viewcvs?rev=249380&root=gcc&view=rev Log: Backport: [Patch ARM] Fix PR71778 gcc/ PR target/71778 * config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET if given a non-constant argument for an intrinsic which requires a constant. gcc/testsuite/ PR target/71778 * gcc.target/arm/pr71778.c: New. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/pr71778.c - copied unchanged from r249272, trunk/gcc/testsuite/gcc.target/arm/pr71778.c Modified: branches/gcc-6-branch/ (props changed) branches/gcc-6-branch/gcc/ (props changed) branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/arm/arm-builtins.c branches/gcc-6-branch/gcc/testsuite/ChangeLog Propchange: branches/gcc-6-branch/ ('svn:mergeinfo' modified) Propchange: branches/gcc-6-branch/gcc/ ('svn:mergeinfo' modified)
[Bug target/71778] [6/7/8 Regression][ARM] ICE using non-constant argument to Neon intrinsic that requires constant arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71778 James Greenhalgh changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from James Greenhalgh --- Fixed on all active branches.
[Bug target/63250] Complex fp16 arithmetic uses nonexistent libgcc functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63250 --- Comment #5 from James Greenhalgh --- Author: jgreenhalgh Date: Wed Nov 23 17:36:21 2016 New Revision: 242784 URL: https://gcc.gnu.org/viewcvs?rev=242784&root=gcc&view=rev Log: [Patch ARM 17/17] Enable _Float16 for ARM and fix PR target/63250 gcc/ PR target/63250 * config/arm/arm-builtins.c (arm_simd_floatHF_type_node): Rename to... (arm_fp16_type_node): ...This, make visibile. (arm_simd_builtin_std_type): Rename arm_simd_floatHF_type_node to arm_fp16_type_node. (arm_init_simd_builtin_types): Likewise. (arm_init_fp16_builtins): Likewise. * config/arm/arm.c (arm_excess_precision): New. (arm_floatn_mode): Likewise. (TARGET_C_EXCESS_PRECISION): Likewise. (TARGET_FLOATN_MODE): Likewise. (arm_promoted_type): Only promote arm_fp16_type_node. * config/arm/arm.h (arm_fp16_type_node): Declare. gcc/testsuite/ PR target/63250 * lib/target-supports.exp (add_options_for_float16): Add -mfp16-format=ieee when testign arm*-*-*. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm-builtins.c trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp
[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #2 from James Greenhalgh --- (In reply to Rainer Orth from comment #1) > James, this is caused by your patch series > > [Patch 1/17] Add a new target hook for describing excess precision intentions > > I believe. > > Rainer Thanks, and sorry for the break. Can you help me out with a configure line that would get me to a stage 1 solaris/x32 compiler so I can debug this?
[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 James Greenhalgh changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2016-11-24 Assignee|unassigned at gcc dot gnu.org |jgreenhalgh at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from James Greenhalgh --- Well, certainly this comment and assert in tree.c: /* The target should not ask for unpredictable float evaluation (though it might advertise that implicitly the evaluation is unpredictable, but we don't care about that here, it will have been reported elsewhere). If it does ask for unpredictable evaluation, we have nothing to do here. */ gcc_assert (target_flt_eval_method != FLT_EVAL_METHOD_UNPREDICTABLE); Suggest that the implementation I've put in for TARGET_C_EXCESS_PRECISION on i386 is wrong (or the assert needs to be weakened). static enum flt_eval_method ix86_excess_precision (enum excess_precision_type type) { switch (type) { case EXCESS_PRECISION_TYPE_FAST: /* The fastest type to promote to will always be the native type, whether that occurs with implicit excess precision or otherwise. */ return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; case EXCESS_PRECISION_TYPE_STANDARD: case EXCESS_PRECISION_TYPE_IMPLICIT: /* Otherwise, the excess precision we want when we are in a standards compliant mode, and the implicit precision we provide can be identical. */ if (!TARGET_80387) return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; else if (TARGET_MIX_SSE_I387) return FLT_EVAL_METHOD_UNPREDICTABLE; else if (!TARGET_SSE_MATH) return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE; else if (TARGET_SSE2) return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT; else return FLT_EVAL_METHOD_UNPREDICTABLE; default: gcc_unreachable (); } return FLT_EVAL_METHOD_UNPREDICTABLE; } I think the right fix is probably to return FLT_METHOD_PROMOTE_TO_FLOAT for EXCESS_PRECISION_TYPE_STANDARD, but I'll need to think about that. Sorry again for the break, by inspection it is obvious how you hit that assert.
[Bug middle-end/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #6 from James Greenhalgh --- None of the logic was there in the original code, so there is not much to compare. The question for the backend when TYPE is EXCESS_PRECISION_TYPE_FAST or EXCESS_PRECISION_TYPE_STANDARD is, does it wants tree.c to insert operations to guarantee explicit excess precision for the types, or does it wants tree.c to keep them as their native types. The assert exists because it makes no sense to ask the front-end to explicitly make the operations unpredictable. The fix which most closely maps to the semantics I think i386 wants is... For EXCESS_PRECISION_TYPE_FAST: Always return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT For EXCESS_PRECISION_TYPE_STANDARD: If we're in a mode which should never promote, or we're in a mode which will be implicitly unpredictable, return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT If we're in the mode which should explicitly promote to LONG_DOUBLE, do that. For EXCESS_PRECISION_TYPE_IMPLICIT: Keep the current logic. I'll write a patch along those lines, and test it as well as I can, but I don't really know how to get good -m32 testing out of my x86_64 box, which doesn't have a good multilib environment set up. If you can point me at a machine in the compile farm I can use I'd be happy to test more extensively.
[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #8 from James Greenhalgh --- (In reply to Jakub Jelinek from comment #7) > (In reply to James Greenhalgh from comment #6) > > None of the logic was there in the original code, so there is not much to > > compare. > > ?? Since -fexcess-precision=standard has been introduced, gcc has the > excess precision notion. So there is something to compare. > E.g. try > float foo (float x, float y, float z) > { > return x + y + z; > } > before your changes with > -fdump-tree-gimple -m32 -msse2 -mno-80387 -fexcess-precision=standard > -fdump-tree-gimple -m32 -msse2 -mfpmath=387+sse -fexcess-precision=standard > -fdump-tree-gimple -m32 -msse2 -mfpmath=387 -fexcess-precision=standard > -fdump-tree-gimple -m32 -msse2 -mfpmath=sse -fexcess-precision=standard > -fdump-tree-gimple -m32 -msse -mno-sse2 -mfpmath=sse > -fexcess-precision=standard > to match the different cases in your hook, and compare that to what you get > with the current trunk. Right, I think we might have been talking about comparing different things. That works for a test of observable behaviour. I've done what I suggested above, tested it as you suggested, and posted a fix to the mailing list https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02568.html
[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #9 from James Greenhalgh --- Author: jgreenhalgh Date: Fri Nov 25 09:25:31 2016 New Revision: 242866 URL: https://gcc.gnu.org/viewcvs?rev=242866&root=gcc&view=rev Log: [Patch i386] PR78509 - TARGET_C_EXCESS_PRECISION should not return "unpredictable" for EXCESS_PRECISION_TYPE_STANDARD gcc/ PR target/78509 * config/i386/i386.c (i386_excess_precision): Do not return FLT_EVAL_METHOD_UNPREDICTABLE when "type" is EXCESS_PRECISION_TYPE_STANDARD. * target.def (excess_precision): Document that targets should not return FLT_EVAL_METHOD_UNPREDICTABLE when "type" is EXCESS_PRECISION_TYPE_STANDARD or EXCESS_PRECISION_TYPE_FAST. Fix typo in first sentence. * doc/tm.texi: Regenerate. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/doc/tm.texi trunk/gcc/target.def
[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #10 from James Greenhalgh --- Should now be fixed, but I'll leave open for Rainer to confirm.
[Bug target/78509] [7 regression] ICE in in excess_precision_type, at tree.c:8875
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78509 --- Comment #12 from James Greenhalgh --- I tried looking at the generated assembly for that test with the compilers I built before my patch series, and after the patch series + the fix above. I couldn't see any difference in code generated for the testcase you mention for each of the sets of options Jakub gave above (with -m3dnow, -O2, -m32 for the testcase). If this turns out to be my fault, I'll gladly look in to it - but I'll need help getting the x86 flags right again!
[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120 James Greenhalgh changed: What|Removed |Added Status|RESOLVED|REOPENED CC||jgreenhalgh at gcc dot gnu.org Resolution|FIXED |--- --- Comment #12 from James Greenhalgh --- I can still trigger this with a testcase using 16-bit floating-point types, and the tiny memory model: int main (__fp16 x) { __fp16 a = 6.5504e4; return (x <= a); } gcc foo.c -O3 -mcmodel=tiny -g /tmp/ccwJITmo.s: Assembler messages: /tmp/ccwJITmo.s: Error: unaligned opcodes detected in executable segment In this test case, a call to force_const_mem in ira adds a new 32-bit constant in the constant pool, but ultimately doesn't use it. That means that when we sweep patterns looking for which constant pool entries to emit, we don't mark the unused pattern created by ira, and it doesn't get emitted. But, that leaves us with inconsistent information between the offset we think we've got, and what we've actually emitted. Presumably IRA isn't the only pass at fault here. Anything which eliminates a reference to a constant pool entry can cause the constant pool offset information to become stale. Maybe force_const_mem shouldn't be updating the offset information at all, and we should only update that as we make the sweep looking for live pool entries? I guess the trouble there is that we don't record the mode of the mem in the constant_descriptor_rtx - but if we were to do that it looks like we might be able to defer calculating offset until when we actually emit the pool. rs6000 might need some changes, but a better interface for their uses of get_pool_size looks like it would be "pool_empty_p" anyway. I'm not sure of this code though, so I don't know if that would make for a clean design. If you think this needs to be a separate bug, feel free to reclose this and open a new one.
[Bug rtl-optimization/78561] New: Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 Bug ID: 78561 Summary: Constant pool size (offset) can become stale where constant pool entires become unused Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Target Milestone: --- This bug report is mostly from inspection, but the effects of this issue can be seen with this testcase on AArch64 (See also PR70120 for why we need the size of the constant pool to be correct). int main (__fp16 x) { __fp16 a = 6.5504e4; return (x <= a); } gcc foo.c -O3 -mcmodel=tiny -g /tmp/ccwJITmo.s: Assembler messages: /tmp/ccwJITmo.s: Error: unaligned opcodes detected in executable segment In this test case, a call to force_const_mem in ira adds a new 32-bit constant in the constant pool, but ultimately doesn't use it. That means that when we sweep patterns looking for which constant pool entries to emit, we don't mark the unused pattern created by ira, and it doesn't get emitted. But, that leaves us with inconsistent information between the offset we think we've got, and what we've actually emitted. Presumably IRA isn't the only pass at fault here. Anything which eliminates a reference to a constant pool entry can cause the constant pool offset information to become stale, as it is only updated when inserting entries to the constant pool, not when we decide those entries are actually used. Maybe force_const_mem shouldn't be updating the offset information at all, and we should only update that as we make the sweep in mark_constant_pool looking for live pool entries? I guess the trouble there is that we don't record the mode of the mem in the constant_descriptor_rtx - but if we were to do that it looks like we might be able to defer calculating offset until when we actually emit the pool. rs6000 might need some changes, but a better interface for their uses of get_pool_size looks like it would be "pool_empty_p" anyway. I'm not sure of this code though, so I don't know if that would make for a clean design.
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 James Greenhalgh changed: What|Removed |Added Target||aarch64*-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2016-11-28 Ever confirmed|0 |1
[Bug target/70120] [6 Regression][aarch64] -g causes Assembler messages: Error: unaligned opcodes detected in executable segment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70120 James Greenhalgh changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #14 from James Greenhalgh --- > I do think a new bug should be opened. OK. PR78561 .
[Bug rtl-optimization/78547] [7 Regression] ICE: in loc_cmp, at var-tracking.c:3417 with -Os -g -mstringop-strategy=libcall -freorder-blocks-algorithm=simple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78547 James Greenhalgh changed: What|Removed |Added CC||hjl at gcc dot gnu.org, ||ienkovich at gcc dot gnu.org --- Comment #2 from James Greenhalgh --- I'd be surprised if r238594 was the root cause, but it may have exposed something latent. This revision changed the cost model for if conversion, effectively disabling it in this testcase. You can emulate turning off if-conversion for the testcase with -fno-if-conversion on the command line. Adding that, you can continue the bisect further back until r237647 which looks more probable given the testcase body. I'm now compiling with: -Os -g -mstringop-strategy=libcall -freorder-blocks-algorithm=simple -fdump-rtl-all-all -fno-if-conversion Author: hjl Date: Tue Jun 21 14:24:31 2016 + Convert V1TImode register to TImode in debug insn TImode register referenced in debug insn can be converted to V1TImode by scalar to vector optimization. After converting a TImode register to V1TImode, we need to check all debug insns on its use chain to convert the V1TImode register to SUBREG TImode. gcc/ 2016-06-21 H.J. Lu Ilya Enkovich PR target/71549 * config/i386/i386.c (timode_scalar_chain::fix_debug_reg_uses): New member function to convert V1TImode register to SUBREG TImode in debug insn. (timode_scalar_chain::convert_insn): Call fix_debug_reg_uses after changing register mode to V1TImode. gcc/testsuite/ 2016-06-21 H.J. Lu PR target/71549 * gcc.target/i386/pr71549.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237647
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 James Greenhalgh changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jgreenhalgh at gcc dot gnu.org --- Comment #1 from James Greenhalgh --- Well, confirmed - and an easy fix is to recompute the offset data while sweeping for valid constants at the end of compilation.
[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445 James Greenhalgh changed: What|Removed |Added Last reconfirmed|2016-09-03 00:00:00 |2016-11-30 CC||law at gcc dot gnu.org --- Comment #5 from James Greenhalgh --- I posted this on list a few weeks back: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01454.html The early threader is running with speed_p set to false (second parameter to find_jump_threads_backwards) unsigned int pass_early_thread_jumps::execute (function *fun) { /* Try to thread each block with more than one successor. */ basic_block bb; FOR_EACH_BB_FN (bb, fun) { if (EDGE_COUNT (bb->succs) > 1) find_jump_threads_backwards (bb, false); } thread_through_all_blocks (true); return 0; } So even though profile information is ignored, we think we are compiling for size and won't thread. The relevant check in profitable_jump_thread_path is: if (speed_p && optimize_edge_for_speed_p (taken_edge)) { } else if (n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "FSM jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); path->pop (); return NULL; } Changing false to true (or even to optimize_bb_for_size_p ) in the above hunk looks like it would enable some of the threading we're relying on here.
[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445 --- Comment #7 from James Greenhalgh --- Right, I've trimmed too much context from my message. This performance regression starts with r239219 which adds a cost model to the threader which relies on frequency information (arguably this is a bad cost model for threading, as at a switch statement you might expect multiple cold edges, and still want to thread the switch, but that's a separate discussion). The threader does a bad job of updating frequency information when it creates new paths, with the effect that the edges we'd want to thread in this test case appear to be cold. The new cost model from r239219 sees the cold edges, and rejects the threading opportunity. The message I was replying to above had said: > Hmm, this is interesting. The patch should have "fixed" the previous > degradation by making the profile correct (backward threader still doe not > update it, but because most threading now happens early and profile is built > afterwards this should be less of issue). I am now looking into the profile > update issues and will try to check why coremarks degrade again. The answer to which is that the early-threader has hard-coded that it is compiling for size, which causes most backward threading to be rejected, so wouldn't fix this issue. However, if we were to use optimize_bb_for_size_p in pass_early_thread_jumps::execute rather than just passing false then the early threader would have resolved this issue (as the profile information is not used to decide if the edge should be optimised for speed).
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #2 from James Greenhalgh --- Author: jgreenhalgh Date: Fri Dec 2 14:29:35 2016 New Revision: 243182 URL: https://gcc.gnu.org/viewcvs?rev=243182&root=gcc&view=rev Log: [Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound gcc/ PR rtl-optimization/78561 * config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename get_pool_size to get_pool_size_upper_bound. (rs6000_stack_info): Likewise. (rs6000_emit_prologue): Likewise. (rs6000_elf_declare_function_name): Likewise. (rs6000_set_up_by_prologue): Likewise. (rs6000_can_eliminate): Likewise, reformat spaces to tabs. * output.h (get_pool_size): Rename to... (get_pool_size_upper_bound): ...This. * varasm.c (get_pool_size): Rename to... (get_pool_size_upper_bound): ...This. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/output.h trunk/gcc/varasm.c
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #3 from James Greenhalgh --- Author: jgreenhalgh Date: Fri Dec 2 14:31:10 2016 New Revision: 243183 URL: https://gcc.gnu.org/viewcvs?rev=243183&root=gcc&view=rev Log: [Patch 2/2 PR78561] Recalculate constant pool size before emitting it gcc/ PR rtl-optimization/78561 * varasm.c (recompute_pool_offsets): New. (output_constant_pool): Call it. gcc/testsuite/ PR rtl-optimization/78561 * gcc.target/aarch64/pr78561.c: New. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/varasm.c
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #6 from James Greenhalgh --- Author: jgreenhalgh Date: Mon Dec 5 09:35:28 2016 New Revision: 243239 URL: https://gcc.gnu.org/viewcvs?rev=243239&root=gcc&view=rev Log: [Patch 2/2 PR78561] Recalculate constant pool size before emitting it gcc/testsuite/ PR rtl-optimization/78561 * gcc.target/aarch64/pr78561.c: Add missing testcase from r243183. Added: trunk/gcc/testsuite/gcc.target/aarch64/pr78561.c Modified: trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #7 from James Greenhalgh --- (In reply to Segher Boessenkool from comment #5) > Oh btw, you forgot to commit the testcase in 2/2. Thanks, that's the easy one to fix. Would you be able to help me with a configure line I can use for a PowerPC bootstrap on one of the compile farm machines so I can debug the issue I've introduced?
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #9 from James Greenhalgh --- (In reply to Segher Boessenkool from comment #8) > I usually use --disable-libgomp, but otherwise everything default (well, > --enable-languages=all,ada,go,obj-c++). I need a bit more hand holding on this one - is there a compile farm machine set up that if I log in and run your configure line I'll be able to get a 32-bit PowerPC ADA bootstrap going? (I tried gcc112, but that doesn't have GNAT installed).
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #11 from James Greenhalgh --- My bootstrap at r243245 on gcc110 seemed to work fine. [jgreenhalgh@gcc1-power7 gcc]$ ../build-gcc/gcc/xgcc -v Using built-in specs. COLLECT_GCC=../build-gcc/gcc/xgcc Target: powerpc64-unknown-linux-gnu Configured with: ../gcc/configure --enable-languages=all,ada,go,obj-c++ Thread model: posix gcc version 7.0.0 20161205 (experimental) (GCC) The file you mentioned (gcc/ada/rts_32/a-chahan.adb) seemed to have been compiled with no issues. Am I missing something to get the 32-bit multilib buiild, or maybe I need to target it explicitly?
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #13 from James Greenhalgh --- (In reply to Segher Boessenkool from comment #12) > It still happens here, also on gcc110. Note you need --disable-werror, > to avoid another bootstrap error. > > Did you perchance use --disable-bootstrap? I didn't need disable-werror either, which makes me think I'm building a completely different toolchain to you. Maybe I'm missing something very obvious? All I'm doing is cloning from the git mirror, checking out the revisions we've discussed here and on IRC, creating a new folder out of tree, running configure, then running make -j41. ssh gcc110.fsffrance.org git clone git://gcc.gnu.org/git/gcc.git cd gcc git checkout mkdir ../build-gcc cd ../build-gcc ../gcc/configure --enable-languages=all,ada,go,obj-c++ make -j41 >& build.log And that works for me. If I'm missing a step or an environment variable, I'm happy to try again.
[Bug rtl-optimization/78561] Constant pool size (offset) can become stale where constant pool entires become unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78561 --- Comment #15 from James Greenhalgh --- (In reply to Segher Boessenkool from comment #14) > I used trunk. --disable-bootstrap fails the same, just much faster ;-) > > Maybe the binutils etc. version matters? Do you have a "modern" GCC on path? I'll just be bootstrapping with the system compiler for stage 1, so might be missing newer warnings?