[Bug target/94145] New: Longcalls mis-optimize loading the function address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145 Bug ID: 94145 Summary: Longcalls mis-optimize loading the function address Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- I'm working on a feature where we convert some/all built-in function calls to use the longcall sequence. I discovered that the compiler is mis-optimizing loading up the function address. This showed up in the Spec 2017 wrf_r benchmark where I replaced some 60,000 direct calls to longcalls. In particular, the PowerPC backend is not marking the load of the function address as being volatile. This allows the compiler to move the load out of a loop. However with the current ELF semantics, you don't want to do this because the function address changes. The first call to the function, the address is the PLT stub, but in subsequent calls it is the address of the function itself after the shared library is loaded. In addition, because UNSPECs are used, the compiler is likely to store the function address in the stack and reload it. Given that the UNSPEC is just a load, it would be better not to optimize this to doing the extra load/store. In fixing the linker bug that this feature uncovered, Alan Modra has a simple patch to fix it.
[Bug target/94145] Longcalls mis-optimize loading the function address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145 --- Comment #1 from Michael Meissner --- Created attachment 48021 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48021&action=edit Example code Compile with -mcpu=future -mpcrel -O3 to see the load of the address being moved out of the loop.
[Bug target/81594] Optimize PowerPC vector set and store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81594 Michael Meissner changed: What|Removed |Added Attachment #41854|0 |1 is obsolete|| --- Comment #4 from Michael Meissner --- Created attachment 48057 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48057&action=edit Update proposed patch to fix the problem
[Bug target/93937] Variable vector extract & zero extend insn can never match
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93937 Michael Meissner changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #2 from Michael Meissner --- Fixed on Feb. 28, 2020
[Bug target/81594] Optimize PowerPC vector set and store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81594 --- Comment #6 from Michael Meissner --- If you look at the original patch, it did try to do this optimization. When I looked at it some time later, the combiner no longer generated the sequence because it thought it was slower (due to length, etc.). You could spend a lot of time tuning the code so eventually the combiner will generate it again, but it was simpler to just put the peephole in to catch the cases that show up. If you want to take on the bug and do it earlier, go ahead. A peephole2 might not catch all uses, but it prevents whack-a-mole, where a change causes other code generation changes down the pike. Note, the original patch was written in the power8 time frame, and it would need to be adjust to power9 and future systems now (i.e. the patch only does the splitting if the value is a FPR or GPR, while in power9 it could be a traditional Altivec register). However, the splitter uses reload_completed that you always seem to object to. It could be done before register allocation, but then you would need to make sure that no other pass recombines the two separate items back into a vector once again.
[Bug target/94451] New: April 1st 2020 GCC does not compile spec 2017 gcc_r benchmark with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94451 Bug ID: 94451 Summary: April 1st 2020 GCC does not compile spec 2017 gcc_r benchmark with -O3 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- Created attachment 48166 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48166&action=edit decimal64.i file that shows the bug. I was building Spec 2017 with the current master compiler branch, and it failed in 3 benchmarks. I looked at the failure of the gcc_r benchmark, and I discovered that the decimal64.c function gets a compiler error when I build a compiler with default checks enabled. I narrowed it down so that it fails with -O2 -fsplit-loops -ftree-vectorize and -fgnu89-inline (the -fgnu89-inline is not needed for the failure, but it is generally needed to compile Spec 2017). -perch-> /opt/at13.0/bin/gdb cc1 GNU gdb (GDB) 8.3.1.20191211-git Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from cc1... Breakpoint 1 at 0x101dba40: file /home/meissner/fsf-src/trunk/gcc/diagnostic.c, line 1777. Breakpoint 2 at 0x1193e368: file /home/meissner/fsf-src/trunk/gcc/diagnostic.c, line 1706. Breakpoint 3 at 0x11a15fe8 Breakpoint 4 at 0x11a15fc4 File tree.h will be skipped when stepping. File is-a.h will be skipped when stepping. File line-map.h will be skipped when stepping. File timevar.h will be skipped when stepping. Function rtx_expr_list::next will be skipped when stepping. Function rtx_expr_list::element will be skipped when stepping. Function rtx_insn_list::next will be skipped when stepping. Function rtx_insn_list::insn will be skipped when stepping. Function rtx_sequence::len will be skipped when stepping. Function rtx_sequence::element will be skipped when stepping. Function rtx_sequence::insn will be skipped when stepping. Function INSN_UID will be skipped when stepping. Function PREV_INSN will be skipped when stepping. Function SET_PREV_INSN will be skipped when stepping. Function NEXT_INSN will be skipped when stepping. Function SET_NEXT_INSN will be skipped when stepping. Function BLOCK_FOR_INSN will be skipped when stepping. Function PATTERN will be skipped when stepping. Function INSN_LOCATION will be skipped when stepping. Function INSN_HAS_LOCATION will be skipped when stepping. Function JUMP_LABEL_AS_INSN will be skipped when stepping. Successfully loaded GDB hooks for GCC (gdb) r -O2 -fsplit-loops -ftree-vectorize -fgnu89-inline -quiet foo-decimal64.i Starting program: /home/meissner/fsf-build-ppc64le/trunk/gcc/cc1 -O2 -fsplit-loops -ftree-vectorize -fgnu89-inline -quiet foo-decimal64.i decimal64.c: In function ‘decDigitsToDPD’: decimal64.c:662:6: error: missing definition for SSA_NAME: _292 in statement: target_205 = _292; Breakpoint 2, internal_error (gmsgid=0x11acd0d0 "verify_ssa failed") at /home/meissner/fsf-src/trunk/gcc/diagnostic.c:1787 1787 global_dc->diagnostic_group_nesting_depth++; (gdb) where #0 internal_error (gmsgid=0x11acd0d0 "verify_ssa failed") at /home/meissner/fsf-src/trunk/gcc/diagnostic.c:1787 #1 0x10e2efac in verify_ssa (check_modified_stmt=, check_ssa_operands=) at /home/meissner/fsf-src/trunk/gcc/tree-ssa.c:1208 #2 0x109b6ea0 in execute_function_todo (fn=0x75a41550, data=) at /home/meissner/fsf-src/trunk/gcc/passes.c:1992 #3 0x109b80d4 in do_per_function (callback=, data=) at /home/meissner/fsf-src/trunk/gcc/passes.c:1640 #4 0x109b82fc in execute_todo (flags=) at /home/meissner/fsf-src/trunk/gcc/passes.c:2039 #5 0x109bbcc4 in execute_one_pass (pass=pass@entry=) at /home/meissner/fsf-src/trunk/gcc/passes.c:2539 #6 0x109bca64 in execute_pass_list_1 (pass=) at /home/meissner/fsf-src/trunk/gcc/passes.c:2590 #7 0x109bca7c in execute_pass_list_1 (pass=) at /home/meissner/fsf-src/trunk/gcc/passes.c:2591 #8 0x109bca7c in execute_pass_list_1 (pass=) at /home/meissner/fsf-src/trunk/gcc/passes.c:2591 #9 0x109bcb08 in execute_pass_list (fn=
[Bug target/94451] April 1st 2020 GCC does not compile spec 2017 gcc_r benchmark with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94451 Michael Meissner changed: What|Removed |Added CC||amodra at gcc dot gnu.org, ||bergner at gcc dot gnu.org, ||dje at gcc dot gnu.org, ||meissner at gcc dot gnu.org, ||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Severity|normal |critical Host||powerpc64le-gnu-linux Build||powerpc64le-gnu-linux Target||powerpc64le-gnu-linux Priority|P3 |P2 --- Comment #1 from Michael Meissner --- I built the compiler on Ubuntu 18.04 on a little endian power9 system using --with-cpu=power9. I used the Advance Toolchain AT13 compiler to build the compiler. I did not bootstrap the compiler.
[Bug target/94557] [9 regression] r9-8486 causes several builtin instruction test case execution failures on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94557 --- Comment #1 from Michael Meissner --- Created attachment 48263 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48263&action=edit Proposed patch to fix the problem. This patch backports a necessary fix from the trunk to fix the problem.
[Bug target/94557] [9 regression] r9-8486 causes several builtin instruction test case execution failures on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94557 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2020-04-13 --- Comment #2 from Michael Meissner --- The issue is that with the backport patch for PR target/93932, GCC is more likely to optimize variable extracts from a vector that is in memory to be a simple load, instead of loading the vector into a vector register, and doing a vector extract on power9. The test cases rely on having indexes outside of the range of valid indexes. If the vector was loaded into a register, we would automatically mask the index as part of the extract. However, if we converted the operation to a single load, we did not do the masking, and the load would load some random value outside of the vector boundary. The trunk had previously had other changes that did this masking as part of the changes for -mcpu=future and PC-relative support. The proposed patch just makes sure the index is properly masked before it is used.
[Bug target/94557] [9 regression] r9-8486 causes several builtin instruction test case execution failures on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94557 --- Comment #3 from Michael Meissner --- Just to be clear, this bug are only bugs in the GCC 9 branch, and it came about due to the back port of the patch for PR target/93932 to the GCC 9 branch. The master branch generates correct code. So, I'm not sure this warrants being a P1 blocker for the GCC 10 release.
[Bug target/94557] [9 regression] r9-8486 causes several builtin instruction test case execution failures on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94557 Michael Meissner changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Michael Meissner --- Fixed by a change to GCC 9 on April 16th, 2020.
[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 Michael Meissner changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #8 from Michael Meissner --- With the committal of the fix to PR target/94557 (fix regression caused on the GCC 9 branch by PR target/93932 patch), this patch now can be closed.
[Bug target/94630] New: General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 Bug ID: 94630 Summary: General bug for changes needed to switch the PowerPC long double default Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- This is a bug to hold patches and observations about the changes needed to switch the compiler default with a configuration switch in the GCC 11 time frame (with a backport to GCC 10.2).
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 Michael Meissner changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Version|10.0|unknown Ever confirmed|0 |1 Priority|P3 |P4 Severity|normal |enhancement Last reconfirmed||2020-04-17
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 --- Comment #1 from Michael Meissner --- Created attachment 48296 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48296&action=edit Patch do the correct mapping for builtin math functions right when long double default is IEEE.
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 --- Comment #2 from Michael Meissner --- When the default is changed, we will need to map __builtin_sprintf and company just like GLIBC will do it if the user includes stdio.h. Otherwise the gcc.dg/tree-ssa/builtin-sprintf.c test fails because it calls the wrong sprintf for long double arguments.
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 --- Comment #3 from Michael Meissner --- Created attachment 48297 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48297&action=edit Patch to mangle *printf and *scanf built-ins if long double is IEEE-128
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 --- Comment #5 from Michael Meissner --- Note, at the moment, the patches are to make the existing configure switch (--with-long-double=ieee) work correctly. However, we need all of the pieces in place (gcc, glibc, libstdc++, etc.) before we can contemplate changing the ABI.
[Bug middle-end/91512] [10 Regression] Fortran compile time regression.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512 --- Comment #31 from Michael Meissner --- For the Spec 2017 521.wrf_r benchmark on little endian PowerPC power9 systems, there was no difference in runtime between a normal run using -Ofast -mcpu=power9 and one with -Ofast -mcpu=power9 -fno-inline-arg-packing. Of the seven rate benchmarks in Spec 2017 that use Fortran (548.exchange2_r, 503.bwaves_r, 507.cactuBSSN_r, 521.wrf_r, 527.cam4_r, 549.fotonik3d_r, and 554.roms_r) none of them vary by more tha 0.7% depending on whether the switch is used or not. I used the compiler checked out from the master branch on March 27, 2020 to build and run the benchmarks. As others have said, using -fno-inline-arg-packing does dramatically reduce the time it takes to compile 521.wrf_r.
[Bug target/94630] General bug for changes needed to switch the PowerPC long double default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94630 --- Comment #7 from Michael Meissner --- Created attachment 48364 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48364&action=edit Propsed patch to build ibm-ldouble.c with -mno-gnu-attributes ibm-ldouble.c in libgcc must be compiled without GNU attributes, so that the __ibm128 functions can be called if long double is IEEE 128-bit.
[Bug target/92218] New: PowerPC indexed insn attribute misses some insns (bswap, atomic, small int float memory)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92218 Bug ID: 92218 Summary: PowerPC indexed insn attribute misses some insns (bswap, atomic, small int float memory) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- In working on the PowerPC 'future' processor, I was using the 'indexed' insn attribute to know when a certain insn used indexed addressing instead of offset addressing. However it fails in one crucial case. If the address is a single register (i.e. indirect addressing) and the insn form requires indexed addressing, the indexed_address_mem predicate function will fail. Off the top of my head, the places where this happens is: 1) Load/store of 8/16/32-bit integers to/from vector/FPR registers; 2) Byte swap to/from memory; or 3) Atomic memory operations. The simplest approach is to go into each of the problematical insns, and explicitly set 'indexed' to 'yes' for the alternatives that require indexed addressing.
[Bug target/92218] PowerPC indexed insn attribute misses some insns (bswap, atomic, small int float/vector load/store)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92218 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2019-10-25 Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org Summary|PowerPC indexed insn|PowerPC indexed insn |attribute misses some insns |attribute misses some insns |(bswap, atomic, small int |(bswap, atomic, small int |float memory) |float/vector load/store) Ever confirmed|0 |1
[Bug target/92218] PowerPC indexed insn attribute misses some insns (bswap, atomic, small int float/vector load/store)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92218 --- Comment #1 from Michael Meissner --- The VSX instructions load scalar from memory and splat into the register are another class of x-form only memory instructions that would need the indexed insn attribute set.
[Bug target/93011] New: PowerPC GCC has warning that aggregate alignment changed in GCC 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93011 Bug ID: 93011 Summary: PowerPC GCC has warning that aggregate alignment changed in GCC 5 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- I've been doing Spec 2017 builds, and I notice that some of the benchmarks get notes of the form: note: the layout of aggregates containing vectors with 8-byte alignment has changed in GCC 5 (this was from coverage.c:268 in the gcc_r benchmark). When GCC 10 comes out, it will be 5 releases since the change was made. I doubt many people are just now porting code from back then. Perhaps it is time to retire the message.
[Bug target/93230] New: PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 Bug ID: 93230 Summary: PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- Created attachment 47634 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47634&action=edit Example code In working on some bugs and extensions for -mcpu=future, I noticed that the code for vec_extract is not optimal when you are extracting an 8/16/32-bit integer from a vector in memory. In this case, we convert the vec_extract to be a load of the scalar value, but we don't have the proper combine insns to fold the sign extend or zero extend into the load, which means we have to issue a separate conversion instruction. For example, consider: #include unsigned long v8hi_uns_1 (vector unsigned short *p) { return (unsigned long) vec_extract (*p, 1); } long v8hi_sign_1 (vector unsigned short *p) { return (long) vec_extract (*p, 1); } It generates: v8hi_uns_1: lhz 3,2(3) rlwinm 3,3,0,0x blr v8hi_sign_1: lhz 3,2(3) extsh 3,3 blr It should generate: v8hi_uns_1: lhz 3,2(3) blr v8hi_sign_1: lhz 3,2(3) blr
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-01-10 Ever confirmed|0 |1
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #1 from Michael Meissner --- Created attachment 47635 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47635&action=edit Example assembler generated for -mcpu=power9
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 Michael Meissner changed: What|Removed |Added Severity|normal |enhancement
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #2 from Michael Meissner --- There is this code in rs6000.md that thinks it is combining the conversion with the load, but the insn is using the wrong types: ;; Optimize extracting a single scalar element from memory. (define_insn_and_split "*vsx_extract__load" [(set (match_operand: 0 "register_operand" "=r") (vec_select: (match_operand:VSX_EXTRACT_I 1 "memory_operand" "m") (parallel [(match_operand:QI 2 "" "n")]))) (clobber (match_scratch:DI 3 "=&b"))] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 4))] { operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], operands[3], mode); } [(set_attr "type" "load") (set_attr "length" "8")]) In addition, the code should also handle sign extension, and loading up the value into a vector register.
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 Michael Meissner changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug target/93230] PowerPC GCC vec_extract of a vector in memory does not fold sign/zero extension into load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93230 --- Comment #3 from Michael Meissner --- Also this code if the element number is variable: ;; Optimize extracting a single scalar element from memory. (define_insn_and_split "*vsx_extract__load" [(set (match_operand: 0 "register_operand" "=r") (vec_select: (match_operand:VSX_EXTRACT_I 1 "memory_operand" "m") (parallel [(match_operand:QI 2 "" "n")]))) (clobber (match_scratch:DI 3 "=&b"))] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 4))] { operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2], operands[3], mode); } [(set_attr "type" "load") (set_attr "length" "8")])
[Bug target/93568] [10 regression] r10-6418 causes many ICEs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93568 Michael Meissner changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org
[Bug target/93568] [10 regression] r10-6418 causes many ICEs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93568 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Michael Meissner --- Fixed.
[Bug target/93569] [10 regression] r10-6419 causes ICE in gcc.target/powerpc/vsx-builtin-15d.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93569 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-02-05 Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/93569] [10 regression] r10-6419 causes ICE in gcc.target/powerpc/vsx-builtin-15d.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93569 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Michael Meissner --- Fixed on February 6th, 2020. commit r10-6494-ga66219dce7fcba068a0998dd926e2ffc6857f149
[Bug target/81594] Optimize PowerPC vector set and store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81594 --- Comment #3 from Michael Meissner --- I looked at this a little. The proposed patch doesn't generate the expected code any more (due to setting the length attribute, which makes it look like the fix generates slower code). I re-implemented it as a peephole2 for ISA 2.07 (power9) and above. The peephole2 does find several places in the 2017 Spec INT benchmarks, where it replaces: MTVSRDD XXPERMDI STV with: STD STD
[Bug target/93932] New: PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 Bug ID: 93932 Summary: PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- I've been looking at vec_extract recently, both in terms of support for the -mcpu=future and to look at supporting PR target/93230 in GCC 11. In some cases, if you have a vec_extract built-in function where the vector is in a register, and the element is variable, the compiler decides store this vector to memory, and then do the variable extract using a scalar load. Unfortunately, this lead to a STORE-HIT-LOAD slowdown, as the scalar load will likely have to wait for the vector store to finish. The test cases are the fold-vect-extract-.p{7,8,9}.c} files in the gcc.target/powerpc directory, where is 'char', 'short', 'int', 'longlong', 'float' and 'double', and the p7/p8/p9 indicates whether the test is for -mcpu=power7, -mcpu=power8, or -mcpu=power9. For -mcpu=power8, the regressions are: fold-vect-extract-double.p8.c: GCC 9.x and current trunk fold-vect-extract-longlong.p8.c: GCC 9.x and current trunk For -mcpu=power9, the regressions are: fold-vect-extract-double.p9.c: GCC 9.x (current trunk is ok) fold-vect-extract-longlong.p9.c: GCC 9.x (current trunk is ok)
[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-02-25 CC||dje at gcc dot gnu.org, ||meissner at gcc dot gnu.org, ||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 Michael Meissner changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org --- Comment #1 from Michael Meissner --- I've discovered that the issue is the combined insn that does variable extract where it handles both the register case and the memory case: (define_insn_and_split "vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r") (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" [(const_int 0)] { rs6000_split_vec_extract_var (operands[0], operands[1], operands[2], operands[3], operands[4]); DONE; }) If I split the insn into two separate patterns, one that handles only the register, and the other that only handles memory accesses. This way the compiler doesn't create the store and does the variable extract in the register.
[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 --- Comment #3 from Michael Meissner --- While I agree that in general, we should only use input_operand for moves and define_expands, I tend to think in the short term (GCC 10) we should just fix the case we know about. As you point out, this is used in every single place where we fold sign/zero/float extension into a load. In looking at gcc 8 and gcc 9, the variable extract patterns are mostly the same, except gcc 8 uses 'ww', etc. constraints, while gcc 9/10 uses the 'isa' attribute to eliminate the cases using power9 instructions on power8. I don't know why only V2DI/V2DF shows it up, when V4SF/V4SI/V8HI/V16QI use the same construct, and why -mcpu=power9 compiles it ok on trunk, but not gcc 9.
[Bug target/93937] New: Variable vector extract & zero extend insn can never match
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93937 Bug ID: 93937 Summary: Variable vector extract & zero extend insn can never match Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- In looking at the variable vector extract code, the insns that attempt to merge a zero extend with a variable extract of a vector element will never match: (define_insn_and_split "*vsx_extract__mode_var" [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r") (zero_extend: (unspec: [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) (clobber (match_scratch:V2DI 4 "=X,&v,X"))] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" [(const_int 0)] { machine_mode smode = mode; rs6000_split_vec_extract_var (gen_rtx_REG (smode, REGNO (operands[0])), operands[1], operands[2], operands[3], operands[4]); DONE; } [(set_attr "isa" "p9v,*,*")]) It will never match, because the compiler will never generate code of the form: (set (reg:SI) (zero_extend:SI (unspec:SI [(reg:V4SI) (reg:DI)] UNSPEC_VSX_EXTRACT))) I.e. the zero_extend type should be DImode. Obviously the issue with PR target/93932 (using input_operand) will also apply to this insn, once the modes are fixed.
[Bug target/93937] Variable vector extract & zero extend insn can never match
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93937 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-02-26 Assignee|unassigned at gcc dot gnu.org |meissner at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/56043] ICE in rs6000_builtin_vectorized_libmass for vsx-mass-1.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56043 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2013-02-07 AssignedTo|unassigned at gcc dot |meissner at gcc dot gnu.org |gnu.org | Ever Confirmed|0 |1
[Bug target/56043] ICE in rs6000_builtin_vectorized_libmass for vsx-mass-1.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56043 --- Comment #1 from Michael Meissner 2013-02-07 20:27:19 UTC --- Created attachment 29390 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29390 Patch to fix the problem There are two problems here. The first problem is the segmentation fault if the builtin function does not have an implicit function. The patch adds code to return NULL_TREE in this case, rather than cause a segmentation violation due to a NULL pointer. However, in the case of powerpc-none-eabi, the vsx-mass-1.c test would still fail, since some of the builtin functions are not treated as builtin (such as atan2, which is what caused the fault). Since the MASS library is only available for powerpc Linux, I have restricted the test to only run on powerpc*-*-linux*.
[Bug debug/55586] Incorrect .debug_line section for function with variable number of arguments in PowerPC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55586 --- Comment #2 from Michael Meissner 2013-02-07 23:49:34 UTC --- As far as I can tell, it is a bug in earlier versions of GDB, and not in the compiler. Due to the ABI's, it will only show up in 32-bit powerpc with an older GDB. The 64-bit powerpc has a completely different ABI, and for stdarg functions, it does not pass the values in floating point registers, and it doesn't use CR6 to indicate that the floating point values were passed. So there isn't a jump, etc. I tested GCC 4.8, 4.7 and found that they essentially generated the same code for the debugging information. On my SLES 10 system, I even used the system compiler which is 4.1.2 based, and it generated the same debug code. If I used a GDB that was 7.3 or newer (SLES 11 SP2, IBM Advance Toolchain 5.0, etc.) and put a breakpoint on the my_function, the debugger puts the breakpoint on the STWU instruction, and it hits the breakpoint. If I use the system debugger on SLES 10 which is version 7.1, the debugger skips the function start, and puts the breakpoint on the first STFD instruction as you mention, and it won't hit the breakpoint unless you pass floating point values in the floating point registers. Here is the assembler output from one of the compilers for -O -m32. Note, there is a .loc before the first instruction at line 9 (the beginning of the function). my_function: .LFB12: .file 1 "bug-55586.c" .loc 1 9 0 .LVL0: stwu 1,-128(1) .LCFI0: mflr 0 .LCFI1: stw 31,124(1) .LCFI2: stw 0,132(1) .LCFI3: stw 4,28(1) stw 5,32(1) stw 6,36(1) stw 7,40(1) stw 8,44(1) stw 9,48(1) stw 10,52(1) bne 1,.L2 .loc 1 9 0 stfd 1,56(1) stfd 2,64(1) stfd 3,72(1) stfd 4,80(1) stfd 5,88(1) stfd 6,96(1) stfd 7,104(1) stfd 8,112(1) .L2:
[Bug target/56043] ICE in rs6000_builtin_vectorized_libmass for vsx-mass-1.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56043 --- Comment #2 from Michael Meissner 2013-02-08 19:36:12 UTC --- Author: meissner Date: Fri Feb 8 19:36:04 2013 New Revision: 195898 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195898 Log: [gcc] 2013-02-07 Michael Meissner PR target/56043 * config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): If there is no implicit builtin declaration, just return NULL. [gcc/testsuite] 2013-02-07 Michael Meissner PR target/56043 * gcc.target/powerpc/vsx-mass-1.c: Only run this test on powerpc*-*-linux*. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
[Bug target/56043] ICE in rs6000_builtin_vectorized_libmass for vsx-mass-1.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56043 --- Comment #3 from Michael Meissner 2013-02-08 19:47:07 UTC --- Author: meissner Date: Fri Feb 8 19:46:52 2013 New Revision: 195899 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195899 Log: [gcc] 2013-02-08 Michael Meissner PR target/56043 * config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): If there is no implicit builtin declaration, just return NULL. [gcc/testsuite] 2013-02-08 Michael Meissner PR target/56043 * gcc.target/powerpc/vsx-mass-1.c: Only run this test on powerpc*-*-linux*. Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_7-branch/gcc/testsuite/ChangeLog branches/gcc-4_7-branch/gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
[Bug target/56043] ICE in rs6000_builtin_vectorized_libmass for vsx-mass-1.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56043 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #4 from Michael Meissner 2013-02-08 19:50:26 UTC --- Fixed in the mainline with subversion id 195898. Fixed in the 4.7 branch with subversion id 195899.
[Bug target/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 --- Comment #5 from Michael Meissner 2013-02-12 18:13:45 UTC --- Created attachment 29426 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29426 Assembly file of slp-perm-1.c after lto with -mcpu=power6 -O3 -maltivec
[Bug target/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 --- Comment #6 from Michael Meissner 2013-02-12 18:16:28 UTC --- Created attachment 29427 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29427 slp-perm-1.c assembly file before LTO is run with -mcpu=power6 -O3 -maltivec
[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 Michael Meissner changed: What|Removed |Added Component|target |lto --- Comment #7 from Michael Meissner 2013-02-12 18:25:38 UTC --- I am switching this to LTO instead of target, as it appears to be an LTO bug. Before LTO is run, the alignment of the .rodata section is 16 byte alignment since the array used to initialize the auto array is copied with altivec instructions. After LTO, the alignment of the .rodata section is 4 bytes. The powerpc Altivec instructions ignore the bottom 4 bits of the address, and so depending on what else is linked, the test will randomly fail or succeed. I added attachments from compiling slp-perm-1.c with -O3 -mcpu=power6 -maltivec -save-temps to give the asm files.
[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 --- Comment #9 from Michael Meissner 2013-02-12 19:07:18 UTC --- The -fsection-anchors option appears to be important. If I use -fsection-anchors (which is default for powerpc64-linux), LTO does not align the .rodata section, but uses Altivec memory instructions. If I use -fno-section-anchors, the .rodata section is not aligned, but it doesn't use Altivec memory instructions, so the test passes.
[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 --- Comment #10 from Michael Meissner 2013-02-12 19:16:56 UTC --- If -fno-merge-constants (and the default -fsection-anchors) is used, then the correct alignment for the table is set (and Altivec memory instructions are used). At a guess, it is likely be in the gimplify_init_constructor function in gimplify.c.
[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494 --- Comment #15 from Michael Meissner 2013-02-13 22:38:12 UTC --- The patch does align the .rodata section to 16 byte alignment, but the code to load up the auto vector from constant memory does not do vectorization. If I use -fno-section-anchors, it aligns .rodata to 4 byte alignment, and does not vectorize the code. If I use -fno-merge-constants, it aligns .rodata to 16 byte alignment, and does vectorize the code. If I use -fno-merge-constants without -flto, it aligns .rodata to 16 byte alignment, but it uses unaligned vector loads/stores. So the patch does help in that the tests now pass that were randomly failing. While it would be nice if we could get the initialization to be vectorized, I'm not how performance critical this is. Eric: if the alignment of the constant data that is used to initialize the auto array is a mismatch, and you use Altivec instructions, when the compiler auto-vectorizes the copy, the wrong data gets used.
[Bug target/57150] New: GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 Bug #: 57150 Summary: GCC when targeting power7 spills long double using VSX instructions. Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Created attachment 30008 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30008 Cut down example to show the problem, using -mcpu=power7 -m64 In the glibc file e_scalbl.c, the compiler is using VSX stxvd2x and lxvd2x instructions to spill long double, even though only 1/2 of the register is used. The compiler should use scalar load/store instructions.
[Bug target/57150] GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 --- Comment #1 from Michael Meissner 2013-05-02 19:37:21 UTC --- Created attachment 30009 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30009 Assembler file
[Bug target/57150] GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 Michael Meissner changed: What|Removed |Added Target||powerpc64-gnu-linux Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2013-05-02 Host||powerp64-gnu-linux Ever Confirmed|0 |1 Known to fail||4.5.0 Build||powerpc64-gnu-linux --- Comment #2 from Michael Meissner 2013-05-02 19:42:51 UTC --- This goes back to the original VSX submission for GCC 4.5. While the code is slow, it does appear to be correct.
[Bug target/57150] GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 --- Comment #3 from Michael Meissner 2013-05-02 21:03:08 UTC --- It shows up due to -fcaller-saves, which creates a V2DF save area.
[Bug target/57150] GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 --- Comment #4 from Michael Meissner 2013-05-03 19:18:21 UTC --- Created attachment 30028 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30028 Patch to use scalar modes for TF/TD caller saves.
[Bug target/57150] GCC when targeting power7 spills long double using VSX instructions.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57150 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #5 from Michael Meissner 2013-05-07 16:26:02 UTC --- Fixed in subversion id 198593.
[Bug target/52775] Change default for using FCFID instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52775 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #4 from Michael Meissner 2012-08-16 22:58:02 UTC --- Fixed in April, 2012.
[Bug target/53487] [4.8 Regression] Unrecognizable insn for conditional move
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53487 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #5 from Michael Meissner 2012-08-16 22:59:35 UTC --- Fixed on June 5, 2012.
[Bug target/52495] rs6000.c fails to (cross-) build: "implicit declaration of function ‘ASM_WEAKEN_DECL’"
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52495 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2012-08-16 CC||meissner at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #1 from Michael Meissner 2012-08-16 23:14:17 UTC --- If the configure scripts think the cross assembler does not support .weak symbols, the compiler will fail because it does not define ASM_WEAKEN_DECL. Note, when I tried this on August 16th, 2012, the current head of binutils seems broken (the archiver segfaults), but the 2_21 branch builds it fine on my Linux system with a target of powerpc64-linux and additional targets of powerpc-linux. Obviously the compiler should do something more appropriate if the assembler does not support .weak symbols.
[Bug target/47251] New: Powerpc doesn't like -m32 -msoft-float -mcpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47251 Summary: Powerpc doesn't like -m32 -msoft-float -mcpu=power7 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux Created attachment 22941 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22941 Function from libgcc.a that fails with -m32 -mcpu=power7 -msoft-float If you build GCC with --with-cpu=power7, it fails in building libgcc for -m32 -msoft-float. This is due to floatunsdidf/floatunsdfdi_mem not having checks for TARGET_HARD_FLOAT. The error is: /home/meissner/fsf-src/trunk-p7/libgcc/../gcc/libgcc2.c: In function ‘__fixunssfdi’: /home/meissner/fsf-src/trunk-p7/libgcc/../gcc/libgcc2.c:1340:1: error: unable to generate reloads for: (insn 24 22 25 2 (set (reg:DF 3 3) (unsigned_float:DF (reg:DI 10 10 [orig:138 hi+-4 ] [138]))) /home/meissner/fsf-src/trunk-p7/libgcc/../gcc/libgcc2.c:1297 314 {*floatunsdidf2_fcfidu} (expr_list:REG_DEAD (reg:DI 10 10 [orig:138 hi+-4 ] [138]) (nil))) /home/meissner/fsf-src/trunk-p7/libgcc/../gcc/libgcc2.c:1340:1: internal compiler error: in find_reloads, at reload.c:3805 Please submit a full bug report, with preprocessed source if appropriate.
[Bug target/47251] Powerpc doesn't like -m32 -msoft-float -mcpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47251 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011.01.10 21:27:14 Ever Confirmed|0 |1
[Bug target/47251] Powerpc doesn't like -m32 -msoft-float -mcpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47251 Michael Meissner changed: What|Removed |Added Target Milestone|--- |4.6.0
[Bug target/47272] New: In addition to the bug uncovered in 42751, gcc can't bootstrap using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 Summary: In addition to the bug uncovered in 42751, gcc can't bootstrap using --with-cpu=power7 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org CC: berg...@vnet.ibm.com Depends on: 42751 Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux The VSX support changed to use the VSX form of the instruction if both VSX and Altivec forms existed. Unfortunately, there are differences between the Altivec memory references instructions (LVX/STVX) and the VSX memory reference instructions (LXVW4X/STXVW4X). In particular, the Altivec memory instructions ignore the bottom 3 bits of the address field, and the VSX instructions do not. The altivec code in libcpp/lex.c was coded such that it knew about ignoring the bottom 3 bits of the load. Thus we should modify __builtin_vec_ld and __builtin_vec_st to use the Altivec versions of the instructions, and provide other builtins that can use either the altivec or VSX memory instructions, depending on the switches used. In addition, during testing, I discovered that __builtin_vec_ld and __builtin_vec_st don't support the vector double and vector long long types added with VSX.
[Bug target/47272] GCC can't bootstrap on powerpc64-linx using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011.01.12 21:53:09 Ever Confirmed|0 |1
[Bug target/47272] GCC can't bootstrap on powerpc64-linx using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 --- Comment #1 from Michael Meissner 2011-01-12 21:54:25 UTC --- Note, the fixes for 47251 will be needed in addition to changes for this bug in order to do a full bootstrap on a power7 system using the --with-cpu=power7 configure option.
[Bug regression/47385] New: Test gcc.target/powerpc/pr37168.c fails if compiled using a compiled configured with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47385 Summary: Test gcc.target/powerpc/pr37168.c fails if compiled using a compiled configured with --with-cpu=power7 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux The test case pr37168 fails if VSX instructions are enabled. This is due to the fact that the vector constant used has 4 single precision floating point values, and the compiler thinks it can create this via Altivec integer instructions (the bit value is 26, and the compiler wants to load 13 into each word and then double it to get 26). The case fails because in this case, V4SF uses the VSX vector unit and not the Altivec vector unit. The fix is to allow either VSX or Altivec vector units.
[Bug regression/47385] Test gcc.target/powerpc/pr37168.c fails if compiled using a compiled configured with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47385 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011.01.20 20:43:06 Ever Confirmed|0 |1
[Bug regression/47385] Test gcc.target/powerpc/pr37168.c fails if compiled using a compiled configured with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47385 --- Comment #1 from Michael Meissner 2011-01-20 20:44:09 UTC --- Created attachment 23051 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23051 Patch to fix the problem
[Bug target/47251] Powerpc doesn't like -m32 -msoft-float -mcpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47251 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #1 from Michael Meissner 2011-01-20 20:50:49 UTC --- Fixed on January 13th, 2011.
[Bug target/47272] GCC can't bootstrap on powerpc64-linx using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 --- Comment #2 from Michael Meissner 2011-01-20 20:57:54 UTC --- Created attachment 23052 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23052 Preliminary patch to allow --with-cpu=power7 to work The root problem is under VSX, the vec_ld/vec_st builtins use VSX memory instructions which have different semantics than Altivec when the memory is not aligned. The Altivec speedup in libcpp/lex.c specifically knows about the Altivec behaviour and accesses the wrong memory location if the compiler is built with VSX instructions. This patch changes vec_ld/vec_st to go back to using Altivec instructions. It also adds vector double/vector long long support to the Altivec builtin whole vector memory operations. However, in doing so, it may affect users who have been using GCC 4.5 for VSX that expects to use VSX instructions. I anticipate this is not the final patch for the problem.
[Bug target/47408] New: Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 Summary: Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux Some of the Altivec tests fail if the default cpu for the powerpc compiler is power7, because these tests are looking for specific code sequences and/or errors. The fix is to add -mno-vsx to the options.
[Bug target/47408] Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 --- Comment #1 from Michael Meissner 2011-01-21 20:00:09 UTC --- Created attachment 23072 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23072 Patch that adds -mno-vsx to altivec tests
[Bug target/47408] Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011.01.21 20:03:18 Ever Confirmed|0 |1
[Bug target/47408] Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 --- Comment #2 from Michael Meissner 2011-01-24 16:47:20 UTC --- Author: meissner Date: Mon Jan 24 16:47:16 2011 New Revision: 169167 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169167 Log: Fix PR 47408 and 47385 Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/altivec.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/ext/altivec-15.C trunk/gcc/testsuite/g++.dg/ext/altivec-types-1.C trunk/gcc/testsuite/g++.dg/ext/altivec-types-2.C trunk/gcc/testsuite/g++.dg/ext/altivec-types-3.C trunk/gcc/testsuite/g++.dg/ext/altivec-types-4.C trunk/gcc/testsuite/gcc.target/powerpc/altivec-11.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-14.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-33.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-types-1.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-types-2.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-types-3.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-types-4.c trunk/gcc/testsuite/gcc.target/powerpc/ppc-vector-memcpy.c trunk/gcc/testsuite/gcc.target/powerpc/ppc-vector-memset.c
[Bug target/47408] Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 --- Comment #3 from Michael Meissner 2011-01-24 16:57:07 UTC --- Author: meissner Date: Mon Jan 24 16:57:04 2011 New Revision: 169168 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169168 Log: Fix PR 47408 and 47385 Modified: branches/ibm/gcc-4_5-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_5-branch/gcc/config/rs6000/altivec.md branches/ibm/gcc-4_5-branch/gcc/testsuite/ChangeLog.ibm branches/ibm/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/altivec-15.C branches/ibm/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/altivec-types-1.C branches/ibm/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/altivec-types-2.C branches/ibm/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/altivec-types-3.C branches/ibm/gcc-4_5-branch/gcc/testsuite/g++.dg/ext/altivec-types-4.C branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-11.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-14.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-33.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-types-1.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-types-2.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-types-3.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/altivec-types-4.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/ppc-vector-memcpy.c branches/ibm/gcc-4_5-branch/gcc/testsuite/gcc.target/powerpc/ppc-vector-memset.c
[Bug target/43154] vec_mergel and vec_mergeh should support V2DF/V2DI
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43154 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #4 from Michael Meissner 2011-01-24 19:52:28 UTC --- Fixed on February 2, 2010.
[Bug target/47580] New: Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 Summary: Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux Test gcc.dg/pr41551.c fails for powerpc if the default target is power7. This is due to the fact that the expander for floatunsdidf (and others) uses gpc_reg_operand: (define_expand "floatunsdidf2" [(set (match_operand:DF 0 "gpc_reg_operand" "") (unsigned_float:DF (match_operand:DI 1 "gpc_reg_operand" "")))] "TARGET_HARD_FLOAT && (TARGET_FCFIDU || VECTOR_UNIT_VSX_P (DFmode))" "") However, the corresponding VSX matcher uses vsx_register_operand: (define_insn "vsx_floatuns2" [(set (match_operand:VSX_B 0 "vsx_register_operand" "=,?wa") (unsigned_float:VSX_B (match_operand: 1 "vsx_register_operand" ",")))] "VECTOR_UNIT_VSX_P (mode)" "xcvux %x0,%x1" [(set_attr "type" "") (set_attr "fp_type" "")]) Gpc_reg_operand allows the virtual stack registers while vsx_register_operand does not. Since the test is: __extension__ typedef __SIZE_TYPE__ size_t; int main(void) { int var, *p = &var; return (double)(size_t)(p); } It means the expander creates: (insn 5 4 6 3 (set (reg:DF 125) (unsigned_float:DF (reg/f:DI 115 virtual-stack-vars))) pr41551.c:11 -1 (nil)) Which then doesn't match when the target is VSX. There are several different ways this can be solved: 1) Allow virtual stack registers to be used in the vsx register operands. 2) Add a new predicate that doesn't allow virtual stack registers in the expander; 3) Add code in the expander to copy the results if it is in a virtual register.
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 --- Comment #1 from Michael Meissner 2011-02-01 19:02:58 UTC --- Author: meissner Date: Tue Feb 1 19:02:55 2011 New Revision: 169499 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169499 Log: Fix PR 47580 Modified: branches/ibm/power7-meissner/gcc/ChangeLog.power7 branches/ibm/power7-meissner/gcc/config/rs6000/predicates.md
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 --- Comment #2 from Michael Meissner 2011-02-01 19:09:53 UTC --- Created attachment 23203 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23203 Patch that allows virtual registers in vsx register predicates.
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 Michael Meissner changed: What|Removed |Added Attachment #23203|0 |1 is obsolete|| --- Comment #3 from Michael Meissner 2011-02-01 19:17:48 UTC --- Created attachment 23204 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23204 Replacement patch
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 Michael Meissner changed: What|Removed |Added Attachment #23204|0 |1 is obsolete|| --- Comment #4 from Michael Meissner 2011-02-02 01:16:01 UTC --- Created attachment 23207 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23207 Replacement patch #2
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 --- Comment #5 from Michael Meissner 2011-02-03 00:41:21 UTC --- Author: meissner Date: Thu Feb 3 00:41:16 2011 New Revision: 169776 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169776 Log: Fix PR target/47580 Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/vsx.md
[Bug target/47272] GCC can't bootstrap on powerpc64-linx using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 --- Comment #3 from Michael Meissner 2011-02-03 05:42:23 UTC --- Author: meissner Date: Thu Feb 3 05:42:19 2011 New Revision: 169780 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169780 Log: Fix PR target/47272 Added: trunk/gcc/testsuite/gcc.target/powerpc/vsx-builtin-8.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/altivec.h trunk/gcc/config/rs6000/altivec.md trunk/gcc/config/rs6000/rs6000-builtin.def trunk/gcc/config/rs6000/rs6000-c.c trunk/gcc/config/rs6000/rs6000-protos.h trunk/gcc/config/rs6000/rs6000.c trunk/gcc/config/rs6000/rs6000.h trunk/gcc/config/rs6000/vector.md trunk/gcc/config/rs6000/vsx.md trunk/gcc/doc/extend.texi trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/powerpc/avoid-indexed-addresses.c trunk/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c trunk/gcc/testsuite/gcc.target/powerpc/ppc64-abi-dfp-1.c
[Bug target/47272] GCC can't bootstrap on powerpc64-linx using --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47272 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #4 from Michael Meissner 2011-02-03 05:43:34 UTC --- Patch committed Feb. 3, 2011, subversion id 169780.
[Bug target/47580] Powerpc GCC fails test gcc.dg/pr41551.c if built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47580 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #6 from Michael Meissner 2011-02-03 05:51:03 UTC --- Patch checked in on Feb. 2nd, 2011, subversion id 169776.
[Bug tree-optimization/46728] [4.6 Regression] GCC no longer generates fmadd for pow (x, 0.75)+y on powerpc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46728 --- Comment #2 from Michael Meissner 2011-02-04 18:11:09 UTC --- When the initial changes for bug 42694 was added that optimizes pow (x, 0.75) into sqrt(sqrt(x))*sqrt(x) under fast math, there was a desire to move this RTL optimization into the tree level. Ideally it should before the vectorization of math functions and FMA (floating point multiplication and add) passes. Here is the discussion about the changes in April 2010: http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00788.html Presumably most of the optimizations done in expand_builtin_pow, expand_builtin_powi and expand_builtin_pow_root in the builtins.c file should be moved to a tree pass.
[Bug target/47636] New: Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47636 Summary: Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux There is a typo in rs6000.md in the rsqrt generator functions. It refers to the RS6000_RECIP_HAVE_RSQRT_P macro, but the actual macro is RS6000_RECIP_HAVE_RSQRTE_P. You get a warning that the function is unknown in the build, but it doesn't stop the build since it just puts out a relocation for the RS6000_RECIP_HAVE_RSQRT_P function to be loaded later. However the rsqrt generators are never called, you never get an error.
[Bug target/47636] Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47636 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011.02.07 19:43:31 Ever Confirmed|0 |1
[Bug target/47636] Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47636 --- Comment #1 from Michael Meissner 2011-02-07 20:32:51 UTC --- Author: meissner Date: Mon Feb 7 20:32:45 2011 New Revision: 169901 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169901 Log: Fix PR target/47636 Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.md
[Bug target/47636] Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47636 --- Comment #2 from Michael Meissner 2011-02-07 20:35:02 UTC --- Created attachment 23269 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23269 Patch that fixes the problem Spell RS6000_RECIP_HAVE_RSQRTE_P correctly.
[Bug target/47755] New: VSX code generates a TOC reference to clear memory
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47755 Summary: VSX code generates a TOC reference to clear memory Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target AssignedTo: meiss...@gcc.gnu.org ReportedBy: meiss...@gcc.gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux If you have an array of pointers or longs in 64-bit mode that you want to clear, via loops like when you have automatic vectorization on: for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) array[i] = 0; The compiler generates the 128-bit zero constant, puts it into the constant pool, and loads it from memory in order to set the array with vector instructions. Normally this would be a missed optimization, but we discovered it in compiling _dl_start with -O3 -mcpu=power7, and at the time _dl_start is run, the TOC registers are not yet set up, so the program crashes before it starts. The cause of the bug is that V2DI mode is not true for either VSX_VECTOR_MODE or ALTIVEC_VECTOR_MODE, since there are no native 64-bit operations in the VSX or Altivec vector instructions. This means that easy_vector_constant fails, which in turn makes LEGITIMATE_CONSTANT_P fail. The solution is use macros that test whether Altivec/VSX memory references can be done, instead of macros that say we have native arithmetic support for those modes.
[Bug target/47755] VSX code generates a TOC reference to clear memory
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47755 --- Comment #1 from Michael Meissner 2011-02-15 15:41:49 UTC --- Created attachment 23352 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23352 Patch to allow V2DI easy vector constants
[Bug target/44218] Improve powerpc -mrecip support
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44218 Michael Meissner changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #3 from Michael Meissner 2011-02-15 17:56:50 UTC --- Fixed with checkin on June 3rd, 2010.
[Bug target/47636] Powerpc rs6000.md uses RS6000_RECIP_HAVE_RSQRT_P
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47636 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #3 from Michael Meissner 2011-02-15 17:58:14 UTC --- Fixed on February 7th, 2011.
[Bug target/47408] Several of the Altivec tests fail if run with a compiler built with --with-cpu=power7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47408 Michael Meissner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #4 from Michael Meissner 2011-02-15 17:59:16 UTC --- Fixed with checkin on January 24th, 2011.
[Bug target/47755] VSX code generates a TOC reference to clear memory
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47755 --- Comment #2 from Michael Meissner 2011-02-15 18:43:01 UTC --- Author: meissner Date: Tue Feb 15 18:42:59 2011 New Revision: 170189 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170189 Log: Fix PR 47755 Added: trunk/gcc/testsuite/gcc.target/powerpc/pr47755.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/predicates.md trunk/gcc/testsuite/ChangeLog