[Bug target/95070] New: vec_cntlz_lsbb implementation uses BE semantics on LE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95070 Bug ID: 95070 Summary: vec_cntlz_lsbb implementation uses BE semantics on LE Product: gcc Version: 8.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- Created attachment 48512 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48512&action=edit test case This: -- vector unsigned char a = { 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; int r = vec_cntlz_lsbb (a); -- returns 14 on LE and 0 on BE. It should return 0 on both. vec_cntlz_lsbb counts bytes with least significant bits of 0 *starting from the lowest element number*. In the above code, a[0] == 0xFF, so the count should find 0 bytes. The same issue occurs with vec_cnttz_lsbb (which should find 14 bytes in the above example on both LE and BE, but finds 0 and 14, respectively).
[Bug target/95082] LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082 Paul Clarke changed: What|Removed |Added CC||pc at us dot ibm.com --- Comment #2 from Paul Clarke --- This is a dup of bug 95070. (I am unable to mark it as such.)
[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402 --- Comment #2 from Paul Clarke --- I'd like to take a stab at fixing this.
[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402 --- Comment #3 from Paul Clarke --- (In reply to Steven Munroe from comment #0) > The rs6000/emmintrin.h implementation of _mm_slli_epi32 reports: > error: argument 1 must be a 5-bit signed literal > > For constant shift values > 15. I thought this would be trivial to reproduce, but not able to provoke it. Do you have a testcase? I will attach the one I tried.
[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402 --- Comment #4 from Paul Clarke --- Created attachment 43829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43829&action=edit unhelpful testcase $ gcc --version gcc (GCC) 8.0.1 20180402 (experimental) Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall -O3 -mcpu=power8 $ gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall -mcpu=power8 $
[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402 --- Comment #6 from Paul Clarke --- (In reply to Steven Munroe from comment #5) > You need to look at the generated asm code. And see what the compiler is > doing. > > Basically it should be generating a vspltisw vr,si for vec_splat_s32. > > But if the immediate signed int (si) is greater than 15, should failure with: > > error: argument 1 must be a 5-bit signed literal I was hoping you'd tell me the scenario with which you saw that error. :-) > The vec_splats should work for any value as it will load a const vector from > storage. > > Perhaps the compiler is generating bad code and not reporting it. > > Or the compiler is too smart and converting the vec_splat_s32 to the more > general vec_splats under the covers. I think the compiler is doing this. Here's an extract from a (new) simple test case: -- out(a); a = _mm_slli_epi32( a, 7 ); out(a); a = _mm_slli_epi32( a, 31 ); out(a); -- li r0,32 stvxv31,r1,r0 bl 1628 addis r9,r2,-2 vspltisw v0,7 addir9,r9,-30976 lvx v31,0,r9 vslwv31,v31,v0 xxlor vs34,vs63,vs63 bl 1628 addis r9,r2,-2 addir9,r9,-30960 lvx v2,0,r9 vslwv2,v31,v2 bl 1628 -- So, if the shift value is < 16, it uses vspltisw. If the shift value is >= 16, it loads a const vector from memory. Is this issue now moot?
[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402 --- Comment #8 from Paul Clarke --- (In reply to Steven Munroe from comment #7) > Please try the same test with AT11 gcc7. I know I hit this! voila! $ /opt/at11.0/bin/gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall -mcpu=power8 -O3 In file included from /opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h:62:0, from 83402.c:2: /opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h: In function ‘main’: /opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h:1513:20: error: argument 1 must be a 5-bit signed literal lshift = (__v4su) vec_splat_s32(__B); ^ $ rpm -q advance-toolchain-at11.0-devel advance-toolchain-at11.0-devel-11.0-3.ppc64le Now the question is whether to bother fixing this: 1. in GCC 8's rs6000/emmintrin.h, since it's not really "broken" there, and backport that AT 11 (sounds a little silly) 2. backport the GCC 8 change that fixes this to AT 11 (sounds hard) 3. change only AT 11's emmintrin.h Any strong opinions? Otherwise, I'm leaning toward option (3).
[Bug c++/77681] New: failing to inline simple function when using -fgnu-tm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681 Bug ID: 77681 Summary: failing to inline simple function when using -fgnu-tm Product: gcc Version: 6.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- Created attachment 39671 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39671&action=edit testcase where always_inline fails with -fgnu-tm I've used several different versions of GCC, including 4.8.5 and 6.2.1, on different architectures (x86_64 and ppc64le). I first noticed that a very simple "static inline" function was not being inlined. When I added "__attribute__((always_inline))", an error was produced (see below). In narrowing down the testcase, I also tried to narrow down the command line, and discovered that the error is only produced when "-fgnu-tm" is present. The narrowed-down testcase makes no use of transactional memory, so there appears to be some inlining interference caused by "-fgnu-tm". -- $ g++ -O3 -c always-inline.cpp -fgnu-tm -o /dev/null always-inline.cpp: In member function ‘T* spsc::pop() [with T = int]’: always-inline.cpp:9:1: error: inlining failed in call to always_inline ‘void* _ZL13SPHGetFreePtrPv.constprop.0()’: SPHGetFreePtr (void *H) { ^ always-inline.cpp:19:32: error: called from here T** p = (T**) SPHGetFreePtr(0); $ g++ -O3 -c always-inline.cpp -o /dev/null $ --
[Bug c++/77681] failing to inline simple function when using -fgnu-tm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681 --- Comment #1 from Paul Clarke --- shoot. this may be a dup of bug 53991
[Bug tree-optimization/53991] _mm_popcnt_u64 fails with -O3 -fgnu-tm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53991 Paul Clarke changed: What|Removed |Added CC||pc at us dot ibm.com --- Comment #10 from Paul Clarke --- *** Bug 77681 has been marked as a duplicate of this bug. ***
[Bug c++/77681] failing to inline simple function when using -fgnu-tm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681 Paul Clarke changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Paul Clarke --- I'll move over to the already-reported bug. *** This bug has been marked as a duplicate of bug 53991 ***
[Bug tree-optimization/53991] _mm_popcnt_u64 fails with -O3 -fgnu-tm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53991 --- Comment #11 from Paul Clarke --- We use TM for a multi-producer-multi-consumer queue implementation, and ran into the issue reported in this bug. (I had opened bug 77681 before discovering this report.) This report is surprisingly old. Is there any chance this could get bumped to higher priority?
[Bug debug/98875] New: DWARF5 as default causes perf probe to hang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98875 Bug ID: 98875 Summary: DWARF5 as default causes perf probe to hang Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- I sent this to gcc-patches, but realized I should open a bug report: -- The subject commit, 3804e937b0e252a7e42632fe6d9f898f1851a49c, causes a failure in the test suite for the IBM Advance Toolchain. The test in question uses "perf probe" to set a tracepoint at "main" in a newly built (with GCC 11) binary of "/bin/ld". With the patch applied, the command enters an infinte loop, calling libdw1 functions but making no progress. The infinite loop can be found in the Linux kernel tools/perf/utils/probe-finder.c: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/probe-finder.c?h=v5.11-rc5#n1190 Reverting this patch permits the command to succeed. -- $ grep VERSION= /etc/os-release VERSION="15-SP2" $ uname -r 5.3.18-22-default $ perf --version perf version 5.3.18 Top of the GCC tree used: ATSRC_PACKAGE_REV=eb9883c1312c Reversion patch: -- $ cat ~/projects/gcc/gcc/gcc-revert-dwarf-5.patch diff --git a/gcc/common.opt b/gcc/common.opt index a8a2b67a99d..7aff4ac6079 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3179,7 +3179,7 @@ Common Driver JoinedOrMissing Negative(gdwarf-) Generate debug information in default version of DWARF format. gdwarf- -Common Driver Joined UInteger Var(dwarf_version) Init(5) Negative(gstabs) +Common Driver Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs) Generate debug information in DWARF v2 (or later) format. gdwarf32 -- Failing command: $ perf probe -v -x /path/to/AT/at-next-15.0-0-alpha/bin/ld ldmain=main
[Bug debug/98875] DWARF5 as default causes perf probe to hang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98875 --- Comment #3 from Paul Clarke --- The IBM Advance Toolchain supports SLES 15, where the latest version of libdw is 0.168. We'll work around the issue by reverting the commit for the version of GCC included with the Advance Toolchain. I didn't see any update to the GCC documentation regarding the disruptive nature of the change causing the problem other than "[DWARF] Version 5 requires GDB 8.0 or higher". Should there be something about libdw as well? Anything else?
[Bug target/102107] New: protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 Bug ID: 102107 Summary: protocol register (r12) corrupted before a tail call Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- Created attachment 51367 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51367&action=edit preprocessed source (large) I've been working on an effort to improve Python performance, and hit an issue when running with a libpython.so that was built with "-mcpu=power10". The problem appears to be not correctly setting up (and preserving) register 12 before calling into a dynamically loaded, non-PCrel Python module in the form of a shared object. GDB shows the following instruction stream: => 0x77d25014 :ld r12,0(r9) => 0x77d25018 :addir1,r1,112 r120x7fffe921af60 140737104686944 => 0x77d2501c :std r10,0(r30) => 0x77d25020 :ld r3,8(r9) => 0x77d25024 :ld r9,0(r31) => 0x77d25028 :ld r29,-24(r1) => 0x77d2502c :ld r30,-16(r1) => 0x77d25030 :mtctr r12 => 0x77d25034 :lwz r12,8(r1) r120x4000 16384 => 0x77d25038 :addir9,r9,1 => 0x77d2503c :std r9,0(r31) => 0x77d25040 :ld r31,-8(r1) => 0x77d25044 :mtocrf 8,r12 => 0x77d25048 :bctr => 0x7fffe921af60 :addis r2,r12,4 => 0x7fffe921af64 : addir2,r2,-12384 => 0x7fffe921af68 : nop => 0x7fffe921af6c : ld r3,-32728(r2) Program received signal SIGSEGV, Segmentation fault. 0x7fffe921af6c in _Py_INCREF (op=) at ../Python-3.9.6/Include/object.h:408 408 op->ob_refcnt++; After setting r12 to the address of the caller (0x77d25014), the load at 0x77d25034 overwrites it with the CR save value just before the tail call (bctr) at 0x77d25048, resulting in the badness when setting up and using the TOC. I suspect some sort of instruction scheduling issue? I've attached a rather large pre-processed C file. It's complicated to reduce because of functions calling other functions. I gave "creduce" a shot at it, but it's challenging (for me, at least) to craft a script that knows what to look for. I'll also attach the best I could get from creduce, but shield your eyes before looking at it.
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 Paul Clarke changed: What|Removed |Added Attachment #51367|0 |1 is obsolete|| --- Comment #1 from Paul Clarke --- Created attachment 51368 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51368&action=edit preprocessed source (large) Attach correct file. :-/
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #2 from Paul Clarke --- Created attachment 51369 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51369&action=edit creduced version (tiny, but ugly)
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #3 from Paul Clarke --- Created attachment 51371 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51371&action=edit preprocessed source (a bit smaller) I was able to cut out a bit more than half of the original code. It gets more difficult from here. If this is still "too big", I can hack at it some more.
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #4 from Paul Clarke --- Created attachment 51372 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51372&action=edit preprocessed source (yet a bit smaller) I was able to remove one of the cases of the switch statement in the function which exhibits the issue. Interestingly, removing any of the others hides the issue.
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #5 from Paul Clarke --- Fails with "-mcpu=power10" and "-O2" or "-O3".
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #8 from Paul Clarke --- $ /opt/at15.0/bin/gcc --version gcc (GCC) 11.2.1 20210802 (Advance-Toolchain 15.0-0) [ebcfb7a665c2]
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 --- Comment #11 from Paul Clarke --- This does produce the issue for me: -- $ git checkout remotes/vendors/ibm/gcc-11-branch gcc-AT $ mkdir gcc-AT-build $ cd gcc-AT-build $ ../gcc-AT/configure --enable-languages=c,c++ --disable-libada --disable-libsanitizer --disable-libssp --disable-libgomp --disable-libvtv --disable-nls --prefix=/home/pc/gcc-AT-install $ make $ make install $ ~/gcc-AT-install/bin/gcc -S -O3 -mcpu=power10 -fverbose-asm r12test2.c $ grep --before-context=15 bctr r12test2.s mtctr 12 # func, func # r12test2.c:3030: } lwz 12,8(1) #, # r12test2.c:3013: ++*p_format; addi 9,9,1 # tmp251, *p_format_31(D), std 9,0(31) # *p_format_31(D), tmp251 # r12test2.c:3030: } ld 31,-8(1) #, mtcrf 8,12 #, .cfi_restore 72 .cfi_restore 31 .cfi_restore 30 .cfi_restore 28 .cfi_restore 27 # r12test2.c:3014: return (*func)(); bctr # func --
[Bug target/102485] New: -Wa,-many no longer has any effect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102485 Bug ID: 102485 Summary: -Wa,-many no longer has any effect Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- The assembler option "-many" tells the assembler to support assembly of instructions from any vintage of processor. This can be passed through the GCC compiler using the command line option "-Wa,-many". The "-many" functionality has been under attack of late. ;-) It once was that GCC would always pass "-many" to the assembler. This was stopped with commit e154242724b084380e3221df7c08fcdbd8460674 "Don't pass -many to the assembler". A recent change to binutils, commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a "ignore sticky options for .machine" stopped preserving "sticky" options across a base `.machine` directive. This change caused sequences like: .machine altivec .machine power5 ...to disable AltiVec instructions afterward, because "power5" did not support AltiVec, and "power5" is a base ".machine" directive. A perhaps unintended consequence is that using GCC to pass "-many" to the assembler (via "-Wa,-many") has no effect because GCC adds a base ".machine" directive to every(?) assembler file given to the assembler, but only passes "-many" (no ".machine" directives are added). The assember sees the "-many" parameter, then sees the base ".machine" directive, and suppresses any impact of the "-many" parameter. -- mfppr32.c: long f () { long ppr; asm volatile ("mfppr32 %0" : "=r"(ppr)); return ppr; } -- $ gcc -c ./mfppr32.c gcc -c ./mfppr32.c /tmp/ccAShoDb.s: Assembler messages: /tmp/ccAShoDb.s:18: Error: unrecognized opcode: `mfppr32' $ gcc -Wa,-many ./mfppr32.c /tmp/cc0tRDPx.s: Assembler messages: /tmp/cc0tRDPx.s:18: Error: unrecognized opcode: `mfppr32' $ gcc -S -Wa,-many -O ./mfppr32.c $ cat mfppr32.s [edited for brevity] .file "mfppr32.c" .machine ppc .section".text" .globl f .type f, @function f: mfppr32 3 blr $ as mfppr32.s mfppr32.s: Assembler messages: mfppr32.s:12: Error: unrecognized opcode: `mfppr32' With older binutils, this worked: $ older-as mfppr32.s $ -- If binutils assembler (as) is doing the right thing now with respect to the base ".machine" directives and sticky ".machine" directives, then it would perhaps be GCCs responsibility to build an assembler file that allows for passing the "-many" assembler command line option through GCC and have that continue to work as likely expected.
[Bug target/102107] protocol register (r12) corrupted before a tail call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107 Paul Clarke changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #23 from Paul Clarke --- Tested (trunk), works for me.
[Bug target/102485] -Wa,-many no longer has any effect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102485 --- Comment #2 from Paul Clarke --- GCC putting the base ".machine" directive at the beginning of the file makes any command-line use of "-many" (-Wa,-many) be ignored. Is that OK? "-many" is supposed to make those black boxes just work. This worked before recent changes to binutils/GCC. Is there any valid use of "-Wa,-many" now?
[Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783 Bug ID: 102783 Summary: [powerpc] FPSCR manipulations cannot be relied upon Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- On all Power targets which support hardware floating-point, there are a few manipulations of the Floating-Point Status and Control Register (FPSCR) that have side-effects for subsequent floating-point computation. For example, changing the floating-point rounding mode, or changing whether floating-point exceptions are enabled. There are many ways to effect those manipulations: - The set of fenv(1) calls - A handful of builtins: __builtin_fpscr_set_rn __builtin_mtfsf __builtin_mtfsb{0,1} - Inline asm using the appropriate instructions (mffsce, mffscdrn{i}, mffscrn{i}, mtfsf{i}, mtfsb{0,1}) The problem is that if any of the above methods are not effected in an out-of-line function, there is no way at present to restrict instruction scheduling such that nearby floating-point computations are prevented from moving before or after the FPSCR changes. (Possibly resulting in computation using a wrong rounding mode, or unexpected FP exceptions.) With asm statements, one could add artificial read and write dependencies to the input or output (if any) of the FPSCR manipulations and previous/subsequent FP computations, but this is not always practicable. (Current glibc is an example.)
[Bug target/101893] There is no vgbbd on p7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101893 --- Comment #1 from Paul Clarke --- I'll take ownership of this, except I'm not sure how to effect that. The fix has been posted https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577023.html, and awaits reviews/approval.