from:"jens.seifert at de dot ibm.com"

[Bug target/119912] New: PPC: Inefficient vector immediate shifts

2025-04-23 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Shifts by -1 should be performed by a 0xFF..FF constant as PPC has modulo shift and the constant generation for 0xFF..FF requires just 1 instruction. On Power9 always use

[Bug target/119702] New: PPCLE: Inefficient auto-vectorization for 64-bit shifts on Power9

2025-04-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- void lshift1(unsigned long long *a) { a[0] <<= 1; a[1] <<= 1; } Output: lshift1(unsigned long long*):

[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 --- Comment #4 from Jens Seifert --- clang is emitting extended mnemonics. On gcc, I only can enforce this by using inline assembly: unsigned long long parityfast(unsigned long long in) { __asm__("popcntd %0,%1":"+r"(in)); return in & 1

[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 --- Comment #2 from Jens Seifert --- popcnt + parity is slower than just 64-bit popcount and extracting last bit. "missed-optimization" opportunity applies as well to big endian. Optimal code: popcntd 3, 3 clrldi 3, 3, 63

[Bug target/119494] New: z196: Inefficient implementation for __builtin_parityll for z196 < z15

2025-03-27 Thread jens.seifert at de dot ibm.com via Gcc-bugs

ity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- bool parityll(unsigned long long x) { return __builtin_parityll(x); } Code generation for z15 and above is opti

[Bug target/119468] New: PPCLE: Inefficient implementation of __builtin_parityll

2025-03-25 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- bool parity(unsigned long long l) { return __builtin_parityll(l); } bool parity2(unsigned long long l) { return

[Bug target/117928] New: z14 builtin for VLBR instruction missing

2024-12-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- I want to use the z14 vlbr instruction, but I found no builtin for them. The assembler claims "unknown" mnemonic for vlbr, but I see the instruction in the &quo

[Bug target/117568] New: z13: Use vector instructions for fixed length memcmp

2024-11-13 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include #include Up to 16 bytes consider using vector instructions for memcmp. This is not required for 1,2,4,8 bytes, but for the rest. For general

[Bug target/117561] New: z13/z14 Please add a scalar_test_data_class builtin

2024-11-13 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- I found no way to efficient check fp data class on z using wftcidb (z13) and wftcisb(z14) instruction. For PowerPC scalar_test_data_class exists and provides

[Bug target/116649] New: PPC: Suboptimal code for __builtin_bcdadd_ovf on Power10

2024-09-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long bcdadd(vector __int128 a, vector __int128 b, vector __int128 *c) { return __builtin_bcdadd_ov(a, b, 0

[Bug target/115973] PPCLE: Inefficient code for __builtin_uaddll_overflow and __builtin_addcll

2024-09-07 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115973 --- Comment #2 from Jens Seifert --- Assembly that better integrates: unsigned long long addc_opt(unsigned long long a, unsigned long long b, unsigned long long *res) { unsigned long long rc; __asm__("addc %0,%2,%3;\n\tsubfe %1,%1,%1":"=r

[Bug target/115973] New: PPCLE: Inefficient code for __builtin_uaddll_overflow and __builtin_addcll

2024-07-17 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long add(unsigned long long a, unsigned long long b, unsigned long long *ovf) { return

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-06-06 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #10 from Jens Seifert --- Does this affect loop vectorize and slp vectorize ? -fno-tree-loop-vectorize avoids loop vectorization to be performed and workarounds this issue. Does the same problems also affect SLP vectorization, which

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 --- Comment #1 from Jens Seifert --- Same issue with gcc 13.2.1

[Bug target/115355] New: PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input setToIdentity.C: #include #include #include void setToIdentityGOOD(unsigned long long *mVec, unsigned int mLen) { for

[Bug target/114376] New: s390: Inefficient __builtin_bswap16

2024-03-18 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned short swap16(unsigned short in) { return __builtin_bswap16(in); } generates -O3 -march=z196 swap16(unsigned short): lrvr%r2,%r2 srl %r2,16

[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones

2023-08-17 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176 --- Comment #10 from Jens Seifert --- Looks like no patch in the area got delivered. I did a small test for unsigned long long c() { return 0xULL; } gcc 13.2.0: li 3,0 ori 3,3,0x

[Bug target/93176] PPC: inefficient 64-bit constant consecutive ones

2023-08-16 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176 --- Comment #7 from Jens Seifert --- What happened ? Still waiting for improvement.

[Bug target/106770] PPCLE: Unnecessary xxpermdi before mfvsrd

2023-02-27 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770 --- Comment #6 from Jens Seifert --- The left part of VSX registers overlaps with floating point registers, that is why no register xxpermdi is required and mfvsrd can access all (left) parts of VSX registers directly. The xxpermdi x,y,y,3 indic

[Bug target/106770] PPCLE: Unnecessary xxpermdi before mfvsrd

2023-02-27 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770 --- Comment #4 from Jens Seifert --- PPCLE with no special option means -mcpu=power8 -maltivec (altivecle to be mor precise). vec_promote(, 1) should be a noop on ppcle. But value gets splatted to both left and right part of vector register. =

[Bug c++/108560] New: builtin_va_arg_pack_len is documented to return size_t, but actually returns int

2023-01-26 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include bool test(const char *fmt, size_t numTokens, ...) { return __builtin_va_arg_pack_len() != numTokens

[Bug target/108396] New: PPCLE: vec_vsubcuq missing

2023-01-13 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: #include vector unsigned __int128 vsubcuq(vector unsigned __int128 a, vector unsigned __int128 b) { return vec_vsubcuq(a, b); } Command line: gcc -m64 -O2 -maltivec -mcpu

[Bug target/108049] s390: Compiler adds extra zero extend after xoring 2 zero extended values

2022-12-10 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108049 --- Comment #1 from Jens Seifert --- Sample above got compiled with -march=z196

[Bug target/108049] New: s390: Compiler adds extra zero extend after xoring 2 zero extended values

2022-12-10 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Same issue for PPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949 extern unsigned char magic1[256]; unsigned

[Bug rtl-optimization/107949] PPC: Unnecessary rlwinm after lbzx

2022-12-10 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949 --- Comment #3 from Jens Seifert --- *** Bug 108048 has been marked as a duplicate of this bug. ***

[Bug target/108048] PPCLE: gcc does not recognize that lbzx does zero extend

2022-12-10 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108048 Jens Seifert changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/108048] New: PPCLE: gcc does not recognize that lbzx does zero extend

2022-12-10 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- extern unsigned char magic1[256]; unsigned int hash(const unsigned char inp[4]) { const unsigned long long INIT = 0x1ULL

[Bug target/107949] PPC: Unnecessary rlwinm after lbzx

2022-12-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949 --- Comment #1 from Jens Seifert --- hash2 is only provided to show how the code should look like (without rlwinm).

[Bug target/107949] New: PPC: Unnecessary rlwinm after lbzx

2022-12-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- extern unsigned char magic1[256]; unsigned int hash(const unsigned char inp[4]) { const unsigned long long INIT = 0x1ULL; unsigned long long h1 = INIT; h1 = magic1

[Bug target/107757] New: PPCLE: Inefficient vector constant creation

2022-11-18 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Due to the fact that vslw, vsld, vsrd, ... only use the modulo of bit width for shifting, the combination with 0xFF..FF vector can be used to create vector constants

[Bug target/86160] Implement isinf on PowerPC

2022-11-08 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86160 --- Comment #4 from Jens Seifert --- I am looking forward to get Power9 optimization using xststdcdp etc.

[Bug target/106770] PPCLE: Unnecessary xxpermdi before mfvsrd

2022-08-29 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770 --- Comment #2 from Jens Seifert --- vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3 extracts the right element. Looks like a bug in vec_extract for PPCLE and not a problem regarding unnecessary xxpermdi. Using assembly

[Bug target/106770] PPCLE: Unnecessary xxpermdi before mfvsrd

2022-08-29 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770 --- Comment #1 from Jens Seifert --- vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3 extracts the right element. Looks like a bug in vec_extract for PPCLE and not a problem regarding unnecessary xxpermdi.

[Bug target/106770] New: PPCLE: Unnecessary xxpermdi before mfvsrd

2022-08-29 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include int cmp2(double a, double b) { vector double va = vec_promote(a, 1); vector double vb = vec_promote(b, 1); vector long long vlt = (vector long long

[Bug target/106769] New: PPCLE: vec_extract(vector unsigned int) unnecessary rldicl after mfvsrwz

2022-08-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include unsigned int extr(vector unsigned int v) { return vec_extract(v, 2); } Generates: _Z4extrDv4_j: .LFB1

[Bug target/106701] New: s390: Compiler does not take into account number range limitation to avoid subtract from immediate

2022-08-21 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long subfic(unsigned long long a) { if (a > 15) __builtin_unreacha

[Bug target/106598] New: s390: Inefficient branchless conditionals for int

2022-08-12 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- int lt(int a, int b) { return a < b; } generates: cr %r2,%r3 lhi %r1,1 lhi %r2,0 locrnl %r1,%r2 l

[Bug target/106592] New: s390: Inefficient branchless conditionals for long long

2022-08-12 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Created attachment 53443 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53443&action=edit source code long long gtRef(long

[Bug target/106536] New: P9: gcc does not detect setb pattern

2022-08-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- int compare2(unsigned long long a, unsigned long long b) { return (a > b ? 1 : (a < b ? -1 : 0)); } Output: _Z8compare2yy: cmpld 0,3,4 bgt

[Bug target/106525] New: s390: Inefficient branchless conditionals for unsigned long long

2022-08-04 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Created attachment 53409 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53409&action=edit source code 1)

[Bug target/106043] Power10: lacking vec_blendv builtins

2022-07-13 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043 Jens Seifert changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/106043] Power10: lacking vec_blendv builtins

2022-07-13 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043 --- Comment #1 from Jens Seifert --- Found in documentation: https://gcc.gnu.org/onlinedocs/gcc-11.3.0/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html#PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1

[Bug c/106043] New: Power10: lacking vec_blendv builtins

2022-06-21 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Missing builtins for vector instructions xxblendvb, xxblendvw, xxblendvd, xxblendvd. #include vector int blendv(vector int a, vector int b, vector int c) { return

[Bug target/104268] New: 390: inefficient vec_popcnt for 16-bit for z13

2022-01-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include vector unsigned short popcnt(vector unsigned short a) { return vec_popcnt(a); } Generates with -march=z13 _Z6popcntDv8_t: .LFB1

[Bug target/103743] New: PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2021-12-15 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- int overflow(); int negOverflow(long long in) { if (in

[Bug target/103731] New: 390: inefficient 64-bit constant generation

2021-12-15 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long M8() { return 0x; } Generates: .LC0: .quad 0x .text .align 8 .globl _Z2M8v .type

[Bug target/103106] New: PPC: Missing builtin for P9 vmsumudm

2021-11-06 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- I can't find builtin for vmsumudm instruction. I also found nothing in the Power vector instrinsic programming reference. https://openpowerfoundation.org/?resource_lib=

[Bug target/102265] New: s390: Inefficient code for __builtin_ctzll

2021-09-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long ctzll(unsigned long long x) { return __builtin_ctzll(x); } creates: lcgr%r1,%r2 ngr %r2,%r1 lghi%r1,63

[Bug target/102117] s390: Inefficient code for 64x64=128 signed multiply for <= z13

2021-08-29 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102117 --- Comment #1 from Jens Seifert --- Sorry small bug in optimal sequence. __int128 imul128_opt(long long a, long long b) { unsigned __int128 x = (unsigned __int128)(unsigned long long)a; unsigned __int128 y = (unsigned __int128)(unsigned

[Bug target/102117] New: s390: Inefficient code for 64x64=128 signed multiply for <= z13

2021-08-29 Thread jens.seifert at de dot ibm.com via Gcc-bugs

mal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- __int128 imul128(long long a, long long b) { return (__int128)a * (__int128)b; } creates sequence with 3 multipl

[Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9

2021-06-20 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866 --- Comment #9 from Jens Seifert --- I know that if I would use vec_perm builtin as an end user, that you then need to fulfill to the LE specification, but you can always optimize the code as you like as long as it creates correct results afterw

[Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9

2021-06-18 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866 --- Comment #7 from Jens Seifert --- Regarding vec_revb for vector unsigned int. I agree that revb: .LFB0: .cfi_startproc vspltish %v1,8 vspltisw %v0,-16 vrlh %v2,%v2,%v1 vrlw %v2,%v2,%v0 blr work

[Bug target/101041] New: z13: Inefficient handling of vector register passed to function

2021-06-12 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include vector unsigned long long mul64(vector unsigned long long a, vector unsigned long long b) { return a * b; } creates

[Bug target/100930] New: PPC: Missing builtins for P9 vextsb2w, vextsb2w, vextsb2d, vextsh2d, vextsw2d

2021-06-06 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Using the same names like xlC appreciated: vec_extsbd, vec_extsbw, vec_extshd, vec_extshw, vec_extswd

[Bug target/100926] New: PPCLE: Inefficient code for vec_xl_be(unsigned short *) < P9

2021-06-05 Thread jens.seifert at de dot ibm.com via Gcc-bugs

mal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: vector unsigned short load_be(unsigned short *c) { return vec_xl_be(0L, c); } creates: _Z7load_bePt: .L

[Bug target/100808] PPC: ISA 3.1 builtin documentation

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808 --- Comment #3 from Jens Seifert --- - Avoid additional "int" unsigned long long int => unsigned long long Why? Those are exactly the same types! Yes, but the rest of the documentation uses unsigned long long. This is just for consistency wit

[Bug target/100871] New: z14: vec_doublee maps to wrong builtin in vecintrin.h

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include Input: vector double doublee(vector float a) { return vec_doublee(a); } cause compile error: vec.C: In function ‘__vector(2) double doublee

[Bug target/100869] New: z13: Inefficient code for vec_reve(vector double)

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: vector double reve(vector double a) { return vec_reve(a); } creates: _Z4reveDv2_d: .LFB3: .cfi_startproc larl%r5,.L12 vl

[Bug target/100868] New: PPC: Inefficient code for vec_reve(vector double)

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: vector double reve(vector double a) { return vec_reve(a); } creates: _Z4reveDv2_d: .LFB3: .cfi_startproc .LCF3: 0: addis 2,12,.TOC.-.LCF3

[Bug target/100867] New: z13: Inefficient code for vec_revb(vector unsigned short)

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: vector unsigned short revb(vector unsigned short a) { return vec_revb(a); } Creates: _Z4revbDv4_j: .LFB1

[Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9

2021-06-02 Thread jens.seifert at de dot ibm.com via Gcc-bugs

mal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: vector unsigned short revb(vector unsigned short a) { return vec_revb(a); } creates: _Z4revbDv8_t: .L

[Bug c/100808] PPC: ISA 3.1 builtin documentation

2021-05-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808 --- Comment #1 from Jens Seifert --- https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html vector unsigned long long int vec_gnb (vector unsigned __int128, const unsigned char) should be unsigned

[Bug c++/100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-05-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809 --- Comment #1 from Jens Seifert --- Same applies to modulo.

[Bug c++/100809] New: PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq

2021-05-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned __int128 div(unsigned __int128 a, unsigned __int128 b) { return a/b; } __int128 div(__int128 a, __int128 b

[Bug c/100808] New: PPC: ISA 3.1 builtin documentation

2021-05-28 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- https://gcc.gnu.org/onlinedocs/gcc/Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1.html#Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1 Please improve the

[Bug target/100694] New: PPC: initialization of __int128 is very inefficient

2021-05-20 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Initializing a __int128 from 2 64-bit integers is implemented very inefficient. The most natural code which works good on all other platforms generate

[Bug target/100693] New: PPC: missing 64-bit addg6s

2021-05-20 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- gcc only provides unsigned int __builtin_addg6s (unsigned int, unsigned int); but addg6s is a 64-bit operation. I require unsigned long long __builtin_addg6s (unsigned long long

[Bug target/98020] PPC: mfvsrwz+extsw not merged to mtvsrwa

2020-12-08 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020 Jens Seifert changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/98124] New: Z: Load and test LTDBR instruction gets not used for comparison against 0.0

2020-12-03 Thread jens.seifert at de dot ibm.com via Gcc-bugs

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include double sign(double in) { return in == 0.0 ? 0.0 : copysign(1.0, in); } Command line: gcc m64 -O2 -save

[Bug target/98020] New: PPC: mfvsrwz+extsw not merge to mtvsrwa

2020-11-26 Thread jens.seifert at de dot ibm.com via Gcc-bugs

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- int extract(vector signed int v) { return v[2]; } Command line: gcc -mcpu=power8 -maltivec -m64 -O3 -save-temps extract.C Output: _Z7extractDv4_i: .LFB0

[Bug target/70928] Load simple float constants via VSX operations on PowerPC

2020-11-14 Thread jens.seifert at de dot ibm.com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928 Jens Seifert changed: What|Removed |Added CC||jens.seifert at de dot ibm.com

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 Jens Seifert changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE

[Bug target/95737] New: PPC: Unnecessary extsw after negative less than

2020-06-18 Thread jens.seifert at de dot ibm.com

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- unsigned long long negativeLessThan(unsigned long long a, unsigned long long b) { return -(a < b); } gcc -m64 -O2 -save-temps negativeLessThan.C crea

[Bug target/95704] PPC: int128 shifts should be implemented branchless

2020-06-17 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704 --- Comment #5 from Jens Seifert --- Power9 code is branchfree but not good at all. _Z3shloy: .LFB0: .cfi_startproc addi 8,5,-64 subfic 6,5,63 srdi 10,3,1 li 7,0 sld 4,4,5 sld 5,3,5

[Bug target/95704] PPC: int128 shifts should be implemented branchless

2020-06-17 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704 --- Comment #3 from Jens Seifert --- GCC 8.3 generates: _Z3shloy: .LFB0: .cfi_startproc addi 9,5,-64 cmpwi 7,9,0 blt 7,.L2 sld 4,3,9 li 3,0 blr .p2align 4,,15 .L2: srdi 9,3,1

[Bug target/95704] PPC: int128 shifts should be implemented branchless

2020-06-16 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704 --- Comment #1 from Jens Seifert --- Created attachment 48742 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48742&action=edit assembly

[Bug target/95704] New: PPC: int128 shifts should be implemented branchless

2020-06-16 Thread jens.seifert at de dot ibm.com

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Created attachment 48741 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48741&action=edit input with branchless 128-bit shifts PowerPC processors don

[Bug target/94297] PPCLE std::replace internal compiler error

2020-04-07 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 Jens Seifert changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #9 from Jens Seifert

[Bug target/94297] PPCLE std::replace internal compiler error

2020-04-07 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 --- Comment #8 from Jens Seifert --- Too old libgmp got picked up. Setting LD_LIBRARY_PATH=/lib64 solved the issue.

[Bug target/94297] PPCLE std::replace internal compiler error

2020-04-07 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 Jens Seifert changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/94519] PPC: ICE: Segmentation fault on -DBL_MAX

2020-04-07 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94519 Jens Seifert changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #2 from Jens Seifert

[Bug target/94519] New: PPC: ICE: Segmentation fault on -DBL_MAX

2020-04-07 Thread jens.seifert at de dot ibm.com

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: #include static const double dsmall[] = { -DBL_MAX }; gcc ccerr.C ccerr.C:3:1: internal compiler error: Segmentation fault static const double dsmall[] = { -DBL_MAX

[Bug target/94297] PPCLE std::replace internal compiler error

2020-03-24 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 --- Comment #5 from Jens Seifert --- No options. Same failure with -O2. System is a RHEL 7.5. Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/ppc64le-redhat-linux/8/lto-wrapper Target: ppc64le-

[Bug target/94297] PPCLE std::replace internal compiler error

2020-03-24 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 --- Comment #3 from Jens Seifert --- Created attachment 48110 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48110&action=edit Pre-processed file created using -save-temps

[Bug target/94297] PPCLE std::replace internal compiler error

2020-03-24 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297 Jens Seifert changed: What|Removed |Added Summary|std::replace internal |PPCLE std::replace internal

[Bug c++/94297] New: std::replace internal compiler error

2020-03-24 Thread jens.seifert at de dot ibm.com

++ Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- #include #include void patch(std::string& s) { std::replace(s.begin(),s.end(),'.','-'); } gcc replace.C In file included from /opt/rh/devtoolset-

[Bug target/94135] PPC: subfic instead of neg used for rotate right

2020-03-16 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135 --- Comment #4 from Jens Seifert --- Setting CA in XER increases issue to issue latency by 1 on Power8. See: Table 10-14. Issue-to-Issue Latencies In addition, setting the CA restricts instruction reordering.

[Bug target/94135] PPC: subfic instead of neg used for rotate right

2020-03-11 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135 --- Comment #2 from Jens Seifert --- POWER8 Processor User’s Manual for the Single-Chip Module: addi addis add add. subf subf. addic subfic adde addme subfme addze. subfze neg neg. nego 1 - 2 cycles (GPR) 2 cycles (XER) 5 cycles (CR) 6/cycle,

[Bug target/94135] New: PPC: subfic instead of neg used for rotate right

2020-03-11 Thread jens.seifert at de dot ibm.com

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: unsigned int rotr32(unsigned int v, unsigned int r) { return (v>>r)|(v<<(32-r)); } unsigned long long rotr64(unsigned long long v, unsigned

[Bug target/93571] New: PPC: fmr gets used instead of faster xxlor

2020-02-04 Thread jens.seifert at de dot ibm.com

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- fmr is a 6 cycle instruction on Power8. Why is gcc not using the 2 cycle xxlor instruction ) Input: double setflm(double x) { double r = __builtin_mffs

[Bug target/93570] New: PPC: __builtin_mtfsf does not return a value

2020-02-04 Thread jens.seifert at de dot ibm.com

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Documentation says: double __builtin_mtfsf(const int,double) Not documented in 8.3.0, but somehow works, nevertheless looks like the prototype is wrong and should be

[Bug target/93449] PPC: Missing conversion builtin from vector to _Decimal128 and vice versa

2020-01-28 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449 --- Comment #4 from Jens Seifert --- Power8 has bcdadd which can be only combined with _Decimal128 if you have some kind of conversion in between BCDs stored in vector register and _Decimal128. On Power9 vec_load_len/vec_store_len can be used to

[Bug target/93448] PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2020-01-28 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448 --- Comment #4 from Jens Seifert --- The inline asm constraint "d" works. Thank you.

[Bug target/93449] PPC: Missing conversion builtin from vector to _Decimal128 and vice versa

2020-01-28 Thread jens.seifert at de dot ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449 --- Comment #2 from Jens Seifert --- #include typedef float _Decimal128 __attribute__((mode(TD))); _Decimal128 bcdtodpd(vector double v) { _Decimal128 res; memcpy(&res, &v, sizeof(res)); res = __builtin_denbcdq(0, res); return res;

[Bug target/93453] New: PPC: rldimi not taken into account to avoid shift+or

2020-01-27 Thread jens.seifert at de dot ibm.com

Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- 2 samples: unsigned long long load8r(unsigned long long *in) { return __builtin_bswap64(*in); } unsigned long long rldimi(unsigned int hi, unsigned int lo

[Bug target/93449] New: PCC: Missing conversion builtin from vector to _Decimal128 and vice versa

2020-01-26 Thread jens.seifert at de dot ibm.com

: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- I am currently porting an application from AIX to PPCLE and found that I am lacking compiler builtins for transforming

[Bug c++/93448] New: PPC: missing builtin for DFP quantize(dqua,dquai,dquaq,dquaiq)

2020-01-26 Thread jens.seifert at de dot ibm.com

Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- I am currently porting an application to PPCLE and found that I am lacking compiler builtins for decimal floating point quantize on

[Bug target/93178] New: PPC: inefficient 64-bit constant generation if msb is off in low 16 bit

2020-01-06 Thread jens.seifert at de dot ibm.com

Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: unsigned long long hi16msbon_low16msboff() { return 0x87654321ULL; // expected: li 3,0x4321 ; oris 3,0x8765

[Bug target/93176] New: PPC: inefficient 64-bit constant consecutive ones

2020-01-06 Thread jens.seifert at de dot ibm.com

: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- All 64-bit constants containing a sequence of ones can be constructed with 2 instructions (li/lis + rldicl). gcc creates up to 5 instructions. Input: unsigned long

[Bug target/93130] New: PCC simple memset not inlined

2020-01-02 Thread jens.seifert at de dot ibm.com

Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Input: void memspace16(char *p) { memset(p, ' ', 16); } Expected result: li 4,0x2020 rldimi 4,4,16,0 rldimi 4,4,32,0 std 4,0(3) Splatting the memset input to 64-bit c

1 2 >

1 - 100 of 110 matches

Mail list logo