: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Shifts by -1 should be performed by a 0xFF..FF constant as
PPC has modulo shift and the constant generation for 0xFF..FF requires just 1
instruction.
On Power9 always use
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
void lshift1(unsigned long long *a)
{
a[0] <<= 1;
a[1] <<= 1;
}
Output:
lshift1(unsigned long long*):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #4 from Jens Seifert ---
clang is emitting extended mnemonics.
On gcc, I only can enforce this by using inline assembly:
unsigned long long parityfast(unsigned long long in)
{
__asm__("popcntd %0,%1":"+r"(in));
return in & 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #2 from Jens Seifert ---
popcnt + parity is slower than just
64-bit popcount and extracting last bit.
"missed-optimization" opportunity applies as well to big endian.
Optimal code:
popcntd 3, 3
clrldi 3, 3, 63
ity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
bool parityll(unsigned long long x)
{
return __builtin_parityll(x);
}
Code generation for z15 and above is opti
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
bool parity(unsigned long long l)
{
return __builtin_parityll(l);
}
bool parity2(unsigned long long l)
{
return
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I want to use the z14 vlbr instruction, but I found no builtin for them.
The assembler claims "unknown" mnemonic for vlbr, but I see the instruction in
the &quo
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
#include
Up to 16 bytes consider using vector instructions for memcmp.
This is not required for 1,2,4,8 bytes, but for the rest.
For general
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I found no way to efficient check fp data class on z using wftcidb (z13) and
wftcisb(z14) instruction.
For PowerPC scalar_test_data_class exists and provides
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long bcdadd(vector __int128 a, vector __int128 b, vector __int128
*c)
{
return __builtin_bcdadd_ov(a, b, 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115973
--- Comment #2 from Jens Seifert ---
Assembly that better integrates:
unsigned long long addc_opt(unsigned long long a, unsigned long long b,
unsigned long long *res)
{
unsigned long long rc;
__asm__("addc %0,%2,%3;\n\tsubfe
%1,%1,%1":"=r
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long add(unsigned long long a, unsigned long long b, unsigned
long long *ovf)
{
return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355
--- Comment #10 from Jens Seifert ---
Does this affect loop vectorize and slp vectorize ?
-fno-tree-loop-vectorize avoids loop vectorization to be performed and
workarounds this issue. Does the same problems also affect SLP vectorization,
which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355
--- Comment #1 from Jens Seifert ---
Same issue with gcc 13.2.1
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input setToIdentity.C:
#include
#include
#include
void setToIdentityGOOD(unsigned long long *mVec, unsigned int mLen)
{
for
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned short swap16(unsigned short in)
{
return __builtin_bswap16(in);
}
generates -O3 -march=z196
swap16(unsigned short):
lrvr%r2,%r2
srl %r2,16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176
--- Comment #10 from Jens Seifert ---
Looks like no patch in the area got delivered. I did a small test for
unsigned long long c()
{
return 0xULL;
}
gcc 13.2.0:
li 3,0
ori 3,3,0x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93176
--- Comment #7 from Jens Seifert ---
What happened ? Still waiting for improvement.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #6 from Jens Seifert ---
The left part of VSX registers overlaps with floating point registers, that is
why no register xxpermdi is required and mfvsrd can access all (left) parts of
VSX registers directly.
The xxpermdi x,y,y,3 indic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #4 from Jens Seifert ---
PPCLE with no special option means -mcpu=power8 -maltivec (altivecle to be mor
precise).
vec_promote(, 1) should be a noop on ppcle. But value gets
splatted to both left and right part of vector register. =
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
bool test(const char *fmt, size_t numTokens, ...)
{
return __builtin_va_arg_pack_len() != numTokens
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
vector unsigned __int128 vsubcuq(vector unsigned __int128 a, vector unsigned
__int128 b)
{
return vec_vsubcuq(a, b);
}
Command line:
gcc -m64 -O2 -maltivec -mcpu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108049
--- Comment #1 from Jens Seifert ---
Sample above got compiled with -march=z196
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Same issue for PPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
extern unsigned char magic1[256];
unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
--- Comment #3 from Jens Seifert ---
*** Bug 108048 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108048
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
extern unsigned char magic1[256];
unsigned int hash(const unsigned char inp[4])
{
const unsigned long long INIT = 0x1ULL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107949
--- Comment #1 from Jens Seifert ---
hash2 is only provided to show how the code should look like (without rlwinm).
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
extern unsigned char magic1[256];
unsigned int hash(const unsigned char inp[4])
{
const unsigned long long INIT = 0x1ULL;
unsigned long long h1 = INIT;
h1 = magic1
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Due to the fact that vslw, vsld, vsrd, ... only use the modulo of bit width for
shifting, the combination with 0xFF..FF vector can be used to create vector
constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86160
--- Comment #4 from Jens Seifert ---
I am looking forward to get Power9 optimization using xststdcdp etc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #2 from Jens Seifert ---
vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3
extracts the right element.
Looks like a bug in vec_extract for PPCLE and not a problem regarding
unnecessary xxpermdi.
Using assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770
--- Comment #1 from Jens Seifert ---
vec_extract(vr, 1) should extract the left element. But xxpermdi x,x,x,3
extracts the right element.
Looks like a bug in vec_extract for PPCLE and not a problem regarding
unnecessary xxpermdi.
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
int cmp2(double a, double b)
{
vector double va = vec_promote(a, 1);
vector double vb = vec_promote(b, 1);
vector long long vlt = (vector long long
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
unsigned int extr(vector unsigned int v)
{
return vec_extract(v, 2);
}
Generates:
_Z4extrDv4_j:
.LFB1
: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long subfic(unsigned long long a)
{
if (a > 15) __builtin_unreacha
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int lt(int a, int b)
{
return a < b;
}
generates:
cr %r2,%r3
lhi %r1,1
lhi %r2,0
locrnl %r1,%r2
l
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 53443
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53443&action=edit
source code
long long gtRef(long
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int compare2(unsigned long long a, unsigned long long b)
{
return (a > b ? 1 : (a < b ? -1 : 0));
}
Output:
_Z8compare2yy:
cmpld 0,3,4
bgt
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 53409
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53409&action=edit
source code
1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106043
--- Comment #1 from Jens Seifert ---
Found in documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.3.0/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html#PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Missing builtins for vector instructions xxblendvb, xxblendvw, xxblendvd,
xxblendvd.
#include
vector int blendv(vector int a, vector int b, vector int c)
{
return
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
vector unsigned short popcnt(vector unsigned short a)
{
return vec_popcnt(a);
}
Generates with -march=z13
_Z6popcntDv8_t:
.LFB1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int overflow();
int negOverflow(long long in)
{
if (in
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long M8()
{
return 0x;
}
Generates:
.LC0:
.quad 0x
.text
.align 8
.globl _Z2M8v
.type
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I can't find builtin for vmsumudm instruction.
I also found nothing in the Power vector instrinsic programming reference.
https://openpowerfoundation.org/?resource_lib=
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long ctzll(unsigned long long x)
{
return __builtin_ctzll(x);
}
creates:
lcgr%r1,%r2
ngr %r2,%r1
lghi%r1,63
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102117
--- Comment #1 from Jens Seifert ---
Sorry small bug in optimal sequence.
__int128 imul128_opt(long long a, long long b)
{
unsigned __int128 x = (unsigned __int128)(unsigned long long)a;
unsigned __int128 y = (unsigned __int128)(unsigned
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
__int128 imul128(long long a, long long b)
{
return (__int128)a * (__int128)b;
}
creates sequence with 3 multipl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866
--- Comment #9 from Jens Seifert ---
I know that if I would use vec_perm builtin as an end user, that you then need
to fulfill to the LE specification, but you can always optimize the code as you
like as long as it creates correct results afterw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866
--- Comment #7 from Jens Seifert ---
Regarding vec_revb for vector unsigned int. I agree that
revb:
.LFB0:
.cfi_startproc
vspltish %v1,8
vspltisw %v0,-16
vrlh %v2,%v2,%v1
vrlw %v2,%v2,%v0
blr
work
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
vector unsigned long long mul64(vector unsigned long long a, vector unsigned
long long b)
{
return a * b;
}
creates
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Using the same names like xlC appreciated:
vec_extsbd, vec_extsbw, vec_extshd, vec_extshw, vec_extswd
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short load_be(unsigned short *c)
{
return vec_xl_be(0L, c);
}
creates:
_Z7load_bePt:
.L
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808
--- Comment #3 from Jens Seifert ---
- Avoid additional "int" unsigned long long int => unsigned long long
Why? Those are exactly the same types!
Yes, but the rest of the documentation uses unsigned long long.
This is just for consistency wit
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
Input:
vector double doublee(vector float a)
{
return vec_doublee(a);
}
cause compile error:
vec.C: In function ‘__vector(2) double doublee
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector double reve(vector double a)
{
return vec_reve(a);
}
creates:
_Z4reveDv2_d:
.LFB3:
.cfi_startproc
larl%r5,.L12
vl
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector double reve(vector double a)
{
return vec_reve(a);
}
creates:
_Z4reveDv2_d:
.LFB3:
.cfi_startproc
.LCF3:
0: addis 2,12,.TOC.-.LCF3
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short revb(vector unsigned short a)
{
return vec_revb(a);
}
Creates:
_Z4revbDv4_j:
.LFB1
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
vector unsigned short revb(vector unsigned short a)
{
return vec_revb(a);
}
creates:
_Z4revbDv8_t:
.L
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100808
--- Comment #1 from Jens Seifert ---
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e1.html
vector unsigned long long int vec_gnb (vector unsigned __int128, const unsigned
char)
should be
unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809
--- Comment #1 from Jens Seifert ---
Same applies to modulo.
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned __int128 div(unsigned __int128 a, unsigned __int128 b)
{
return a/b;
}
__int128 div(__int128 a, __int128 b
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
https://gcc.gnu.org/onlinedocs/gcc/Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1.html#Basic-PowerPC-Built-in-Functions-Available-on-ISA-3_002e1
Please improve the
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Initializing a __int128 from 2 64-bit integers is implemented very inefficient.
The most natural code which works good on all other platforms generate
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
gcc only provides
unsigned int __builtin_addg6s (unsigned int, unsigned int);
but addg6s is a 64-bit operation. I require
unsigned long long __builtin_addg6s (unsigned long long
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020
Jens Seifert changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
double sign(double in)
{
return in == 0.0 ? 0.0 : copysign(1.0, in);
}
Command line:
gcc m64 -O2 -save
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
int extract(vector signed int v)
{
return v[2];
}
Command line:
gcc -mcpu=power8 -maltivec -m64 -O3 -save-temps extract.C
Output:
_Z7extractDv4_i:
.LFB0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70928
Jens Seifert changed:
What|Removed |Added
CC||jens.seifert at de dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|DUPLICATE
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
unsigned long long negativeLessThan(unsigned long long a, unsigned long long b)
{
return -(a < b);
}
gcc -m64 -O2 -save-temps negativeLessThan.C
crea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #5 from Jens Seifert ---
Power9 code is branchfree but not good at all.
_Z3shloy:
.LFB0:
.cfi_startproc
addi 8,5,-64
subfic 6,5,63
srdi 10,3,1
li 7,0
sld 4,4,5
sld 5,3,5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #3 from Jens Seifert ---
GCC 8.3 generates:
_Z3shloy:
.LFB0:
.cfi_startproc
addi 9,5,-64
cmpwi 7,9,0
blt 7,.L2
sld 4,3,9
li 3,0
blr
.p2align 4,,15
.L2:
srdi 9,3,1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704
--- Comment #1 from Jens Seifert ---
Created attachment 48742
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48742&action=edit
assembly
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Created attachment 48741
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48741&action=edit
input with branchless 128-bit shifts
PowerPC processors don
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|CLOSED
--- Comment #9 from Jens Seifert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #8 from Jens Seifert ---
Too old libgmp got picked up. Setting LD_LIBRARY_PATH=/lib64 solved the issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94519
Jens Seifert changed:
What|Removed |Added
Status|RESOLVED|CLOSED
--- Comment #2 from Jens Seifert
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
#include
static const double dsmall[] = { -DBL_MAX };
gcc ccerr.C
ccerr.C:3:1: internal compiler error: Segmentation fault
static const double dsmall[] = { -DBL_MAX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #5 from Jens Seifert ---
No options. Same failure with -O2. System is a RHEL 7.5.
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/ppc64le-redhat-linux/8/lto-wrapper
Target: ppc64le-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
--- Comment #3 from Jens Seifert ---
Created attachment 48110
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48110&action=edit
Pre-processed file created using -save-temps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94297
Jens Seifert changed:
What|Removed |Added
Summary|std::replace internal |PPCLE std::replace internal
++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
#include
#include
void patch(std::string& s)
{
std::replace(s.begin(),s.end(),'.','-');
}
gcc replace.C
In file included from
/opt/rh/devtoolset-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135
--- Comment #4 from Jens Seifert ---
Setting CA in XER increases issue to issue latency by 1 on Power8.
See:
Table 10-14. Issue-to-Issue Latencies
In addition, setting the CA restricts instruction reordering.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94135
--- Comment #2 from Jens Seifert ---
POWER8 Processor User’s Manual for the Single-Chip Module:
addi addis add add. subf subf. addic subfic adde addme subfme addze. subfze neg
neg. nego
1 - 2 cycles (GPR)
2 cycles (XER)
5 cycles (CR)
6/cycle,
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
unsigned int rotr32(unsigned int v, unsigned int r)
{
return (v>>r)|(v<<(32-r));
}
unsigned long long rotr64(unsigned long long v, unsigned
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
fmr is a 6 cycle instruction on Power8. Why is gcc not using the 2 cycle xxlor
instruction )
Input:
double setflm(double x)
{
double r = __builtin_mffs
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Documentation says:
double __builtin_mtfsf(const int,double)
Not documented in 8.3.0, but somehow works, nevertheless looks like the
prototype is wrong and should be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449
--- Comment #4 from Jens Seifert ---
Power8 has bcdadd which can be only combined with _Decimal128 if you have some
kind of conversion in between BCDs stored in vector register and _Decimal128.
On Power9 vec_load_len/vec_store_len can be used to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93448
--- Comment #4 from Jens Seifert ---
The inline asm constraint "d" works. Thank you.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93449
--- Comment #2 from Jens Seifert ---
#include
typedef float _Decimal128 __attribute__((mode(TD)));
_Decimal128 bcdtodpd(vector double v)
{
_Decimal128 res;
memcpy(&res, &v, sizeof(res));
res = __builtin_denbcdq(0, res);
return res;
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
2 samples:
unsigned long long load8r(unsigned long long *in)
{
return __builtin_bswap64(*in);
}
unsigned long long rldimi(unsigned int hi, unsigned int lo
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I am currently porting an application from AIX to PPCLE and found that I am
lacking compiler builtins for transforming
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
I am currently porting an application to PPCLE and found that I am lacking
compiler builtins for decimal floating point quantize on
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
unsigned long long hi16msbon_low16msboff()
{
return 0x87654321ULL; // expected: li 3,0x4321 ; oris 3,0x8765
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
All 64-bit constants containing a sequence of ones can be constructed with 2
instructions (li/lis + rldicl). gcc creates up to 5 instructions.
Input:
unsigned long
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.seifert at de dot ibm.com
Target Milestone: ---
Input:
void memspace16(char *p)
{
memset(p, ' ', 16);
}
Expected result:
li 4,0x2020
rldimi 4,4,16,0
rldimi 4,4,32,0
std 4,0(3)
Splatting the memset input to 64-bit c
1 - 100 of 110 matches
Mail list logo