[Bug c/92769] New: No way to set CR0[SO] on function return

2019-12-03 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92769

Bug ID: 92769
   Summary: No way to set CR0[SO] on function return
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

Linux system calls and Linux VDSO calls require the error status to be
reflected through SO bit of CR0 register on function return.

There is no way to do that from C functions. This requires to add Assembly
trampoline functions just for that, with all associated drawbacks (adding a
stack frame to save LR, etc ...)

Would it be possible to add to builtin-functions which would set/clear SO on
function return ?

Something like:
- __builtin_ppc_return_with_so_set()
- __builtin_ppc_return_with_so_cleared()

[Bug target/92769] Powerpc: No way to set CR0[SO] on function return

2019-12-11 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92769

--- Comment #2 from Christophe Leroy  ---
But CR0 being volatile doesn't prevent GCC to set/clr its SO bit just before
branching to LR as the ASM functions do, does it ?

In our ABIs, r3 is also volatile in our ABIs, it doesn't prevent using it as
function return.

[Bug c/93800] New: GCC adds unwanted nops to align loops on powerpc 8xx

2020-02-18 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93800

Bug ID: 93800
   Summary: GCC adds unwanted nops to align loops on powerpc 8xx
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

GCC 9.2 add nops in front of loops. GCC 8.1 didn't when compiled for powerpc
8xx. On the 8xx, a nop is 1 cycle and alignment of loops provide no benefit, so
this is a waste of cycles.

Reproducer:

volatile int g;
int f(int a, int b)
{
int i;

for (i = 0; i < b; i++)
a += g;
return a;
}

Built with -m32 -mcpu=860 -O2

 :
   0:   2c 04 00 00 cmpwi   r4,0
   4:   4c 81 00 20 blelr   
   8:   3d 40 00 00 lis r10,0
a: R_PPC_ADDR16_HA  g
   c:   7c 89 03 a6 mtctr   r4
  10:   39 4a 00 00 addir10,r10,0
12: R_PPC_ADDR16_LO g
  14:   60 00 00 00 nop
  18:   60 00 00 00 nop
  1c:   60 00 00 00 nop
  20:   81 2a 00 00 lwz r9,0(r10)
  24:   7c 63 4a 14 add r3,r3,r9
  28:   42 00 ff f8 bdnz20 
  2c:   4e 80 00 20 blr



The same with GCC 8.1:

 :
   0:   2c 04 00 00 cmpwi   r4,0
   4:   4c 81 00 20 blelr   
   8:   3d 40 00 00 lis r10,0
a: R_PPC_ADDR16_HA  g
   c:   7c 89 03 a6 mtctr   r4
  10:   39 4a 00 00 addir10,r10,0
12: R_PPC_ADDR16_LO g
  14:   81 2a 00 00 lwz r9,0(r10)
  18:   7c 63 4a 14 add r3,r3,r9
  1c:   42 00 ff f8 bdnz14 
  20:   4e 80 00 20 blr

[Bug target/93802] New: gcc generates a rlwinm/or pair instead of a single rlwimi (powerpc)

2020-02-18 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93802

Bug ID: 93802
   Summary: gcc generates a rlwinm/or pair instead of a single
rlwimi (powerpc)
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

unsigned long f(unsigned short x)
{
return (x << 16) | x;
}



Results in:

 :
   0:   54 69 80 1e rlwinm  r9,r3,16,0,15
   4:   7d 23 1b 78 or  r3,r9,r3
   8:   4e 80 00 20 blr



Should instead be:

rlwimi r3, r3, 16, 0, 15
blr

Problem seen with at least GCC 9.2 and GCC 8.1 and GCC 5.5

[Bug c/86106] New: powerpc: Suboptimal logical operation

2018-06-11 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86106

Bug ID: 86106
   Summary: powerpc: Suboptimal logical operation
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

unsigned int g(unsigned int val)
{
unsigned int mask = 0x7f7f7f7f;

return ~(((val & mask) + mask) | val | mask);
}

generates the following:

0020 :
  20:   3d 20 7f 7f lis r9,32639
  24:   61 29 7f 7f ori r9,r9,32639
  28:   7c 69 48 38 and r9,r3,r9
  2c:   3d 29 7f 7f addis   r9,r9,32639
  30:   39 29 7f 7f addir9,r9,32639
  34:   7d 23 1b 78 or  r3,r9,r3
  38:   64 63 7f 7f orisr3,r3,32639
  3c:   60 63 7f 7f ori r3,r3,32639
  40:   7c 63 18 f8 not r3,r3
  44:   4e 80 00 20 blr

Whereas I'd expect something like:

lis r4,32639
ori r4,r4,32639
and r9,r3,r4
or  r3,r3,r4
add r9,r9,r4
nor r3,r9,r3
blr

[Bug c/86131] New: powerpc: gcc uses costly multiply instead of shift left

2018-06-13 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86131

Bug ID: 86131
   Summary: powerpc: gcc uses costly multiply instead of shift
left
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

unsigned long f1(unsigned long a, unsigned long b)
{
return a >> ((4 - a) * 8);
}

unsigned long f2(unsigned long a, unsigned long b)
{
return a >> ((4 - a) << 3);
}

unsigned long g1(unsigned long a, unsigned long b)
{
return a >> (32 - a * 8);
}

unsigned long g2(unsigned long a, unsigned long b)
{
return a >> (32 - (a << 3));
}

when compiling with GCC 8.1 with -O2 -mcpu=860, the following result is
obtained:

 :
   0:   1d 23 ff f8 mulli   r9,r3,-8
   4:   39 29 00 20 addir9,r9,32
   8:   7c 63 4c 30 srw r3,r3,r9
   c:   4e 80 00 20 blr

0010 :
  10:   21 23 00 04 subfic  r9,r3,4
  14:   55 29 18 38 rlwinm  r9,r9,3,0,28
  18:   7c 63 4c 30 srw r3,r3,r9
  1c:   4e 80 00 20 blr

0020 :
  20:   1d 23 ff f8 mulli   r9,r3,-8
  24:   39 29 00 20 addir9,r9,32
  28:   7c 63 4c 30 srw r3,r3,r9
  2c:   4e 80 00 20 blr

0030 :
  30:   54 69 18 38 rlwinm  r9,r3,3,0,28
  34:   21 29 00 20 subfic  r9,r9,32
  38:   7c 63 4c 30 srw r3,r3,r9
  3c:   4e 80 00 20 blr

mulli requires 2 cycles, therefore it shouldn't be used, should it ?

The same code compiled with -mcpu=e300c2 gives:

 :
   0:   54 69 18 38 rlwinm  r9,r3,3,0,28
   4:   21 29 00 20 subfic  r9,r9,32
   8:   7c 63 4c 30 srw r3,r3,r9
   c:   4e 80 00 20 blr

0010 :
  10:   21 23 00 04 subfic  r9,r3,4
  14:   55 29 18 38 rlwinm  r9,r9,3,0,28
  18:   7c 63 4c 30 srw r3,r3,r9
  1c:   4e 80 00 20 blr

0020 :
  20:   54 69 18 38 rlwinm  r9,r3,3,0,28
  24:   21 29 00 20 subfic  r9,r9,32
  28:   7c 63 4c 30 srw r3,r3,r9
  2c:   4e 80 00 20 blr

0030 :
  30:   54 69 18 38 rlwinm  r9,r3,3,0,28
  34:   21 29 00 20 subfic  r9,r9,32
  38:   7c 63 4c 30 srw r3,r3,r9
  3c:   4e 80 00 20 blr

[Bug c/80132] New: powerpc: irrelevant register move before operation

2017-03-21 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80132

Bug ID: 80132
   Summary: powerpc: irrelevant register move before operation
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

In the function below, the two first 'mr' instructions are unneeded, the loop
should operate directly with r5 and r6

void memset64(long long *p, long long v, unsigned int c)
{
int i;

for (i = 0; i < c; i++)
*p++ = v;
}

test2.o: file format elf32-powerpc


Disassembly of section .text:

001c :
  1c:   2c 07 00 00 cmpwi   r7,0
  20:   7c cb 33 78 mr  r11,r6
  24:   7c aa 2b 78 mr  r10,r5
  28:   38 63 ff f8 addir3,r3,-8
  2c:   4d 82 00 20 beqlr   
  30:   7c e9 03 a6 mtctr   r7
  34:   95 43 00 08 stwur10,8(r3)
  38:   91 63 00 04 stw r11,4(r3)
  3c:   42 00 ff f8 bdnz34 
  40:   4e 80 00 20 blr

[Bug c/80131] New: powerpc: 1U << (31 - x) doesn't generate optimised code

2017-03-21 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80131

Bug ID: 80131
   Summary: powerpc: 1U << (31 - x) doesn't generate optimised
code
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

I would expect the two functions below to generate the same code,  but it
doesn't

unsigned int f1(unsigned int i)
{
return 1U << (31 - i);
}

unsigned int f2(unsigned int i)
{
return (1U << 31) >> i;
}

test3.o: file format elf32-powerpc


Disassembly of section .text:

 :
   0:   20 63 00 1f subfic  r3,r3,31
   4:   39 20 00 01 li  r9,1
   8:   7d 23 18 30 slw r3,r9,r3
   c:   4e 80 00 20 blr

0010 :
  10:   3d 20 80 00 lis r9,-32768
  14:   7d 23 1c 30 srw r3,r9,r3
  18:   4e 80 00 20 blr

[Bug c/80134] New: powerpc: loop on p[i] and *p++ should give the same code

2017-03-21 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80134

Bug ID: 80134
   Summary: powerpc: loop on p[i] and *p++ should give the same
code
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

The below code should give the same code, shouldn't depend on whether we use
p[i] or *p++
void memset32a(int *p, int v, unsigned int c)
{
int i;

for (i = 0; i < c; i++)
p[i] = v;
}

void memset32b(int *p, int v, unsigned int c)
{
int i;

for (i = 0; i < c; i++)
*p++ = v;
}


test4.o: file format elf32-powerpc


Disassembly of section .text:

 :
   0:   2c 05 00 00 cmpwi   r5,0
   4:   4d 82 00 20 beqlr   
   8:   54 a9 10 3a rlwinm  r9,r5,2,0,29
   c:   39 29 ff fc addir9,r9,-4
  10:   55 29 f0 be rlwinm  r9,r9,30,2,31
  14:   39 29 00 01 addir9,r9,1
  18:   7d 29 03 a6 mtctr   r9
  1c:   38 63 ff fc addir3,r3,-4
  20:   94 83 00 04 stwur4,4(r3)
  24:   42 00 ff fc bdnz20 
  28:   4e 80 00 20 blr

002c :
  2c:   2c 05 00 00 cmpwi   r5,0
  30:   38 63 ff fc addir3,r3,-4
  34:   4d 82 00 20 beqlr   
  38:   7c a9 03 a6 mtctr   r5
  3c:   94 83 00 04 stwur4,4(r3)
  40:   42 00 ff fc bdnz3c 
  44:   4e 80 00 20 blr

[Bug c/82940] New: Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2017-11-10 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

Bug ID: 82940
   Summary: Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

unsigned char g(unsigned char t[], unsigned char v)
{
return (t[v & 0x7f] & 0x7f) | (v & 0x80);
}

0008 :
   8:   54 89 06 7e clrlwi  r9,r4,25
   c:   7c 63 48 ae lbzxr3,r3,r9
  10:   54 84 00 30 rlwinm  r4,r4,0,0,24
  14:   54 63 06 7e clrlwi  r3,r3,25
  18:   7c 63 23 78 or  r3,r3,r4
  1c:   4e 80 00 20 blr


I would expect

0008 :
   8:   54 89 06 7e clrlwi  r9,r4,25
   c:   7c 63 48 ae lbzxr3,r3,r9
  10:   54 84 00 30 rlwimi  r3,r4,0,24,24
  14:   4e 80 00 20 blr

[Bug regression/67288] New: [4.9 regression] non optimal simple function (useless additional shift/remove/shift/add)

2015-08-20 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288

Bug ID: 67288
   Summary: [4.9 regression] non optimal simple function (useless
additional shift/remove/shift/add)
   Product: gcc
   Version: 4.9.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

The following function (Linux Kernel, compiled with -O2) was resulting in a
good assembly with GCC 4.8.3. With GCC 4.9.3 there are a lot of unneccessary
instructions

/* L1_CACHE_BYTES = 16 */
/* L1_CACHE_SHIFT = 4 */

#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

static inline void dcbf(void *addr)
{
__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}

void flush_dcache_range(unsigned long start, unsigned long stop)
{
void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
unsigned int i;

for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
dcbf(addr);
if (i)
mb();
}

Result with GCC 4.9.3: (15 insns)

c000d970 :
c000d970:   54 63 00 36 rlwinm  r3,r3,0,0,27
c000d974:   38 84 00 0f addir4,r4,15
c000d978:   7c 83 20 50 subfr4,r3,r4
c000d97c:   54 89 e1 3f rlwinm. r9,r4,28,4,31
c000d980:   4d 82 00 20 beqlr   
c000d984:   55 24 20 36 rlwinm  r4,r9,4,0,27
c000d988:   39 24 ff f0 addir9,r4,-16
c000d98c:   55 29 e1 3e rlwinm  r9,r9,28,4,31
c000d990:   39 29 00 01 addir9,r9,1
c000d994:   7d 29 03 a6 mtctr   r9
c000d998:   7c 00 18 ac dcbf0,r3
c000d99c:   38 63 00 10 addir3,r3,16
c000d9a0:   42 00 ff f8 bdnzc000d998 
c000d9a4:   7c 00 04 ac sync
c000d9a8:   4e 80 00 20 blr

The following section is just useless: (shift left 4 bits, remove 16, shift
right 4 bits, add 1)
c000d984:   55 24 20 36 rlwinm  r4,r9,4,0,27
c000d988:   39 24 ff f0 addir9,r4,-16
c000d98c:   55 29 e1 3e rlwinm  r9,r9,28,4,31
c000d990:   39 29 00 01 addir9,r9,1



Result with GCC 4.8.3 was correct: (11 insns)

c000d894 :
c000d894:   54 63 00 36 rlwinm  r3,r3,0,0,27
c000d898:   38 84 00 0f addir4,r4,15
c000d89c:   7d 23 20 50 subfr9,r3,r4
c000d8a0:   55 29 e1 3f rlwinm. r9,r9,28,4,31
c000d8a4:   4d 82 00 20 beqlr   
c000d8a8:   7d 29 03 a6 mtctr   r9
c000d8ac:   7c 00 18 ac dcbf0,r3
c000d8b0:   38 63 00 10 addir3,r3,16
c000d8b4:   42 00 ff f8 bdnzc000d8ac 
c000d8b8:   7c 00 04 ac sync
c000d8bc:   4e 80 00 20 blr


[Bug target/67290] New: powerpc: suboptimal add of u64 with u32

2015-08-20 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67290

Bug ID: 67290
   Summary: powerpc: suboptimal add of u64 with u32
   Product: gcc
   Version: 4.9.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: christophe.le...@c-s.fr
  Target Milestone: ---

#define u32 unsigned long
#define u64 unsigned long long

u64 target(u64 base, u32 offset)
{
return base + offset;
}


With GCC 4.9.3 we get: (same with GCC 4.8.3)

 :
   0:   7c ab 2b 78 mr  r11,r5
   4:   39 40 00 00 li  r10,0
   8:   7c 84 58 14 addcr4,r4,r11
   c:   7c 63 51 14 adder3,r3,r10
  10:   4e 80 00 20 blr

I would expect:

:
addc r4,r4,r5
addze r3,r3
blr


[Bug regression/67288] [4.9 regression] non optimal simple function (useless additional shift/remove/shift/add)

2015-08-24 Thread christophe.le...@c-s.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288

--- Comment #2 from Christophe Leroy  ---
Compilation ok with below code

[root@localhost knl]# cat flush.c

#define L1_CACHE_SHIFT  4
#define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)

#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

static inline void dcbf(void *addr)
{
__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
}

void flush_dcache_range(unsigned long start, unsigned long stop)
{
void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
unsigned int i;

for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
dcbf(addr);
if (i)
mb();   /* sync */
}

[root@localhost knl]# ppc-linux-gcc -c -O2 flush.c

[root@localhost knl]# ppc-linux-objdump -d flush.o 
flush.o: file format elf32-powerpc


Disassembly of section .text:

 :
   0:   54 63 00 36 rlwinm  r3,r3,0,0,27
   4:   38 84 00 0f addir4,r4,15
   8:   7c 83 20 50 subfr4,r3,r4
   c:   54 89 e1 3f rlwinm. r9,r4,28,4,31
  10:   4d 82 00 20 beqlr   
  14:   55 24 20 36 rlwinm  r4,r9,4,0,27
  18:   39 24 ff f0 addir9,r4,-16
  1c:   55 29 e1 3e rlwinm  r9,r9,28,4,31
  20:   39 29 00 01 addir9,r9,1
  24:   7d 29 03 a6 mtctr   r9
  28:   7c 00 18 ac dcbf0,r3
  2c:   38 63 00 10 addir3,r3,16
  30:   42 00 ff f8 bdnz28 
  34:   7c 00 04 ac sync
  38:   4e 80 00 20 blr