[Bug c/92769] New: No way to set CR0[SO] on function return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92769 Bug ID: 92769 Summary: No way to set CR0[SO] on function return Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- Linux system calls and Linux VDSO calls require the error status to be reflected through SO bit of CR0 register on function return. There is no way to do that from C functions. This requires to add Assembly trampoline functions just for that, with all associated drawbacks (adding a stack frame to save LR, etc ...) Would it be possible to add to builtin-functions which would set/clear SO on function return ? Something like: - __builtin_ppc_return_with_so_set() - __builtin_ppc_return_with_so_cleared()
[Bug target/92769] Powerpc: No way to set CR0[SO] on function return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92769 --- Comment #2 from Christophe Leroy --- But CR0 being volatile doesn't prevent GCC to set/clr its SO bit just before branching to LR as the ASM functions do, does it ? In our ABIs, r3 is also volatile in our ABIs, it doesn't prevent using it as function return.
[Bug c/93800] New: GCC adds unwanted nops to align loops on powerpc 8xx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93800 Bug ID: 93800 Summary: GCC adds unwanted nops to align loops on powerpc 8xx Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- GCC 9.2 add nops in front of loops. GCC 8.1 didn't when compiled for powerpc 8xx. On the 8xx, a nop is 1 cycle and alignment of loops provide no benefit, so this is a waste of cycles. Reproducer: volatile int g; int f(int a, int b) { int i; for (i = 0; i < b; i++) a += g; return a; } Built with -m32 -mcpu=860 -O2 : 0: 2c 04 00 00 cmpwi r4,0 4: 4c 81 00 20 blelr 8: 3d 40 00 00 lis r10,0 a: R_PPC_ADDR16_HA g c: 7c 89 03 a6 mtctr r4 10: 39 4a 00 00 addir10,r10,0 12: R_PPC_ADDR16_LO g 14: 60 00 00 00 nop 18: 60 00 00 00 nop 1c: 60 00 00 00 nop 20: 81 2a 00 00 lwz r9,0(r10) 24: 7c 63 4a 14 add r3,r3,r9 28: 42 00 ff f8 bdnz20 2c: 4e 80 00 20 blr The same with GCC 8.1: : 0: 2c 04 00 00 cmpwi r4,0 4: 4c 81 00 20 blelr 8: 3d 40 00 00 lis r10,0 a: R_PPC_ADDR16_HA g c: 7c 89 03 a6 mtctr r4 10: 39 4a 00 00 addir10,r10,0 12: R_PPC_ADDR16_LO g 14: 81 2a 00 00 lwz r9,0(r10) 18: 7c 63 4a 14 add r3,r3,r9 1c: 42 00 ff f8 bdnz14 20: 4e 80 00 20 blr
[Bug target/93802] New: gcc generates a rlwinm/or pair instead of a single rlwimi (powerpc)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93802 Bug ID: 93802 Summary: gcc generates a rlwinm/or pair instead of a single rlwimi (powerpc) Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- unsigned long f(unsigned short x) { return (x << 16) | x; } Results in: : 0: 54 69 80 1e rlwinm r9,r3,16,0,15 4: 7d 23 1b 78 or r3,r9,r3 8: 4e 80 00 20 blr Should instead be: rlwimi r3, r3, 16, 0, 15 blr Problem seen with at least GCC 9.2 and GCC 8.1 and GCC 5.5
[Bug c/86106] New: powerpc: Suboptimal logical operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86106 Bug ID: 86106 Summary: powerpc: Suboptimal logical operation Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- unsigned int g(unsigned int val) { unsigned int mask = 0x7f7f7f7f; return ~(((val & mask) + mask) | val | mask); } generates the following: 0020 : 20: 3d 20 7f 7f lis r9,32639 24: 61 29 7f 7f ori r9,r9,32639 28: 7c 69 48 38 and r9,r3,r9 2c: 3d 29 7f 7f addis r9,r9,32639 30: 39 29 7f 7f addir9,r9,32639 34: 7d 23 1b 78 or r3,r9,r3 38: 64 63 7f 7f orisr3,r3,32639 3c: 60 63 7f 7f ori r3,r3,32639 40: 7c 63 18 f8 not r3,r3 44: 4e 80 00 20 blr Whereas I'd expect something like: lis r4,32639 ori r4,r4,32639 and r9,r3,r4 or r3,r3,r4 add r9,r9,r4 nor r3,r9,r3 blr
[Bug c/86131] New: powerpc: gcc uses costly multiply instead of shift left
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86131 Bug ID: 86131 Summary: powerpc: gcc uses costly multiply instead of shift left Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- unsigned long f1(unsigned long a, unsigned long b) { return a >> ((4 - a) * 8); } unsigned long f2(unsigned long a, unsigned long b) { return a >> ((4 - a) << 3); } unsigned long g1(unsigned long a, unsigned long b) { return a >> (32 - a * 8); } unsigned long g2(unsigned long a, unsigned long b) { return a >> (32 - (a << 3)); } when compiling with GCC 8.1 with -O2 -mcpu=860, the following result is obtained: : 0: 1d 23 ff f8 mulli r9,r3,-8 4: 39 29 00 20 addir9,r9,32 8: 7c 63 4c 30 srw r3,r3,r9 c: 4e 80 00 20 blr 0010 : 10: 21 23 00 04 subfic r9,r3,4 14: 55 29 18 38 rlwinm r9,r9,3,0,28 18: 7c 63 4c 30 srw r3,r3,r9 1c: 4e 80 00 20 blr 0020 : 20: 1d 23 ff f8 mulli r9,r3,-8 24: 39 29 00 20 addir9,r9,32 28: 7c 63 4c 30 srw r3,r3,r9 2c: 4e 80 00 20 blr 0030 : 30: 54 69 18 38 rlwinm r9,r3,3,0,28 34: 21 29 00 20 subfic r9,r9,32 38: 7c 63 4c 30 srw r3,r3,r9 3c: 4e 80 00 20 blr mulli requires 2 cycles, therefore it shouldn't be used, should it ? The same code compiled with -mcpu=e300c2 gives: : 0: 54 69 18 38 rlwinm r9,r3,3,0,28 4: 21 29 00 20 subfic r9,r9,32 8: 7c 63 4c 30 srw r3,r3,r9 c: 4e 80 00 20 blr 0010 : 10: 21 23 00 04 subfic r9,r3,4 14: 55 29 18 38 rlwinm r9,r9,3,0,28 18: 7c 63 4c 30 srw r3,r3,r9 1c: 4e 80 00 20 blr 0020 : 20: 54 69 18 38 rlwinm r9,r3,3,0,28 24: 21 29 00 20 subfic r9,r9,32 28: 7c 63 4c 30 srw r3,r3,r9 2c: 4e 80 00 20 blr 0030 : 30: 54 69 18 38 rlwinm r9,r3,3,0,28 34: 21 29 00 20 subfic r9,r9,32 38: 7c 63 4c 30 srw r3,r3,r9 3c: 4e 80 00 20 blr
[Bug c/80132] New: powerpc: irrelevant register move before operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80132 Bug ID: 80132 Summary: powerpc: irrelevant register move before operation Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- In the function below, the two first 'mr' instructions are unneeded, the loop should operate directly with r5 and r6 void memset64(long long *p, long long v, unsigned int c) { int i; for (i = 0; i < c; i++) *p++ = v; } test2.o: file format elf32-powerpc Disassembly of section .text: 001c : 1c: 2c 07 00 00 cmpwi r7,0 20: 7c cb 33 78 mr r11,r6 24: 7c aa 2b 78 mr r10,r5 28: 38 63 ff f8 addir3,r3,-8 2c: 4d 82 00 20 beqlr 30: 7c e9 03 a6 mtctr r7 34: 95 43 00 08 stwur10,8(r3) 38: 91 63 00 04 stw r11,4(r3) 3c: 42 00 ff f8 bdnz34 40: 4e 80 00 20 blr
[Bug c/80131] New: powerpc: 1U << (31 - x) doesn't generate optimised code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80131 Bug ID: 80131 Summary: powerpc: 1U << (31 - x) doesn't generate optimised code Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- I would expect the two functions below to generate the same code, but it doesn't unsigned int f1(unsigned int i) { return 1U << (31 - i); } unsigned int f2(unsigned int i) { return (1U << 31) >> i; } test3.o: file format elf32-powerpc Disassembly of section .text: : 0: 20 63 00 1f subfic r3,r3,31 4: 39 20 00 01 li r9,1 8: 7d 23 18 30 slw r3,r9,r3 c: 4e 80 00 20 blr 0010 : 10: 3d 20 80 00 lis r9,-32768 14: 7d 23 1c 30 srw r3,r9,r3 18: 4e 80 00 20 blr
[Bug c/80134] New: powerpc: loop on p[i] and *p++ should give the same code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80134 Bug ID: 80134 Summary: powerpc: loop on p[i] and *p++ should give the same code Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- The below code should give the same code, shouldn't depend on whether we use p[i] or *p++ void memset32a(int *p, int v, unsigned int c) { int i; for (i = 0; i < c; i++) p[i] = v; } void memset32b(int *p, int v, unsigned int c) { int i; for (i = 0; i < c; i++) *p++ = v; } test4.o: file format elf32-powerpc Disassembly of section .text: : 0: 2c 05 00 00 cmpwi r5,0 4: 4d 82 00 20 beqlr 8: 54 a9 10 3a rlwinm r9,r5,2,0,29 c: 39 29 ff fc addir9,r9,-4 10: 55 29 f0 be rlwinm r9,r9,30,2,31 14: 39 29 00 01 addir9,r9,1 18: 7d 29 03 a6 mtctr r9 1c: 38 63 ff fc addir3,r3,-4 20: 94 83 00 04 stwur4,4(r3) 24: 42 00 ff fc bdnz20 28: 4e 80 00 20 blr 002c : 2c: 2c 05 00 00 cmpwi r5,0 30: 38 63 ff fc addir3,r3,-4 34: 4d 82 00 20 beqlr 38: 7c a9 03 a6 mtctr r5 3c: 94 83 00 04 stwur4,4(r3) 40: 42 00 ff fc bdnz3c 44: 4e 80 00 20 blr
[Bug c/82940] New: Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940 Bug ID: 82940 Summary: Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- unsigned char g(unsigned char t[], unsigned char v) { return (t[v & 0x7f] & 0x7f) | (v & 0x80); } 0008 : 8: 54 89 06 7e clrlwi r9,r4,25 c: 7c 63 48 ae lbzxr3,r3,r9 10: 54 84 00 30 rlwinm r4,r4,0,0,24 14: 54 63 06 7e clrlwi r3,r3,25 18: 7c 63 23 78 or r3,r3,r4 1c: 4e 80 00 20 blr I would expect 0008 : 8: 54 89 06 7e clrlwi r9,r4,25 c: 7c 63 48 ae lbzxr3,r3,r9 10: 54 84 00 30 rlwimi r3,r4,0,24,24 14: 4e 80 00 20 blr
[Bug regression/67288] New: [4.9 regression] non optimal simple function (useless additional shift/remove/shift/add)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288 Bug ID: 67288 Summary: [4.9 regression] non optimal simple function (useless additional shift/remove/shift/add) Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- The following function (Linux Kernel, compiled with -O2) was resulting in a good assembly with GCC 4.8.3. With GCC 4.9.3 there are a lot of unneccessary instructions /* L1_CACHE_BYTES = 16 */ /* L1_CACHE_SHIFT = 4 */ #define mb() __asm__ __volatile__ ("sync" : : : "memory") static inline void dcbf(void *addr) { __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); } void flush_dcache_range(unsigned long start, unsigned long stop) { void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1)); unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1); unsigned int i; for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES) dcbf(addr); if (i) mb(); } Result with GCC 4.9.3: (15 insns) c000d970 : c000d970: 54 63 00 36 rlwinm r3,r3,0,0,27 c000d974: 38 84 00 0f addir4,r4,15 c000d978: 7c 83 20 50 subfr4,r3,r4 c000d97c: 54 89 e1 3f rlwinm. r9,r4,28,4,31 c000d980: 4d 82 00 20 beqlr c000d984: 55 24 20 36 rlwinm r4,r9,4,0,27 c000d988: 39 24 ff f0 addir9,r4,-16 c000d98c: 55 29 e1 3e rlwinm r9,r9,28,4,31 c000d990: 39 29 00 01 addir9,r9,1 c000d994: 7d 29 03 a6 mtctr r9 c000d998: 7c 00 18 ac dcbf0,r3 c000d99c: 38 63 00 10 addir3,r3,16 c000d9a0: 42 00 ff f8 bdnzc000d998 c000d9a4: 7c 00 04 ac sync c000d9a8: 4e 80 00 20 blr The following section is just useless: (shift left 4 bits, remove 16, shift right 4 bits, add 1) c000d984: 55 24 20 36 rlwinm r4,r9,4,0,27 c000d988: 39 24 ff f0 addir9,r4,-16 c000d98c: 55 29 e1 3e rlwinm r9,r9,28,4,31 c000d990: 39 29 00 01 addir9,r9,1 Result with GCC 4.8.3 was correct: (11 insns) c000d894 : c000d894: 54 63 00 36 rlwinm r3,r3,0,0,27 c000d898: 38 84 00 0f addir4,r4,15 c000d89c: 7d 23 20 50 subfr9,r3,r4 c000d8a0: 55 29 e1 3f rlwinm. r9,r9,28,4,31 c000d8a4: 4d 82 00 20 beqlr c000d8a8: 7d 29 03 a6 mtctr r9 c000d8ac: 7c 00 18 ac dcbf0,r3 c000d8b0: 38 63 00 10 addir3,r3,16 c000d8b4: 42 00 ff f8 bdnzc000d8ac c000d8b8: 7c 00 04 ac sync c000d8bc: 4e 80 00 20 blr
[Bug target/67290] New: powerpc: suboptimal add of u64 with u32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67290 Bug ID: 67290 Summary: powerpc: suboptimal add of u64 with u32 Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: christophe.le...@c-s.fr Target Milestone: --- #define u32 unsigned long #define u64 unsigned long long u64 target(u64 base, u32 offset) { return base + offset; } With GCC 4.9.3 we get: (same with GCC 4.8.3) : 0: 7c ab 2b 78 mr r11,r5 4: 39 40 00 00 li r10,0 8: 7c 84 58 14 addcr4,r4,r11 c: 7c 63 51 14 adder3,r3,r10 10: 4e 80 00 20 blr I would expect: : addc r4,r4,r5 addze r3,r3 blr
[Bug regression/67288] [4.9 regression] non optimal simple function (useless additional shift/remove/shift/add)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288 --- Comment #2 from Christophe Leroy --- Compilation ok with below code [root@localhost knl]# cat flush.c #define L1_CACHE_SHIFT 4 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) #define mb() __asm__ __volatile__ ("sync" : : : "memory") static inline void dcbf(void *addr) { __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); } void flush_dcache_range(unsigned long start, unsigned long stop) { void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1)); unsigned int size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1); unsigned int i; for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES) dcbf(addr); if (i) mb(); /* sync */ } [root@localhost knl]# ppc-linux-gcc -c -O2 flush.c [root@localhost knl]# ppc-linux-objdump -d flush.o flush.o: file format elf32-powerpc Disassembly of section .text: : 0: 54 63 00 36 rlwinm r3,r3,0,0,27 4: 38 84 00 0f addir4,r4,15 8: 7c 83 20 50 subfr4,r3,r4 c: 54 89 e1 3f rlwinm. r9,r4,28,4,31 10: 4d 82 00 20 beqlr 14: 55 24 20 36 rlwinm r4,r9,4,0,27 18: 39 24 ff f0 addir9,r4,-16 1c: 55 29 e1 3e rlwinm r9,r9,28,4,31 20: 39 29 00 01 addir9,r9,1 24: 7d 29 03 a6 mtctr r9 28: 7c 00 18 ac dcbf0,r3 2c: 38 63 00 10 addir3,r3,16 30: 42 00 ff f8 bdnz28 34: 7c 00 04 ac sync 38: 4e 80 00 20 blr