https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
Bug ID: 80080 Summary: S390: Isses with emitted cs-instructions for __atomic builtins. Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: stli at linux dot vnet.ibm.com CC: krebbel at gcc dot gnu.org Target Milestone: --- Target: S390 For s390, I am now using the c11 atomic builtins in glibc. There are now some issues with the emitted cs-instructions. If __atomic_compare_exchange_n is used within a condition for if/while, it is sometimes not using the condition code directly to jump away. Instead it extracts the condition code to a general register via ipm followed by further instructions in order to compare it. Afterwards it jumps according to this comparison. int foo1 (int *mem) { int val, newval; val = __atomic_load_n (mem, __ATOMIC_RELAXED); do { newval = val | 123; } while (!__atomic_compare_exchange_n (mem, &val, newval, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)); /* 0000000000000000 <foo>: 0: 58 10 20 00 l %r1,0(%r2) 4: 18 31 lr %r3,%r1 6: a5 3b 00 7b oill %r3,123 a: ba 13 20 00 cs %r1,%r3,0(%r2) e: b2 22 00 30 ipm %r3 12: 8a 30 00 1c sra %r3,28 16: ec 36 ff f7 00 7e cijne %r3,0,4 <foo+0x4> 1c: a7 29 00 00 lghi %r2,0 20: 07 fe br %r14 22: 07 07 nopr %r7 24: 07 07 nopr %r7 26: 07 07 nopr %r7 */ return 0; } For __atomic_exchange_n, s390 has no special instruction and thus a cs-loop is used. An extra register is needed to save the old-value instead of using the "old"-register of cs-instruction which is updated if the new value is not stored to memory. This extra register is reloaded in every loop. extern int bar (int *mem); int foo2 (int *mem) { int old = __atomic_exchange_n (mem, 0, __ATOMIC_ACQUIRE); if (old >= 2) return bar (mem); /* 0000000000000028 <foo2>: 28: a7 48 00 00 lhi %r4,0 2c: 58 10 20 00 l %r1,0(%r2) 30: 18 31 lr %r3,%r1 32: ba 14 20 00 cs %r1,%r4,0(%r2) 36: a7 74 ff fd jne 30 <foo2+0x8> 3a: ec 3c 00 06 01 7e cijnh %r3,1,46 <foo2+0x1e> 40: c0 f4 00 00 00 00 jg 40 <foo2+0x18> 42: R_390_PC32DBL bar+0x2 46: a7 29 00 00 lghi %r2,0 4a: 07 fe br %r14 4c: 07 07 nopr %r7 4e: 07 07 nopr %r7 */ return 0; } In case of exchanging to a zero value, like above, the load-and-and instruction can be used. Then only one register is needed for the new and old value and there is no loop. If the exchanged memory is a global variable, the address of it is loaded within the loop instead of before the loop. extern int foo3_mem; int foo3 (void) { return __atomic_exchange_n (&foo3_mem, 5, __ATOMIC_ACQUIRE); /* 0000000000000050 <foo3>: 50: c4 1d 00 00 00 00 lrl %r1,50 <foo3> 52: R_390_PC32DBL foo3_mem+0x2 56: a7 38 00 05 lhi %r3,5 5a: 18 21 lr %r2,%r1 5c: c0 40 00 00 00 00 larl %r4,5c <foo3+0xc> 5e: R_390_PC32DBL foo3_mem+0x2 62: ba 13 40 00 cs %r1,%r3,0(%r4) 66: a7 74 ff fa jne 5a <foo3+0xa> 6a: b9 14 00 22 lgfr %r2,%r2 6e: 07 fe br %r14 */ }