https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
Bug ID: 80080
Summary: S390: Isses with emitted cs-instructions for __atomic
builtins.
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: stli at linux dot vnet.ibm.com
CC: krebbel at gcc dot gnu.org
Target Milestone: ---
Target: S390
For s390, I am now using the c11 atomic builtins in glibc.
There are now some issues with the emitted cs-instructions.
If __atomic_compare_exchange_n is used within a condition for if/while,
it is sometimes not using the condition code directly to jump away.
Instead it extracts the condition code to a general register via ipm followed
by
further instructions in order to compare it. Afterwards it jumps according to
this comparison.
int foo1 (int *mem)
{
int val, newval;
val = __atomic_load_n (mem, __ATOMIC_RELAXED);
do
{
newval = val | 123;
}
while (!__atomic_compare_exchange_n (mem, &val, newval, 1, __ATOMIC_ACQUIRE,
__ATOMIC_RELAXED));
/*
0000000000000000 <foo>:
0: 58 10 20 00 l %r1,0(%r2)
4: 18 31 lr %r3,%r1
6: a5 3b 00 7b oill %r3,123
a: ba 13 20 00 cs %r1,%r3,0(%r2)
e: b2 22 00 30 ipm %r3
12: 8a 30 00 1c sra %r3,28
16: ec 36 ff f7 00 7e cijne %r3,0,4 <foo+0x4>
1c: a7 29 00 00 lghi %r2,0
20: 07 fe br %r14
22: 07 07 nopr %r7
24: 07 07 nopr %r7
26: 07 07 nopr %r7
*/
return 0;
}
For __atomic_exchange_n, s390 has no special instruction and thus a cs-loop is
used. An extra register is needed to save the old-value instead of using the
"old"-register of cs-instruction which is updated if the new value is not
stored
to memory. This extra register is reloaded in every loop.
extern int bar (int *mem);
int foo2 (int *mem)
{
int old = __atomic_exchange_n (mem, 0, __ATOMIC_ACQUIRE);
if (old >= 2)
return bar (mem);
/*
0000000000000028 <foo2>:
28: a7 48 00 00 lhi %r4,0
2c: 58 10 20 00 l %r1,0(%r2)
30: 18 31 lr %r3,%r1
32: ba 14 20 00 cs %r1,%r4,0(%r2)
36: a7 74 ff fd jne 30 <foo2+0x8>
3a: ec 3c 00 06 01 7e cijnh %r3,1,46 <foo2+0x1e>
40: c0 f4 00 00 00 00 jg 40 <foo2+0x18>
42: R_390_PC32DBL bar+0x2
46: a7 29 00 00 lghi %r2,0
4a: 07 fe br %r14
4c: 07 07 nopr %r7
4e: 07 07 nopr %r7
*/
return 0;
}
In case of exchanging to a zero value, like above, the load-and-and instruction
can be used. Then only one register is needed for the new and old value and
there is no loop.
If the exchanged memory is a global variable, the address of it is loaded
within the loop instead of before the loop.
extern int foo3_mem;
int foo3 (void)
{
return __atomic_exchange_n (&foo3_mem, 5, __ATOMIC_ACQUIRE);
/*
0000000000000050 <foo3>:
50: c4 1d 00 00 00 00 lrl %r1,50 <foo3>
52: R_390_PC32DBL foo3_mem+0x2
56: a7 38 00 05 lhi %r3,5
5a: 18 21 lr %r2,%r1
5c: c0 40 00 00 00 00 larl %r4,5c <foo3+0xc>
5e: R_390_PC32DBL foo3_mem+0x2
62: ba 13 40 00 cs %r1,%r3,0(%r4)
66: a7 74 ff fa jne 5a <foo3+0xa>
6a: b9 14 00 22 lgfr %r2,%r2
6e: 07 fe br %r14
*/
}