On 16.02.19 19:38, Jakub Jelinek wrote: > Hi! > > The following patch fixes wrong-code on the following testcase extracted > from pseudo-RNG with e.g. -march=zEC12 -O2. > The problem is in the instruction emitted by the *r<noxa>sbg_sidi_srl > patterns. We have in *.final correct: > (insn 67 65 68 (parallel [ > (set (reg:SI 1 %r1 [189]) > (xor:SI (subreg:SI (zero_extract:DI (reg/v:DI 11 %r11 > [orig:89 th ] [89]) > (const_int 32 [0x20]) > (const_int 24 [0x18])) 4) > (reg:SI 1 %r1 [187]))) > (clobber (reg:CC 33 %cc)) > ]) "pr89369.c":44:73 1419 {*rxsbg_sidi_srl} > (expr_list:REG_DEAD (reg/v:DI 11 %r11 [orig:89 th ] [89]) > (expr_list:REG_UNUSED (reg:CC 33 %cc) > (nil)))) > which is effectively (reg:SI %r1) ^= (unsigned) ((reg:DI %r11) >> 8). > But the pattern emits rxsbg %r1,%r11,40,63,56 which is effectively > (reg:SI %r1) ^= ((unsigned) ((reg:DI %r11) >> 8) & 0xffffff) > or equivalently > (reg:SI %r1) ^= ((reg:SI %r11) >> 8). Even in the pattern one can see > that it wants to extract exactly 32 bits always, no matter what the shift > count is. Fixed by always emitting 32,63,(32+pos from zero extract). > On that pr89369.c testcase, the patch also changes > - rxsbg %r12,%r9,64,63,32 > - rxsbg %r12,%r1,64,63,32 > + rxsbg %r12,%r9,32,63,32 > + rxsbg %r12,%r1,32,63,32 > and > - rxsbg %r1,%r3,64,63,32 > + rxsbg %r1,%r3,32,63,32 > which are all with zero_extract with len 32 and pos 0, so again, it wants > to extract the low 32 bits. I3 64 larger than I4 63 is just weird. > The patch also changes the instructions emitted in rXsbg_mode_sXl.c: > - rosbg %r2,%r3,34,63,62 > + rosbg %r2,%r3,32,63,62 > and > - rxsbg %r2,%r3,34,63,62 > + rxsbg %r2,%r3,32,63,62 > Here, it is > __attribute__ ((noinline)) unsigned int > rosbg_si_srl (unsigned int a, unsigned int b) > { > return a | (b >> 2); > } > __attribute__ ((noinline)) unsigned int > rxsbg_si_srl (unsigned int a, unsigned int b) > { > return a ^ (b >> 2); > } > so from quick POV, one might think 34,63,62 is better, as we want to or in > just the 30 bits from b. Both should actually work the same though, because > (subreg/s/v:SI (reg/v:DI 64 [ b+-4 ]) 4) - the b argument is passed zero > extended and so it doesn't really matter how many bits we extract, as long > as it is 30 or more. If I try instead: > __attribute__ ((noinline, noipa)) unsigned int > rosbg_si_srl (unsigned int a, unsigned long long b) > { > return a | ((unsigned) b >> 2); > } > __attribute__ ((noinline, noipa)) unsigned int > rxsbg_si_srl (unsigned int a, unsigned long long b) > { > return a ^ ((unsigned) b >> 2); > } > then both the unpatched and patched compiler emit properly > rosbg %r2,%r3,34,63,62 > and > rxsbg %r2,%r3,34,63,62 > through a different pattern, because in that case we must not rely on the > upper 32 bits of b being zero. > > In addition to this change, the patch adds a cleanup, there is no reason to > use a static buffer in each instruction and increase global state, we can > just tweak the arguments and let the caller deal with it. That is something > normally done in other parts of the s390.md as well. As small CONST_INTs > are hashed, it shouldn't increase compile time memory. > > Bootstrapped/regtested on s390x-linux, ok for trunk? > > 2019-02-16 Jakub Jelinek <ja...@redhat.com> > > PR target/89369 > * config/s390/s390.md (*r<noxa>sbg_<mode>_srl_bitmask, > *r<noxa>sbg_<mode>_sll, *r<noxa>sbg_<mode>_srl): Don't construct > pattern in a temporary buffer. > (*r<noxa>sbg_sidi_srl): Likewise. Always use 32 as I3 rather > than 64-operands[2]. > > * gcc.c-torture/execute/pr89369.c: New test. > * gcc.target/s390/md/rXsbg_mode_sXl.c (rosbg_si_srl, > rxsbg_si_srl): Expect last 3 operands 32,63,62 rather than > 34,63,62.
Ok. Thanks! Andreas