Tejas Belagod <tbela...@arm.com> writes: > Hi, > > I seem to have uncovered what seems to be a bug with handling SUBREG (MEM) > with > the MEM having side-effects while testing aarch64-4.7. This bug seems to be > latent on trunk. > > Here is a test case reduced from gcc.c-torture/execute/scal-to-vec1.c. > > #define vector(elcount, type) \ > __attribute__((vector_size((elcount)*sizeof(type)))) type > > #define vidx(type, vec, idx) (*((type *) &(vec) + idx)) > > #define operl(a, b, op) (a op b) > #define operr(a, b, op) (b op a) > > #define check(type, count, vec0, vec1, num, op, lr) \ > do {\ > int __i; \ > for (__i = 0; __i < count; __i++) {\ > if (vidx (type, vec1, __i) != oper##lr (num, vidx (type, vec0, > __i), > op)) \ > __builtin_abort (); \ > }\ > } while (0) > > #define veccompare(type, count, v0, v1) \ > do {\ > int __i; \ > for (__i = 0; __i < count; __i++) { \ > if (vidx (type, v0, __i) != vidx (type, v1, __i)) \ > __builtin_abort (); \ > } \ > } while (0) > > volatile int one = 1; > > int main (int argc, char *argv[]) { > > vector(8, short) v0 = {one, 1, 2, 3, 4, 5, 6, 7}; > vector(8, short) v1; > v1 = 2 << v0; check (short, 8, v0, v1, 2, <<, l); > return 0; > } > > When compiled with -O1, the combiner tries to combine these two instructions: > > (insn 87 86 89 4 (set (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ]) > (sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 [ ivtmp.23 ])) [0 > MEM[base: D.2294_40, offset: 0B]+0 S2 A16]))) scal-to-vec1.c:35 54 > {*extendhisi2_aarch64} > (expr_list:REG_INC (reg:DI 113 [ ivtmp.23 ]) > (nil))) > > (insn 89 87 90 4 (set (reg:SI 155) > (ashift:SI (reg:SI 158) > (subreg:QI (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ]) > 0))) > scal-to-vec1.c:35 331 {*ashlsi3_insn} > (expr_list:REG_DEAD (reg:SI 154 [ MEM[base: D.2294_40, offset: 0B] ]) > (nil))) > > It tries to forward-propagate reg 154 to insn 89 and the intermediate RTL > looks > like this: > > (insn 89 87 90 4 (set (reg:SI 155) > (ashift:SI (reg:SI 158) > (subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 > [ivtmp.23 ])) [0 MEM[base: > D.2294_40, offset: 0B]+0 S2 A16]))) > > The combiner then recursively tries to simplify this RTL. During simplifying > the > subreg part of the RTL i.e. > > (subreg:QI ((sign_extend:SI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0 > MEM[base: D.2294_40, offset: > 0B]+0 S2 A16]))) > > combine_simplify_rtx () calls simplify_subreg () and this returns > > (subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0 > MEM[base: D.2294_40, offset: 0B]+0 S2 A16]))) > > The code that does this is in combine.c:combine_simplify_rtx () > > ... > rtx temp; > temp = simplify_subreg (mode, SUBREG_REG (x), op0_mode, > SUBREG_BYTE (x)); > if (temp) > return temp; > } > ... > > So, the above tree is returned as is and insn 87 gets combined into insn 89 to > become > > (insn 89 87 90 4 (set (reg:SI 155) > (ashift:SI (reg:SI 158) > (subreg:QI (mem:HI (pre_inc:DI (reg:DI 113 ivtmp.23 ])) [0 > MEM[base: D.2294_40, offset: 0B]+0 S2 A16]))) > > This ends up in reload and in cleanup_subreg_operands () the subreg:QI > (mem:HI) > gets narrowed to mem:QI, but the interesting bit is that the pre_inc stays the > same, so becomes a pre_inc for a QI mode address rather than the original HI > mode address and therefore instead of addressing a byte and incrementing the > address by 2 to get the next half word, the address gets incremented by 1! > > Also, I feel this is a latent bug on the trunk. This is because while the > combiner takes the same path on the trunk, the address expression is not a > pre_inc there - the trunk wants to keep the addressing to a mem (plus (reg) > (const)) and therefore combine_simplify_rtx () simplifies the subreg > expression > right down to the narrowed memref i.e. mem:QI (plus:DI (reg:DI) (const)). This > gets combined to insn 89 to become > > (insn 89 87 90 4 (set (reg:SI 155) > (ashift:SI (reg:SI 158) (mem:QI (plus:DI (reg:DI) (const)))) > > This then, gets later checked with recog () and since the predicate for > operand > 2 does not allow memory operands for shifts in aarch64.md, does not get > combined > and all is well. > > After the discussions Richard Earnshaw had on IRC with Andrew Pinski, it was > felt that it was best to fix the standard predicate > 'general_operand' to not allow SUBREG (MEM) with side-effects as it has known > issues associated with it - particularly reload not being able to deal with > such > cases. 'general_operand' currently outlaws cases like paradoxical SUBREG (MEM) > on targets with insn scheduling and SUBREG (MEM) with non-zero SUBREG_BYTE. > This > is another case it should not allow. Here is a patch that tightens > general_operands to not allow SUBREG (MEM) with MEM having side-effects. > > I would like to hear some thoughts on this.
LGTM. It would be good to get rid of subreg mem operands altogether at some point, but that's a little too drastic for stage 3. This patch looks like a strict improvement. Richard