On 19 September 2014 12:21, Tejas Belagod <tejas.bela...@arm.com> wrote: > The reason we avoided using type-punning using unions was that reload would > get confused with potential subreg(mem) that could be introduced because of > memory xfer caused by unions and large int modes. As a result, we would get > incorrect or sub-optimal code. But this seems to have fixed itself. :-) > > Because this involves xfers between large int modes and > CANNOT_CHANGE_MODE_CLASS has some impact on it, it would be good to test > what impact your patch has with C_C_M_C removed, so that it will be easier > to fix the fallout once we remove C_C_M_C eventually. To test this you will > need Richard's patch set > https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01440.html. > > Same for your other 2 patches in this series(3,4).
I tried those patches, and altered aarch64_cannot_change_mode_class to return false for all cases. However, this does not avoid the unnecessary moves. Taking a really simple test case: #include <arm_neon.h> int32x2x2_t xvld2_s32(int32_t *__a) { int32x2x2_t ret; __builtin_aarch64_simd_oi __o; __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a); ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0); ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1); return ret; } (disabling scheduling for clarity) $ aarch64-oe-linux-gcc -O2 -S -o - simd.c -fno-schedule-insns -fno-schedule-insns2 ... xvld2_s32: ld2 {v2.2s - v3.2s}, [x0] orr v0.8b, v2.8b, v2.8b orr v1.8b, v3.8b, v3.8b ret ... The reason is apparent in the rtl dump from ira: ... Allocno a0r73 of FP_REGS(32) has 31 avail. regs 33-63, node: 33-63 (confl regs = 0-32 64 65) ... (insn 2 4 3 2 (set (reg/v/f:DI 79 [ __a ]) (reg:DI 0 x0 [ __a ])) simd.c:5 34 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 0 x0 [ __a ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 20 2 (set (reg/v:OI 73 [ __o ]) (subreg:OI (vec_concat:V8SI (vec_concat:V4SI (unspec:V2SI [ (mem:TI (reg/v/f:DI 79 [ __a ]) [0 S16 A8]) ] UNSPEC_LD2) (vec_duplicate:V2SI (const_int 0 [0]))) (vec_concat:V4SI (unspec:V2SI [ (mem:TI (reg/v/f:DI 79 [ __a ]) [0 S16 A8]) ] UNSPEC_LD2) (vec_duplicate:V2SI (const_int 0 [0])))) 0)) simd.c:8 2149 {aarch64_ld2v2si_dreg} (expr_list:REG_DEAD (reg/v/f:DI 79 [ __a ]) (nil))) (insn 20 6 21 2 (set (reg:V2SI 32 v0) (subreg:V2SI (reg/v:OI 73 [ __o ]) 0)) simd.c:12 778 {*aarch64_simd_movv2si} (nil)) (insn 21 20 22 2 (set (reg:V2SI 33 v1) (subreg:V2SI (reg/v:OI 73 [ __o ]) 16)) simd.c:12 778 {*aarch64_simd_movv2si} (expr_list:REG_DEAD (reg/v:OI 73 [ __o ]) (nil))) (insn 22 21 23 2 (use (reg:V2SI 32 v0)) simd.c:12 -1 (nil)) (insn 23 22 0 2 (use (reg:V2SI 33 v1)) simd.c:12 -1 (nil)) The register allocator considers r73 to conflict with v0, because they are simultaneously live after insn 20. Without the 2nd use of v73 (eg if the write to res.val[1] is replaced with vdup_n_s32(0) ) then the allocator does do the right thing with the subreg and allocates v73 to {v0,v1}. I haven't read all of the old threads relating to Richard's patches yet, but I don't see why they would affect this issue. I don't think the register allocator is able to resolve this unless the conversion between the __builtin_simd type and the int32x4x2_t type is done as a single operation. However, type-punning is not possible with the arrays of 64 bit vectors, as the arrays are not the same size as the corresponding __builtin_simd types, and any solution for those would probably help with the q variants too. Maybe the solution is to pass the NEON intrinsic types directly to the builtins? Is there a reason that it wasn't done that way before? Thanks Charles