On Fri, 11 Oct 2019, Joel Hutton wrote: > Hi Richard, > > Thanks for your help, I've reworked my SLP RFC based on your feedback. > > I think a better place for the loop searching for CONSTRUCTORs is > > vect_slp_analyze_bb_1 where I'd put it before the check you remove, > > and I'd simply append found CONSTRUCTORs to the grouped_stores > > array > I've moved this check into a separate function and called it from > vect_slp_analyze_bb_1 > > > The fixup you do in vectorizable_operation doesn't > > belong there either, I'd add a new field to the SLP instance > > structure refering to the CONSTRUCTOR stmt and do the fixup > > in vect_schedule_slp_instance instead where you can simply > > replace the CONSTRUCTOR with the vectorized SSA name then. > > Done. > > > + /* Check that the constructor elements are unique. */ > > + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) > > + { > > + tree prev_val; > > + int j; > > + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, > > prev_val) > > + { > > + if (val == prev_val && i!=j) > > > > why's that necessary? (it looks incomplete, also doesn't catch > > [duplicate] constants) > > The thinking was that there was no benefit in vectorizing a constructor of > duplicates, or a vector of constants, although now you mention it that > thinking > may be flawed. I've removed it > > > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS > > (we can have omitted trailing zeros).
^^^ I don't see this being handled? You give up on non-SSA names but not on the omitted trailing zeros. > ... > > What happens if you have a vector constructor that is twice > > as large as the machine supports? The vectorizer will happily > > produce a two vector SSA name vectorized result but your > > CONSTRUCTOR replacement doesn't work here. I think this should > > be made work correctly (not give up on that case). > > I've reworked the patch to account for this, by checking that the > vectorized version > has one vectorized stmt at the root of the SLP tree. I'm not sure how to > handle > a vector constructor twice as large as the machine supports, as far as I > can see, > when only analyzing a within a basic block, the SSA name of the > constructor has to > be maintained. You build a CONSTRUCTOR of vectors, thus _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... }; + + if (constructor) + { + SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt; + } + else + SLP_INSTANCE_ROOT_STMT (new_instance) = NULL; + too much vertical space, no {} around single-stmt if clauses @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb, stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt); if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info)) { + /* Check this is not a constructor that will be vectorized + away. */ + if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info)) + continue; hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead? In theory the stmt should be part of the SLP tree itself but that's probably too awkward to be made work at the moment ;) vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new functions which need comments. + /* For vector constructors, the same SSA name must be used to maintain data + flow into other basic blocks. */ + if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance) + && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1 + && SLP_TREE_VEC_STMTS (node).exists ()) + { it should read /* Vectorize the instance root. */ and be in vect_schedule_slp after the vect_schedule_slp_instance. As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1, you also cannot simply do "nothing" here, "failing" vectorization (well, you probably can - DCE will remove the vectorized code - but at least if there were calls in the SLP tree they will be mangled by vectorization so the scalar code is wrecked). SO it should be if (SLP_INSTANCE_ROOT_STMT (instance)) .. you may not fail to replace the scalar root stmt here .. + if (CONSTRUCTOR_NELTS (rhs) == 0) + vectorizable = false; + if you use continue; you can elide the 'vectorizable' variable. + if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt))) + vectorizable = false; + why's that? no comments that clarify ;) The vector may be used as argument to a call or as source of a store. So I'd simply remove this check (and the function). Thanks, Richard. > Currently SLP vectorization can build SLP trees starting from reductions or > from group stores. This patch adds a third starting point: vector > constructors. > > For the following test case (compiled with -O3): > > char g_d[1024], g_s1[1024], g_s2[1024]; > void test_loop(void) > { > char d = g_d, s1 = g_s1, *s2 = g_s2; > for ( int y = 0; y < 128; y++ ) > > { > for ( int x = 0; x < 16; x++ ) > d[x] = s1[x] + s2[x]; > d += 16; > } > } > > before patch: > test_loop: > .LFB0: > .cfi_startproc > adrp x0, g_s1 > adrp x2, g_s2 > add x3, x0, :lo12:g_s1 > add x4, x2, :lo12:g_s2 > ldrb w7, [x2, #:lo12:g_s2] > ldrb w1, [x0, #:lo12:g_s1] > adrp x0, g_d > ldrb w6, [x4, 1] > add x0, x0, :lo12:g_d > ldrb w5, [x3, 1] > add w1, w1, w7 > fmov s0, w1 > ldrb w7, [x4, 2] > add w5, w5, w6 > ldrb w1, [x3, 2] > ldrb w6, [x4, 3] > add x2, x0, 2048 > ins v0.b[1], w5 > add w1, w1, w7 > ldrb w7, [x3, 3] > ldrb w5, [x4, 4] > add w7, w7, w6 > ldrb w6, [x3, 4] > ins v0.b[2], w1 > ldrb w8, [x4, 5] > add w6, w6, w5 > ldrb w5, [x3, 5] > ldrb w9, [x4, 6] > add w5, w5, w8 > ldrb w1, [x3, 6] > ins v0.b[3], w7 > ldrb w8, [x4, 7] > add w1, w1, w9 > ldrb w11, [x3, 7] > ldrb w7, [x4, 8] > add w11, w11, w8 > ldrb w10, [x3, 8] > ins v0.b[4], w6 > ldrb w8, [x4, 9] > add w10, w10, w7 > ldrb w9, [x3, 9] > ldrb w7, [x4, 10] > add w9, w9, w8 > ldrb w8, [x3, 10] > ins v0.b[5], w5 > ldrb w6, [x4, 11] > add w8, w8, w7 > ldrb w7, [x3, 11] > ldrb w5, [x4, 12] > add w7, w7, w6 > ldrb w6, [x3, 12] > ins v0.b[6], w1 > ldrb w12, [x4, 13] > add w6, w6, w5 > ldrb w5, [x3, 13] > ldrb w1, [x3, 14] > add w5, w5, w12 > ldrb w13, [x4, 14] > ins v0.b[7], w11 > ldrb w12, [x4, 15] > add w4, w1, w13 > ldrb w1, [x3, 15] > add w1, w1, w12 > ins v0.b[8], w10 > ins v0.b[9], w9 > ins v0.b[10], w8 > ins v0.b[11], w7 > ins v0.b[12], w6 > ins v0.b[13], w5 > ins v0.b[14], w4 > ins v0.b[15], w1 > .p2align 3,,7 > .L2: > str q0, [x0], 16 > cmp x2, x0 > bne .L2 > ret > .cfi_endproc > .LFE0: > > After patch: > > test_loop: > .LFB0: > .cfi_startproc > adrp x3, g_s1 > adrp x2, g_s2 > add x3, x3, :lo12:g_s1 > add x2, x2, :lo12:g_s2 > adrp x0, g_d > add x0, x0, :lo12:g_d > add x1, x0, 2048 > ldr q1, [x2] > ldr q0, [x3] > add v0.16b, v0.16b, v1.16b > .p2align 3,,7 > .L2: > str q0, [x0], 16 > cmp x0, x1 > bne .L2 > ret > .cfi_endproc > .LFE0: > > > 2019-10-11 Joel Hutton joel.hut...@arm.com > > * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector > constructors. > (vect_bb_slp_scalar_cost): Likewise. > (vect_ssa_use_outside_bb): New function. > (vect_slp_check_for_constructors): New function. > (vect_slp_analyze_bb_1): Add check for vector constructors. > (vect_schedule_slp_instance): Add case to fixup vector constructor > stmt. > * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field. > > > gcc/testsuite/ChangeLog: > > 2019-10-11 Joel Hutton joel.hut...@arm.com > > * gcc.dg/vect/bb-slp-40.c: New test. > > bootstrapped and regression tested on aarch64-none-linux-gnu > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)