...so that we can use them for variable-length vectors. For now constant-length vectors continue to use VEC_PERM_EXPR and the vec_perm_const optab even for cases that the new optabs could handle.
The vector optabs are inconsistent about whether there should be an underscore before the mode part of the name, but the other lo/hi optabs have one. Doing this means that we're able to optimise some SLP tests using non-SLP (for now) on targets with variable-length vectors, so the patch needs to add a few XFAILs. Most of these go away with later patches. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linus-gnu. OK to install? Richard 2017-11-09 Richard Sandiford <richard.sandif...@linaro.org> Alan Hayward <alan.hayw...@arm.com> David Sherwood <david.sherw...@arm.com> gcc/ * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi) (vec_extract_even, vec_extract_odd): Document new optabs. * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI) (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal functions. * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab) (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab): New optabs. * tree-vect-data-refs.c: Include internal-fn.h. (vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}. (vect_permute_store_chain): Use them here too. (vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}. (vect_permute_load_chain): Use them here too. * tree-vect-stmts.c (can_reverse_vector_p): New function. (get_negative_load_store_type): Use it. (reverse_vector): New function. (vectorizable_store, vectorizable_load): Use it. * config/aarch64/iterators.md (perm_optab): New iterator. * config/aarch64/aarch64-sve.md (<perm_optab>_<mode>): New expander. (vec_reverse_<mode>): Likewise. gcc/testsuite/ * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL. * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise. * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length. * gcc.dg/vect/pr68445.c: Likewise. * gcc.dg/vect/slp-12a.c: Likewise. * gcc.dg/vect/slp-13-big-array.c: Likewise. * gcc.dg/vect/slp-13.c: Likewise. * gcc.dg/vect/slp-14.c: Likewise. * gcc.dg/vect/slp-15.c: Likewise. * gcc.dg/vect/slp-42.c: Likewise. * gcc.dg/vect/slp-multitypes-2.c: Likewise. * gcc.dg/vect/slp-multitypes-4.c: Likewise. * gcc.dg/vect/slp-multitypes-5.c: Likewise. * gcc.dg/vect/slp-reduc-4.c: Likewise. * gcc.dg/vect/slp-reduc-7.c: Likewise. * gcc.target/aarch64/sve_vec_perm_2.c: New test. * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise. * gcc.target/aarch64/sve_vec_perm_3.c: New test. * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise. * gcc.target/aarch64/sve_vec_perm_4.c: New test. * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-09 13:21:01.989917982 +0000 +++ gcc/doc/md.texi 2017-11-09 13:21:02.323463345 +0000 @@ -5017,6 +5017,46 @@ There is no need for a target to supply and @samp{vec_perm_const@var{m}} if the former can trivially implement the operation with, say, the vector constant loaded into a register. +@cindex @code{vec_reverse_@var{m}} instruction pattern +@item @samp{vec_reverse_@var{m}} +Reverse the order of the elements in vector input operand 1 and store +the result in vector output operand 0. Both operands have mode @var{m}. + +This pattern is provided mainly for targets with variable-length vectors. +Targets with fixed-length vectors can instead handle any reverse-specific +optimizations in @samp{vec_perm_const@var{m}}. + +@cindex @code{vec_interleave_lo_@var{m}} instruction pattern +@item @samp{vec_interleave_lo_@var{m}} +Take the lowest-indexed halves of vector input operands 1 and 2 and +interleave the elements, so that element @var{x} of operand 1 is followed by +element @var{x} of operand 2. Store the result in vector output operand 0. +All three operands have mode @var{m}. + +This pattern is provided mainly for targets with variable-length +vectors. Targets with fixed-length vectors can instead handle any +interleave-specific optimizations in @samp{vec_perm_const@var{m}}. + +@cindex @code{vec_interleave_hi_@var{m}} instruction pattern +@item @samp{vec_interleave_hi_@var{m}} +Like @samp{vec_interleave_lo_@var{m}}, but operate on the highest-indexed +halves instead of the lowest-indexed halves. + +@cindex @code{vec_extract_even_@var{m}} instruction pattern +@item @samp{vec_extract_even_@var{m}} +Concatenate vector input operands 1 and 2, extract the elements with +even-numbered indices, and store the result in vector output operand 0. +All three operands have mode @var{m}. + +This pattern is provided mainly for targets with variable-length vectors. +Targets with fixed-length vectors can instead handle any +extract-specific optimizations in @samp{vec_perm_const@var{m}}. + +@cindex @code{vec_extract_odd_@var{m}} instruction pattern +@item @samp{vec_extract_odd_@var{m}} +Like @samp{vec_extract_even_@var{m}}, but extract the elements with +odd-numbered indices. + @cindex @code{push@var{m}1} instruction pattern @item @samp{push@var{m}1} Output a push instruction. Operand 0 is value to push. Used only when Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-09 13:21:01.989917982 +0000 +++ gcc/internal-fn.def 2017-11-09 13:21:02.323463345 +0000 @@ -102,6 +102,17 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_ DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, vec_mask_store_lanes, mask_store_lanes) +DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_LO, ECF_CONST | ECF_NOTHROW, + vec_interleave_lo, binary) +DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_HI, ECF_CONST | ECF_NOTHROW, + vec_interleave_hi, binary) +DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_EVEN, ECF_CONST | ECF_NOTHROW, + vec_extract_even, binary) +DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_ODD, ECF_CONST | ECF_NOTHROW, + vec_extract_odd, binary) +DEF_INTERNAL_OPTAB_FN (VEC_REVERSE, ECF_CONST | ECF_NOTHROW, + vec_reverse, unary) + DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) /* Unary math functions. */ Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-09 13:21:01.989917982 +0000 +++ gcc/optabs.def 2017-11-09 13:21:02.323463345 +0000 @@ -309,6 +309,11 @@ OPTAB_D (vec_perm_optab, "vec_perm$a") OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a") OPTAB_D (vec_set_optab, "vec_set$a") OPTAB_D (vec_shr_optab, "vec_shr_$a") +OPTAB_D (vec_interleave_lo_optab, "vec_interleave_lo_$a") +OPTAB_D (vec_interleave_hi_optab, "vec_interleave_hi_$a") +OPTAB_D (vec_extract_even_optab, "vec_extract_even_$a") +OPTAB_D (vec_extract_odd_optab, "vec_extract_odd_$a") +OPTAB_D (vec_reverse_optab, "vec_reverse_$a") OPTAB_D (vec_unpacks_float_hi_optab, "vec_unpacks_float_hi_$a") OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a") OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a") Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-09 13:21:02.326167766 +0000 @@ -52,6 +52,7 @@ Software Foundation; either version 3, o #include "params.h" #include "tree-cfg.h" #include "tree-hash-traits.h" +#include "internal-fn.h" /* Return true if load- or store-lanes optab OPTAB is implemented for COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */ @@ -4636,7 +4637,16 @@ vect_grouped_store_supported (tree vecty return false; } - /* Check that the permutation is supported. */ + /* Powers of 2 use a tree of interleaving operations. See whether the + target supports them directly. */ + if (count != 3 + && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype, + OPTIMIZE_FOR_SPEED) + && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype, + OPTIMIZE_FOR_SPEED)) + return true; + + /* Otherwise check for support in the form of general permutations. */ unsigned int nelt; if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt)) { @@ -4881,50 +4891,78 @@ vect_permute_store_chain (vec<tree> dr_c /* If length is not equal to 3 then only power of 2 is supported. */ gcc_assert (pow2p_hwi (length)); - /* vect_grouped_store_supported ensures that this is constant. */ - unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); - auto_vec_perm_indices sel (nelt); - sel.quick_grow (nelt); - for (i = 0, n = nelt / 2; i < n; i++) + if (direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype, + OPTIMIZE_FOR_SPEED) + && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype, + OPTIMIZE_FOR_SPEED)) + { + /* We could support the case where only one of the optabs is + implemented, but that seems unlikely. */ + perm_mask_low = NULL_TREE; + perm_mask_high = NULL_TREE; + } + else { - sel[i * 2] = i; - sel[i * 2 + 1] = i + nelt; + /* vect_grouped_store_supported ensures that this is constant. */ + unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); + for (i = 0, n = nelt / 2; i < n; i++) + { + sel[i * 2] = i; + sel[i * 2 + 1] = i + nelt; + } + perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); + + for (i = 0; i < nelt; i++) + sel[i] += nelt / 2; + perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); } - perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); - for (i = 0; i < nelt; i++) - sel[i] += nelt / 2; - perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); + for (i = 0, n = log_length; i < n; i++) + { + for (j = 0; j < length / 2; j++) + { + vect1 = dr_chain[j]; + vect2 = dr_chain[j + length / 2]; - for (i = 0, n = log_length; i < n; i++) - { - for (j = 0; j < length/2; j++) - { - vect1 = dr_chain[j]; - vect2 = dr_chain[j+length/2]; + /* Create interleaving stmt: + high = VEC_PERM_EXPR <vect1, vect2, + {0, nelt, 1, nelt + 1, ...}> */ + low = make_temp_ssa_name (vectype, NULL, "vect_inter_low"); + if (perm_mask_low) + perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1, + vect2, perm_mask_low); + else + { + perm_stmt = gimple_build_call_internal + (IFN_VEC_INTERLEAVE_LO, 2, vect1, vect2); + gimple_set_lhs (perm_stmt, low); + } + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + (*result_chain)[2 * j] = low; - /* Create interleaving stmt: - high = VEC_PERM_EXPR <vect1, vect2, {0, nelt, 1, nelt+1, - ...}> */ - high = make_temp_ssa_name (vectype, NULL, "vect_inter_high"); + /* Create interleaving stmt: + high = VEC_PERM_EXPR <vect1, vect2, + {nelt / 2, nelt * 3 / 2, + nelt / 2 + 1, nelt * 3 / 2 + 1, + ...}> */ + high = make_temp_ssa_name (vectype, NULL, "vect_inter_high"); + if (perm_mask_high) perm_stmt = gimple_build_assign (high, VEC_PERM_EXPR, vect1, vect2, perm_mask_high); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[2*j] = high; - - /* Create interleaving stmt: - low = VEC_PERM_EXPR <vect1, vect2, - {nelt/2, nelt*3/2, nelt/2+1, nelt*3/2+1, - ...}> */ - low = make_temp_ssa_name (vectype, NULL, "vect_inter_low"); - perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1, - vect2, perm_mask_low); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[2*j+1] = low; - } - memcpy (dr_chain.address (), result_chain->address (), - length * sizeof (tree)); - } + else + { + perm_stmt = gimple_build_call_internal + (IFN_VEC_INTERLEAVE_HI, 2, vect1, vect2); + gimple_set_lhs (perm_stmt, high); + } + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + (*result_chain)[2 * j + 1] = high; + } + memcpy (dr_chain.address (), result_chain->address (), + length * sizeof (tree)); + } } } @@ -5235,7 +5273,16 @@ vect_grouped_load_supported (tree vectyp return false; } - /* Check that the permutation is supported. */ + /* Powers of 2 use a tree of extract operations. See whether the + target supports them directly. */ + if (count != 3 + && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype, + OPTIMIZE_FOR_SPEED) + && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype, + OPTIMIZE_FOR_SPEED)) + return true; + + /* Otherwise check for support in the form of general permutations. */ unsigned int nelt; if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt)) { @@ -5464,17 +5511,30 @@ vect_permute_load_chain (vec<tree> dr_ch /* If length is not equal to 3 then only power of 2 is supported. */ gcc_assert (pow2p_hwi (length)); - /* vect_grouped_load_supported ensures that this is constant. */ - unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); - auto_vec_perm_indices sel (nelt); - sel.quick_grow (nelt); - for (i = 0; i < nelt; ++i) - sel[i] = i * 2; - perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); - - for (i = 0; i < nelt; ++i) - sel[i] = i * 2 + 1; - perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); + if (direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype, + OPTIMIZE_FOR_SPEED) + && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype, + OPTIMIZE_FOR_SPEED)) + { + /* We could support the case where only one of the optabs is + implemented, but that seems unlikely. */ + perm_mask_even = NULL_TREE; + perm_mask_odd = NULL_TREE; + } + else + { + /* vect_grouped_load_supported ensures that this is constant. */ + unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + auto_vec_perm_indices sel (nelt); + sel.quick_grow (nelt); + for (i = 0; i < nelt; ++i) + sel[i] = i * 2; + perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); + + for (i = 0; i < nelt; ++i) + sel[i] = i * 2 + 1; + perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); + } for (i = 0; i < log_length; i++) { @@ -5485,19 +5545,33 @@ vect_permute_load_chain (vec<tree> dr_ch /* data_ref = permute_even (first_data_ref, second_data_ref); */ data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_even"); - perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR, - first_vect, second_vect, - perm_mask_even); + if (perm_mask_even) + perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR, + first_vect, second_vect, + perm_mask_even); + else + { + perm_stmt = gimple_build_call_internal + (IFN_VEC_EXTRACT_EVEN, 2, first_vect, second_vect); + gimple_set_lhs (perm_stmt, data_ref); + } vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[j/2] = data_ref; + (*result_chain)[j / 2] = data_ref; /* data_ref = permute_odd (first_data_ref, second_data_ref); */ data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_odd"); - perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR, - first_vect, second_vect, - perm_mask_odd); + if (perm_mask_odd) + perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR, + first_vect, second_vect, + perm_mask_odd); + else + { + perm_stmt = gimple_build_call_internal + (IFN_VEC_EXTRACT_ODD, 2, first_vect, second_vect); + gimple_set_lhs (perm_stmt, data_ref); + } vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[j/2+length/2] = data_ref; + (*result_chain)[j / 2 + length / 2] = data_ref; } memcpy (dr_chain.address (), result_chain->address (), length * sizeof (tree)); Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/tree-vect-stmts.c 2017-11-09 13:21:02.327069240 +0000 @@ -1796,6 +1796,46 @@ perm_mask_for_reverse (tree vectype) return vect_gen_perm_mask_checked (vectype, sel); } +/* Return true if the target can reverse the elements in a vector of + type VECTOR_TYPE. */ + +static bool +can_reverse_vector_p (tree vector_type) +{ + return (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type, + OPTIMIZE_FOR_SPEED) + || perm_mask_for_reverse (vector_type)); +} + +/* Generate a statement to reverse the elements in vector INPUT and + return the SSA name that holds the result. GSI is a statement iterator + pointing to STMT, which is the scalar statement we're vectorizing. + VEC_DEST is the destination variable with which new SSA names + should be associated. */ + +static tree +reverse_vector (tree vec_dest, tree input, gimple *stmt, + gimple_stmt_iterator *gsi) +{ + tree new_temp = make_ssa_name (vec_dest); + tree vector_type = TREE_TYPE (input); + gimple *perm_stmt; + if (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type, + OPTIMIZE_FOR_SPEED)) + { + perm_stmt = gimple_build_call_internal (IFN_VEC_REVERSE, 1, input); + gimple_set_lhs (perm_stmt, new_temp); + } + else + { + tree perm_mask = perm_mask_for_reverse (vector_type); + perm_stmt = gimple_build_assign (new_temp, VEC_PERM_EXPR, + input, input, perm_mask); + } + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + return new_temp; +} + /* A subroutine of get_load_store_type, with a subset of the same arguments. Handle the case where STMT is part of a grouped load or store. @@ -1999,7 +2039,7 @@ get_negative_load_store_type (gimple *st return VMAT_CONTIGUOUS_DOWN; } - if (!perm_mask_for_reverse (vectype)) + if (!can_reverse_vector_p (vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -6760,20 +6800,10 @@ vectorizable_store (gimple *stmt, gimple if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - tree perm_mask = perm_mask_for_reverse (vectype); tree perm_dest = vect_create_destination_var (gimple_assign_rhs1 (stmt), vectype); - tree new_temp = make_ssa_name (perm_dest); - - /* Generate the permute statement. */ - gimple *perm_stmt - = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd, - vec_oprnd, perm_mask); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - - perm_stmt = SSA_NAME_DEF_STMT (new_temp); - vec_oprnd = new_temp; + vec_oprnd = reverse_vector (perm_dest, vec_oprnd, stmt, gsi); } /* Arguments are ready. Create the new vector stmt. */ @@ -7998,9 +8028,7 @@ vectorizable_load (gimple *stmt, gimple_ if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - tree perm_mask = perm_mask_for_reverse (vectype); - new_temp = permute_vec_elements (new_temp, new_temp, - perm_mask, stmt, gsi); + new_temp = reverse_vector (vec_dest, new_temp, stmt, gsi); new_stmt = SSA_NAME_DEF_STMT (new_temp); } Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2017-11-09 13:21:01.989917982 +0000 +++ gcc/config/aarch64/iterators.md 2017-11-09 13:21:02.322561871 +0000 @@ -1556,6 +1556,11 @@ (define_int_attr pauth_hint_num_a [(UNSP (UNSPEC_PACI1716 "8") (UNSPEC_AUTI1716 "12")]) +(define_int_attr perm_optab [(UNSPEC_ZIP1 "vec_interleave_lo") + (UNSPEC_ZIP2 "vec_interleave_hi") + (UNSPEC_UZP1 "vec_extract_even") + (UNSPEC_UZP2 "vec_extract_odd")]) + (define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip") (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn") (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")]) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-09 13:21:01.989917982 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-09 13:21:02.320758923 +0000 @@ -630,6 +630,19 @@ (define_expand "vec_perm<mode>" } ) +(define_expand "<perm_optab>_<mode>" + [(set (match_operand:SVE_ALL 0 "register_operand") + (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand") + (match_operand:SVE_ALL 2 "register_operand")] + OPTAB_PERMUTE))] + "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()") + +(define_expand "vec_reverse_<mode>" + [(set (match_operand:SVE_ALL 0 "register_operand") + (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand")] + UNSPEC_REV))] + "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()") + (define_insn "*aarch64_sve_tbl<mode>" [(set (match_operand:SVE_ALL 0 "register_operand" "=w") (unspec:SVE_ALL Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c 2017-11-09 13:21:02.323463345 +0000 @@ -51,7 +51,4 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* Requires reverse for variable-length SVE, which is implemented for - by a later patch. Until then we report it twice, once for SVE and - once for 128-bit Advanced SIMD. */ -/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */ +/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c 2017-11-09 13:21:02.323463345 +0000 @@ -183,7 +183,4 @@ int main () } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* f4 requires reverse for SVE, which is implemented by a later patch. - Until then we report it twice, once for SVE and once for 128-bit - Advanced SIMD. */ -/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */ +/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr33953.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr33953.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/pr33953.c 2017-11-09 13:21:02.323463345 +0000 @@ -29,6 +29,6 @@ void blockmove_NtoN_blend_noremap32 (con } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || vect_variable_length } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr68445.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr68445.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/pr68445.c 2017-11-09 13:21:02.323463345 +0000 @@ -16,4 +16,4 @@ void IMB_double_fast_x (int *destf, int } } -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-12a.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-12a.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-12a.c 2017-11-09 13:21:02.323463345 +0000 @@ -75,5 +75,5 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-13-big-array.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-13-big-array.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-13-big-array.c 2017-11-09 13:21:02.324364818 +0000 @@ -134,4 +134,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-13.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-13.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-13.c 2017-11-09 13:21:02.324364818 +0000 @@ -128,4 +128,4 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-14.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-14.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-14.c 2017-11-09 13:21:02.324364818 +0000 @@ -111,5 +111,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_int_mult } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-15.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-15.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-15.c 2017-11-09 13:21:02.324364818 +0000 @@ -112,6 +112,6 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {target vect_int_mult } } } */ /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" {target { ! { vect_int_mult } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! { vect_int_mult } } } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-42.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-42.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-42.c 2017-11-09 13:21:02.324364818 +0000 @@ -15,5 +15,5 @@ void foo (int n) } } -/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */ /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c 2017-11-09 13:21:02.324364818 +0000 @@ -77,5 +77,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c 2017-11-09 13:21:02.324364818 +0000 @@ -52,5 +52,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_unpack } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c 2017-11-09 13:21:02.324364818 +0000 @@ -52,5 +52,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-reduc-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-reduc-4.c 2017-11-09 13:21:02.325266292 +0000 @@ -57,5 +57,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_min_max } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_min_max } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_min_max || vect_variable_length } } } } */ Index: gcc/testsuite/gcc.dg/vect/slp-reduc-7.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 2017-11-09 13:21:01.989917982 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-reduc-7.c 2017-11-09 13:21:02.325266292 +0000 @@ -55,5 +55,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || vect_variable_length } } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include <stdint.h> + +#define VEC_PERM(TYPE) \ +TYPE __attribute__ ((noinline, noclone)) \ +vec_reverse_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \ +{ \ + for (int i = 0; i < n; ++i) \ + a[i] = b[n - i - 1]; \ +} + +#define TEST_ALL(T) \ + T (int8_t) \ + T (uint8_t) \ + T (int16_t) \ + T (uint16_t) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (float) \ + T (double) + +TEST_ALL (VEC_PERM) + +/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.b, z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,29 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_vec_perm_2.c" + +#define N 153 + +#define HARNESS(TYPE) \ + { \ + TYPE a[N], b[N]; \ + for (unsigned int i = 0; i < N; ++i) \ + { \ + b[i] = i * 2 + i % 5; \ + asm volatile ("" ::: "memory"); \ + } \ + vec_reverse_##TYPE (a, b, N); \ + for (unsigned int i = 0; i < N; ++i) \ + { \ + TYPE expected = (N - i - 1) * 2 + (N - i - 1) % 5; \ + if (a[i] != expected) \ + __builtin_abort (); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST_ALL (HARNESS) +} Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */ + +#include <stdint.h> + +#define VEC_PERM(TYPE) \ +TYPE __attribute__ ((noinline, noclone)) \ +vec_zip_##TYPE (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, long n) \ +{ \ + for (long i = 0; i < n; ++i) \ + { \ + a[i * 8] = c[i * 4]; \ + a[i * 8 + 1] = b[i * 4]; \ + a[i * 8 + 2] = c[i * 4 + 1]; \ + a[i * 8 + 3] = b[i * 4 + 1]; \ + a[i * 8 + 4] = c[i * 4 + 2]; \ + a[i * 8 + 5] = b[i * 4 + 2]; \ + a[i * 8 + 6] = c[i * 4 + 3]; \ + a[i * 8 + 7] = b[i * 4 + 3]; \ + } \ +} + +#define TEST_ALL(T) \ + T (int8_t) \ + T (uint8_t) \ + T (int16_t) \ + T (uint16_t) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (float) \ + T (double) + +TEST_ALL (VEC_PERM) + +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */ +/* Currently we can't use SLP for groups bigger than 128 bits. */ +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,31 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_vec_perm_3.c" + +#define N (43 * 8) + +#define HARNESS(TYPE) \ + { \ + TYPE a[N], b[N], c[N]; \ + for (unsigned int i = 0; i < N; ++i) \ + { \ + b[i] = i * 2 + i % 5; \ + c[i] = i * 3; \ + asm volatile ("" ::: "memory"); \ + } \ + vec_zip_##TYPE (a, b, c, N / 8); \ + for (unsigned int i = 0; i < N / 2; ++i) \ + { \ + TYPE expected1 = i * 3; \ + TYPE expected2 = i * 2 + i % 5; \ + if (a[i * 2] != expected1 || a[i * 2 + 1] != expected2) \ + __builtin_abort (); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST_ALL (HARNESS) +} Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */ + +#include <stdint.h> + +#define VEC_PERM(TYPE) \ +TYPE __attribute__ ((noinline, noclone)) \ +vec_uzp_##TYPE (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, long n) \ +{ \ + for (long i = 0; i < n; ++i) \ + { \ + a[i * 4] = c[i * 8]; \ + b[i * 4] = c[i * 8 + 1]; \ + a[i * 4 + 1] = c[i * 8 + 2]; \ + b[i * 4 + 1] = c[i * 8 + 3]; \ + a[i * 4 + 2] = c[i * 8 + 4]; \ + b[i * 4 + 2] = c[i * 8 + 5]; \ + a[i * 4 + 3] = c[i * 8 + 6]; \ + b[i * 4 + 3] = c[i * 8 + 7]; \ + } \ +} + +#define TEST_ALL(T) \ + T (int8_t) \ + T (uint8_t) \ + T (int16_t) \ + T (uint16_t) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (float) \ + T (double) + +TEST_ALL (VEC_PERM) + +/* We could use a single uzp1 and uzp2 per function by implementing + SLP load permutation for variable width. XFAIL until then. */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */ +/* Delete these if the tests above start passing instead. */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */ +/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c =================================================================== --- /dev/null 2017-11-09 12:47:20.377612760 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c 2017-11-09 13:21:02.325266292 +0000 @@ -0,0 +1,29 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_vec_perm_4.c" + +#define N (43 * 8) + +#define HARNESS(TYPE) \ + { \ + TYPE a[N], b[N], c[N]; \ + for (unsigned int i = 0; i < N; ++i) \ + { \ + c[i] = i * 2 + i % 5; \ + asm volatile ("" ::: "memory"); \ + } \ + vec_uzp_##TYPE (a, b, c, N / 8); \ + for (unsigned int i = 0; i < N; ++i) \ + { \ + TYPE expected = i * 2 + i % 5; \ + if ((i & 1 ? b[i / 2] : a[i / 2]) != expected) \ + __builtin_abort (); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST_ALL (HARNESS) +}