Richard Sandiford <richard.sandif...@arm.com> writes: > I'll try to have a patch ready tomorrow morning European time.
Well, I totally failed to hit that deadline. When testing on Power, I saw a couple of extra failures, but I now think they're improvements rather than regressions. See the point about single-iteration epilogues below. --- This patch fixes the fallout that Kewen reported on Power after the recent change to avoid unnecessary use of partial vectors. As Kewen said, the problem is that vect_analyze_loop_2 doesn't know how many epilogue iterations there will be, and so it cannot make a final decision about whether the number of iterations forces an epilogue loop to use partial vectors. This is similar to the current situation for peeling: we don't know during initial analysis whether an epilogue loop will itself require peeling. Instead we decide that during vect_do_peeling, where the final number of epilogue loop iterations is known. The patch takes a similar approach for the decision about whether to use partial vectors. As the comments in the patch say, the idea is that vect_analyze_loop_2 should make peeling and partial- vector decisions based on the assumption that the loop_vinfo will be used as the main loop, while vect_do_peeling should make them in the knowledge that the loop_vinfo will be used as an epilogue loop. This allows the same analysis to be used for both cases, which we rely on for implementing VECT_COMPARE_COSTS; see the big comment in vect_analyze_loop for details. I hope the patch makes the (mostly preexisting) structure a bit more obvious. It isn't what anyone would design from scratch, but that's the nature of working with a mature vector framework. Arranging things this way means that vect_verify_full_masking and vect_verify_loop_lens now become part of the “can” rather than “will” test for partial vectors. Also, while splitting out the logic that handles epilogues with constant iterations, I added a check to make sure that we don't try to use partial vectors to vectorise a single-scalar loop. This required some changes to the Power tests. Tested on aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard gcc/ * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function. * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function. Reject using vector epilogue loops for single iterations. Install the constant number of epilogue loop iterations in the associated loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling to do the main part of the test. (vect_do_peeling): Use vect_update_epilogue_niters to handle epilogue loops with a known number of iterations. Skip recomputing the number of iterations later in that case. Otherwise, use vect_determine_partial_vectors_and_peeling to decide whether the epilogue loop needs to use partial vectors or peeling. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the default can_use_partial_vectors_p to false if partial-vector-usage=0. (determine_peel_for_niter): Remove in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function, split out from... (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P rather than USING_PARTIAL_VECTORS_P. gcc/testsuite/ * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the single-iteration epilogues of the 64-bit loops to be vectorized. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise. --- .../gcc.target/powerpc/p9-vec-length-epil-1.c | 4 +- .../gcc.target/powerpc/p9-vec-length-epil-7.c | 2 +- .../gcc.target/powerpc/p9-vec-length-epil-8.c | 4 +- gcc/tree-vect-loop-manip.c | 83 +++++--- gcc/tree-vect-loop.c | 196 ++++++++++++------ gcc/tree-vectorizer.h | 3 +- 6 files changed, 192 insertions(+), 100 deletions(-) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 9dffc5570e5..b7fa6bc8d2f 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1967,7 +1967,8 @@ extern tree vect_create_addr_base_for_vector_ref (vec_info *, extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info loop_vinfo); bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *); /* Used in tree-vect-loop-manip.c */ -extern void determine_peel_for_niter (loop_vec_info); +extern opt_result vect_determine_partial_vectors_and_peeling (loop_vec_info, + bool); /* Used in gimple-loop-interchange.c and tree-parloops.c. */ extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree, enum tree_code); diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 47cfa6f4061..7cf00e6eed4 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epilog) rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); } +/* EPILOGUE_VINFO is an epilogue loop that we now know would need to + iterate exactly CONST_NITERS times. Make a final decision about + whether the epilogue loop should be used, returning true if so. */ + +static bool +vect_update_epilogue_niters (loop_vec_info epilogue_vinfo, + unsigned HOST_WIDE_INT const_niters) +{ + /* Avoid wrap-around when computing const_niters - 1. Also reject + using an epilogue loop for a single scalar iteration, even if + we could in principle implement that using partial vectors. */ + unsigned int gap_niters = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); + if (const_niters <= gap_niters + 1) + return false; + + /* Install the number of iterations. */ + tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo)); + tree niters_tree = build_int_cst (niters_type, const_niters); + tree nitersm1_tree = build_int_cst (niters_type, const_niters - 1); + + LOOP_VINFO_NITERS (epilogue_vinfo) = niters_tree; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) = nitersm1_tree; + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. */ + return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, true); +} + /* Function vect_do_peeling. Input: @@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, int estimated_vf; int prolog_peeling = 0; bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0; + bool vect_epilogues_updated_niters = false; /* We currently do not support prolog peeling if the target alignment is not known at compile time. 'vect_gen_prolog_loop_niters' depends on the target alignment being constant. */ @@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (vect_epilogues && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && prolog_peeling >= 0 - && known_eq (vf, lowest_vf) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo)) + && known_eq (vf, lowest_vf)) { unsigned HOST_WIDE_INT eiters = (LOOP_VINFO_INT_NITERS (loop_vinfo) @@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, eiters = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); - unsigned int ratio; - unsigned int epilogue_gaps - = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); - while (!(constant_multiple_p - (GET_MODE_SIZE (loop_vinfo->vector_mode), - GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio) - && eiters >= lowest_vf / ratio + epilogue_gaps)) + while (!vect_update_epilogue_niters (epilogue_vinfo, eiters)) { delete epilogue_vinfo; epilogue_vinfo = NULL; @@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; loop_vinfo->epilogue_vinfos.ordered_remove (0); - epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); } + vect_epilogues_updated_niters = true; } /* Prolog loop may be skipped. */ bool skip_prolog = (prolog_peeling != 0); @@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, skip_e edge. */ if (skip_vector) { - gcc_assert (update_e != NULL && skip_e != NULL); + gcc_assert (update_e != NULL + && skip_e != NULL + && !vect_epilogues_updated_niters); gphi *new_phi = create_phi_node (make_ssa_name (TREE_TYPE (niters)), update_e->dest); tree new_ssa = make_ssa_name (TREE_TYPE (niters)); @@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, niters = PHI_RESULT (new_phi); } - /* Subtract the number of iterations performed by the vectorized loop - from the number of total iterations. */ - tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), - before_loop_niters, - niters); - - LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; - LOOP_VINFO_NITERSM1 (epilogue_vinfo) - = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), - epilogue_niters, - build_one_cst (TREE_TYPE (epilogue_niters))); - /* Set ADVANCE to the number of iterations performed by the previous loop and its prologue. */ *advance = niters; - /* Redo the peeling for niter analysis as the NITERs and alignment - may have been updated to take the main loop into account. */ - determine_peel_for_niter (epilogue_vinfo); + if (!vect_epilogues_updated_niters) + { + /* Subtract the number of iterations performed by the vectorized loop + from the number of total iterations. */ + tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), + before_loop_niters, + niters); + + LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) + = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), + epilogue_niters, + build_one_cst (TREE_TYPE (epilogue_niters))); + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. + We should have rejected the loop during the analysis phase + if this fails. */ + if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo, + true)) + gcc_unreachable (); + } } adjust_vec.release (); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b1a6e1508c7..bb9d87fd200 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -814,7 +814,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) vec_outside_cost (0), vec_inside_cost (0), vectorizable (false), - can_use_partial_vectors_p (true), + can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), using_partial_vectors_p (false), epil_using_partial_vectors_p (false), peeling_for_gaps (false), @@ -2003,22 +2003,123 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo) } } +/* Determine if operating on full vectors for LOOP_VINFO might leave + some scalar iterations still to do. If so, decide how we should + handle those scalar iterations. The possibilities are: -/* Decides whether we need to create an epilogue loop to handle - remaining scalar iterations and sets PEELING_FOR_NITERS accordingly. */ + (1) Make LOOP_VINFO operate on partial vectors instead of full vectors. + In this case: -void -determine_peel_for_niter (loop_vec_info loop_vinfo) + LOOP_VINFO_USING_PARTIAL_VECTORS_P == true + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false + LOOP_VINFO_PEELING_FOR_NITER == false + + (2) Make LOOP_VINFO operate on full vectors and use an epilogue loop + to handle the remaining scalar iterations. In this case: + + LOOP_VINFO_USING_PARTIAL_VECTORS_P == false + LOOP_VINFO_PEELING_FOR_NITER == true + + There are two choices: + + (2a) Consider vectorizing the epilogue loop at the same VF as the + main loop, but using partial vectors instead of full vectors. + In this case: + + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == true + + (2b) Consider vectorizing the epilogue loop at lower VFs only. + In this case: + + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P == false + + When FOR_EPILOGUE_P is true, make this determination based on the + assumption that LOOP_VINFO is an epilogue loop, otherwise make it + based on the assumption that LOOP_VINFO is the main loop. The caller + has made sure that the number of iterations is set appropriately for + this value of FOR_EPILOGUE_P. */ + +opt_result +vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, + bool for_epilogue_p) { - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; + /* Determine whether there would be any scalar iterations left over. */ + bool need_peeling_or_partial_vectors_p + = vect_need_peeling_or_partial_vectors_p (loop_vinfo); - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - /* The main loop handles all iterations. */ - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false; - else if (vect_need_peeling_or_partial_vectors_p (loop_vinfo)) - LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; -} + /* Decide whether to vectorize the loop with partial vectors. */ + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && need_peeling_or_partial_vectors_p) + { + /* For partial-vector-usage=1, try to push the handling of partial + vectors to the epilogue, with the main loop continuing to operate + on full vectors. + + ??? We could then end up failing to use partial vectors if we + decide to peel iterations into a prologue, and if the main loop + then ends up processing fewer than VF iterations. */ + if (param_vect_partial_vector_usage == 1 + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) + && !vect_known_niters_smaller_than_vf (loop_vinfo)) + LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; + else + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; + } + + if (dump_enabled_p ()) + { + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + dump_printf_loc (MSG_NOTE, vect_location, + "operating on partial vectors%s.\n", + for_epilogue_p ? " for epilogue loop" : ""); + else + dump_printf_loc (MSG_NOTE, vect_location, + "operating only on full vectors%s.\n", + for_epilogue_p ? " for epilogue loop" : ""); + } + if (for_epilogue_p) + { + loop_vec_info orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); + gcc_assert (orig_loop_vinfo); + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + gcc_assert (known_lt (LOOP_VINFO_VECT_FACTOR (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))); + } + + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + { + /* Check that the loop processes at least one full vector. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + tree scalar_niters = LOOP_VINFO_NITERS (loop_vinfo); + if (known_lt (wi::to_widest (scalar_niters), vf)) + return opt_result::failure_at (vect_location, + "loop does not have enough iterations" + " to support vectorization.\n"); + + /* If we need to peel an extra epilogue iteration to handle data + accesses with gaps, check that there are enough scalar iterations + available. + + The check above is redundant with this one when peeling for gaps, + but the distinction is useful for diagnostics. */ + tree scalar_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo); + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + && known_lt (wi::to_widest (scalar_nitersm1), vf)) + return opt_result::failure_at (vect_location, + "loop does not have enough iterations" + " to support peeling for gaps.\n"); + } + + LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) + = (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && need_peeling_or_partial_vectors_p); + + return opt_result::success (); +} /* Function vect_analyze_loop_2. @@ -2272,72 +2373,32 @@ start_over: LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } - /* Decide whether to vectorize a loop with partial vectors for - this vectorization factor. */ - if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) - { - /* Don't use partial vectors if we don't need to peel the loop. */ - if (param_vect_partial_vector_usage == 0 - || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - else if (vect_verify_full_masking (loop_vinfo) - || vect_verify_loop_lens (loop_vinfo)) - { - /* The epilogue and other known niters less than VF - cases can still use vector access with length fully. */ - if (param_vect_partial_vector_usage == 1 - && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) - && !vect_known_niters_smaller_than_vf (loop_vinfo)) - { - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - } - else - LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; - - if (dump_enabled_p ()) - { - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - dump_printf_loc (MSG_NOTE, vect_location, - "operating on partial vectors.\n"); - else - dump_printf_loc (MSG_NOTE, vect_location, - "operating only on full vectors.\n"); - } - - /* If epilog loop is required because of data accesses with gaps, - one additional iteration needs to be peeled. Check if there is - enough iterations for vectorization. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) - { - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo); - - if (known_lt (wi::to_widest (scalar_niters), vf)) - return opt_result::failure_at (vect_location, - "loop has no enough iterations to" - " support peeling for gaps.\n"); - } + /* If we still have the option of using partial vectors, + check whether we can generate the necessary loop controls. */ + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && !vect_verify_full_masking (loop_vinfo) + && !vect_verify_loop_lens (loop_vinfo)) + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) return opt_result::failure_at (vect_location, "Vectorization factor too high for" " epilogue loop.\n"); + /* Decide whether this loop_vinfo should use partial vectors or peeling, + assuming that the loop will be used as a main loop. We will redo + this analysis later if we instead decide to use the loop as an + epilogue loop. */ + ok = vect_determine_partial_vectors_and_peeling (loop_vinfo, false); + if (!ok) + return ok; + /* Check the costings of the loop make vectorizing worthwhile. */ res = vect_analyze_loop_costing (loop_vinfo); if (res < 0) @@ -2350,7 +2411,6 @@ start_over: return opt_result::failure_at (vect_location, "Loop costings not worthwhile.\n"); - determine_peel_for_niter (loop_vinfo); /* If an epilogue loop is required make sure we can create one. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c index ebb2f45c917..d248f091b52 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c @@ -10,6 +10,6 @@ /* { dg-final { scan-assembler-times {\mlxvx?\M} 20 } } */ /* { dg-final { scan-assembler-times {\mstxvx?\M} 10 } } */ -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */ -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mlxvl\M} 14 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c index 9d403287923..a27ee347ca1 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c @@ -8,4 +8,4 @@ #include "p9-vec-length-7.h" -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c index 6b54a29efaa..961df0d5646 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c @@ -8,5 +8,5 @@ #include "p9-vec-length-8.h" -/* { dg-final { scan-assembler-times {\mlxvl\M} 30 } } */ -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */ +/* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */