Richard! I attached vect dump for hte part of attached test-case which illustrated how vectorization of epilogues works through masking: #define SIZE 1023 #define ALIGN 64
extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, __SIZE_TYPE__ size) __attribute__((weak)); extern void free (void *); void __attribute__((noinline)) test_citer (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c) { int i; a = (int *)__builtin_assume_aligned (a, ALIGN); b = (int *)__builtin_assume_aligned (b, ALIGN); c = (int *)__builtin_assume_aligned (c, ALIGN); for (i = 0; i < SIZE; i++) c[i] = a[i] + b[i]; } It was compiled with -mavx2 --param vect-epilogues-mask=1 options. I did not include in this patch vectorization of low trip-count loops since in the original patch additional parameter was introduced: +DEFPARAM (PARAM_VECT_SHORT_LOOPS, + "vect-short-loops", + "Enable vectorization of low trip count loops using masking.", + 0, 0, 1) I assume that this ability can be included very quickly but it requires cost model enhancements also. Best regards. Yuri. 2016-11-28 17:39 GMT+03:00 Richard Biener <rguent...@suse.de>: > On Thu, 24 Nov 2016, Yuri Rumyantsev wrote: > >> Hi All, >> >> Here is the second patch which supports epilogue vectorization using >> masking without cost model. Currently it is possible >> only with passing parameter "--param vect-epilogues-mask=1". >> >> Bootstrapping and regression testing did not show any new regression. >> >> Any comments will be appreciated. > > Going over the patch the main question is one how it works -- it looks > like the decision whether to vectorize & mask the epilogue is made > when vectorizing the loop that generates the epilogue rather than > in the epilogue vectorization path? > > That is, I'd have expected to see this handling low-trip count loops > by masking? And thus masking the epilogue simply by it being > low-trip count? > > Richard. > >> ChangeLog: >> 2016-11-24 Yuri Rumyantsev <ysrum...@gmail.com> >> >> * params.def (PARAM_VECT_EPILOGUES_MASK): New. >> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var. >> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h. >> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and >> required_mask fields. >> (vect_check_required_masks_widening): New. >> (vect_check_required_masks_narrowing): New. >> (vect_get_masking_iv_elems): New. >> (vect_get_masking_iv_type): New. >> (vect_get_extreme_masks): New. >> (vect_check_required_masks): New. >> (vect_analyze_loop_operations): Call vect_check_required_masks if all >> statements can be masked. >> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound. >> Add check that epilogue can be masked with the same vf with issue >> fail notes. Allow epilogue vectorization through masking of low trip >> loops. Set to true can_be_masked field before loop operation analysis. >> Do not set-up min_scalar_loop_bound for epilogue vectorization through >> masking. Do not peeling for epilogue masking. Reset can_be_masked >> field before repeat analysis. >> (vect_estimate_min_profitable_iters): Do not compute profitability >> for epilogue masking. Set up mask_loop filed to true if parameter >> PARAM_VECT_EPILOGUES_MASK is non-zero. >> (vectorizable_reduction): Add check that statement can be masked. >> (vectorizable_induction): Do not support masking for induction. >> (vect_gen_ivs_for_masking): New. >> (vect_get_mask_index_for_elems): New. >> (vect_get_mask_index_for_type): New. >> (vect_create_narrowed_masks): New. >> (vect_create_widened_masks): New. >> (vect_gen_loop_masks): New. >> (vect_mask_reduction_stmt): New. >> (vect_mask_mask_load_store_stmt): New. >> (vect_mask_load_store_stmt): New. >> (vect_mask_loop): New. >> (vect_transform_loop): Invoke vect_mask_loop if required. >> Use div_ceil to recompute upper bounds for masked loops. Issue >> statistics for epilogue vectorization through masking. Do not reduce >> vf for masking epilogue. >> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h. >> (can_mask_load_store): New. >> (vectorizable_mask_load_store): Check that mask conjuction is >> supported. Set-up first_copy_p field of stmt_vinfo. >> (vectorizable_simd_clone_call): Check that simd clone can not be >> masked. >> (vectorizable_store): Check that store can be masked. Mark the first >> copy of generated vector stores and provide it with vectype and the >> original data reference. >> (vectorizable_load): Check that load can be masked. >> (vect_stmt_should_be_masked_for_epilogue): New. >> (vect_add_required_mask_for_stmt): New. >> (vect_analyze_stmt): Add check on unsupported statements for masking >> with printing message. >> * tree-vectorizer.h (struct _loop_vec_info): Add new fields >> can_be_maske, required_masks, masl_loop. >> (LOOP_VINFO_CAN_BE_MASKED): New. >> (LOOP_VINFO_REQUIRED_MASKS): New. >> (LOOP_VINFO_MASK_LOOP): New. >> (struct _stmt_vec_info): Add first_copy_p field. >> (STMT_VINFO_FIRST_COPY_P): New. >> >> gcc/testsuite/ >> >> * gcc.dg/vect/vect-tail-mask-1.c: New test. >> >> 2016-11-18 18:54 GMT+03:00 Christophe Lyon <christophe.l...@linaro.org>: >> > On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> >> It is very strange that this test failed on arm, since it requires >> >> target avx2 to check vectorizer dumps: >> >> >> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { >> >> target avx2_runtime } } } */ >> >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED >> >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ >> >> >> >> Could you please clarify what is the reason of the failure? >> > >> > It's not the scan-dumps that fail, but the execution. >> > The test calls abort() for some reason. >> > >> > It will take me a while to rebuild the test manually in the right >> > debug environment to provide you with more traces. >> > >> > >> > >> >> >> >> Thanks. >> >> >> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.l...@linaro.org>: >> >>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> >>>> Hi All, >> >>>> >> >>>> Here is patch for non-masked epilogue vectoriziation. >> >>>> >> >>>> Bootstrap and regression testing did not show any new failures. >> >>>> >> >>>> Is it OK for trunk? >> >>>> >> >>>> Thanks. >> >>>> Changelog: >> >>>> >> >>>> 2016-11-15 Yuri Rumyantsev <ysrum...@gmail.com> >> >>>> >> >>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >> >>>> * tree-if-conv.c (tree_if_conversion): Make public. >> >>>> * * tree-if-conv.h: New file. >> >>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >> >>>> dynamic alias checks for epilogues. >> >>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >> >>>> * tree-vect-loop.c: include tree-if-conv.h. >> >>>> (new_loop_vec_info): Add zeroing orig_loop_info field. >> >>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >> >>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >> >>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >> >>>> using passed argument. >> >>>> (vect_transform_loop): Check if created epilogue should be returned >> >>>> for further vectorization with less vf. If-convert epilogue if >> >>>> required. Print vectorization success for epilogue. >> >>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >> >>>> if it is required, pass loop_vinfo produced during vectorization of >> >>>> loop body to vect_analyze_loop. >> >>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >> >>>> orig_loop_info. >> >>>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >> >>>> (LOOP_VINFO_EPILOGUE_P): New. >> >>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >> >>>> (vect_do_peeling): Change prototype to return epilogue. >> >>>> (vect_analyze_loop): Add argument of loop_vec_info type. >> >>>> (vect_transform_loop): Return created loop. >> >>>> >> >>>> gcc/testsuite/ >> >>>> >> >>>> * lib/target-supports.exp (check_avx2_hw_available): New. >> >>>> (check_effective_target_avx2_runtime): New. >> >>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >> >>>> >> >>> >> >>> Hi, >> >>> >> >>> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >> >>> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >> >>> gcc.dg/vect/vect-tail-nomask-1.c execution test >> >>> >> >>> It does pass on the same target if configured --with-cpu=cortex-a9. >> >>> >> >>> Christophe >> >>> >> >>> >> >>> >> >>>> >> >>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>: >> >>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >> >>>>> <ysrum...@gmail.com> wrote: >> >>>>>>Richard, >> >>>>>> >> >>>>>>I checked one of the tests designed for epilogue vectorization using >> >>>>>>patches 1 - 3 and found out that build compiler performs vectorization >> >>>>>>of epilogues with --param vect-epilogues-nomask=1 passed: >> >>>>>> >> >>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >> >>>>>>t1.new-nomask.s -fdump-tree-vect-details >> >>>>>>$ grep VECTORIZED -c t1.c.156t.vect >> >>>>>>4 >> >>>>>> Without param only 2 loops are vectorized. >> >>>>>> >> >>>>>>Should I simply add a part of tests related to this feature or I must >> >>>>>>delete all not necessary changes also? >> >>>>> >> >>>>> Please remove all not necessary changes. >> >>>>> >> >>>>> Richard. >> >>>>> >> >>>>>>Thanks. >> >>>>>>Yuri. >> >>>>>> >> >>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>: >> >>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >> >>>>>>> >> >>>>>>>> Richard, >> >>>>>>>> >> >>>>>>>> In my previous patch I forgot to remove couple lines related to aux >> >>>>>>field. >> >>>>>>>> Here is the correct updated patch. >> >>>>>>> >> >>>>>>> Yeah, I noticed. This patch would be ok for trunk (together with >> >>>>>>> necessary parts from 1 and 2) if all not required parts are removed >> >>>>>>> (and you'd add the testcases covering non-masked tail vect). >> >>>>>>> >> >>>>>>> Thus, can you please produce a single complete patch containing only >> >>>>>>> non-masked epilogue vectoriziation? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Richard. >> >>>>>>> >> >>>>>>>> Thanks. >> >>>>>>>> Yuri. >> >>>>>>>> >> >>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>: >> >>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >> >>>>>>>> > >> >>>>>>>> >> Richard, >> >>>>>>>> >> >> >>>>>>>> >> I prepare updated 3 patch with passing additional argument to >> >>>>>>>> >> vect_analyze_loop as you proposed (untested). >> >>>>>>>> >> >> >>>>>>>> >> You wrote: >> >>>>>>>> >> tw, I wonder if you can produce a single patch containing just >> >>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out >> >>>>>>>> >> changes only needed by later patches? >> >>>>>>>> >> >> >>>>>>>> >> Did you mean that I exclude all support for vectorization >> >>>>>>epilogues, >> >>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes >> >>>>>>>> >> like >> >>>>>>>> >> >> >>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> >>>>>>>> >> index 11863af..32011c1 100644 >> >>>>>>>> >> --- a/gcc/tree-vect-loop.c >> >>>>>>>> >> +++ b/gcc/tree-vect-loop.c >> >>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >> >>>>>>>> >> LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >> >>>>>>>> >> LOOP_VINFO_PEELING_FOR_NITER (res) = false; >> >>>>>>>> >> LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >> >>>>>>>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >> >>>>>>>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >> >>>>>>>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >> >>>>>>>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >> >>>>>>>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >> >>>>>>>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >> >>>>>>>> > >> >>>>>>>> > Yes. >> >>>>>>>> > >> >>>>>>>> >> Did you mean also that new combined patch must be working patch, >> >>>>>>i.e. >> >>>>>>>> >> can be integrated without other patches? >> >>>>>>>> > >> >>>>>>>> > Yes. >> >>>>>>>> > >> >>>>>>>> >> Could you please look at updated patch? >> >>>>>>>> > >> >>>>>>>> > Will do. >> >>>>>>>> > >> >>>>>>>> > Thanks, >> >>>>>>>> > Richard. >> >>>>>>>> > >> >>>>>>>> >> Thanks. >> >>>>>>>> >> Yuri. >> >>>>>>>> >> >> >>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>: >> >>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >> >>>>>>>> >> > >> >>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >> >>>>>>>> >> >> >> >>>>>>>> >> >> > Richard, >> >>>>>>>> >> >> > >> >>>>>>>> >> >> > Here is updated 3 patch. >> >>>>>>>> >> >> > >> >>>>>>>> >> >> > I checked that all new tests related to epilogue >> >>>>>>vectorization passed with it. >> >>>>>>>> >> >> > >> >>>>>>>> >> >> > Your comments will be appreciated. >> >>>>>>>> >> >> >> >>>>>>>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >> >>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >> >>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that >> >>>>>>>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >> >>>>>>>> >> >> original vectorization factor? So we can pass down an >> >>>>>>(optional) >> >>>>>>>> >> >> forced vectorization factor as well? >> >>>>>>>> >> > >> >>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just >> >>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >> >>>>>>>> >> > changes only needed by later patches? >> >>>>>>>> >> > >> >>>>>>>> >> > Thanks, >> >>>>>>>> >> > Richard. >> >>>>>>>> >> > >> >>>>>>>> >> >> Richard. >> >>>>>>>> >> >> >> >>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener >> >>>>>><rguent...@suse.de>: >> >>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >> >>>>>>>> >> >> > > >> >>>>>>>> >> >> > >> Hi Richard, >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> I did not understand your last remark: >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> >>>>>>>> >> >> > >> > && dump_enabled_p ()) >> >>>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >> >>>>>>vect_location, >> >>>>>>>> >> >> > >> > "loop vectorized\n"); >> >>>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >> >>>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >> >>>>>>>> >> >> > >> > num_vectorized_loops++; >> >>>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >> >>>>>>it to be unrolled >> >>>>>>>> >> >> > >> > etc. */ >> >>>>>>>> >> >> > >> > loop->force_vectorize = false; >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >> >>>>>>it easier >> >>>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >> >>>>>>in dumps >> >>>>>>>> >> >> > >> > + put new loop as the next loop to process. >> >>>>>>*/ >> >>>>>>>> >> >> > >> > + if (new_loop) >> >>>>>>>> >> >> > >> > + { >> >>>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >> >>>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >> >>>>>>>> >> >> > >> > + } >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >> >>>>>>new_loop) >> >>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also >> >>>>>>perform >> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there). >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >> >>>>>>vectorization >> >>>>>>>> >> >> > >> > separately that would be great. >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> Could you please clarify your proposal. >> >>>>>>>> >> >> > > >> >>>>>>>> >> >> > > When a loop was vectorized set things up to immediately >> >>>>>>vectorize >> >>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and >> >>>>>>avoiding >> >>>>>>>> >> >> > > the re-use of ->aux. >> >>>>>>>> >> >> > > >> >>>>>>>> >> >> > > Richard. >> >>>>>>>> >> >> > > >> >>>>>>>> >> >> > >> Thanks. >> >>>>>>>> >> >> > >> Yuri. >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener >> >>>>>><rguent...@suse.de>: >> >>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> >> Hi All, >> >>>>>>>> >> >> > >> >> >> >>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review >> >>>>>>which support >> >>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low >> >>>>>>trip count. We >> >>>>>>>> >> >> > >> >> assume that the only patch - >> >>>>>>vec-tails-07-combine-tail.patch - was not >> >>>>>>>> >> >> > >> >> approved by Jeff. >> >>>>>>>> >> >> > >> >> >> >>>>>>>> >> >> > >> >> I did re-base of all patches and performed >> >>>>>>bootstrapping and >> >>>>>>>> >> >> > >> >> regression testing that did not show any new failures. >> >>>>>>Also all >> >>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have >> >>>>>>been changed >> >>>>>>>> >> >> > >> >> accordingly. >> >>>>>>>> >> >> > >> >> >> >>>>>>>> >> >> > >> >> Is it OK for trunk? >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > I would have prefered that the series up to >> >>>>>>-03-nomask-tails would >> >>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but >> >>>>>>unfortunately >> >>>>>>>> >> >> > >> > the patchset is oddly separated. >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > I have a comment on that part nevertheless: >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment >> >>>>>>(loop_vec_info >> >>>>>>>> >> >> > >> > loop_vinfo) >> >>>>>>>> >> >> > >> > /* Check if we can possibly peel the loop. */ >> >>>>>>>> >> >> > >> > if (!vect_can_advance_ivs_p (loop_vinfo) >> >>>>>>>> >> >> > >> > || !slpeel_can_duplicate_loop_p (loop, >> >>>>>>single_exit (loop)) >> >>>>>>>> >> >> > >> > - || loop->inner) >> >>>>>>>> >> >> > >> > + || loop->inner >> >>>>>>>> >> >> > >> > + /* Required peeling was performed in prologue >> >>>>>>and >> >>>>>>>> >> >> > >> > + is not required for epilogue. */ >> >>>>>>>> >> >> > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >> >>>>>>>> >> >> > >> > do_peeling = false; >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > if (do_peeling >> >>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment >> >>>>>>(loop_vec_info >> >>>>>>>> >> >> > >> > loop_vinfo) >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > do_versioning = >> >>>>>>>> >> >> > >> > optimize_loop_nest_for_speed_p (loop) >> >>>>>>>> >> >> > >> > - && (!loop->inner); /* FORNOW */ >> >>>>>>>> >> >> > >> > + && (!loop->inner) /* FORNOW */ >> >>>>>>>> >> >> > >> > + /* Required versioning was performed for the >> >>>>>>>> >> >> > >> > + original loop and is not required for >> >>>>>>epilogue. */ >> >>>>>>>> >> >> > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > if (do_versioning) >> >>>>>>>> >> >> > >> > { >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > please do that check in the single caller of this >> >>>>>>function. >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I >> >>>>>>believe that simply >> >>>>>>>> >> >> > >> > passing down info from the processed parent would be >> >>>>>>_much_ cleaner. >> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >> >>>>>>>> >> >> > >> > && dump_enabled_p ()) >> >>>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >> >>>>>>vect_location, >> >>>>>>>> >> >> > >> > "loop vectorized\n"); >> >>>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >> >>>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >> >>>>>>>> >> >> > >> > num_vectorized_loops++; >> >>>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >> >>>>>>it to be unrolled >> >>>>>>>> >> >> > >> > etc. */ >> >>>>>>>> >> >> > >> > loop->force_vectorize = false; >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >> >>>>>>it easier >> >>>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >> >>>>>>in dumps >> >>>>>>>> >> >> > >> > + put new loop as the next loop to process. >> >>>>>>*/ >> >>>>>>>> >> >> > >> > + if (new_loop) >> >>>>>>>> >> >> > >> > + { >> >>>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >> >>>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >> >>>>>>>> >> >> > >> > + } >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >> >>>>>>new_loop) >> >>>>>>>> >> >> > >> > function which will set up stuff properly (and also >> >>>>>>perform >> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there). >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >> >>>>>>vectorization >> >>>>>>>> >> >> > >> > separately that would be great. >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and >> >>>>>>question its >> >>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main >> >>>>>>vector loop). >> >>>>>>>> >> >> > >> > But it has already been approved ... oh well. >> >>>>>>>> >> >> > >> > >> >>>>>>>> >> >> > >> > Thanks, >> >>>>>>>> >> >> > >> > Richard. >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > >> >> >>>>>>>> >> >> > > >> >>>>>>>> >> >> > > -- >> >>>>>>>> >> >> > > Richard Biener <rguent...@suse.de> >> >>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, >> >>>>>>Graham Norton, HRB 21284 (AG Nuernberg) >> >>>>>>>> >> >> > >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> > >> >>>>>>>> >> > -- >> >>>>>>>> >> > Richard Biener <rguent...@suse.de> >> >>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >> >>>>>>Norton, HRB 21284 (AG Nuernberg) >> >>>>>>>> >> >> >>>>>>>> > >> >>>>>>>> > -- >> >>>>>>>> > Richard Biener <rguent...@suse.de> >> >>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >> >>>>>>Norton, HRB 21284 (AG Nuernberg) >> >>>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Richard Biener <rguent...@suse.de> >> >>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >> >>>>>>Norton, HRB 21284 (AG Nuernberg) >> >>>>> >> >>>>> >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
t11.c.156t.vect
Description: Binary data