On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko <evstu...@gmail.com> wrote: > Testing finished. No new regressions. > Is the following patch ok?
+ if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 || + !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain)) ||s and &&s go to the next line. I miss testcases that make sure the vectorizer/backend code-paths are both exercised. Put them in gcc.target/i386 and provide an appropriate -march. The vectorizer changes are ok with the above fixed, I defer to backend maintainers for the i386 changes. Richard. > 2014-06-11 Evgeny Stupachenko <evstu...@gmail.com> > > * config/i386/i386.c (ix86_reassociation_width): Add alternative for > vector case. > * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. > * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. > * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. > Introduces alternative way of loads group permutaions. > (vect_transform_grouped_load): Try alternative way of permutations. > > Thanks, > Evgeny > > On Tue, Jun 10, 2014 at 4:43 PM, Evgeny Stupachenko <evstu...@gmail.com> > wrote: >> ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which >> include vector mode. >> I'll try to separate this into scalar and vector part, but it will >> require more testing (under the testing now). >> What about the rest of the patch? >> >> Thanks, >> Evgeny >> >> On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan >> <ramana.radhakrish...@arm.com> wrote: >>> On 06/05/14 12:43, Evgeny Stupachenko wrote: >>>> >>>> New hook is related to vector instructions only. Vector instructions >>>> could be sequential in pipeline, but scalar - parallel. For x86 >>>> architectures TARGET_SCHED_REASSOC_WIDTH does not give required >>>> differentiation. >>>> General hooks could be potentially reused in other algorithms/by other >>>> architectures. >>> >>> >>> It already takes a "mode" argument. Couldn't you use a vector mode to work >>> this out ? >>> >>> If it is not enough then please be more specific about the documentation of >>> this hook about where it is useful so that it's easy for people reading the >>> documentation to understand at a glance what purpose it serves. >>> >>> >>> Ramana >>> >>> >>>> >>>> Thanks, >>>> Evgeny >>>> >>>> On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan >>>> <ramana....@googlemail.com> wrote: >>>>> >>>>> On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko <evstu...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> The patch introduces alternative way of permutations for load groups >>>>>> of size 2 and 3 which should be faster on architectures with low >>>>>> parallelism. >>>>>> The patch gives 2 times gain on Silvermont to the test from PR52252 >>>>>> (in addition to already committed 3 times gain). >>>>>> >>>>>> Patch passes bootstrap on x86. Make check is in progress. >>>>> >>>>> >>>>> Why do we need a new hook ? Can't you derive this information from >>>>> something which is equally badly named TARGET_SCHED_REASSOC_WIDTH >>>>> though used in the reassociation logic but also serves a similar >>>>> purpose ? >>>>> >>>>> Also the documentation of this hook is incomplete at best and wrong at >>>>> worst as this is not applied everywhere in the vectorizer but just for >>>>> this special case for load store permuting. Implying this is useful >>>>> everywhere in the vectorizer does not appear to be correct. >>>>> >>>>> regards >>>>> Ramana >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> ChangeLog: >>>>>> >>>>>> 2014-05-28 Evgeny Stupachenko <evstu...@gmail.com> >>>>>> >>>>>> * config/i386/i386.c (ix86_have_vector_parallel_execution): >>>>>> New. >>>>>> (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New. >>>>>> * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. >>>>>> * config/i386/x86-tune.def >>>>>> (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. >>>>>> * target.def (have_vector_parallel_execution): New. >>>>>> * doc/tm.texi.in (have_vector_parallel_execution)): New. >>>>>> * doc/tm.texi: Regenerate. >>>>>> * targhooks.c (default_have_vector_parallel_execution): New. >>>>>> * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. >>>>>> Introduces alternative way of loads group permutaions. >>>>>> (vect_transform_grouped_load): Try alternative way of >>>>>> permutaions. >>>>>> >>>>>> Evgeny >>>> >>>> >>>