Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Richard Biener Tue, 11 Feb 2014 05:02:35 -0800

On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:

> Hi,
> 
> The patch gives an expected 3 times gain for the test case in the PR52252
> (and even 6 times for AVX2).
> It passes make check and bootstrap on x86.
> spec2000/spec2006 got no regressions/gains on x86.
> 
> Is this patch ok?


I've worked on generalizing the permutation support in the light
of the availability of the generic shuffle support in the IL
but hit some road-blocks in the way code-generation works for
group loads with permutations (I don't remember if I posted all patches).

This patch seems to be to a slightly different place but it again
special-cases a specific permutation.  Why's that?  Why can't we
support groups of size 7 for example?  So - can this be generalized
to support arbitrary non-power-of-two load/store groups?

Other than that the patch has to wait for stage1 to open again,
of course.  And it misses a testcase.

Btw, do you have a copyright assignment on file with the FSF covering
work on GCC?

Thanks,
Richard.

> ChangeLog:
> 
> 2014-02-11  Evgeny Stupachenko  <evstu...@gmail.com>
> 
>         * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle.
>         * tree-vect-data-refs.c (vect_grouped_store_supported): New
>         check for stores group of length 3.
>         (vect_permute_store_chain): New permutations for stores group of
>         length 3.
>         (vect_grouped_load_supported): New check for loads group of length
> 3.
>         (vect_permute_load_chain): New permutations for loads group of
> length 3.
>         * tree-vect-stmts.c (vect_model_store_cost): New cost
> vec_perm_shuffle
>         for the new permutations.
>         (vect_model_load_cost): Ditto.
>         * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>         vec_perm_shuffle cost as equvivalent of vec_perm cost.
>         * config/arm/arm.c: Ditto.
>         * config/rs6000/rs6000.c: Ditto.
>         * config/spu/spu.c: Ditto.
>         * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
> byte
>         shuffle on some x86 architectures.
>         * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>         * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>         (ix86_builtin_vectorization_cost): Adding cost for the new
> permutations.
>         Fixing cost for other permutations.
>         (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>         slow (TARGET_SLOW_PHUFFB).
>         (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>         Adding new shuffle cost only when byte shuffle is expected.
>         Fixing cost model for Silvermont.
> 
> Thanks,
> Evgeny
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

Reply via email to