On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: > Hi, > > The patch gives an expected 3 times gain for the test case in the PR52252 > (and even 6 times for AVX2). > It passes make check and bootstrap on x86. > spec2000/spec2006 got no regressions/gains on x86. > > Is this patch ok?
I've worked on generalizing the permutation support in the light of the availability of the generic shuffle support in the IL but hit some road-blocks in the way code-generation works for group loads with permutations (I don't remember if I posted all patches). This patch seems to be to a slightly different place but it again special-cases a specific permutation. Why's that? Why can't we support groups of size 7 for example? So - can this be generalized to support arbitrary non-power-of-two load/store groups? Other than that the patch has to wait for stage1 to open again, of course. And it misses a testcase. Btw, do you have a copyright assignment on file with the FSF covering work on GCC? Thanks, Richard. > ChangeLog: > > 2014-02-11 Evgeny Stupachenko <evstu...@gmail.com> > > * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle. > * tree-vect-data-refs.c (vect_grouped_store_supported): New > check for stores group of length 3. > (vect_permute_store_chain): New permutations for stores group of > length 3. > (vect_grouped_load_supported): New check for loads group of length > 3. > (vect_permute_load_chain): New permutations for loads group of > length 3. > * tree-vect-stmts.c (vect_model_store_cost): New cost > vec_perm_shuffle > for the new permutations. > (vect_model_load_cost): Ditto. > * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding > vec_perm_shuffle cost as equvivalent of vec_perm cost. > * config/arm/arm.c: Ditto. > * config/rs6000/rs6000.c: Ditto. > * config/spu/spu.c: Ditto. > * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow > byte > shuffle on some x86 architectures. > * config/i386/i386.h (processor_costs): Defining pshuffb cost. > * config/i386/i386.c (processor_costs): Adding pshuffb cost. > (ix86_builtin_vectorization_cost): Adding cost for the new > permutations. > Fixing cost for other permutations. > (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are > slow (TARGET_SLOW_PHUFFB). > (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. > Adding new shuffle cost only when byte shuffle is expected. > Fixing cost model for Silvermont. > > Thanks, > Evgeny > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer