On Wed, Oct 9, 2013 at 7:11 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > This is a follow-up to the recent patch that fixed constant permute > control vectors for little endian. When the control vector is constant, > we can adjust the constant and use a vperm without increasing code size. > When the control vector is unknown, however, we have to generate two > additional instructions to subtract each element of the control vector > from 31 (equivalently, from -1, since only 5 bits are pertinent). This > patch adds the additional code generation. > > There are two main paths to the affected permutes: via the known > pattern vec_perm<mode>, and via an altivec builtin. The builtin path > causes a little difficulty because there's no way to dispatch a builtin > to two different insns for BE and LE. I solved this by adding two new > unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X). The > insns for the builtins are changed from a define_insn to a > define_insn_and_split. We create the _X forms at expand time and later > split them into the correct sequences for BE and LE, using the "real" > UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction. > > For the path via the known pattern, I added a new routine in rs6000.c in > similar fashion to the solution for the constant control vector case. > > When the permute control vector is a rotate vector loaded by lvsl or > lvsr, we can generate the desired control vector more cheaply by simply > changing to use the opposite instruction. We are already doing that > when expanding an unaligned load. The changes in vector.md avoid > undoing that effort by circumventing the subtract-from-splat (going > straight to the UNSPEC_VPERM). > > I bootstrapped and tested this for big endian on > powerpc64-unknown-linux-gnu with no new regressions. I did the same for > little endian on powerpc64le-unknown-linux-gnu. Here the results were > slightly mixed: the changes fix 32 test failures, but expose an > unrelated bug in 9 others when -mvsx is permitted on LE (not currently > allowed). The bug is a missing permute for a vector load in the > unaligned vector load logic that will be fixed in a subsequent patch. > > Is this okay for trunk? > > Thanks, > Bill > > > 2013-10-09 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm > directly to circumvent subtract from splat{31} workaround. > * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New > prototype. > * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New. > * config/rs6000/altivec.md (define_c_enum "unspec"): Add > UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X. > (altivec_vperm_<mode>): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_internal): New define_insn. > (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to > separate big and little endian logic. > (*altivec_vperm_<mode>_uns_internal): New define_insn. > (vec_permv16qi): Add little endian logic.
Okay. Thanks, David