On Fri, Jan 23, 2015 at 12:23 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > The POWER8 processor greatly improves performance of unaligned vector > loads and stores. Except for certain corner cases the compiler can't > readily track, an unaligned vector load or store performs equivalently > to an aligned one. > > To exploit this in the auto-vectorizer requires two changes. The simple > one is to change the cost model to reflect the cheaper cost for POWER8 > versus previous processors. Additionally, we want to avoid generating > the masked-load sequence (load/load/lvsl/vperm) used to force alignment > of unaligned loads. Of course we must be careful to still use this > sequence if -mno-vsx is selected for POWER8 for some reason. > > (Note that POWER7 supported unaligned vector memory references, but for > best performance we have chosen to use the masked-load sequence for that > processor. This is no longer optimal for POWER8.) > > The code changes in the rs6000 back end are simple enough, but > unfortunately there is quite a bit of test case fallout. There are two > predicates in target-supports.exp that need adjustment: > * vect_no_align, which returns 1 iff the target plus current options > does not support a vector alignment mechanism; and > * vect_hw_misalign, which returns 1 iff the target supports a > misaligned vector access. > > Unlike previous processors, for POWER8+VSX we want both of these > predicates to return 1. In the former case, it isn't that we don't > support a vector alignment mechanism, but under no circumstances do we > want to use it when we have a misaligned vector access instruction that > performs well. > > As a result of these changes, many loops will now auto-vectorize on P8 > that would not on P7. Unfortunately, this causes many tests to fail, > even with the changes to target-supports.exp. The primary reason is > that the tests are testing vect_no_align in many places, where the > correct thing to test is vect_no_align && !vect_hw_misalign. That is, > the test condition should fire not just when there isn't a vector > alignment mechanism, but only when the target also doesn't support a > direct misaligned vector memory access. > > The reason this "shortcut" has worked up till now is that the set of > targets for which vect_no_align and vect_hw_misalign both return 1 has > been empty. Thus vect_no_align has been functionally equivalent to > vect_no-align && !vect_hw_misalign. Happily, this means it is also safe > to make that substitution in the failing tests without affecting other > targets. > > So, this patch contains three parts: > * Changes to rs6000.c and vector.md for the vectorization support; > * Changes to target-supports.exp to reflect POWER8's characteristics; > and > * Numerous changes to fix test cases to make them pass/fail correctly > for POWER8. > > I've tested this on POWER8 BE, POWER8 LE, and POWER7 BE, with no > regressions. A handful of existing POWER8 failures are also corrected > as a happy side effect. > > Since we're in stage 4, I obviously need to hold off till the next > release, but pending that, will this be ok for trunk? After it burns in > I would like to backport it to 4.8, 4.9, and 5. > > Thanks! > > Bill > > > [gcc] > > 2015-01-23 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_builtin_mask_for_load): Return 0 > for POWER8 so that the vectorizer will use direct unaligned loads. > (rs6000_builtin_support_vector_misalignment): Always return true > for VSX + POWER8. > (rs6000_builtin_vectorization_cost): Cost of unaligned loads and > stores on VSX + POWER8 is almost always the same as the cost of an > aligned load or store, so model it that way.
The processor test should be centralized in option_override instead explicitly testing POWER8. We don't want to have to hunt for and update multiple, explicit processor tuning references if a potential future processor with the same characteristics is supported. > * config/rs6000/vector.md (movmisalign<mode>): Misaligned loads > and stores are always permissible for VSX + POWER8. Why test POWER8 separately in this pattern instead of setting TARGET_ALLOW_MOVMISALIGN for POWER8, if the use did not explicitly invoke the option? > > [gcc/testsuite] > > 2015-01-23 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * gcc.dg/vect/bb-slp-24.c: Exclude test for POWER8. > * gcc.dg/vect/bb-slp-25.c: Likewise. > * gcc.dg/vect/bb-slp-29.c: Likewise. > * gcc.dg/vect/bb-slp-32.c: Replace vect_no_align with > vect_no_align && { ! vect_hw_misalign }. > * gcc.dg/vect/bb-slp-9.c: Likewise. > * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Exclude test for > vect_hw_misalign. > * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Likewise. > * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust tests to > account for POWER8, where peeling for alignment is not needed. > * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Replace > vect_no_align with vect_no_align && { ! vect_hw_misalign }. > * gcc.dg.vect.if-cvt-stores-vect-ifcvt-18.c: Likewise. > * gcc.dg/vect/no-scevccp-outer-6-global.c: Likewise. > * gcc.dg/vect/no-scevccp-outer-6.c: Likewise. > * gcc.dg/vect/no-vfa-vect-43.c: Likewise. > * gcc.dg/vect/no-vfa-vect-57.c: Likewise. > * gcc.dg/vect/no-vfa-vect-61.c: Likewise. > * gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise. > * gcc.dg/vect/no-vfa-vect-depend-2.c: Likewise. > * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise. > * gcc.dg/vect/pr16105.c: Likewise. > * gcc.dg/vect/pr20122.c: Likewise. > * gcc.dg/vect/pr33804.c: Likewise. > * gcc.dg/vect/pr33953.c: Likewise. > * gcc.dg/vect/pr56787.c: Likewise. > * gcc.dg/vect/pr58508.c: Likewise. > * gcc.dg/vect/slp-25.c: Likewise. > * gcc.dg/vect/vect-105-bit-array.c: Likewise. > * gcc.dg/vect/vect-105.c: Likewise. > * gcc.dg/vect/vect-27.c: Likewise. > * gcc.dg/vect/vect-29.c: Likewise. > * gcc.dg/vect/vect-33.c: Exclude unaligned access test for > POWER8. > * gcc.dg/vect/vect-42.c: Replace vect_no_align with vect_no_align > && { ! vect_hw_misalign }. > * gcc.dg/vect/vect-44.c: Likewise. > * gcc.dg/vect/vect-48.c: Likewise. > * gcc.dg/vect/vect-50.c: Likewise. > * gcc.dg/vect/vect-52.c: Likewise. > * gcc.dg/vect/vect-56.c: Likewise. > * gcc.dg/vect/vect-60.c: Likewise. > * gcc.dg/vect/vect-72.c: Likewise. > * gcc.dg/vect/vect-75-big-array.c: Likewise. > * gcc.dg/vect/vect-75.c: Likewise. > * gcc.dg/vect/vect-77-alignchecks.c: Likewise. > * gcc.dg/vect/vect-77-global.c: Likewise. > * gcc.dg/vect/vect-78-alignchecks.c: Likewise. > * gcc.dg/vect/vect-78-global.c: Likewise. > * gcc.dg/vect/vect-93.c: Likewise. > * gcc.dg/vect/vect-95.c: Likewise. > * gcc.dg/vect/vect-96.c: Likewise. > * gcc.dg/vect/vect-cond-1.c: Likewise. > * gcc.dg/vect/vect-cond-3.c: Likewise. > * gcc.dg/vect/vect-cond-4.c: Likewise. > * gcc.dg/vect/vect-cselim-1.c: Likewise. > * gcc.dg/vect/vect-multitypes-1.c: Likewise. > * gcc.dg/vect/vect-multitypes-3.c: Likewise.This would be better > implemented as a test of a new variable set in option_override instead of > explicitly testing POWER8 because > * gcc.dg/vect/vect-multitypes-4.c: Likewise. > * gcc.dg/vect/vect-multitypes-6.c: Likewise. > * gcc.dg/vect/vect-nest-cycle-1.c: Likewise. > * gcc.dg/vect/vect-nest-cycle-2.c: Likewise. > * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise. > * gcc.dg/vect/vect-outer-3a.c: Likewise. > * gcc.dg/vect/vect-outer-5.c: Likewise. > * gcc.dg/vect/vect-outer-fir-big-array.c: Likewise. > * gcc.dg/vect/vect-outer-fir-lb-big-array.c: Likewise. > * gcc.dg/vect/vect-outer-fir-lb.c: Likewise. > * gcc.dg/vect/vect-outer-fir.c: Likewise. > * gcc.dg/vect/vect-peel-3.c: Likewise. > * gcc.dg/vect/vect-peel-4.c: Likewise. > * gcc.dg/vect/vect-pre-interact.c: Likewise. > * gcc.target/powerpc/vsx-vectorize-2.c: Exclude test for POWER8. > * gcc.target/powerpc/vsx-vectorize-4.c: Likewise. > * gcc.target/powerpc/vsx-vectorize-6.c: LikewisThis would be better > implemented as a test of a new variable set in option_override instead of > explicitly testing POWER8 becausee. > * gcc.target/powerpc/vsx-vectorize-7.c: Likewise. > * gfortran.dg/vect/vect-2.f90: Replace vect_no_align with > vect_no_align && { ! vect_hw_misalign }. > * gfortran.dg/vect/vect-3.f90: Likewise. > * gfortran.dg/vect/vect-4.f90: Likewise. > * gfortran.dg/vect/vect-5.f90: Likewise. > * lib/target-supports.exp (check_effective_target_vect_no_align): > Return 1 for POWER8. > (check_effective_target_vect_hw_misalign): Return 1 for POWER8. This is a reasonable change, but please ask a vect maintainer like Richi or a testsuite maintainer to approve. Thanks, David