Aaron Sawdey has been running the GCC testsuite on the power9 simulator and he noticed that:
gcc.c-torture/execute/pr68532.c does not run, and opened bug 78243 for this failure. Now, if you compile pr68532 with normal options (-O2/-O3 and -mcpu=power8 or -mcpu=power9) it works because with the normal powerpc vectorization costs, the vectorize only generates scalar code. However, the options in the PR explicitly turns on -fno-vect-cost-model, which forces the loop to be be vectorized. In doing so, it generates pretty bad code. The vectorizer generates a vector add loop, and at the end it does a vector reduction to get the total added. When -mcpu=power9 is used it generates a VEXTRACTUH instruction to extract the HImode from the V8HImode vector. Unfortunately, on little endian (with little endian element ordering), it gets the wrong element. This patch fixes that problem. I did a bootstrap and regression test, and there were no regressions. I ran the test on the power9 simulator for both big endian and little endian options and it passed. I also ran the following executable tests from the testsuite which exercise vector init, set, and extract for each of the basic types: vec-init-1.c element type: int vec-init-2.c element type: long vec-init-4.c element type: short vec-init-5.c element type: signed char vec-init-8.c element type: float vec-init-9.c element type: double All tests passed on both little endian and big endian simulator runs. Can I check this patch into the trunk? 2016-11-10 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/78243 * config/rs6000/vsx.md (vsx_extract_<mode>_p9): Correct the element order for little endian ordering. * config/rs6000/altivec.md (reduc_plus_scal_<mode>): Use VECTOR_ELT_ORDER_BIG and not BYTES_BIG_ENDIAN to adjust element number. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 242048) +++ gcc/config/rs6000/vsx.md (revision 242049) @@ -2542,10 +2542,13 @@ (define_insn "vsx_extract_<mode>_p9" "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB && TARGET_VSX_SMALL_INTEGER" { - /* Note, the element number has already been adjusted for endianness, so we - don't have to adjust it here. */ - int unit_size = GET_MODE_UNIT_SIZE (<MODE>mode); - HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]); + HOST_WIDE_INT elt = INTVAL (operands[2]); + HOST_WIDE_INT elt_adj = ((!VECTOR_ELT_ORDER_BIG) + ? (GET_MODE_NUNITS (<MODE>mode) - 1 - elt) + : elt); + + HOST_WIDE_INT unit_size = GET_MODE_UNIT_SIZE (<MODE>mode); + HOST_WIDE_INT offset = unit_size * elt_adj; operands[2] = GEN_INT (offset); if (unit_size == 4) Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 242048) +++ gcc/config/rs6000/altivec.md (revision 242049) @@ -2785,7 +2785,7 @@ (define_expand "reduc_plus_scal_<mode>" rtx vtmp1 = gen_reg_rtx (V4SImode); rtx vtmp2 = gen_reg_rtx (<MODE>mode); rtx dest = gen_lowpart (V4SImode, vtmp2); - int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0; + int elt = VECTOR_ELT_ORDER_BIG ? GET_MODE_NUNITS (<MODE>mode) - 1 : 0; emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); emit_insn (gen_altivec_vsum4s<VI_char>s (vtmp1, operands[1], vzero));