https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87064

--- Comment #16 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #13)
> So, both the following patches should fix it IMHO, but no idea which one if
> any is right.
> With
> --- gcc/config/rs6000/vsx.md.jj       2019-01-01 12:37:44.305529527 +0100
> +++ gcc/config/rs6000/vsx.md  2019-01-18 18:07:37.194899062 +0100
> @@ -4356,7 +4356,9 @@
>    ""
>    [(const_int 0)]
>  {
> -  rtx hi = gen_highpart (DFmode, operands[1]);
> +  rtx hi = (BYTES_BIG_ENDIAN
> +         ? gen_highpart (DFmode, operands[1])
> +         : gen_lowpart (DFmode, operands[1]));
>    rtx lo = (GET_CODE (operands[2]) == SCRATCH)
>           ? gen_reg_rtx (DFmode)
>           : operands[2];
> 
> the assembly changes:
> --- reduction-3.s1    2019-01-18 18:05:14.313229730 +0100
> +++ reduction-3.s2    2019-01-18 18:10:20.617233358 +0100
> @@ -27,7 +27,7 @@ MAIN__._omp_fn.0:
>       addi 9,9,16
>       bdnz .L2
>        # vec_extract to same register
> -     lfd 12,-8(1)
> +     lfd 12,-16(1)
>       xsmaxdp 0,12,0
>       stfd 0,0(10)
>       blr
> with:
> --- gcc/config/rs6000/vsx.md.jj       2019-01-01 12:37:44.305529527 +0100
> +++ gcc/config/rs6000/vsx.md  2019-01-18 18:16:30.680186709 +0100
> @@ -4361,7 +4361,9 @@
>           ? gen_reg_rtx (DFmode)
>           : operands[2];
>  
> -  emit_insn (gen_vsx_extract_v2df (lo, operands[1], const1_rtx));
> +  emit_insn (gen_vsx_extract_v2df (lo, operands[1],
> +                                BYTES_BIG_ENDIAN
> +                                ? const1_rtx : const0_rtx));
>    emit_insn (gen_<VEC_reduc_rtx>df3 (operands[0], hi, lo));
>    DONE;
>  }

This is what looks right to me.  This code all pre-dates little-endian support,
and I think we missed changing the element to be extracted in this spot.  There
is probably something wrong with _v4sf_scalar also -- the gen_vsx_xxsldwi_v4sf
probably needs to be adjusted also for little-endian, but I have a hard time
following this code and I'm not certain.

Bill

> the assembly changes:
> --- reduction-3.s1    2019-01-18 18:05:14.313229730 +0100
> +++ reduction-3.s3    2019-01-18 18:17:18.977397458 +0100
> @@ -26,7 +26,7 @@ MAIN__._omp_fn.0:
>       xxpermdi 0,0,0,2
>       addi 9,9,16
>       bdnz .L2
> -      # vec_extract to same register
> +     xxpermdi 0,0,0,3
>       lfd 12,-8(1)
>       xsmaxdp 0,12,0
>       stfd 0,0(10)
> 
> So just judging from this exact testcase, the first patch seems to be more
> efficient, though still unsure about that, because it goes through memory in
> either case, wouldn't it be better to emit a xxpermdi from 0 to 12 that
> swaps the two elements instead of loading it from memory?

Reply via email to