On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > On Wed, Sep 15, 2021 at 10:10 AM <[email protected]> wrote:
> > > >
> > > > From: "H.J. Lu" <[email protected]>
> > > >
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > > TARGET_USE_VECTOR_CONVERTS when
> > > > handling avx_partial_xmm_update attribute. Don't convert AVX partial
> > > > XMM register update if vector packed SSE conversion should be used.
> > > >
> > > > gcc/
> > > >
> > > > PR target/101900
> > > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS and
> > > TARGET_USE_VECTOR_CONVERTS
> > > > before generating vxorps.
> > > >
> > > > gcc/
> > > >
> > > > PR target/101900
> > > > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > > > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > > > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > > > ---
> > > > gcc/config/i386/i386-features.c | 21 ++++++++++++++++++---
> > > > gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++++++++++++++++++
> > > > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++++++++++++++++++
> > > > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++++++++++++++++++
> > > > 4 files changed, 73 insertions(+), 3 deletions(-) create mode 100644
> > > > gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.c
> > > > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > > > 100644
> > > > --- a/gcc/config/i386/i386-features.c
> > > > +++ b/gcc/config/i386/i386-features.c
> > > > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> > > > != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > > > continue;
> > > >
> > > > - if (!v4sf_const0)
> > > > - v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > > -
> > > > /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
> > > > SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> > > > vec_merge with subreg. */
> > > > rtx src = SET_SRC (set);
> > > > rtx dest = SET_DEST (set);
> > > > machine_mode dest_mode = GET_MODE (dest);
> > > > + machine_mode src_mode;
> > > > +
> > > > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > > + {
> > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > > + continue;
> > > > + }
> > > > +
> > > > + if (TARGET_USE_VECTOR_CONVERTS)
> > > > + {
> > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > + if (src_mode == E_SImode || src_mode == E_DImode)
> > > > + continue;
> > > > + }
> > > > +
> > > > + if (!v4sf_const0)
> > > > + v4sf_const0 = gen_reg_rtx (V4SFmode);
> > >
> > > Please better move initialization of src_mode to the top of the new hunk,
> > > like:
> > >
> > > machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
> > > case E_SFmode:
> > > case E_DFmode:
> > > if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > continue;
> > > break;
> > > case E_SImode:
> > > case E_DImode:
> > > if (TARGET_USE_VECTOR_CONVERTS)
> > > continue;
> > > break;
> > > default:
> > > break;
> > > }
> > >
> > > or something like the above.
> >
> > Done, thanks for your good advice, I also rebased patch 4/4, since it is
> > based on patch 3/4.
The above change broke
+FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefpd-2.c compilation failed to
produce executable
+FAIL: gcc.target/i386/avx512f-vscalefps-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefps-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefps-2.c compilation failed to
produce executable
+FAIL: gcc.target/i386/avx512f-vscalefss-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefss-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefss-2.c compilation failed to
produce executable
+FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512vl-vscalefpd-2.c compilation failed to
produce executable
+FAIL: gcc.target/i386/avx512vl-vscalefps-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512vl-vscalefps-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512vl-vscalefps-2.c compilation failed to
produce executable
when configured with --enable-checking=yes,rtl,extra, the error is:
during RTL pass: rpad
/home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/avx512f-vscalefpd-2.c:57:1:
internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 'E'
(rtx unspec) in remove_partial_avx_dependency, at
config/i386/i386-features.c:2219
0x77541d rtl_check_failed_type2(rtx_def const*, int, int, int, char const*,
int, char const*)
../../gcc/rtl.c:898
0x84e731 remove_partial_avx_dependency
../../gcc/config/i386/i386-features.c:2219
0x84e731 execute
../../gcc/config/i386/i386-features.c:2389
This is on
2219 machine_mode src_mode = GET_MODE (XEXP (src, 0));
and src is:
(unspec:DF [
(mem:DF (plus:DI (reg/f:DI 149)
(reg:DI 103 [ ivtmp.65 ])) [3 MEM[(double *)&src2 + ivtmp.65_65
* 1]+0 S8 A64])
(const_int 9 [0x9])
] UNSPEC_ROUND)
- whole insn
(insn 55 54 56 5 (set (reg:DF 99 [ _50 ])
(unspec:DF [
(mem:DF (plus:DI (reg/f:DI 149)
(reg:DI 103 [ ivtmp.65 ])) [3 MEM[(double *)&src2 +
ivtmp.65_65 * 1]+0 S8 A64])
(const_int 9 [0x9])
] UNSPEC_ROUND))
"/home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/avx512f-vscalefpd-2.c":19:28
1076 {sse4_1_rounddf2}
(nil))
so XEXP (src, 0) can't be used in that case.
Looking at insns with avx_partial_xmm_update attribute, it seems
src is either FLOAT_EXTEND/FLOAT_TRUNCATE/FLOAT/UNSIGNED_FLOAT and
in that case it looks like a conversion and has different modes,
or it is UNSPEC (UNSPEC_{RCP,RSQRT,ROUND}) or SQRT and in that case it doesn't.
Jakub