On Tue, Feb 4, 2020 at 1:30 PM Jakub Jelinek <[email protected]> wrote:
>
> On Tue, Feb 04, 2020 at 12:24:10PM +0100, Uros Bizjak wrote:
> > > A && is missing in the split condition to inherit TARGET_AVX.
> >
> > Also, you don't need to emit "#" in output template. This is just for
> > extra safety, we can live without. Please see e.g. "*tzcnt<mode>_1".
>
> I was worried that -fipa-ra then would give wrong answers.
> -O2 -mabi=ms -mavx512f -fno-schedule-insns2 I was worried about isn't a
> problem though,
> because while the split4 pass is disabled, the split3 pass is scheduled
> instead before
> regstack.
> I can do
> -O2 -mabi=ms -mavx512f -fdisable-rtl-split4
> or
> -O2 -mabi=ms -mavx512f -fno-schedule-insns2 -fdisable-rtl-split3
> though and then we ICE, with the patch as is with just && epilogue_completed
> ...
> the ICE is
> cc1: note: disable pass rtl-split4 for functions in the range of [0,
> 4294967295]
> pr92190.c: In function ‘foo’:
> pr92190.c:19:1: error: could not split insn
> 19 | }
> | ^
> (insn:TI 23 18 10 2 (parallel [
> (unspec_volatile [
> (const_int 0 [0])
> ] UNSPECV_VZEROUPPER)
> (clobber (reg:V2DI 20 xmm0))
> (clobber (reg:V2DI 21 xmm1))
> (clobber (reg:V2DI 22 xmm2))
> (clobber (reg:V2DI 23 xmm3))
> (clobber (reg:V2DI 24 xmm4))
> (clobber (reg:V2DI 25 xmm5))
> (set (reg:V2DI 26 xmm6)
> (reg:V2DI 26 xmm6))
> (clobber (reg:V2DI 27 xmm7))
> (clobber (reg:V2DI 44 xmm8))
> (clobber (reg:V2DI 45 xmm9))
> (clobber (reg:V2DI 46 xmm10))
> (clobber (reg:V2DI 47 xmm11))
> (clobber (reg:V2DI 48 xmm12))
> (clobber (reg:V2DI 49 xmm13))
> (clobber (reg:V2DI 50 xmm14))
> (clobber (reg:V2DI 51 xmm15))
> ]) "pr92190.c":17:3 4895 {*avx_vzeroupper}
> (nil))
> because as the splitter is written, it modifies the insn in place and so
> later in try_split
> if (INSN_P (insn_last)
> && rtx_equal_p (PATTERN (insn_last), pat))
> return trial;
> rtx_equal_p is true because of the modification in place.
> Now, if I slightly modify the patch, so that it does
> operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
> instead of
> XVEC (operands[0], 0) = vec;
> and thus doesn't modify in place, it will ICE in a different spot:
> pr92190.c: In function ‘foo’:
> pr92190.c:19:1: internal compiler error: in final_scan_insn_1, at final.c:3078
> 19 | }
> | ^
> i.e. on
> /* If we have a length attribute, this instruction should have
> been split in shorten_branches, to ensure that we would have
> valid length info for the splitees. */
> gcc_assert (!HAVE_ATTR_length);
> Though, I guess that is not really specific to this patch, by
> -fdisable-rtl-split{3,4,5} we effectively disable all possibilities of
> splitting before final, so anything that isn't split before will ICE that
> way.
> So, if we wanted to remove that class of ICEs, we should just fatal_error if
> somebody disables the last splitter pass before expansion and
> HAVE_ATTR_length is defined.
>
> Here is the patch with && before epilogue_completed, no "#" and that avoids
> modification of the insn in place. We don't ICE, just could generate wrong
> code with -fipa-ra, but user asked for that with -fdisable-rtl-split{3,4,5},
> so guess that is ok.
>
> 2020-02-04 Jakub Jelinek <[email protected]>
>
> PR target/92190
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper): Only
> include sets and not clobbers in the vzeroupper pattern.
> * config/i386/sse.md (*avx_vzeroupper): Change from define_insn to
> define_insn_and_split, split if epilogue_completed and not all
> xmm0-xmm15
> registers are mentioned in the pattern and add clobbers for the
> missing
> registers at that point.
>
> * gcc.target/i386/pr92190.c: New test.
As Richard advised, let's put this safety stuff back. Usually, in
i386.md, these kind of splitters are implemented as two patterns, one
(define_insn_and_split) having "#", and the other (define_insn) with a
real insn. My opinion is, that this separation avoids confusion as
much as possible.
Uros.
> --- gcc/config/i386/i386-features.c.jj 2020-02-04 11:40:58.755611428 +0100
> +++ gcc/config/i386/i386-features.c 2020-02-04 11:51:33.602148491 +0100
> @@ -1764,29 +1764,32 @@ convert_scalars_to_vector (bool timode_p
>
> (set (reg:V2DF R) (reg:V2DF R))
>
> - which preserves the low 128 bits but clobbers the upper bits.
> - For a dead register we just use:
> -
> - (clobber (reg:V2DF R))
> -
> - which invalidates any previous contents of R and stops R from becoming
> - live across the vzeroupper in future. */
> + which preserves the low 128 bits but clobbers the upper bits. */
>
> static void
> ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> {
> rtx pattern = PATTERN (insn);
> unsigned int nregs = TARGET_64BIT ? 16 : 8;
> - rtvec vec = rtvec_alloc (nregs + 1);
> - RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> + unsigned int npats = nregs;
> for (unsigned int i = 0; i < nregs; ++i)
> {
> unsigned int regno = GET_SSE_REGNO (i);
> + if (!bitmap_bit_p (live_regs, regno))
> + npats--;
> + }
> + if (npats == 0)
> + return;
> + rtvec vec = rtvec_alloc (npats + 1);
> + RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> + for (unsigned int i = 0, j = 0; i < nregs; ++i)
> + {
> + unsigned int regno = GET_SSE_REGNO (i);
> + if (!bitmap_bit_p (live_regs, regno))
> + continue;
> rtx reg = gen_rtx_REG (V2DImode, regno);
> - if (bitmap_bit_p (live_regs, regno))
> - RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> - else
> - RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> + ++j;
> + RTVEC_ELT (vec, j) = gen_rtx_SET (reg, reg);
> }
> XVEC (pattern, 0) = vec;
> df_insn_rescan (insn);
> --- gcc/config/i386/sse.md.jj 2020-02-04 11:40:58.813610563 +0100
> +++ gcc/config/i386/sse.md 2020-02-04 13:21:50.088340875 +0100
> @@ -19815,11 +19815,38 @@ (define_expand "avx_vzeroupper"
> [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> "TARGET_AVX")
>
> -(define_insn "*avx_vzeroupper"
> +(define_insn_and_split "*avx_vzeroupper"
> [(match_parallel 0 "vzeroupper_pattern"
> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> "TARGET_AVX"
> "vzeroupper"
> + "&& epilogue_completed
> + && XVECLEN (operands[0], 0) != (TARGET_64BIT ? 16 : 8) + 1"
> + [(match_dup 0)]
> +{
> + /* For IPA-RA purposes, make it clear the instruction clobbers
> + even XMM registers not mentioned explicitly in the pattern. */
> + unsigned int nregs = TARGET_64BIT ? 16 : 8;
> + unsigned int npats = XVECLEN (operands[0], 0);
> + rtvec vec = rtvec_alloc (nregs + 1);
> + RTVEC_ELT (vec, 0) = XVECEXP (operands[0], 0, 0);
> + for (unsigned int i = 0, j = 1; i < nregs; ++i)
> + {
> + unsigned int regno = GET_SSE_REGNO (i);
> + if (j < npats
> + && REGNO (SET_DEST (XVECEXP (operands[0], 0, j))) == regno)
> + {
> + RTVEC_ELT (vec, i + 1) = XVECEXP (operands[0], 0, j);
> + j++;
> + }
> + else
> + {
> + rtx reg = gen_rtx_REG (V2DImode, regno);
> + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> + }
> + }
> + operands[0] = gen_rtx_PARALLEL (VOIDmode, vec);
> +}
> [(set_attr "type" "sse")
> (set_attr "modrm" "0")
> (set_attr "memory" "none")
> --- gcc/testsuite/gcc.target/i386/pr92190.c.jj 2020-02-04 11:51:33.608148402
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr92190.c 2020-02-04 11:51:33.608148402
> +0100
> @@ -0,0 +1,19 @@
> +/* PR target/92190 */
> +/* { dg-do compile { target { *-*-linux* && lp64 } } } */
> +/* { dg-options "-mabi=ms -O2 -mavx512f" } */
> +
> +typedef char VC __attribute__((vector_size (16)));
> +typedef int VI __attribute__((vector_size (16 * sizeof 0)));
> +VC a;
> +VI b;
> +void bar (VI);
> +void baz (VC);
> +
> +void
> +foo (void)
> +{
> + VC k = a;
> + VI n = b;
> + bar (n);
> + baz (k);
> +}
>
>
> Jakub
>