https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64286
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Igor Zamyatin from comment #1) > Perhaps something like below to restrict ree for such cases? > > diff --git a/gcc/ree.c b/gcc/ree.c > index 3376901..92370ea 100644 > --- a/gcc/ree.c > +++ b/gcc/ree.c > @@ -1004,6 +1004,11 @@ add_removable_extension (const_rtx expr, rtx_insn > *insn, > struct df_link *defs, *def; > ext_cand *cand; > > + if (!SCALAR_INT_MODE_P (GET_MODE (dest)) > + && (GET_MODE_UNIT_PRECISION (mode) != > + GET_MODE_UNIT_PRECISION (GET_MODE (XEXP (src, 0))))) > + return; > + > /* First, make sure we can get all the reaching definitions. */ > defs = get_defs (insn, XEXP (src, 0), NULL); > if (!defs) I think your patch is too restrictive. Consider -O2 -mavx2: typedef char __v16qi __attribute__((__vector_size__(16))); typedef int __m128i __attribute__((__vector_size__(16))); __m128i bar (__m128i); typedef int __m256i __attribute__((__vector_size__(32))); __m256i v; void foo (char *p) { __m128i a = (__m128i)__builtin_ia32_loaddqu (p); __m128i ps1 = bar (a); v = (__m256i) __builtin_ia32_pmovzxbw256 ((__v16qi) a); } Here, there is: (insn 19 9 11 2 (set (reg:V16QI 22 xmm1 [92]) (mem/c:V16QI (plus:DI (reg/f:DI 6 bp) (const_int -32 [0xffffffffffffffe0])) [2 %sfp+-16 S16 A128])) pr64286.i:12 1185 {*movv16qi_internal} (nil)) (insn 11 19 13 2 (set (reg:V16HI 22 xmm1 [orig:93 D.2299 ] [93]) (zero_extend:V16HI (reg:V16QI 22 xmm1 [92]))) pr64286.i:12 3826 {avx2_zero_extendv16qiv16hi2} (nil)) and there is no reason to restrict it. I also don't understand the GET_MODE_UNIT_PRECISION != GET_MODE_UNIT_PRECISION test, do you know about SIGN_EXTEND/ZERO_EXTEND where the unit precision is the same? That wouldn't be an extension. The important difference between vectors and scalars is that for scalars the lowpart subreg of the zero/sign extended value is still the original value, while for vectors that is not the case. So, for vectors you can REE optimize them only if all the uses are the same extension (zero vs. sign, and to the same mode). Therefore, supposedly for non-scalar modes (i.e. vector ones, other than scalar int and vector int hopefully don't have zero/sign_extend) I think what should be done is bail out if any of the defs has any uses that are not the sign resp. zero extension that has been found. We have there the: /* Second, make sure the reaching definitions don't feed another and different extension. FIXME: this obviously can be improved. */ for (def = defs; def; def = def->next) if ((idx = def_map[INSN_UID (DF_REF_INSN (def->ref))]) && (cand = &(*insn_list)[idx - 1]) && cand->code != code) { if (dump_file) { fprintf (dump_file, "Cannot eliminate extension:\n"); print_rtl_single (dump_file, insn); fprintf (dump_file, " because of other extension\n"); } return; } loop, perhaps for the vector modes we could add else if (!SCALAR_INT_MODE_P (...) && idx == 0) and in that case look using DU chains (which are supposedly computed) if any uses of it other than the current insn are not a sign/zero extension at all or are different extension or to different mode than the current instruction, and in that case record some magic value to def_map (e.g -1U) and treat later that magic def_map value as a sign that we should give up (disregard that extension).