On Mon, Jan 29, 2024 at 11:24:58AM +0100, Richard Biener wrote: > The following expands .VEC_SET and .VEC_EXTRACT instruction selection > to global hard registers, not only automatic variables (possibly) > promoted to registers. This can avoid some ICEs later and create > better code. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK? > > Thanks, > Richard. > > PR middle-end/113622 > * gimple-isel.cc (gimple_expand_vec_set_extract_expr): > Also allow DECL_HARD_REGISTER variables. > > * gcc.target/i386/pr113622-1.c: New testcase. > --- > gcc/gimple-isel.cc | 3 ++- > gcc/testsuite/gcc.target/i386/pr113622-1.c | 12 ++++++++++++ > 2 files changed, 14 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-1.c > > diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc > index 7e2392ecd38..e94f292dd38 100644 > --- a/gcc/gimple-isel.cc > +++ b/gcc/gimple-isel.cc > @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun, > machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0)); > machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref)); > > - if (auto_var_in_fn_p (view_op0, fun->decl) > + if ((auto_var_in_fn_p (view_op0, fun->decl) > + || DECL_HARD_REGISTER (view_op0)) > && !TREE_ADDRESSABLE (view_op0) > && ((!is_extract && can_vec_set_var_idx_p (outermode)) > || (is_extract
All we know here from the earlier checks is DECL_P (view_op0), but DECL_HARD_REGISTER uses VAR_DECL_CHECK, shouldn't this be || (VAR_P (view_op0) && DECL_HARD_REGISTER (view_op0))) instead? > diff --git a/gcc/testsuite/gcc.target/i386/pr113622-1.c > b/gcc/testsuite/gcc.target/i386/pr113622-1.c > new file mode 100644 > index 00000000000..2d6cb3c89a8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr113622-1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx512f -w" } */ > + > +typedef float __attribute__ ((vector_size (64))) vec; > +register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1"); I'd feel better if this used say zmm5, zmm6, zmm7 or something similar so that it doesn't clash with some of the implicitly used SSE registers, but on the other side still fit into 8 SSE registers which ia32 has access to. > + > +void > +test (void) > +{ > + for (int i = 0; i < 8; i++) > + c[i] = a[i] < b[i] ? 0.1 : 0.2; > +} Otherwise LGTM. Jakub