On Mon, Jan 29, 2024 at 11:24:58AM +0100, Richard Biener wrote:
> The following expands .VEC_SET and .VEC_EXTRACT instruction selection
> to global hard registers, not only automatic variables (possibly)
> promoted to registers.  This can avoid some ICEs later and create
> better code.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>       PR middle-end/113622
>       * gimple-isel.cc (gimple_expand_vec_set_extract_expr):
>       Also allow DECL_HARD_REGISTER variables.
> 
>       * gcc.target/i386/pr113622-1.c: New testcase.
> ---
>  gcc/gimple-isel.cc                         |  3 ++-
>  gcc/testsuite/gcc.target/i386/pr113622-1.c | 12 ++++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113622-1.c
> 
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 7e2392ecd38..e94f292dd38 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun,
>        machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
>        machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref));
>  
> -      if (auto_var_in_fn_p (view_op0, fun->decl)
> +      if ((auto_var_in_fn_p (view_op0, fun->decl)
> +        || DECL_HARD_REGISTER (view_op0))
>         && !TREE_ADDRESSABLE (view_op0)
>         && ((!is_extract && can_vec_set_var_idx_p (outermode))
>             || (is_extract

All we know here from the earlier checks is DECL_P (view_op0), but
DECL_HARD_REGISTER uses VAR_DECL_CHECK, shouldn't this be
           || (VAR_P (view_op0) && DECL_HARD_REGISTER (view_op0)))
instead?

> diff --git a/gcc/testsuite/gcc.target/i386/pr113622-1.c 
> b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> new file mode 100644
> index 00000000000..2d6cb3c89a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr113622-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512f -w" } */
> +
> +typedef float __attribute__ ((vector_size (64))) vec;
> +register vec a asm("zmm2"), b asm("zmm0"), c asm("zmm1");

I'd feel better if this used say zmm5, zmm6, zmm7 or something similar
so that it doesn't clash with some of the implicitly used SSE
registers, but on the other side still fit into 8 SSE registers
which ia32 has access to.

> +
> +void
> +test (void)
> +{
> +  for (int i = 0; i < 8; i++)
> +    c[i] = a[i] < b[i] ? 0.1 : 0.2;
> +}

Otherwise LGTM.

        Jakub

Reply via email to