On Fri, Mar 11, 2022 at 8:43 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> After accounting for GPR -> XMM move cost for vec_construct the
> base cost needs adjustments to not double-cost those.  This also
> lowers the cost when such move is not necessary.
>
> This fixes the observed 538.imagick_r and 525.x264_r regressions
> for me on Zen2 with -Ofast -march=native.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?
LGTM.
>
> Thanks,
> Richard.
>
> 2022-03-11  Richard Biener  <rguent...@suse.de>
>
>         PR target/104762
>         * config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not
>         cost the first lane of SSE pieces as inserts for vec_construct.
> ---
>  gcc/config/i386/i386.cc | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 4121f986221..23bedea92bd 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22597,16 +22597,21 @@ ix86_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>
>        case vec_construct:
>         {
> -         /* N element inserts into SSE vectors.  */
> -         int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> +         int n = TYPE_VECTOR_SUBPARTS (vectype);
> +         /* N - 1 element inserts into an SSE vector, the possible
> +            GPR -> XMM move is accounted for in add_stmt_cost.  */
> +         if (GET_MODE_BITSIZE (mode) <= 128)
> +           return (n - 1) * ix86_cost->sse_op;
>           /* One vinserti128 for combining two SSE vectors for AVX256.  */
> -         if (GET_MODE_BITSIZE (mode) == 256)
> -           cost += ix86_vec_cost (mode, ix86_cost->addss);
> +         else if (GET_MODE_BITSIZE (mode) == 256)
> +           return ((n - 2) * ix86_cost->sse_op
> +                   + ix86_vec_cost (mode, ix86_cost->addss));
>           /* One vinserti64x4 and two vinserti128 for combining SSE
>              and AVX256 vectors to AVX512.  */
>           else if (GET_MODE_BITSIZE (mode) == 512)
> -           cost += 3 * ix86_vec_cost (mode, ix86_cost->addss);
> -         return cost;
> +           return ((n - 4) * ix86_cost->sse_op
> +                   + 3 * ix86_vec_cost (mode, ix86_cost->addss));
> +         gcc_unreachable ();
>         }
>
>        default:
> --
> 2.34.1



-- 
BR,
Hongtao

Reply via email to