On Thu, Aug 15, 2024 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch follows up on the previous patch to fix PR target/116275 by
> improving the code STV (ultimately) generates for highpart sign extensions
> like (x<<8)>>8.  The arithmetic right shift is able to take advantage of
> the available common subexpressions from the preceding left shift.
>
> Hence previously with -O2 -m32 -mavx -mno-avx512vl we'd generate:
>
>         vpsllq  $8, %xmm0, %xmm0
>         vpsrad  $8, %xmm0, %xmm1
>         vpsrlq  $8, %xmm0, %xmm0
>         vpblendw        $51, %xmm0, %xmm1, %xmm0
>
> But with improved splitting, we now generate three instructions:
>
>         vpslld  $8, %xmm1, %xmm0
>         vpsrad  $8, %xmm0, %xmm0
>         vpblendw        $51, %xmm1, %xmm0, %xmm0
>
> This patch also implements Uros' suggestion that the pre-reload
> splitter could introduced a new pseudo to hold the intermediate
> to potentially help reload with register allocation, which applies
> when not performing the above optimization, i.e. on TARGET_XOP.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-08-15  Roger Sayle  <ro...@nextmovesoftware.com>
>             Uros Bizjak  <ubiz...@gmail.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): Split
>         to an improved implementation on !TARGET_XOP.  On TARGET_XOP, use
>         a new pseudo for the intermediate to simplify register allocation.
>
> gcc/testsuite/ChangeLog
>         * g++.target/i386/pr116275-2.C: New test case.

LGTM.

Thanks,
Uros.

Reply via email to