V sob., 24. avg. 2024 17:11 je oseba Roger Sayle <ro...@nextmovesoftware.com>
napisala:

>
> This patch tweaks timode_scalar_chain::compute_convert_gain to better
> reflect the expansion of V1TImode arithmetic right shifts by the i386
> backend.  The comment "see ix86_expand_v1ti_ashiftrt" appears after
> "case ASHIFTRT" in compute_convert_gain, and the changes below attempt
> to better match the logic used there.
>
> The original motivating example is:
>
> __int128 m1;
> void foo()
> {
>   m1 = (m1 << 8) >> 8;
> }
>
> which with -O2 -mavx2 we fail to convert to vector form due to the
> inappropriate cost of the arithmetic right shift.
>
>   Instruction gain -16 for     7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
>   Total gain: -3
>   Chain #1 conversion is not profitable
>
> This is reporting that the ASHIFTRT is four instructions worse using
> vectors than in scalar form, which is incorrect as the AVX2 expansion
> of this shift only requires three instructions (and the scalar form
> requires two).
>
> With more accurate costs in timode_scalar_chain::compute_convert_gain
> we now see (with -O2 -mavx2):
>
>   Instruction gain -4 for     7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
>   Total gain: 9
>   Converting chain #1...
>
> which results in:
>
> foo:    vmovdqa m1(%rip), %xmm0
>         vpslldq $1, %xmm0, %xmm0
>         vpsrad  $8, %xmm0, %xmm1
>         vpsrldq $1, %xmm0, %xmm0
>         vpblendd        $7, %xmm0, %xmm1, %xmm0
>         vmovdqa %xmm0, m1(%rip)
>         ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  No new testcase (yet) as the code for both the
> vector and scalar forms of the above function are still suboptimal
> so code generation is in flux, but this improvement should be a step
> in the right direction.  Ok for mainline?
>
>
> 2024-08-24  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.cc (compute_convert_gain)
>         <case ASHIFTRT>: Update to match ix86_expand_v1ti_ashiftrt.
>
> TARGET_AVX2 always implies TARGET_SSE4_1, so there is no need to OR them
> together.
>

OK with above change.

Thanks,
Uros.

>
>

Reply via email to