Hi Jakub,
On 27 Apr 23:34, Jakub Jelinek wrote:
> Hi!
>
> While AVX512F doesn't contain EVEX encoded vround{ss,sd,ps,pd} instructions,
> it contains vrndscale* which performs the same thing if bits [4:7] of the
> immediate are zero.
>
> For _mm*_round_{ps,pd} we actually already emit vrndscale* for -mavx512f
> instead of vround* unconditionally (because
> <avx512>_rndscale<mode><mask_name><round_saeonly_name>
> instruction has the same RTL as <sse4_1>_round<ssemodesuffix><avxsizesuffix>
> and the former, enabled for TARGET_AVX512F, comes first), for the scalar
> cases (thus __builtin_round* or _mm*_round_s{s,d}) the patterns we have
> don't allow extended registers and thus we end up with unnecessary moves
> if the inputs and/or outputs are or could be most effectively allocated
> in the xmm16+ registers.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
Your patch is OK.
>
> 2016-04-27 Jakub Jelinek <[email protected]>
>
> * config/i386/i386.md (sse4_1_round<mode>2): Add avx512f alternative.
> * config/i386/sse.md (sse4_1_round<ssescalarmodesuffix>): Likewise.
>
> * gcc.target/i386/avx-vround-1.c: New test.
> * gcc.target/i386/avx-vround-2.c: New test.
> * gcc.target/i386/avx512vl-vround-1.c: New test.
> * gcc.target/i386/avx512vl-vround-2.c: New test.
--
Thanks, K