On 14.06.2023 10:10, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> There's no reason to constrain this to AVX512VL, as the wider operation
>> is not usable for more narrow operands only when the possible memory
> But this may require more resources (on AMD znver4 processor a zmm
> instruction will also be split into 2 uops, right?) And on some intel
> processors(SKX/CLX) there will be frequency reduction.

I'm afraid I don't follow: Largely the same AVX512 code would be
generated when passing -mavx512vl, so how can power/performance
considerations matter here? All I'm doing here (and in a few more
patches I'm still in the process of testing) is relax when AVX512
insns can actually be used (reducing the copying between registers
and/or the number of insns needed). My understanding on the Intel
side is that it only matters whether AVX512 insns are used, not
what vector length they are. You may be right about znver4, though.

Nevertheless I agree ...

> If it needs to be done, it is better guarded with
> !TARGET_PREFER_AVX256, at least when micro-architecture AVX256_OPTIMAL
> or users explicitly uses -mprefer-vector-width=256, we don't want to
> produce any zmm instruction for surprise.(Although
> -mprefer-vector-width=256 is supposed for auto-vectorizer, but backend
> codegen also use it under such cases, i.e. in *movsf_internal
> alternative 5 use zmm only TARGET_AVX512F && !TARGET_PREFER_AVX256.)

... that respecting such overrides is probably desirable, so I'll
adjust.

Jan

>> source is a non-broadcast one. This way even the scalar copysign<mode>3
>> can benefit from the operation being a single-insn one (leaving aside
>> moves which the compiler decides to insert for unclear reasons, and
>> leaving aside the fact that bcst_mem_operand() is too restrictive for
>> broadcast to be embedded right into VPTERNLOG*).
>>
>> Along with this also request value duplication in
>> ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
>> excess space allocation in .rodata.*, filled with zeros which are never
>> read.
>>
>> gcc/
>>
>>         * config/i386/i386-expand.cc (ix86_expand_copysign): Request
>>         value duplication by ix86_build_signbit_mask() when AVX512F and
>>         not HFmode.
>>         * config/i386/sse.md (*<avx512>_vternlog<mode>_all): Convert to
>>         2-alternative form. Adjust "mode" attribute. Add "enabled"
>>         attribute.
>>         (*<avx512>_vpternlog<mode>_1): Relax to just TARGET_AVX512F.
>>         (*<avx512>_vpternlog<mode>_2): Likewise.
>>         (*<avx512>_vpternlog<mode>_3): Likewise.

Reply via email to