https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112992

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:6a67fdcb3f0cc8be47b49ddd246d0c50c3770800

commit r14-7026-g6a67fdcb3f0cc8be47b49ddd246d0c50c3770800
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Tue Jan 9 08:28:42 2024 +0000

    i386: PR target/112992: Optimize mode for broadcast of constants.

    The issue addressed by this patch is that when initializing vectors by
    broadcasting integer constants, the compiler has the flexibility to
    select the most appropriate vector mode to perform the broadcast, as
    long as the resulting vector has an identical bit pattern.
    For example, the following constants are all equivalent:
    V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
    V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
    V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
    So instruction sequences that construct any of these can be used to
    construct the others (with a suitable cast/SUBREG).

    On x86_64, it turns out that broadcasts of SImode constants are preferred,
    as DImode constants often require a longer movabs instruction, and
    HImode and QImode broadcasts require multiple uops on some architectures.
    Hence, SImode is always the equal shortest/fastest implementation.

    Examples of this improvement, can be seen in the testsuite.

    gcc.target/i386/pr102021.c
    Before:
       0:   48 b8 0c 00 0c 00 0c    movabs $0xc000c000c000c,%rax
       7:   00 0c 00
       a:   62 f2 fd 28 7c c0       vpbroadcastq %rax,%ymm0
      10:   c3                      retq

    After:
       0:   b8 0c 00 0c 00          mov    $0xc000c,%eax
       5:   62 f2 7d 28 7c c0       vpbroadcastd %eax,%ymm0
       b:   c3                      retq

    and
    gcc.target/i386/pr90773-17.c:
    Before:
       0:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 7 <foo+0x7>
       7:   b8 0c 00 00 00          mov    $0xc,%eax
       c:   62 f2 7d 08 7a c0       vpbroadcastb %eax,%xmm0
      12:   62 f1 7f 08 7f 02       vmovdqu8 %xmm0,(%rdx)
      18:   c7 42 0f 0c 0c 0c 0c    movl   $0xc0c0c0c,0xf(%rdx)
      1f:   c3                      retq

    After:
       0:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 7 <foo+0x7>
       7:   b8 0c 0c 0c 0c          mov    $0xc0c0c0c,%eax
       c:   62 f2 7d 08 7c c0       vpbroadcastd %eax,%xmm0
      12:   62 f1 7f 08 7f 02       vmovdqu8 %xmm0,(%rdx)
      18:   c7 42 0f 0c 0c 0c 0c    movl   $0xc0c0c0c,0xf(%rdx)
      1f:   c3                      retq

    where according to Agner Fog's instruction tables broadcastd is slightly
    faster on some microarchitectures, for example Knight's Landing.

    2024-01-09  Roger Sayle  <ro...@nextmovesoftware.com>
                Hongtao Liu  <hongtao....@intel.com>

    gcc/ChangeLog
            PR target/112992
            * config/i386/i386-expand.cc
            (ix86_convert_const_wide_int_to_broadcast): Allow call to
            ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
            (ix86_broadcast_from_constant): Revert recent change; Return a
            suitable MEMREF independently of mode/target combinations.
            (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
            to decide whether expansion is possible/preferrable.  Only try
            forcing DImode constants to memory (and trying again) if calling
            ix86_expand_vector_init_duplicate fails with an DImode immediate
            constant.
            (ix86_expand_vector_init_duplicate) <case E_V2DImode>: Try using
            V4SImode for suitable immediate constants.
            <case E_V4DImode>: Try using V8SImode for suitable constants.
            <case E_V4HImode>: Fail for CONST_INT_P, i.e. use constant pool.
            <case E_V2HImode>: Likewise.
            <case E_V8HImode>: For CONST_INT_P try using V4SImode via widen.
            <case E_V16QImode>: For CONT_INT_P try using V8HImode via widen.
            <label widen>: Handle CONT_INTs via simplify_binary_operation.
            Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
            <case E_V16HImode>: For CONST_INT_P try V8SImode via widen.
            <case E_V32QImode>: For CONST_INT_P try V16HImode via widen.
            (ix86_expand_vector_init): Move try using a broadcast for all_same
            with ix86_expand_vector_init_duplicate before using constant pool.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/auto-init-8.c: Update test case.
            * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512fp16-13.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/pr100865-1.c: Likewise.
            * gcc.target/i386/pr100865-10a.c: Likewise.
            * gcc.target/i386/pr100865-10b.c: Likewise.
            * gcc.target/i386/pr100865-2.c: Likewise.
            * gcc.target/i386/pr100865-3.c: Likewise.
            * gcc.target/i386/pr100865-4a.c: Likewise.
            * gcc.target/i386/pr100865-4b.c: Likewise.
            * gcc.target/i386/pr100865-5a.c: Likewise.
            * gcc.target/i386/pr100865-5b.c: Likewise.
            * gcc.target/i386/pr100865-9a.c: Likewise.
            * gcc.target/i386/pr100865-9b.c: Likewise.
            * gcc.target/i386/pr102021.c: Likewise.
            * gcc.target/i386/pr90773-17.c: Likewise.
  • [Bug target/112992] Ineffic... cvs-commit at gcc dot gnu.org via Gcc-bugs

Reply via email to