https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79745

            Bug ID: 79745
           Summary: vec_init<> expander misses V2TImode with AVX and
                    V2OImode and V2TImode with AVX512
           Product: gcc
           Version: 7.0.1
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
            Blocks: 65832
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

The AVX case pessimizes CPUv6 x264 when vectorized with 256bit vectors.  The
vectorizer tries to load the two 128bit halves and build a 256bit vector
(the halves are separated by a gap) via

              /* Avoid emitting a constructor of vector elements by performing
                 the loads using an integer type of the same size,
                 constructing a vector of those and then re-interpreting it
                 as the original vector type.  This works around the fact
                 that the vec_init optab was only designed for scalar
                 element modes and thus expansion goes through memory.
                 This avoids a huge runtime penalty due to the general
                 inability to perform store forwarding from smaller stores
                 to a larger load.  */
              unsigned lsize
                = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
              enum machine_mode elmode = mode_for_size (lsize, MODE_INT, 0);
              enum machine_mode vmode = mode_for_vector (elmode,
                                                         nunits / group_size);
              /* If we can't construct such a vector fall back to
                 element loads of the original vector type.  */
              if (VECTOR_MODE_P (vmode)
                  && optab_handler (vec_init_optab, vmode) != CODE_FOR_nothing)
                {
                  nloads = nunits / group_size;
                  lnel = group_size;
                  ltype = build_nonstandard_integer_type (lsize, 1);
                  lvectype = build_vector_type (ltype, nloads);
                }


See also PR65832 which is broader.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65832
[Bug 65832] Inefficient vector construction

Reply via email to