https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121284

            Bug ID: 121284
           Summary: [14/15/16 Regression] unnecessary memory operations on
                    vector conversion from double to int
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mkretz at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Test case (https://compiler-explorer.com/z/nedfsshjM):

using int8 [[gnu::vector_size(32)]] = int;
using int16 [[gnu::vector_size(64)]] = int;
using double8 [[gnu::vector_size(64)]] = double;

auto g(double8 x, double8 y) {
  return __builtin_convertvector(__builtin_shufflevector(x, y, 0, 1, 2, 3, 4,
5,
                                                         6, 7, 8, 9, 10, 11,
12,
                                                         13, 14, 15),
                                 int16);
}

auto g2(double8 x, double8 y) {
  return __builtin_shufflevector(__builtin_convertvector(x, int8),
                                 __builtin_convertvector(y, int8), 0, 1, 2, 3,
                                 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
}

Compile with -O2 -march=skylake-avx512.

g and g2 are equivalent functions. GCC 13.4 produces the expected result for g:

        vcvttpd2dq      ymm1, zmm1
        vcvttpd2dq      ymm0, zmm0
        vinserti64x4    zmm0, zmm0, ymm1, 0x1
        ret

g2 isn't too bad, but not perfect and doesn't regress. Starting with GCC 14, g
is translated into an unnecessary sequence of stores, shuffles and loads before
the conversion instructions.

Similar issue for AVX2 (https://compiler-explorer.com/z/EGoW4x5Wc):

using int4 [[gnu::vector_size(16)]] = int;
using int8 [[gnu::vector_size(32)]] = int;
using double4 [[gnu::vector_size(32)]] = double;

auto g(double4 x, double4 y) {
  return __builtin_convertvector(
      __builtin_shufflevector(x, y, 0, 1, 2, 3, 4, 5, 6, 7), int8);
}

auto g2(double4 x, double4 y) {
  return __builtin_shufflevector(__builtin_convertvector(x, int4),
                                 __builtin_convertvector(y, int4), 0, 1, 2, 3,
                                 4, 5, 6, 7);
}


and SSE4 (https://compiler-explorer.com/z/nra3KoKjK):

using int2 [[gnu::vector_size(8)]] = int;
using int4 [[gnu::vector_size(16)]] = int;
using double2 [[gnu::vector_size(16)]] = double;

auto g(double2 x, double2 y) {
  return __builtin_convertvector(__builtin_shufflevector(x, y, 0, 1, 2, 3),
                                 int4);
}

auto g2(double2 x, double2 y) {
  return __builtin_shufflevector(__builtin_convertvector(x, int2),
                                 __builtin_convertvector(y, int2), 0, 1, 2, 3);
}

Reply via email to