https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121284
Bug ID: 121284 Summary: [14/15/16 Regression] unnecessary memory operations on vector conversion from double to int Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mkretz at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Test case (https://compiler-explorer.com/z/nedfsshjM): using int8 [[gnu::vector_size(32)]] = int; using int16 [[gnu::vector_size(64)]] = int; using double8 [[gnu::vector_size(64)]] = double; auto g(double8 x, double8 y) { return __builtin_convertvector(__builtin_shufflevector(x, y, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), int16); } auto g2(double8 x, double8 y) { return __builtin_shufflevector(__builtin_convertvector(x, int8), __builtin_convertvector(y, int8), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15); } Compile with -O2 -march=skylake-avx512. g and g2 are equivalent functions. GCC 13.4 produces the expected result for g: vcvttpd2dq ymm1, zmm1 vcvttpd2dq ymm0, zmm0 vinserti64x4 zmm0, zmm0, ymm1, 0x1 ret g2 isn't too bad, but not perfect and doesn't regress. Starting with GCC 14, g is translated into an unnecessary sequence of stores, shuffles and loads before the conversion instructions. Similar issue for AVX2 (https://compiler-explorer.com/z/EGoW4x5Wc): using int4 [[gnu::vector_size(16)]] = int; using int8 [[gnu::vector_size(32)]] = int; using double4 [[gnu::vector_size(32)]] = double; auto g(double4 x, double4 y) { return __builtin_convertvector( __builtin_shufflevector(x, y, 0, 1, 2, 3, 4, 5, 6, 7), int8); } auto g2(double4 x, double4 y) { return __builtin_shufflevector(__builtin_convertvector(x, int4), __builtin_convertvector(y, int4), 0, 1, 2, 3, 4, 5, 6, 7); } and SSE4 (https://compiler-explorer.com/z/nra3KoKjK): using int2 [[gnu::vector_size(8)]] = int; using int4 [[gnu::vector_size(16)]] = int; using double2 [[gnu::vector_size(16)]] = double; auto g(double2 x, double2 y) { return __builtin_convertvector(__builtin_shufflevector(x, y, 0, 1, 2, 3), int4); } auto g2(double2 x, double2 y) { return __builtin_shufflevector(__builtin_convertvector(x, int2), __builtin_convertvector(y, int2), 0, 1, 2, 3); }