https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92188

            Bug ID: 92188
           Summary: Cannot merge memory write for
                    _mm_cvtps_ph/_mm256_cvtps_ph and x86-64
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fredrik987 at gmail dot com
  Target Milestone: ---

Created attachment 47089
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47089&action=edit
Test code

For this code, the memory write cannot be merged with vcvtps2ph.

void test1(__m128i *x, const __m256 *y)
{
    // Cannot merge memory write
    *x = _mm256_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION);
}

  ...
  vcvtps2ph $4, %ymm0, %xmm0
  vmovaps %xmm0, (%rdi)
  ...

A workaround is to change the output type to __v8hi as.

void test2(__v8hi *x, const __m256 *y)
{
    // Memory write merged
    *x = (__v8hi)_mm256_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION);
}

  ...
  vcvtps2ph $4, %ymm0, (%rdi)
  ...

However it does not work for the 128 bit variant of vcvtps2ph.

void test4(__v4hi *x, const __m128 *y)
{
    // Cannot merge memory write
    *x = (__v4hi)(((__v2di)_mm_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION))[0]);
}

  ...
  vcvtps2ph $4, %xmm0, %xmm0
  vmovq %xmm0, (%rdi)
  ...

The opposite problem exists for e.g. _mm256_extracti128_si256, which normally
merges the memory write but not for output type __v8hi.

void test6(__v8hi *x, const __m256i *y)
{
    // Cannot merge memory write
    *x = (__v8hi)_mm256_extracti128_si256(*y, 1);
}

  ...
  vextracti128 $0x1, %ymm0, %xmm0
  vmovaps %xmm0, (%rdi)
  ...

It would be good if all variants behave the same, with memory write merged.

I use "-O3 -march=core-avx2" when compiling (using compiler explorer).

Reply via email to