https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92188
Bug ID: 92188
Summary: Cannot merge memory write for
_mm_cvtps_ph/_mm256_cvtps_ph and x86-64
Product: gcc
Version: 9.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: fredrik987 at gmail dot com
Target Milestone: ---
Created attachment 47089
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47089&action=edit
Test code
For this code, the memory write cannot be merged with vcvtps2ph.
void test1(__m128i *x, const __m256 *y)
{
// Cannot merge memory write
*x = _mm256_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION);
}
...
vcvtps2ph $4, %ymm0, %xmm0
vmovaps %xmm0, (%rdi)
...
A workaround is to change the output type to __v8hi as.
void test2(__v8hi *x, const __m256 *y)
{
// Memory write merged
*x = (__v8hi)_mm256_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION);
}
...
vcvtps2ph $4, %ymm0, (%rdi)
...
However it does not work for the 128 bit variant of vcvtps2ph.
void test4(__v4hi *x, const __m128 *y)
{
// Cannot merge memory write
*x = (__v4hi)(((__v2di)_mm_cvtps_ph(*y, _MM_FROUND_CUR_DIRECTION))[0]);
}
...
vcvtps2ph $4, %xmm0, %xmm0
vmovq %xmm0, (%rdi)
...
The opposite problem exists for e.g. _mm256_extracti128_si256, which normally
merges the memory write but not for output type __v8hi.
void test6(__v8hi *x, const __m256i *y)
{
// Cannot merge memory write
*x = (__v8hi)_mm256_extracti128_si256(*y, 1);
}
...
vextracti128 $0x1, %ymm0, %xmm0
vmovaps %xmm0, (%rdi)
...
It would be good if all variants behave the same, with memory write merged.
I use "-O3 -march=core-avx2" when compiling (using compiler explorer).