Hi,
I've recently been trying to hand-write code to trigger automatic
vectorization optimizations in GCC on Intel x86 machines (without
using the interfaces in immintrin.h), but I'm running into a problem
where I can't seem to get the concise `vpmovzxbd` or similar
instructions.
My requirement is to convert 8 `uint8_t` elements to `int32_t` type
and print the output. If I use the interface (_mm256_cvtepu8_epi32) in
immintrin.h, the code is as follows:
int immintrin () {
int size = 10000, offset = 3;
uint8_t* a = malloc(sizeof(char) * size);
__v8si b = (__v8si)_mm256_cvtepu8_epi32(*(__m128i *)(a + offset));
for (int i = 0; i < 8; i++) {
printf("%d\n", b[i]);
}
}
After compiling with -mavx2 -O3, you can get concise and efficient
instructions. (You can see it here: https://godbolt.org/z/8ojzdav47)
But if I do not use this interface and instead use a for-loop or the
`__builtin_convertvector` interface provided by GCC, I cannot achieve
the above effect. The code is as follows:
typedef uint8_t v8qiu __attribute__ ((__vector_size__ (8)));
int forloop () {
int size = 10000, offset = 3;
uint8_t* a = malloc(sizeof(char) * size);
v8qiu av = *(v8qiu *)(a + offset);
__v8si b = {};
for (int i = 0; i < 8; i++) {
b[i] = (a + offset)[i];
}
for (int i = 0; i < 8; i++) {
printf("%d\n", b[i]);
}
}
int builtin_cvt () {
int size = 10000, offset = 3;
uint8_t* a = malloc(sizeof(char) * size);
v8qiu av = *(v8qiu *)(a + offset);
__v8si b = __builtin_convertvector(av, __v8si);
for (int i = 0; i < 8; i++) {
printf("%d\n", b[i]);
}
}
The instructions generated by both functions are redundant and
complex, and are quite difficult to read compared to calling
`_mm256_cvtepu8_epi32` directly. (You can see it here as well:
https://godbolt.org/z/8ojzdav47)
What I want to ask is: How should I write the source code to get
assembly instructions similar to directly calling
_mm256_cvtepu8_epi32?
Or would it be easier if I modified the GIMPLE directly? But it seems
that there is no relevant expression or interface directly
corresponding to `vpmovzxbd` in GIMPLE.
Thanks
Hanke Zhang