Hi ,
For the below code x86_64 is able to vectorize.
#define LEN 32000
__attribute__((aligned(32))) float a[LEN], b[LEN],c[LEN];
void test()
{
for (int i = 0; i < LEN; i++) {
if (b[i] > (float)0.) {
a[i] = b[i];
}
}
}
X86_64 ASM
L2:
vmovaps b(%rax), %ymm0
vcmpltps %ymm0, %ymm2, %ymm1 ⇐ Set masks
vmaskmovps %ymm0, %ymm1, a(%rax) ⇐ store b[i] to a[i] when the
mask is true
addq $32, %rax
cmpq $128000, %rax
jne .L2
In Aarch64, We have BIT and FCMLT instructions.
Is it possible to vectorize them like this?
ldr q1, [x1]
fcmlt v0.4s, v1.4s, #0 ⇐ set mask
bit v2.16b, v1.16b, v0.16b ⇐ select
str q2, [x1]
BIT instructions accepts only 8b or 16B. But my assumption is that it is
doing bit by bit copying and fcmlt will set corresponding element bits to 1
if condition is true else sets it to false.
We can use bit for any modes.
Regards,
Venkat.