在 2023/2/25 上午3:24, Richard Henderson 写道:
On 2/23/23 21:24, gaosong wrote:
I was wrong, the instruction is to sign-extend the odd or even
elements of the vector before the operation, not to sign-extend the
result.
E.g
vaddwev_h_b vd, vj, vk
vd->H[i] = SignExtend(vj->B[2i]) + SignExtend(vk->B[2i]);
vaddwev_w_h vd, vj, vk
vd->W[i] = SignExtend(vj->H[2i]) + SignExtend(vk->H[2i]);
vaddwev_d_w vd, vj, vk
vd->Q[i] = SignExtend(vj->W[2i]) + SignExtend(vk->W[2i]);
vaddwev_q_d vd, vj, vk
vd->Q[i] = SignExtend(vj->D[2i]) + SignExtend(vk->D[2i]);
Ok, good example.
Sorry , My description is not comprehensive.
vaddwedv_w_h vd, vj, vk
...
for i in range(4):
vd->W[i] = SignExtend(vj->H[2i], 32) + SignExtend(vk->H[2i]. 32);
...
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a,
TCGv_vec b)
{
TCGv_vec t1 = tcg_temp_new_vec_matching(a);
TCGv_vec t2 = tcg_temp_new_vec_matching(b);
int halfbits = 4 << vece;
/* Sign-extend even elements from a */
tcg_gen_dupi_vec(vece, t1, MAKE_64BIT_MASK(0, halfbits));
tcg_gen_and_vec(vece, a, a, t1);
No need to mask off these bits...
I am not sure. but the result is not correct. It's weird.
like this:
the vece is MO_32.
static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
{
TCGv_vec t1 = tcg_temp_new_vec_matching(a);
TCGv_vec t2 = tcg_temp_new_vec_matching(b);
int halfbits = 4 << vece;
tcg_gen_shli_vec(vece, t1, a, halfbits);
tcg_gen_shri_vec(vece, t1, t1, halfbits);
tcg_gen_shli_vec(vece, t2, b, halfbits);
tcg_gen_shri_vec(vece, t2, t2, halfbits);
tcg_gen_add_vec(vece, t, t1, t2);
tcg_temp_free_vec(t1);
tcg_temp_free_vec(t2);
}
...
op[MO_16];
{
.fniv = gen_vaddwev_s,
.fno = gen_helper_vaddwev_w_h,
.opt_opc = vecop_list,
.vece = MO_32
},
...
TRANS(vaddwev_w_h, gvec_vvv, MO_16, gvec_vaddwev_s)
input : 0x ffff fffe ffff fffe ffff fffe ffff fffe + 0
output : 0x 0000 fffe 0000 fffe 0000 fffe 0000 fffe
the crroect is 0xffffffffefffffffefffffffefffffffe.
Thanks.
Song Gao
tcg_gen_shli_vec(vece, a, a, halfbits);
... because they shift out here anyway.
tcg_gen_sari_vec(vece, a, a, halfbits);
/* Sign-extend even elements from b */
tcg_gen_dupi_vec(vece, t2, MAKE_64BIT_MASK(0, halfbits));
tcg_gen_and_vec(vece, b, b, t2);
tcg_gen_shli_vec(vece, b, b, halfbits);
tcg_gen_sari_vec(vece, b, b, halfbits);
tcg_gen_add_vec(vece, t, a, b);
tcg_temp_free_vec(t1);
tcg_temp_free_vec(t2);
}
Otherwise this looks good.