On 08/23/2018 08:21 AM, Peter Maydell wrote: > On 9 August 2018 at 05:21, Richard Henderson > <[email protected]> wrote: >> The 16-byte load only uses 16 predicate bits. But while >> reusing the other load infrastructure, we find other bits >> that are set and trigger an assert. To avoid this and >> retain the assert, zero-extend the predicate that we pass >> to the LD1 helper. >> >> Reported-by: Laurent Desnogues <[email protected]> >> Signed-off-by: Richard Henderson <[email protected]> >> --- >> target/arm/translate-sve.c | 25 +++++++++++++++++++++++-- >> 1 file changed, 23 insertions(+), 2 deletions(-) >> >> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c >> index d27bc8c946..bef6b8242d 100644 >> --- a/target/arm/translate-sve.c >> +++ b/target/arm/translate-sve.c >> @@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, >> TCGv_i64 addr, int msz) >> unsigned vsz = vec_full_reg_size(s); >> TCGv_ptr t_pg; >> TCGv_i32 desc; >> + int poff; >> >> /* Load the first quadword using the normal predicated load helpers. */ >> desc = tcg_const_i32(simd_desc(16, 16, zt)); >> - t_pg = tcg_temp_new_ptr(); >> >> - tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg)); >> + poff = pred_full_reg_offset(s, pg); >> + if (vsz > 16) { >> + /* >> + * Zero-extend the first 16 bits of the predicate into a temporary. >> + * This avoids triggering an assert making sure we don't have bits >> + * set within a predicate beyond VQ, but we have lowered VQ to 1 >> + * for this load operation. >> + */ >> + TCGv_i64 tmp = tcg_temp_new_i64(); >> +#ifdef HOST_WORDS_BIGENDIAN >> + poff += 6; >> +#endif >> + tcg_gen_ld16u_i64(tmp, cpu_env, poff); >> + >> + poff = offsetof(CPUARMState, vfp.preg_tmp); >> + tcg_gen_st_i64(tmp, cpu_env, poff); >> + tcg_temp_free_i64(tmp); >> + } >> + >> + t_pg = tcg_temp_new_ptr(); >> + tcg_gen_addi_ptr(t_pg, cpu_env, poff); > > Reviewed-by: Peter Maydell <[email protected]> > > The bigendian #ifdef in the middle of the code is a little > ugly, though -- I don't suppose it's possible to avoid it > (or abstract it away) somehow?
Adding a helper function for this one use didn't seem worthwhile. But I certainly can duplicate the form of vec_reg_offset if you prefer. r~
