On Fri, Aug 7, 2020 at 8:04 AM Richard Henderson < [email protected]> wrote:
> On 8/6/20 3:46 AM, [email protected] wrote: > > +static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz, bool > is_ldst) > > { > > - return simd_maxsz(desc) << vext_lmul(desc); > > + /* > > + * As simd_desc support at most 256 bytes, the max vlen is 256 bits. > > + * so vlen in bytes (vlenb) is encoded as maxsz. > > + */ > > + uint32_t vlenb = simd_maxsz(desc); > > + > > + if (is_ldst) { > > + /* > > + * Vector load/store instructions have the EEW encoded > > + * directly in the instructions. The maximum vector size is > > + * calculated with EMUL rather than LMUL. > > + */ > > + uint32_t eew = ctzl(esz); > > + uint32_t sew = vext_sew(desc); > > + uint32_t lmul = vext_lmul(desc); > > + int32_t emul = eew - sew + lmul; > > + uint32_t emul_r = emul < 0 ? 0 : emul; > > + return 1 << (ctzl(vlenb) + emul_r - ctzl(esz)); > > As I said before, the is_ldst instructions should put the EEW and EMUL > values > into the SEW and LMUL desc fields, so that this does not need to be special-cased at all. > I add a vext_get_emul() helper function in trans_rvv.inc.c: > static uint8_t vext_get_emul(DisasContext *s, uint8_t eew) > { > int8_t lmul = sextract32(s->lmul, 0, 3); > int8_t emul = ctzl(eew) - (s->sew + 3) + lmul; // may remove ctzl() if eew is already log2(eew) > return emul < 0 ? 0 : emul; > } and pass emul as LMUL field in VDATA so that it can be reused in vector_helper.c: vext_max_elems(): > uint8_t emul = vext_get_emul(s, eew); > data = FIELD_DP32(data, VDATA, LMUL, emul); I also remove the passing SEW field in VDATA codes as I think SEW might not be required in the updated vext_max_elems() (see below). > > > + /* Return VLMAX */ > > + return 1 << (ctzl(vlenb) + vext_lmul(desc) - ctzl(esz)); > > This is overly complicated. > > (1) 1 << ctzl(vlenb) == vlenb. > (2) I'm not sure why esz is not already a log2 number. > esz is passed from e.g. GEN_VEXT_LD_STRIDE() macro: > #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN) \ > void HELPER(NAME)(void *vd, void * v0, target_ulong base, \ > target_ulong stride, CPURISCVState *env, \ > uint32_t desc) \ > { \ > uint32_t vm = vext_vm(desc); \ > vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN, \ > sizeof(ETYPE), GETPC(), MMU_DATA_LOAD); \ > } > > GEN_VEXT_LD_STRIDE(vlse8_v, int8_t, lde_b) which is calculated by sizeof(ETYPE), so the results would be: 1, 2, 4, 8. and vext_max_elems() is called by e.g. vext_ldst_stride(): > uint32_t max_elems = vext_max_elems(desc, esz); I can add another parameter to the macro and pass the hard-coded log2(esz) number if it's the better way instead of using ctzl(). Or if there's another approach to get the log2(esz) number more elegantly? > > This ought to look more like > > int scale = lmul - esz; > return (scale < 0 > ? vlenb >> -scale > : vlenb << scale); > > Thanks for the detailed point outs. I manage to change the codes to below as your suggestion. > static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz) > { > /* > * As simd_desc support at most 256 bytes, the max vlen is 256 bits. > * so vlen in bytes (vlenb) is encoded as maxsz. > */ > uint32_t vlenb = simd_maxsz(desc); > > /* Return VLMAX */ > int scale = vext_lmul(desc) - ctzl(esz); // may remove ctzl() if esz is already log2(esz) > return scale < 0 ? vlenb >> -scale : vlenb << scale; > } > > r~ > Thanks for the review. Frank Chang
