On Fri, Aug 7, 2020 at 8:04 AM Richard Henderson <
[email protected]> wrote:

> On 8/6/20 3:46 AM, [email protected] wrote:
> > +static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz, bool
> is_ldst)
> >  {
> > -    return simd_maxsz(desc) << vext_lmul(desc);
> > +    /*
> > +     * As simd_desc support at most 256 bytes, the max vlen is 256 bits.
> > +     * so vlen in bytes (vlenb) is encoded as maxsz.
> > +     */
> > +    uint32_t vlenb = simd_maxsz(desc);
> > +
> > +    if (is_ldst) {
> > +        /*
> > +         * Vector load/store instructions have the EEW encoded
> > +         * directly in the instructions. The maximum vector size is
> > +         * calculated with EMUL rather than LMUL.
> > +         */
> > +        uint32_t eew = ctzl(esz);
> > +        uint32_t sew = vext_sew(desc);
> > +        uint32_t lmul = vext_lmul(desc);
> > +        int32_t emul = eew - sew + lmul;
> > +        uint32_t emul_r = emul < 0 ? 0 : emul;
> > +        return 1 << (ctzl(vlenb) + emul_r - ctzl(esz));
>
> As I said before, the is_ldst instructions should put the EEW and EMUL
> values
> into the SEW and LMUL desc fields, so that this does not need to be

special-cased at all.
>

I add a vext_get_emul() helper function in trans_rvv.inc.c:

> static uint8_t vext_get_emul(DisasContext *s, uint8_t eew)
> {
>     int8_t lmul = sextract32(s->lmul, 0, 3);
>     int8_t emul = ctzl(eew) - (s->sew + 3) + lmul;  // may remove ctzl()
if eew is already log2(eew)
>     return emul < 0 ? 0 : emul;
> }

and pass emul as LMUL field in VDATA so that it can be
reused in vector_helper.c: vext_max_elems():

> uint8_t emul = vext_get_emul(s, eew);
> data = FIELD_DP32(data, VDATA, LMUL, emul);

I also remove the passing SEW field in VDATA codes as I think SEW
might not be required in the updated vext_max_elems() (see below).


>
> > +        /* Return VLMAX */
> > +        return 1 << (ctzl(vlenb) + vext_lmul(desc) - ctzl(esz));
>
> This is overly complicated.
>
> (1) 1 << ctzl(vlenb) == vlenb.
> (2) I'm not sure why esz is not already a log2 number.
>

esz is passed from e.g. GEN_VEXT_LD_STRIDE() macro:

> #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)        \
> void HELPER(NAME)(void *vd, void * v0, target_ulong base,           \
>                   target_ulong stride, CPURISCVState *env,
      \
>                   uint32_t desc)
                         \
> {
                                   \
>     uint32_t vm = vext_vm(desc);
                \
>     vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,     \
>                      sizeof(ETYPE), GETPC(), MMU_DATA_LOAD);            \
> }
>
> GEN_VEXT_LD_STRIDE(vlse8_v,  int8_t,  lde_b)

which is calculated by sizeof(ETYPE), so the results would be: 1, 2, 4, 8.
and vext_max_elems() is called by e.g. vext_ldst_stride():

> uint32_t max_elems = vext_max_elems(desc, esz);

I can add another parameter to the macro and pass the hard-coded log2(esz)
number
if it's the better way instead of using ctzl().
Or if there's another approach to get the log2(esz) number more elegantly?


>
> This ought to look more like
>
>   int scale = lmul - esz;
>   return (scale < 0
>           ? vlenb >> -scale
>           : vlenb << scale);
>
>
Thanks for the detailed point outs.
I manage to change the codes to below as your suggestion.

> static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
> {
>     /*
>      * As simd_desc support at most 256 bytes, the max vlen is 256 bits.
>      * so vlen in bytes (vlenb) is encoded as maxsz.
>      */
>     uint32_t vlenb = simd_maxsz(desc);
>
>     /* Return VLMAX */
>     int scale = vext_lmul(desc) - ctzl(esz);  // may remove ctzl() if esz
is already log2(esz)
>     return scale < 0 ? vlenb >> -scale : vlenb << scale;
> }


>
> r~
>

Thanks for the review.
Frank Chang

Reply via email to