On 6/7/21 9:57 AM, Peter Maydell wrote:
+#define DO_VDUP(OP, ESIZE, TYPE, H) \
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val) \
+ { \
+ TYPE *d = vd; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ uint64_t bytemask = mask_to_bytemask##ESIZE(mask); \
+ d[H(e)] &= ~bytemask; \
+ d[H(e)] |= (val & bytemask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+DO_VDUP(vdupb, 1, uint8_t, H1)
+DO_VDUP(vduph, 2, uint16_t, H2)
+DO_VDUP(vdupw, 4, uint32_t, H4)
Hmm. I think the masking should be done at either uint32_t or uint64_t. Doing
it byte-by-byte is wasteful.
Whether you want to do the replication in tcg (I can export gen_dup_i32 from
tcg-op-gvec.c) and have one helper, or do the replication here e.g.
static void do_vdup(CPUARMState *env, void *vd, uint64_t val);
void helper(mve_vdupb)(CPUARMState *env, void *vd, uint32_t val)
{
do_vdup(env, vd, dup_const(MO_8, val));
}
r~