On 7/29/21 1:14 AM, Peter Maydell wrote:
+ r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], d[H##ESIZE(e)], \
+ 0, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+#define DO_VFMS16(N, M, D, F, S) float16_muladd(float16_chs(N), M, D, F, S)
+#define DO_VFMS32(N, M, D, F, S) float32_muladd(float32_chs(N), M, D, F, S)
+
+DO_VFMA(vfmah, 2, uint16_t, float16_muladd)
+DO_VFMA(vfmas, 4, uint32_t, float32_muladd)
+DO_VFMA(vfmsh, 2, uint16_t, DO_VFMS16)
+DO_VFMA(vfmss, 4, uint32_t, DO_VFMS32)
Here's where I think passing float16/float32 as the type will pay off, with
r = n[H##SIZE(e)];
if (CHS) {
r = TYPE##_chs(r);
}
r = TYPE##_muladd(r, m[...], d[...], 0, fpst);
r~