On 9/3/25 15:01, Max Chou wrote:
+#define OPMVX_VQDOTQ(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2) \
+static void do_##NAME(void *vd, target_long s1, void *vs2, int i) \
+{ \
+ int idx; \
+ T1 r1; \
+ T2 r2; \
+ TX1 *r1_buf = (TX1 *)&s1; \
+ TX2 *r2_buf = (TX2 *)vs2 + HD(i); \
+ TD acc = ((TD *)vd)[HD(i)]; \
+ \
+ for (idx = 0; idx < 4; ++idx) { \
+ r1 = *((T1 *)r1_buf + HS1(idx)); \
+ r2 = *((T2 *)r2_buf + HS2(idx)); \
+ acc += r1 * r2; \
One could argue for a missing widening cast to TD here. You got away with it because the
only uses happen to have small inputs and "int" sized outputs, so C arithmetic promotion
worked for you.
You can move the variable declarations into the loop.
r~