On 9/3/25 15:01, Max Chou wrote:
+#define OPMVV_VQDOTQ(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2) \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i) \
+{ \
+ int idx; \
+ T1 r1; \
+ T2 r2; \
+ TX1 *r1_buf = (TX1 *)vs1 + HD(i); \
+ TX2 *r2_buf = (TX2 *)vs2 + HD(i); \
+ TD acc = ((TD *)vd)[HD(i)]; \
+ \
+ for (idx = 0; idx < 4; ++idx) { \
+ r1 = (TD)(*((T1 *)r1_buf + HS1(idx))); \
+ r2 = (TD)(*((T2 *)r2_buf + HS2(idx))); \
+ acc += r1 * r2; \
Incorrect typing or casting, take your pick.
I suggest
for (int idx = 0; idx < 4; ++idx) {
T1 r1 = ((T1 *)r1_buf)[HS1(idx)];
T2 r2 = ((T2 *)r1_buf)[HS2(idx)];
acc += (TD)r1 * (TD)r2;
}
r~
