Re: [PATCH] target/arm: Fix 32-bit SMOPA

2024-03-05 Thread Michael Tokarev
05.03.2024 05:31, Richard Henderson : The while the 8-bit input elements are sequential in the input vector, Nitpick: "The while the" /mjt

Re: [PATCH] target/arm: Fix 32-bit SMOPA

2024-03-05 Thread Richard Henderson
On 3/4/24 23:19, Philippe Mathieu-Daudé wrote: +    uint32_t *za_row = &za[H4(tile_vslice_index(row))]; +    uint32_t n = zn[H4(row)]; + +    for (col = 0; col < oprsz; ++col) { +    uint8_t pb = pm[H1(col >> 1)] >> ((col & 1) * 4); +    uint32_t *a = &za_row[col]; S

Re: [PATCH] target/arm: Fix 32-bit SMOPA

2024-03-05 Thread Philippe Mathieu-Daudé
Hi Richard, On 5/3/24 03:31, Richard Henderson wrote: The while the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-sta...@nongnu.org Fixes: 23a5e38

[PATCH] target/arm: Fix 32-bit SMOPA

2024-03-04 Thread Richard Henderson
The while the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-sta...@nongnu.org Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")