On 12/16/22 18:31, 钟居哲 wrote:
Register allocation (RA) doesn't affect the assembler checks since I
relax the registers in assmebler checks,
all assmebler checks have their own goal. For example:
The code like this:
+void foo2 (void * restrict in, void * restrict out, int n)
+{
+ for (int i = 0; i < n; i++)
+ {
+ vuint16mf4_t v = *(vuint16mf4_t*)(in + i);
+ *(vuint16mf4_t*)(out + i) = v;
+ }
+}
Assembler check:
scan-assembler-times
{vsetvli\s+(?:ra|[sgtf]p|t[0-6]|s[0-9]|s10|s11|a[0-7]),\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]\s+\.L[0-9]\:\s+vle16\.v\s+(?:v[0-9]|v[1-2][0-9]|v3[0-1]),0\s*\((?:ra|[sgtf]p|t[0-6]|s[0-9]|s10|s11|a[0-7])\
I don't care about which vector register is using since I relax register in
assembler : (?:v[0-9]|v[1-2][0-9]|v3[0-1]), this means any vector register
v0-v31
But also I relax scalar register : (?:ra|[sgtf]p|t[0-6]|s[0-9]|s10|s11|a[0-7]),
so could be any x0 - x31 of them.
The only strict check is that make sure the vsetvl is hoist outside the loop
meaning the location of vsetvl is outside of the Lable L[0-9]:
vsetvli\s+(?:ra|[sgtf]p|t[0-6]|s[0-9]|s10|s11|a[0-7]),\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]\s+\.L[0-9]
You can see the last assembler is \s+\.L[0-9] to make sure VSETVL PASS
successfully do the optimization that hoist the vsetvl instruction outside the
loop
I try to use check-function-body but it fails since it can not recognize the
Lable which is most important for such cases.
Ah, I should have looked at those regexps closer. Understood about the
checking for hoisting the vsetvl. Though it makes me wonder if we'd be
better off dumping information out of the vsetvl pass.
In the case of hoisting we could dump the loop nest of the original
evaluation block and the loop nest of the new vsetvl location, then we
scan for those in the vsetvl pass dump. While it doesn't check the
assembly code, it's probably just as good if not better.
Consider that as an alternative. But I'm not going to insist on it. I
just know we've had a lot of trouble through the years where assembly
code changes slightly, causing test fails. So I try to avoid too much
assembly scanning if it can be avoided. Often the easiest way to get
the same basic effect is to dump at the transformation point and scan
for those markers in the dump file.
Jeff