> From: Dipesh Sharma <[email protected]>
> Sent: Friday, June 26, 2026 8:16 PM
>
I would like to mention before the comments. We have
a patch series support "pseudo" tmm registers. We did not
support tmm register allocation, but similar as ARM's
SME. We used internal pattern instead of inline assembly
wrapper to support ACE ISAs. This will help compiler know
the dependency on insts and maybe more convenient to
extend to real tmm register allocation. Would you like me to
send out the patch to have a look?
> diff --git a/gcc/config/i386/aceintrin.h b/gcc/config/i386/aceintrin.h
> new file mode 100644
> index 00000000000..b9292520ba7
> --- /dev/null
> +++ b/gcc/config/i386/aceintrin.h
> +#define _tile_setrowi(dst, src1, imm8) \
> + __asm__ volatile \
> + ("{tilemovrow\t%1, %0, %%tmm"#dst"|tilemovrow\ttmm"#dst", %0, %1}"
> \
> + :: "v" (src1), "i" (imm8))
> +
> +#define _tile_setrow(dst, src1, r32) \
> + __asm__ volatile \
> + ("{tilemovrow\t%1, %0, %%tmm"#dst"|tilemovrow\ttmm"#dst", %0, %1}"
> \
> + :: "v" (src1), "r" (r32))
I agree that we need to separate the intrin name from legacy AMX to avoid
confusion. But load/store intrins tend to use insert/extract in intrin name.
Could you change to them? Similar to bsr intrins.
> +
> +#define _tile_setcoli(dst, src1, imm8) \
> + __asm__ volatile \
> + ("{tilemovcol\t%1, %0, %%tmm"#dst"|tilemovcol\ttmm"#dst", %0, %1}" \
> + :: "v" (src1), "i" (imm8))
> +
> +#define _tile_setcol(dst, src1, r32) \
> + __asm__ volatile \
> + ("{tilemovcol\t%1, %0, %%tmm"#dst"|tilemovcol\ttmm"#dst", %0, %1}" \
> + :: "v" (src1), "r" (r32))
> +
> +#define _bsr_movf(src1, src2) \
> + __asm__ volatile \
> + ("{bsrmovf\t%1, %0, %%bsr0|bsrmovf\tbsr0, %0, %1}" \
> + :: "v" (src1), "v" (src2))
We may need to use _bsr0_* in case there are more BSRs coming in.
> +
> +#define _tile_top2bf16ps(dst, src1, src2) \
> + __asm__ volatile \
> +
> ("{top2bf16ps\t%1, %0, %%tmm"#dst"|top2bf16ps\ttmm"#dst", %0, %1}" \
> + :: "v" (src1), "v" (src2))
We need to omit the t in intrin name to keep the previous convention.
Furthermore, outer product is much like dot product and I believe the legacy
AMX intrin names should follow that but not according to the dot product
naming convention. We could use this opportunity to correct them in ACE.
Could we change to:
_tile_op2bf16_ps
_tile_op4bssd_epi32
_tile_op4mxbhf8_ps
etc.?
> +
> +#ifdef __DISABLE_ACE_V1__
> +#undef __DISABLE_ACE_V1__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_ACE_V1__ */
> +
I just realized that those ISAs reused from AMX-AVX512 are neither
in tileintrin.h, nor in aceintrin.h. How are you going to handle that?
BTW, if we are going to use a different name, it is a chance to correct
their naming. It is not correct currently.
Thx,
Haochen