Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 6:30 PM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu wrote: > > > > On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu wrote: > > > > > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > > > > > > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > >

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread H.J. Lu
On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu wrote: > > > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > > > > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > >

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu
On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu wrote: > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > > > > > ldtilecfg and sttilecfg take a 512-byte memory block. Wit

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu wrote: > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > > > ldtilecfg and sttilecfg take a 512-byte memory block. With > > > _tile_loadconfig implemented as > > > > > > extern __inline

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu
On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu wrote: > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > > > ldtilecfg and sttilecfg take a 512-byte memory block. With > > _tile_loadconfig implemented as > > > > extern __inline void > > __attribute__((__gnu_inline__, __always_inline__, __artifi

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu wrote: > > ldtilecfg and sttilecfg take a 512-byte memory block. With > _tile_loadconfig implemented as > > extern __inline void > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _tile_loadconfig (const void *__config) > { > __asm__ v

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongyu Wang
Thanks for fixing this! Didn't notice that the pointer conversion can cause this issue... Was it possible to use local array like char a[64] = (char *)p __asm__ volatile ("ldtilecfg\t%X0" :: "m" (a))); If not, for the two patterns we can use "m" instead of "jm" as APX supports EGPR extension for

[PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread H.J. Lu
ldtilecfg and sttilecfg take a 512-byte memory block. With _tile_loadconfig implemented as extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _tile_loadconfig (const void *__config) { __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void **)__config))); }