LuoYuanke added a comment.

> I think that point was not really clear during the discussion. Using `load 
> <256 x i32>` to lower `__tile_loadd() ` would indeed be incorrect. But I 
> don't think that's happening at the moment, at least going from a simple 
> example https://gcc.godbolt.org/z/KT5rczn8j

The `load/store <256 x i32>` is generated by front-end, because in C language 
tile is a vector <256 x i32>. The `load/store <256 x i32>` is transformed to 
`llvm.x86.tileloadd64.internal/llvm.x86.tilestored64.internal` in 
`lib/Target/X86/X86LowerAMXType.cpp` if the load result is to be an operand of 
amx intrinsics or the store value is returned from amx intrinsics.

>   void foo() {
>     tilea = __builtin_ia32_tileloadd64_internal(16, 64, buf, 64);
>   }
>
> is lowered to
>
>   define dso_local void @foo() #0 {
>     %1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* 
> getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64)
>     %2 = bitcast x86_amx %1 to <256 x i32>
>     store <256 x i32> %2, <256 x i32>* @tilea, align 64
>     ret void
>   }
>
> So we emit an intrinsic to do the strided load and the result is stored to 
> continuous memory, which is what the type `_tile1024i` requires. What's not 
> modeled correctly is the conversion between the result of 
> `@llvm.x86.tileloadd64.internal` and the `store`. It needs to be transferred 
> in a flat vector.

Yes.  I agree that it needs to be transferred in a flat vector.

> Whether we should have `x86_amx` in the first place is a separate question I 
> think. Having a builtin type that does not work properly with fundamental 
> instructions like `load` or `store` seems prone for errors (what instructions 
> actually work with `x86_amx`? Do binary operators work?). Perhaps it would be 
> possible and sufficient to have the intrinsics use an opaque type instead of 
> a builtin type, like

We only support tileload, tilestore, tilezero, tiletdp (dot product) 
instructions/intrinsics for `x86_amx`. Is there any opaque type example llvm 
source code for builtin? This example has some error at 
https://gcc.godbolt.org/z/ar6WhjTMz.

>   %my_x86_amx = type opaque
>   
>   define %my_x86_amx @foo(%my_x86_amx %x) {
>     ret %my_x86_amx %x
>   }
>
> But I think we should address those 2 issues separately and fix the biggest 
> problem (mis-use of `bitcast`) first, perhaps followed up by verifier rules 
> rejecting `x86_amx` from un-suitable instructions and go from there.

I may further implement this patch and transform/eliminate 
@llvm.x86.vector.amx.cast in `lib/Target/X86/X86LowerAMXType.cpp` which is 
before codegen. There is some effort to implement it, but I'd like to take a 
try.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to