[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-30 Thread Alan Li via cfe-commits
@@ -444,17 +444,40 @@ def ROCDL_ds_read_tr6_b96 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr6.b96">; def ROCDL_ds_read_tr16_b64 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr16.b64">; //===-===// -// Global load to LDS int

[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

2025-04-29 Thread Alan Li via cfe-commits
lialan wrote: > > I still think we need an intrinsic here because a load + an addtid store > > can be scheduled much different from the asynchronous "gather to LDS" - and > > because we don't want this load/store to not be optimized > > IMO the intrinsic should only be added as a last resort i