@@ -444,17 +444,40 @@ def ROCDL_ds_read_tr6_b96 :
ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr6.b96">;
def ROCDL_ds_read_tr16_b64 : ROCDL_LDS_Read_Tr_IntrOp<"ds.read.tr16.b64">;
//===-===//
-// Global load to LDS int
lialan wrote:
> > I still think we need an intrinsic here because a load + an addtid store
> > can be scheduled much different from the asynchronous "gather to LDS" - and
> > because we don't want this load/store to not be optimized
>
> IMO the intrinsic should only be added as a last resort i