@adavis
sorry-- PE: physical execution unit (e.g. just a generic name for cpu, accelerator, etc) thanks for the clarifying explanation. I think you should follow [this discussion](https://discuss.tvm.apache.org/t/pre-rfc-additional-target-hooks/10430) on splitting the BYOC lower and generate apart. This is something we're working on, but don't have yet. I almost suggested you implement a CUDA-like codegen which inherits from [CodegenC](https://github.com/apache/tvm/blob/main/src/target/source/codegen_c.h), but which can generate the C++ primitives you want. I don't think this would ultimately work out that great because you'd need to model the C++ target as a separate device and I don't think scheduling would fall back on the host device properly. But you might be able to get that to work if you hack at it enough. It likely wouldn't be upstreamable in that form. Outside of the TVM C++ runtime, we are missing a specification to interact with DMA, so as you mentioned it needs to happen via call_extern for now. Some initial discussion of heterogeneous compute with AOT/C runtime is happening [here](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/12). I expect we will resolve this in the near future, but it needs e.g. RFC and community feedback to properly add. I'll definitely be sure to loop you in as that work gets traction. If you can share, it would be helpful to understand the interface your DMA engine presents and whether you've been successful using the built-in TVM prefetch. -Andrew --- [Visit Topic](https://discuss.tvm.apache.org/t/default-schedule-for-custom-target/10593/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/e03331c39ffda7ae6db53c6799c48469212490bc188a4ad330e1593d4d1e912d).