@adavis 

sorry-- PE: physical execution unit (e.g. just a generic name for cpu, 
accelerator, etc)

thanks for the clarifying explanation. I think you should follow [this 
discussion](https://discuss.tvm.apache.org/t/pre-rfc-additional-target-hooks/10430)
 on splitting the BYOC lower and generate apart. This is something we're 
working on, but don't have yet.

I almost suggested you implement a CUDA-like codegen which inherits from 
[CodegenC](https://github.com/apache/tvm/blob/main/src/target/source/codegen_c.h),
 but which can generate the C++ primitives you want. I don't think this would 
ultimately work out that great because you'd need to model the C++ target as a 
separate device and I don't think scheduling would fall back on the host device 
properly. But you might be able to get that to work if you hack at it enough. 
It likely wouldn't be upstreamable in that form.

Outside of the TVM C++ runtime, we are missing a specification to interact with 
DMA, so as you mentioned it needs to happen via call_extern for now. Some 
initial discussion of heterogeneous compute with AOT/C runtime is happening 
[here](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/12).
 I expect we will resolve this in the near future, but it needs e.g. RFC and 
community feedback to properly add. I'll definitely be sure to loop you in as 
that work gets traction. If you can share, it would be helpful to understand 
the interface your DMA engine presents and whether you've been successful using 
the built-in TVM prefetch.

-Andrew





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/default-schedule-for-custom-target/10593/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/e03331c39ffda7ae6db53c6799c48469212490bc188a4ad330e1593d4d1e912d).

Reply via email to