Many thanks `tkonolige`, I think this is a good excuse for me to learn more about the internals of the TVM runtime and profiling.
I've started with making a simple C++ deployment of the `matmul_add`, with the goal of using it to implement Option 1. I am following the basic structure of `apps/howto_deploy` ([link](https://github.com/apache/tvm/tree/main/apps/howto_deploy)) for my example. Basically, I want to get it working in C++ before I try and make a nice Python wrapper, and all the layers of abstraction I'd need to break through. I've been reading through the PAPI and Profiler code, and have already learned a lot. I see in [the definition of the Profiler](https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L337) the example usage: ``` Device cpu, gpu; Profiler prof({cpu, gpu}); my_gpu_kernel(); // do a warmup iteration prof.Start(); prof.StartCall("my_gpu_kernel", gpu); my_gpu_kernel(); prof.StopCall(); prof.StartCall("my_cpu_function", cpu); my_cpu_function(); prof.StopCall(); prof.Stop(); std::cout << prof.Report << std::endl; // print profiling report ``` I am trying something similar, which might be the right way to go, using the PAPI collector as the metric collector: ```c++ tvm::Device dev = {kDLCPU, 0}; tvm::Map<tvm::runtime::profiling::DeviceWrapper, tvm::Array<tvm::String>> metrics({ {kDLCPU, {"perf::CYCLES", "perf::STALLED-CYCLES-FRONTEND", "perf::STALLED-CYCLES-BACKEND", "perf::INSTRUCTIONS", "perf::CACHE-MISSES"}}, {kDLCUDA, {"cuda:::event:elapsed_cycles_sm:device=0"}}}); tvm::runtime::profiling::MetricCollector papi_collector = tvm::runtime::profiling::CreatePAPIMetricCollector(metrics); std::cout << "papi_collector created" << std::endl; tvm::runtime::profiling::Profiler prof = tvm::runtime::profiling::Profiler({dev}, {papi_collector}); std::cout << "Profiler created" << std::endl; f(A, B, C, out); // warmup std::cout << "Warmup perfomed" << std::endl; prof.Start(); prof.StartCall("matmul_add_dyn", dev); f(A, B, C, out); prof.StopCall(); ``` My main issue right now is struggling with the initaliser of `metrics`, which `CreatePAPIMetricCollector` requires. It's not clear to me how to get the typing right. I can't find anywhere else in the codebase that uses `Map<DeviceWrapper, Array<String>>`. I have [my code here](https://github.com/Wheest/tvm-papi-single-op), which can be cloned into `tvm/apps`, and run with `./run_example.sh`. Compiling the PAPI example is `make papi`. Any pointers on that line? --- [Visit Topic](https://discuss.tvm.apache.org/t/papi-counters-with-basic-matmul-relay-function/11263/5) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/927d1bf5b0bf74102a7ec0a6bac5d53577bb5613f6ab0d5564fdca63d3b087b5).