Many thanks `tkonolige`, I think this is a good excuse for me to learn more 
about the internals of the TVM runtime and profiling.

I've started with making a simple C++ deployment of the `matmul_add`, with the 
goal of using it to implement Option 1.

I am following the basic structure of `apps/howto_deploy` 
([link](https://github.com/apache/tvm/tree/main/apps/howto_deploy)) for my 
example.

Basically, I want to get it working in C++ before I try and make a nice Python 
wrapper, and all the layers of abstraction I'd need to break through.

I've been reading through the PAPI and Profiler code, and have already learned 
a lot.  I see in [the definition of the 
Profiler](https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L337)
 the example usage:

```
Device cpu, gpu;
Profiler prof({cpu, gpu});
my_gpu_kernel(); // do a warmup iteration
prof.Start();
prof.StartCall("my_gpu_kernel", gpu);
my_gpu_kernel();
prof.StopCall();
prof.StartCall("my_cpu_function", cpu);
my_cpu_function();
prof.StopCall();
prof.Stop();
std::cout << prof.Report << std::endl; // print profiling report
```

I am trying something similar, which might be the right way to go, using the 
PAPI collector as the metric collector:


```c++
tvm::Device dev = {kDLCPU, 0};
tvm::Map<tvm::runtime::profiling::DeviceWrapper, tvm::Array<tvm::String>> 
metrics({
   {kDLCPU,
    {"perf::CYCLES", "perf::STALLED-CYCLES-FRONTEND", 
"perf::STALLED-CYCLES-BACKEND",
     "perf::INSTRUCTIONS", "perf::CACHE-MISSES"}},
   {kDLCUDA, {"cuda:::event:elapsed_cycles_sm:device=0"}}});


tvm::runtime::profiling::MetricCollector papi_collector = 
tvm::runtime::profiling::CreatePAPIMetricCollector(metrics);

std::cout << "papi_collector created" << std::endl;

tvm::runtime::profiling::Profiler prof = 
tvm::runtime::profiling::Profiler({dev}, {papi_collector});
std::cout << "Profiler created" << std::endl;
f(A, B, C, out); // warmup
std::cout << "Warmup perfomed" << std::endl;
prof.Start();
prof.StartCall("matmul_add_dyn", dev);
f(A, B, C, out);
prof.StopCall();
```

My main issue right now is struggling with the initaliser of `metrics`, which 
`CreatePAPIMetricCollector` requires.  It's not clear to me how to get the 
typing right.

I can't find anywhere else in the codebase that uses `Map<DeviceWrapper, 
Array<String>>`.

I have [my code here](https://github.com/Wheest/tvm-papi-single-op), which can 
be cloned into `tvm/apps`, and run with `./run_example.sh`.  Compiling the PAPI 
example is `make papi`.

Any pointers on that line?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/papi-counters-with-basic-matmul-relay-function/11263/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/927d1bf5b0bf74102a7ec0a6bac5d53577bb5613f6ab0d5564fdca63d3b087b5).

Reply via email to