In our application, we're utilizing UMA to offload specific operations (e.g. 
`conv2d`) to a custom accelerator. We're also utilizing USMP to specify two 
`WorkspaceMemoryPools` called `l2_mem` and `act_mem`. `l2_mem` is accessible by 
`Target("c")`, while `act_mem` is accessible by both `Target("c")` and 
`Target("accel")`. Using a relay pass, we add a few layout transforms and extra 
custom operations to ensure compatibilty between operations run on `C` and 
`accel` backends. 

Certain operations require inputs/outputs to be in specific memory pools:
* `accel_input_fetcher()`: Input -> `l2_mem`, output -> `act_mem`
* `accel_conv2d()`: Input & Output -> `act_mem`

Currently, the codegen looks like this for a network with 1 conv layer:
```C
// default_lib1.c
TVM_DLL int32_t tvmgen_default___tvm_main__(int8_t* data_buffer_var, int8_t* 
output_buffer_var, uint8_t* act_mem_0_var, uint8_t* l2_mem_1_var, uint8_t* 
wei_mem_2_var) {
  void* constant_0_let = (&(wei_mem_2_var[0]));
  void* sid_1_let = (&(l2_mem_1_var[0]));    // >>>> L2_MEM - OK
  void* sid_3_let = (&(l2_mem_1_var[0]));    // >>>> L2_MEM - NOT OK - Would 
like it to be in ACT_MEM
  if (tvmgen_default_fused_layout_transform(data_buffer_var, sid_1_let, ...) != 
0 ) return -1;
  if (tvmgen_default_accel_main_0(sid_1_let, constant_0_let, sid_3_let, ...) != 
0 ) return -1;
  if (tvmgen_default_fused_layout_transform_strided_slice(sid_3_let, 
output_buffer_var, ...) != 0 ) return -1;
  return 0;
}

// default_lib2.c
TVM_DLL int32_t tvmgen_default_accel_main_0(int8_t* accel_0_i0, int8_t* 
tvm_var_extract_const_0, int8_t* accel_conv2d, uint8_t* act_mem_6_var, uint8_t* 
l2_mem_7_var, uint8_t* wei_mem_8_var) {
  void* input_fetcher_let = (&(act_mem_6_var[0]));    // >>>> ACT_MEM - OK
  accel_input_fetcher(accel_0_i0, accel_input_fetcher_let, ...);
  accel_conv2d(accel_input_fetcher_let, tvm_var_extract_const_0, accel_conv2d, 
...);
  return 0;
}
```

I've tried adding a `tir_pass` which captures `tir.Allocate` ops and add the 
annotation "candidate_memory_pools", but since I'm registering the `tir_pass` 
using UMA's `register_tir_pass()`, it's only triggering for the offloaded 
function (in `default_lib2.c`), and I only capture the `tir.Allocate` for the 
`input_fetcher_let` buffer. Ideally I would like to capture the buffer 
allocates for the "main" function as well.

How can I proceed? Is there a way to achieve what I need?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/usmp-uma-pin-buffer-in-main-to-a-specific-memory-pool/15760/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/4299e6468cbc00fe25f5a59b9c2f139e90dc56855f529761f5e0c1f743c56016).

Reply via email to