Hi TVM community,

I am facing the following problem: I have pruned a 2D model and now I want to 
use TVM quantization. Since int8 quantization takes advantage of the dp4a 
primitive, the workload should be divisible by ic_block_factor which is 4. 
However, my network is pruned and the channels are no longer divisible by 4, 
which results in an error 

https://github.com/apache/tvm/blob/5fa1c6dae0903f4dc31d39d42fcf582190ac1a68/python/tvm/topi/cuda/conv2d_int8.py#L91

I would like to pad the input weight channel/kernel. However, I am not really 
familiar with how TVM implements its kernel.

Can you advise me an approach to solve my problem?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/quantizatition-and-pruned-model/10308/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/e6578e8fa07a052d8046f798e8a8fef6ce93703406539439dad3492c022baf76).

Reply via email to