Hi TVM community,
I am facing the following problem: I have pruned a 2D model and now I want to use TVM quantization. Since int8 quantization takes advantage of the dp4a primitive, the workload should be divisible by ic_block_factor which is 4. However, my network is pruned and the channels are no longer divisible by 4, which results in an error https://github.com/apache/tvm/blob/5fa1c6dae0903f4dc31d39d42fcf582190ac1a68/python/tvm/topi/cuda/conv2d_int8.py#L91 I would like to pad the input weight channel/kernel. However, I am not really familiar with how TVM implements its kernel. Can you advise me an approach to solve my problem? --- [Visit Topic](https://discuss.tvm.apache.org/t/quantizatition-and-pruned-model/10308/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/e6578e8fa07a052d8046f798e8a8fef6ce93703406539439dad3492c022baf76).