For rasp3 and rasp4, we saw 1.3x - 1.5x performance speedup going from FP32 to Int8.
The link comparing QNNPACK and TVM is not upstream'd yet. If I understand correctly, it will be sometime before the authors of that work will be able to make it to upstream. There are some differences in underlying design as well, which might cause some delays in getting to that performance. Regarding int16, we observed that LLVM can generate good enough good with int16 instead of int8 for rasp3/4. So we uplift the datatype to int16 (exception is Intel Cascadelake and Nvidia devices). When we write a better schedule with int8 datatypes, we can remove the upcasting. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/27) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/fbe8d42637387d54830b723d7de31d07150c77d9dcee57ba0b3182c908483459).