Do you guys know if we have the same speedups than with 2d conv networks like resnet 50 8 bit quantized vs FP32 / 16 ? Is the full range of optimizations available for this ?
In comparison, TensorRT doesn't support 8 bit quantization (and other optimizations) for 3d operations, so the speedup is tiny compared to FP16. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/issues/4009#issuecomment-692124105