Hi, I fork the https://github.com/GaryYuyjl/incubator-tvm/tree/int4tensorcore for int4 computation with tensorcore. I found it cost too much time while packing int4 to int32 with cpu. So I write the pack progress into conv2d compute&schedule and get good results. But the packing data time still takes up at least 30% of the total convolution time. It may because my compute&schedule code is bad. Do you hava any good suggestion about efficiently packing data?
--- [Visit Topic](https://discuss.tvm.ai/t/rfc-discuss-tvm-v0-7-roadmap/5159/38) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/4d9726e40b5e7120e0ab2018ab09ffecdb138a13257942204ed7711687ee3e62).