Hi, I fork the https://github.com/GaryYuyjl/incubator-tvm/tree/int4tensorcore 
for int4 computation with tensorcore. I found it cost too much time while 
packing int4 to int32 with cpu. 
So I write the pack progress into conv2d compute&schedule and get good results. 
But the packing data time still takes up at least 30% of the total convolution 
time. It may because my compute&schedule code is bad.
Do you hava any good suggestion about efficiently packing data?





---
[Visit Topic](https://discuss.tvm.ai/t/rfc-discuss-tvm-v0-7-roadmap/5159/38) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/4d9726e40b5e7120e0ab2018ab09ffecdb138a13257942204ed7711687ee3e62).

Reply via email to