> If we have time, we could investigate why we couldn't achieve 252GFlops even 
> more. Only 73% hardware efficiency means we have much work could dive.

252 Gops/s is a reasonable number as this is ~90% hardware efficiency. 
Currently FBGEMM and MKL-DNN can reach this number. For the current PR, the 
reason is that we didn't fully utilize the accumulation in `vpdpbusd` 
instruction, and we get 205.6 Gops/s (73% efficiency).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3388#issuecomment-518429450

Reply via email to