I got very close to matching PyTorch's bmm on Vega 20 (Radeon VII) and to about 
to 1.5x on 1080Ti for the 1024 example (with fixed dims).

One of the limiting things on the path ahead is the "-1" issue in the output 
configurations of course.

Best regards

Thomas





---
[Visit 
Topic](https://discuss.tvm.ai/t/optimizing-matrix-multiplication-for-gpu/4212/26)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/63020d8b5ed6e9fdcc88dfe4ddf6688d614ef1abd2cacb1a026a893f15134c7c).

Reply via email to