Did you mean the number of cycles for a single GEMM instruction? AKAIK, it
depends on the implementation, as it for Chisel-based implementation, a single
GEMM instruction takes 4 cycles to complete, since there are stages in the
design that prepare the data stream for such execution.
---
Before start working on actual FPGA, we could evaluate with TSIM
(cycle-accurate simulation) in TVM; For experiments with vision models, please
refer to [ Deploy Pretrained Vision Model from MxNet on
VTA](https://tvm.apache.org/docs/vta/tutorials/frontend/deploy_classification.html?highlight=t
I'm not quite clear about the extent, but I think the PR [Add µTVM Zephyr
support + QEMU regression test
#6603](https://github.com/apache/incubator-tvm/pull/6603) should be helpful for
evaluating uTVM on RISC-V.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/can-the-tvm-stack-target-
Did you mean LOG_BLOCK=4 or just BLOCK=4?
If LOG_BLOCK=4, that means BLOCK_IN and BLOCK_OUT would be 16. Therefore, in a
GEMM instruction, it would perform 16x16 fused-multiply-add (MAC), that is 256
MACs in a single GEMM instruction. In my calculation,
256 MACs * 0.142 GHz = 36.352 GOps
Not