Did you mean LOG_BLOCK=4 or just BLOCK=4?
If LOG_BLOCK=4, that means BLOCK_IN and BLOCK_OUT would be 16. Therefore, in a GEMM instruction, it would perform 16x16 fused-multiply-add (MAC), that is 256 MACs in a single GEMM instruction. In my calculation, 256 MACs * 0.142 GHz = 36.352 GOps Note that, in current Chisel implement, a single GEMM would take 4 cycles/stages to complete. We might see a performance regression on that. --- [Visit Topic](https://discuss.tvm.apache.org/t/vta-how-can-i-calculate-vta-gops/8192/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/fe7f0ce258295c66686dff7ebc1d808c686c836647c64f267f668f8819ca201d).