[Apache TVM Discuss] [Questions] Why are neither 256bit SIMD vectors nor FMAD instructions used by autotuning?

Fabian Schuetze via Apache TVM Discuss Tue, 24 Jun 2025 10:43:01 -0700


Thanks for the wonderful library. It's a pleasure to work with it.


I am a bit puzzled by the generated code for a matrix multiplication. I see 
neither 256bit SIMD vectors nor FMA instructions although I specify the target 
as: `llvm -mcpu=alderlake -mattr=+avx2 -num-cores=16`.  What could be the 
reason?

My code is:
```
import tvm
from tvm.script import tir as T
import numpy as np
from tvm import meta_schedule as ms


@tvm.script.ir_module
class MyModule:
    @T.prim_func
    def main(
        A: T.Buffer[(1024, 1024), "float32"],
        B: T.Buffer[(1024, 1024), "float32"],
        C: T.Buffer[(1024, 1024), "float32"],
    ):
        T.func_attr({"global_symbol": "main", "tir.noalias": True})
        for i, j, k in T.grid(1024, 1024, 1024):
            with T.block("C"):
                vi, vj, vk = T.axis.remap("SSR", [i, j, k])
                with T.init():
                    C[vi, vj] = 0.0
                C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vk, vj]


dtype = "float32"
a_np = np.random.rand(1024, 1024).astype(dtype)
b_np = np.random.rand(1024, 1024).astype(dtype)

a_nd = tvm.nd.array(a_np)
b_nd = tvm.nd.array(b_np)
c_nd = tvm.nd.empty((1024, 1024), dtype="float32")

target = "llvm -mcpu=alderlake -mattr=+avx2 -num-cores=16"
database = ms.tune_tir(
    mod=MyModule,
    target=target,
    max_trials_global=64,
    num_trials_per_iter=64,
    work_dir="./tune_tmp",
)


sch_tuned = ms.tir_integration.compile_tir(database, MyModule, target=target)
print(sch_tuned.mod.script())

lib = tvm.build(sch_tuned.mod, target="llvm")
with open('/tmp/my_module.S', 'w') as f:
    f.write(lib.get_source('asm'))
```
Looking at the assembly, I see 128 vectors being used, and `mulps` and `addps` 
instructions. What can I do to improve codegen?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/why-are-neither-256bit-simd-vectors-nor-fmad-instructions-used-by-autotuning/18429/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/c9a5c3277e47abf17c7bd5561484ebe076b54e7afe586c11eaa319ee087279cc).

[Apache TVM Discuss] [Questions] Why are neither 256bit SIMD vectors nor FMAD instructions used by autotuning?

Reply via email to