[quote="JosseVanDelm, post:1, topic:9335"] ``` Ab = tvm.tir.decl_buffer(a.shape, a.dtype, name="A", offset_factor=1, strides=[2,1]) Bb = tvm.tir.decl_buffer(b.shape, b.dtype, name="B", offset_factor=1, strides=[2,1]) Cb = tvm.tir.decl_buffer(c.shape, c.dtype, name="C", offset_factor=1, strides=[2,1]) ``` [/quote]
Hi, you specified the wrong strides. try: ```python Ab = tvm.tir.decl_buffer(a.shape, a.dtype, name="A", offset_factor=1, strides=[4, 1]) Bb = tvm.tir.decl_buffer(b.shape, b.dtype, name="B", offset_factor=1, strides=[4, 1]) Cb = tvm.tir.decl_buffer(c.shape, c.dtype, name="C", offset_factor=1, strides=[4,1]) ``` The original IR is ```python primfn(A_1: handle, B_1: handle, C_1: handle) -> () attr = {"global_symbol": "main", "tir.noalias": True} buffers = {C: Buffer(C_2: Pointer(float32), float32, [4, 4], []), B: Buffer(B_2: Pointer(float32), float32, [4, 4], []), A: Buffer(A_2: Pointer(float32), float32, [4, 4], [])} buffer_map = {A_1: A, B_1: B, C_1: C} { for (i.outer: int32, 0, 2) { for (j.outer: int32, 0, 2) { for (i.inner: int32, 0, 2) { for (j.inner: int32, 0, 2) { C_2[((((i.outer*8) + (i.inner*4)) + (j.outer*2)) + j.inner)] = ((float32*)A_2[((((i.outer*8) + (i.inner*4)) + (j.outer*2)) + j.inner)] + (float32*)B_2[((((i.outer*8) + (i.inner*4)) + (j.outer*2)) + j.inner)]) } } } } } ``` As you can see, the coeff of i.inner is 4, i.inner*4 is 1. These two-axis are the Tensorize region. They will be verified in the bind buffer step in StorageFlatten Pass. --- [Visit Topic](https://discuss.tvm.apache.org/t/te-tensorize-elementwise-sum/9335/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/51acb41e5a2542204e7523308d7d6000bab44447f6d99a85cc4b06a31b6eeaaa).