In my understanding, autotvm implements operators based on opreation and 
schedule defination.
 For example, there are pack and no pack implementations in x86/dense.py. 
In the process of autotuning, the implementation-defined configs are fed into 
the actual measurement, but No such correctness verification similar to the 
tvm.testing.assert_allclose() interface was found in the entire tuning process.
So I completely extracted the pack and nopack in x86/dense.py and checked the 
result of tvm.testing.assert_allclose(), and found:
1. If the segmented axis is implemented in opreation, the tile size must be 
divisible, otherwise the calculation result cannot be guaranteed to be correct;
2. Other axes that are not defined in the opreation in advance can be divisible 
by not necessarily;
3. Next, I observed the lower difference between them, and found that the code 
generated with the tile size that cannot be divisible in 1 is wrong;
4. So I think this should be a code generation bug.


For example, I test x86/dense.py(nopack implement)
M N K = 1 1000 512 and Tm Tn Tk = 1 100 10
```
produce compute {
  parallel (y.outer.x.outer.fused, 0, 10) {
    produce compute {
      for (z.y.fused.init, 0, 100) {
        compute[ramp(((y.outer.x.outer.fused*1000) + (z.y.fused.init*10)), 1, 
10)] = x10(0f)
      }
      for (k, 0, 51) {
        for (z.y.fused, 0, 100) {
          compute[ramp(((y.outer.x.outer.fused*1000) + (z.y.fused*10)), 1, 10)] 
= (compute[ramp(((y.outer.x.outer.fused*1000) + (z.y.fused*10)), 1, 10)] + 
(data[ramp((k*10), 1, 10)]*weight[ramp((((y.outer.x.outer.fused*51200) + 
(z.y.fused*512)) + (k*10)), 1, 10)]))
        }
      }
    }
    for (x.inner, 0, 100) {
      compute[((y.outer.x.outer.fused*100) + x.inner)] = 0f
      for (kk, 0, 10) {
        compute[((y.outer.x.outer.fused*100) + x.inner)] = 
(compute[((y.outer.x.outer.fused*100) + x.inner)] + 
compute[(((y.outer.x.outer.fused*1000) + (x.inner*10)) + kk)])
      }
    }
  }
}

Traceback (most recent call last):

  File "OpsGemm/gemm_v3_scheduling.py", line 388, in <module>
    buildandevaluation(s, data, weight, out, a, bt, ct, ctx, ct_np)

  File "OpsGemm/gemm_v3_scheduling.py", line 44, in buildandevaluation
    tvm.testing.assert_allclose(c.asnumpy(), c_np, rtol=1e-5)

  File "/root/tvm/python/tvm/testing.py", line 29, in assert_allclose
    np.testing.assert_allclose(actual, desired, rtol=rtol, atol=atol, 
verbose=True)

  File 
"/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 
1452, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)

  File 
"/anaconda3/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 
789, in assert_array_compare
    raise AssertionError(msg)

AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-07

(mismatch 100.0%)
 x: array([128.66817 , 119.130806, 129.70555 , 126.0419  , 126.232285,
       129.39488 , 128.2362  , 124.842926, 128.9357  , 126.89033 ,
       132.58101 , 128.5313  , 129.82468 , 129.89973 , 125.16623 ,...
 y: array([128.91081 , 119.22253 , 130.08371 , 126.27693 , 126.305466,
       129.55656 , 128.48164 , 124.97862 , 129.08452 , 127.26959 ,
       132.87903 , 128.80084 , 129.95262 , 130.14275 , 125.569626,...

```
 if M N K = 1 1000 512 and Tm Tn Tk = 1 100 16
```
produce compute {
  parallel (y.outer.x.outer.fused, 0, 10) {
    produce compute {
      for (z.y.fused.init, 0, 100) {
        compute[ramp(((y.outer.x.outer.fused*1600) + (z.y.fused.init*16)), 1, 
16)] = x16(0f)
      }
      for (k, 0, 32) {
        for (z.y.fused, 0, 100) {
          compute[ramp(((y.outer.x.outer.fused*1600) + (z.y.fused*16)), 1, 16)] 
= (compute[ramp(((y.outer.x.outer.fused*1600) + (z.y.fused*16)), 1, 16)] + 
(data[ramp((k*16), 1, 16)]*weight[ramp((((y.outer.x.outer.fused*51200) + 
(z.y.fused*512)) + (k*16)), 1, 16)]))
        }
      }
    }
    for (x.inner, 0, 100) {
      compute[((y.outer.x.outer.fused*100) + x.inner)] = 0f
      for (kk, 0, 16) {
        compute[((y.outer.x.outer.fused*100) + x.inner)] = 
(compute[((y.outer.x.outer.fused*100) + x.inner)] + 
compute[(((y.outer.x.outer.fused*1600) + (x.inner*16)) + kk)])
      }
    }
  }
}

time: 0.000062

```

 if M N K = 1 1000 512 and Tm Tn Tk = 1 23 16
```
produce compute {
  parallel (y.outer.x.outer.fused, 0, 44) {
    produce compute {
      for (z.y.fused.init, 0, 23) {
        compute[ramp(((y.outer.x.outer.fused*368) + (z.y.fused.init*16)), 1, 
16)] = x16(0f)
      }
      for (k, 0, 32) {
        for (z.y.fused, 0, 23) {
          if (likely((((y.outer.x.outer.fused*23) + z.y.fused) < 1000))) {
            compute[ramp(((y.outer.x.outer.fused*368) + (z.y.fused*16)), 1, 
16)] = (compute[ramp(((y.outer.x.outer.fused*368) + (z.y.fused*16)), 1, 16)] + 
(data[ramp((k*16), 1, 16)]*weight[ramp((((y.outer.x.outer.fused*11776) + 
(z.y.fused*512)) + (k*16)), 1, 16)]))
          }
        }
      }
    }
    for (x.inner, 0, 23) {
      if (likely((((y.outer.x.outer.fused*23) + x.inner) < 1000))) {
        compute[((y.outer.x.outer.fused*23) + x.inner)] = 0f
      }
      for (kk, 0, 16) {
        if (likely((((y.outer.x.outer.fused*23) + x.inner) < 1000))) {
          if (likely((((y.outer.x.outer.fused*23) + x.inner) < 1000))) {
            compute[((y.outer.x.outer.fused*23) + x.inner)] = 
(compute[((y.outer.x.outer.fused*23) + x.inner)] + 
compute[(((y.outer.x.outer.fused*368) + (x.inner*16)) + kk)])
          }
        }
      }
    }
  }
}

time: 0.000958

```
(ps:In order to observe the lower code, I closed all unroll operations)

Is this what I missed? And how does autotvm ensure the correctness of the 
results in the autotuning process?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/how-to-verify-the-correctness-of-different-schedule-and-tile-size-in-autotvm/8143/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/4d37445b7b155ceff55f827f2db7b69d67271750c46d9ec080008a827d1edac7).

Reply via email to