When tensorize some computations, you must write a tensor intrinsic. I realize
that we can just use the call_extern or call_intrin directly from the test
cases. Why we need to tensorize since it will do a lot of verifications between
the intrinsic and body. That causes me some troubles when
I think that's because of the function TransformShape in the file
data_layout.cc. It's used to split shape like from NCHW to NCHWc. But when you
want to convert your output shape from NCHW to NHWC for example, it will cause
error. So you need to add your own shape transform function.
---