We have found a simple workaround in the case of concatenating 2D tensors (currently our most common use case). By unrolling the last axis, llvm is smart enough to generate vectorized code and the performance is even better than c code in caffe2. For benchmark numbers, see https://gist.github.com/ajtulloch/d3b47517721c71c09375fd76f387e718 from @ajtulloch.
--- [Visit Topic](https://discuss.tvm.ai/t/explore-optimizations-for-concat/2435/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/7ae4490f47fc913bda389acda4e01a203491e081ed484ddfe429e79b48fea20c). Tianqi Chen, UW, Seattle, WA, 98105, United States http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=BojjmDIBV4i0i1amSwIFLQ2