We have found a simple workaround in the case of concatenating 2D tensors 
(currently our most common use case). By unrolling the last axis, llvm is smart 
enough to generate vectorized code and the performance is even better than c 
code in caffe2. For benchmark numbers, see 
https://gist.github.com/ajtulloch/d3b47517721c71c09375fd76f387e718 from 
@ajtulloch.





---
[Visit Topic](https://discuss.tvm.ai/t/explore-optimizations-for-concat/2435/9) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/7ae4490f47fc913bda389acda4e01a203491e081ed484ddfe429e79b48fea20c).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=BojjmDIBV4i0i1amSwIFLQ2

Reply via email to