[Apache TVM Discuss] [Development/pre-RFC] [RFC][TOP][BYOC] Intel LIBXSMM Integration

2021-12-18 Thread Zhuwenxi via Apache TVM Discuss
We're actually comparing with MKL, rather than oneDNN. The MKL version we used is from latest oneAPI package. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-top-byoc-intel-libxsmm-integration/11688/14) to respond. You are receiving this because you enabled mailing list mode. To

[Apache TVM Discuss] [Development/pre-RFC] [RFC][Tensorize] Add "reduce_last" property for TensorIntrin to support activation fusion

2021-11-03 Thread Zhuwenxi via Apache TVM Discuss
**Motivation:** Existing `TensorIntrin` has "reduce_init" and "reduce_update" to support the tensorization of reduce_axis == 0 and reduce_axis > 0 specifically, which is already well suited for many cases. However, the support for activation fusion is still missing, because it lacks of facili

[Apache TVM Discuss] [Development] [TOPI][CUDA] "scatter_nd"' has a very poor performance on CUDA backend (>1000x slower than hand-written cuda code)

2021-07-08 Thread Zhuwenxi via Apache TVM Discuss
# Problem Statement Existing cuda "[scatter_nd](https://github.com/apache/tvm/blob/main/python/tvm/topi/cuda/scatter.py#L726)" op (which written with TIR) has 2 problems, which block I from deploying it to real-world GPU devices: 1. There is an integer overflow bug in it's TIR implementation,

[Apache TVM Discuss] [Development] [Tensorize] Support "reduce_last" for TensorIntrin

2021-07-07 Thread Zhuwenxi via Apache TVM Discuss
@jwfromm @Huyuwei @yzhliu @FrozenGene Welcome for comments! --- [Visit Topic](https://discuss.tvm.apache.org/t/tensorize-support-reduce-last-for-tensorintrin/10392/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here

[Apache TVM Discuss] [Development] [Tensorize] Support "reduce_last" for TensorIntrin

2021-07-05 Thread Zhuwenxi via Apache TVM Discuss
@tqchen since this feature would change tensorize APIs, I suppose I shouldn't send a PR directly. Could you bridge me someone who's interested, to help review the proposal? --- [Visit Topic](https://discuss.tvm.apache.org/t/tensorize-support-reduce-last-for-tensorintrin/10392/2) to respo

[Apache TVM Discuss] [Development] [Tensorize] Support "reduce_last" for TensorIntrin

2021-07-04 Thread Zhuwenxi via Apache TVM Discuss
Hi, All. Existing `TensorIntrin` support "reduce_init" and "reduce_body" which could cover most cases, which is very good. However, when I was trying to implement a tensor intrinsic like "matmul_with_relu", current TensorIntrin is not sufficient to describe it. The TIR I'm looking for is som

[Apache TVM Discuss] [Development] [AutoScheduler] Do we have plan to support auto schedule ExternOp?

2021-07-01 Thread Zhuwenxi via Apache TVM Discuss
@mbrookhart Make sense, thank you! --- [Visit Topic](https://discuss.tvm.apache.org/t/autoscheduler-do-we-have-plan-to-support-auto-schedule-externop/10346/9) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https:/

[Apache TVM Discuss] [Development] [AutoScheduler] Do we have plan to support auto schedule ExternOp?

2021-06-30 Thread Zhuwenxi via Apache TVM Discuss
Thank you, @comaniac . @jroesch @mbrookhart @ritwikdas54 I noticed you've participated in implementing those ops above (by git blame :stuck_out_tongue_winking_eye:), could you explain a little bit about why use TIR instead of TE? --- [Visit Topic](https://discuss.tvm.apache.org/t/autosc

[Apache TVM Discuss] [Development] [AutoScheduler] Do we have plan to support auto schedule ExternOp?

2021-06-30 Thread Zhuwenxi via Apache TVM Discuss
@comaniac By the way, do you know the specific reason why ops like "scatter" choose to implement use TIR instead of TE? According to my quick statistic, there are at least 13 ops in relay uses TIR as their implementation: 1. argwhere 2. non_max_suppression 3. scanop 4. scatter 5. scatter_nd

[Apache TVM Discuss] [Development] [AutoScheduler] Do we have plan to support auto schedule ExternOp?

2021-06-29 Thread Zhuwenxi via Apache TVM Discuss
Thank you @comaniac , really appreciate! The reason "...because they are written in TIR instead of TE" does make sense to me. And I agree for the case "scatter", the improvement would be small. I guess Relay's default schedule is probably good enough for my case. --- [Visit Topic](https:

[Apache TVM Discuss] [Development] [AutoScheduler] Do we have plan to support auto schedule ExternOp?

2021-06-29 Thread Zhuwenxi via Apache TVM Discuss
Hi All. I just noticed that AutoScheduler lacks support for ExternOp. Currently AutoScheduler supports ComputeOp only. I understand that it is non-trial to auto schedule a op with external function calls, however there are a bunch of topi ops whose algorithm are purely written with tensor ex