We're actually comparing with MKL, rather than oneDNN. The MKL version we used
is from latest oneAPI package.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-top-byoc-intel-libxsmm-integration/11688/14)
to respond.
You are receiving this because you enabled mailing list mode.
To
**Motivation:**
Existing `TensorIntrin` has "reduce_init" and "reduce_update" to support the
tensorization of reduce_axis == 0 and reduce_axis > 0 specifically, which is
already well suited for many cases. However, the support for activation fusion
is still missing, because it lacks of facili
# Problem Statement
Existing cuda
"[scatter_nd](https://github.com/apache/tvm/blob/main/python/tvm/topi/cuda/scatter.py#L726)"
op (which written with TIR) has 2 problems, which block I from deploying it to
real-world GPU devices:
1. There is an integer overflow bug in it's TIR implementation,
@jwfromm @Huyuwei @yzhliu @FrozenGene Welcome for comments!
---
[Visit
Topic](https://discuss.tvm.apache.org/t/tensorize-support-reduce-last-for-tensorintrin/10392/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here
@tqchen since this feature would change tensorize APIs, I suppose I shouldn't
send a PR directly. Could you bridge me someone who's interested, to help
review the proposal?
---
[Visit
Topic](https://discuss.tvm.apache.org/t/tensorize-support-reduce-last-for-tensorintrin/10392/2)
to respo
Hi, All.
Existing `TensorIntrin` support "reduce_init" and "reduce_body" which could
cover most cases, which is very good. However, when I was trying to implement a
tensor intrinsic like "matmul_with_relu", current TensorIntrin is not
sufficient to describe it.
The TIR I'm looking for is som
@mbrookhart Make sense, thank you!
---
[Visit
Topic](https://discuss.tvm.apache.org/t/autoscheduler-do-we-have-plan-to-support-auto-schedule-externop/10346/9)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https:/
Thank you, @comaniac .
@jroesch @mbrookhart @ritwikdas54 I noticed you've participated in implementing
those ops above (by git blame :stuck_out_tongue_winking_eye:), could you
explain a little bit about why use TIR instead of TE?
---
[Visit
Topic](https://discuss.tvm.apache.org/t/autosc
@comaniac By the way, do you know the specific reason why ops like "scatter"
choose to implement use TIR instead of TE?
According to my quick statistic, there are at least 13 ops in relay uses TIR as
their implementation:
1. argwhere
2. non_max_suppression
3. scanop
4. scatter
5. scatter_nd
Thank you @comaniac , really appreciate!
The reason "...because they are written in TIR instead of TE" does make sense
to me. And I agree for the case "scatter", the improvement would be small. I
guess Relay's default schedule is probably good enough for my case.
---
[Visit
Topic](https:
Hi All.
I just noticed that AutoScheduler lacks support for ExternOp. Currently
AutoScheduler supports ComputeOp only.
I understand that it is non-trial to auto schedule a op with external function
calls, however there are a bunch of topi ops whose algorithm are purely written
with tensor ex
11 matches
Mail list logo