I spent a lot of time optimizing the sort/argsort kernel for GPUs, we get
pretty good performance on GPUs from multiple vendors that competes with those
vendor's hand tuned libraries.
If these TIR kernels are well optimized, they shouldn't end up being the
bottleneck in models.
---
[Visi
TE is a limited declarative programming model, it's not possible to write
operations that do data-dependent indexing in TE.
Anything that's sort/scatter related needs to be written directly in the more
imperative TIR.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/autoscheduler-do-we
I've been looking at the PR and some of the discussion, and I thought I'd bring
my thoughts back jto this RFC, it seems like a better place for broader design
thoughts.
First, thanks for the RFC, @ziheng. There is definitely wy too much
boilerplate in TVM right now, and finding ways to st
Agreed, 100%. I've been using a `yapf` configuration file I got from @comaniac,
but I'm happy to standardize.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-introduce-automatic-formatting-of-python-code/7843/2)
to respond.
You are receiving this because you enabled mailing list m