I am not super familiar with the Unity direction, but keeping BYOC sounds like
a good idea. I don't know if this is how its supposed to be used, but I am
using it as "catch all" way to extend TVM. I'm currently adding some custom
opencl kernels for depthwise conv2d: the way that I am planning
as long as LLM workloads are still composed tensor programs, then TVM just has
to positiion itself as a more general tensor program compiler moreso than an ML
compiler. The tensor expression and Ansor projects look perfectly suited for
this/
---
[Visit
Topic](https://discuss.tvm.apache.o
Thanks!
I'm not familiar with this project BitBlas. Please correct me if I am wrong: in
the code you showed, the IRModule pass that retrieves the threadblock
dimensions is
[get_annotated_device_mod](https://github.com/microsoft/BitBLAS/blob/2f6d316be9f9d70f2845c2f319ac2f348d0cd6a6/bitblas/uti
@varunnaw Good point, in my project we use this approach to retrieve
attributes, including the dynamic shared memory size and block/grid
information, which might be helpful to you.
https://github.com/microsoft/BitBLAS/blob/main/bitblas/builder/wrapper/tir.py#L64-L80
## Why this is important?
One suggestion that I have for TVM is to add a cleaner exit from the stack.
For example, for opencl/ cuda targets, what do I do if I just want the
generated kernels?
Note: there is a way to print the source for CL, but unfortunately I have not
found a way to get the work group / threadblock s
LLMs are fundamentally transforming the paradigm of ML deployment and
compilation. Simultaneously, the increasing complexity of ML optimization
pipelines has rendered many legacy components inadequate for meeting rapidly
evolving requirements.
On the other hand, the open-source community face
that is right, in such case, we will need to ensure downstream project
structured to depend on the same libtvm. So both projectA, and projectB depends
on the same upstream TVM (via include dependency), but also build new
optimization transformations on-top.
That does mean we need to restructu
@tqchen, thanks! This is exactly what we are expecting. However, last time I
tried to bring my own tuner into `mlc-llm`, I encountered an issue:
```python
import tvm # upstream
relax_mod = relax_transform(relax_mod)
import welder
relax_mod = welder.tune(relax_mod)
# something bad happened
``
Thanks @LeiWang1999 , I think the main goal here would be to ensure that the IR
remain as a common shared parts.
Different projects can have their own defined transformations and leverages the
main code-base. That would enable us to reuse different tuner and
transformations of the IR out of t
Completely agree with these perspectives. Another observation I have is that
projects developed based on TVM are often not straightforward; they typically
require hacking the underlying TVM code. For example, in the Ladder project
(based on Welder), we added support for MFMA and HIP code gener
Over the past year, the community has worked hard to bring in and transition to
a more flexible and productive flow for ML compilers. One lesson we learned is
that it is hard to build a silver bullet for everything. Additionally, given
the amount of time and energy contributed by community vol
11 matches
Mail list logo