ysh329 created an issue (apache/tvm#17860)

# Introduction

The TVM community has worked since the last release to deliver the following 
new exciting improvements!

The main tags are below (**bold text is with lots of progress**): Relax 
(especial PyTorch frontend), CUDA etc.

Please visit the full listing of commits for a complete view: 
[v0.20.dev0...v0.20.0.rc0](https://github.com/apache/tvm/compare/v0.20.dev0...v0.20.0.rc0).

### Community

None.

### RFCs

None.

### Adreno
 * [#17608](https://github.com/apache/tvm/pull/17608) - [WINDOWS] Windows build 
dependencies for Adreno target

### BugFix
 * [#17761](https://github.com/apache/tvm/pull/17761) - [FIX][RELAX] fix fusion 
of transpose + matmul when constant  weight 
 * [#17762](https://github.com/apache/tvm/pull/17762) - [Fix] Fix OpenCL header 
in attention utils
 * [#17711](https://github.com/apache/tvm/pull/17711) - [Fix][dlight] add an 
explicit reduction loop check in Reduce
 * [#17697](https://github.com/apache/tvm/pull/17697) - [Fix] Include 
`<chrono>` for `std::chrono`
 * [#17677](https://github.com/apache/tvm/pull/17677) - Declare build backend 
for python package
 * [#17598](https://github.com/apache/tvm/pull/17598) - [TIR][FIX] update 
FlopEstimator to include missing nodes
 * [#17601](https://github.com/apache/tvm/pull/17601) - [Flashinfer][Fix] fix 
missing args in flashinfer test
 * [#17607](https://github.com/apache/tvm/pull/17607) - [FIX][TVMC] Fix the 
mixed precision conversion pipeline

### CI
 * [#17687](https://github.com/apache/tvm/pull/17687) - Update images to 
20250226-223225-63bc315f
 * [#17680](https://github.com/apache/tvm/pull/17680) - update images to 
20250225-035137-aeadc31c
 * [#17675](https://github.com/apache/tvm/pull/17675) - [skip ci]Update github 
tvmbot
 * [#17635](https://github.com/apache/tvm/pull/17635) - Cleanup legacy files
 * [#17634](https://github.com/apache/tvm/pull/17634) - [skip ci]Improve build 
time
 * [#17629](https://github.com/apache/tvm/pull/17629) - [skip ci]Robustify CI 
for SPOT failure
 * [#17620](https://github.com/apache/tvm/pull/17620) - Unpin pytest-profiling
 * [#17621](https://github.com/apache/tvm/pull/17621) - [skip ci] Remove legacy 
CI runners protection
 * [#17619](https://github.com/apache/tvm/pull/17619) - [Refactor]Remove legacy 
frontend tests

### Dlight
 * [#17754](https://github.com/apache/tvm/pull/17754) - Fix general reduction 
rule to support non-last reduction axis
 * [#17663](https://github.com/apache/tvm/pull/17663) - [CPU] Add CPU Backend 
Support for GEMV Optimization

### Docker
 * [#17691](https://github.com/apache/tvm/pull/17691) - Fix ml_dtypes downgrade 
issue introduced by TensorFlow
 * [#17686](https://github.com/apache/tvm/pull/17686) - Update ml_dtypes to 
0.5.1+
 * [#17676](https://github.com/apache/tvm/pull/17676) - Use Torch GPU on gpu 
device
 * [#17648](https://github.com/apache/tvm/pull/17648) - Tensorflow (aka TFLite) 
upgrade to 2.18.0
 * [#17643](https://github.com/apache/tvm/pull/17643) - Update ml_dtypes version
 * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update 
ml_dtypes version
 * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update 
ml_dtypes version
 * [#17617](https://github.com/apache/tvm/pull/17617) - Tensorflow upgrade to 
2.18.0

### Docs
 * [#17650](https://github.com/apache/tvm/pull/17650) - Update README
 * [#17611](https://github.com/apache/tvm/pull/17611) - Download 3rd party 
embeds to local files
 * [#17604](https://github.com/apache/tvm/pull/17604) - Update README

### MetaSchedule
 * [#17104](https://github.com/apache/tvm/pull/17104) - Adding post 
optimization in MetaSchedule to Improve Scheduling

### OpenCL & CLML
 * [#17571](https://github.com/apache/tvm/pull/17571) - [OPENCL][TEXTURE] 
Improved texture memory planning

### Relax
 * [#17814](https://github.com/apache/tvm/pull/17814) - [PyTorch] Add 
stack.default and sum.default to exported programs translator
 * [#17820](https://github.com/apache/tvm/pull/17820) - [PyTorch] Add support 
for broadcast_to, narrow ops
 * [#17822](https://github.com/apache/tvm/pull/17822) - [PyTorch] Cleanup tests 
for ExportedProgram frontend
 * [#17806](https://github.com/apache/tvm/pull/17806) - [PyTorch] Add Softplus 
Op Support for Exported Program and FX graph
 * [#17817](https://github.com/apache/tvm/pull/17817) - [PyTorch] Support 
dynamic shapes in ExportedProgram frontend
 * [#17813](https://github.com/apache/tvm/pull/17813) - [PyTorch] Improve 
ExportedProgram frontend by supporting `unflatten.int`, `hardtanh_.default`, 
`dropout_.default`, `silu_.default`, `add_.Tensor` and `relu_.default`
 * [#17812](https://github.com/apache/tvm/pull/17812) - [PyTorch] Support 
argsort, topk ops for ExportedProgram importer
 * [#17810](https://github.com/apache/tvm/pull/17810) - [PyTorch] Add support 
for argsort, sort, topk ops
 * [#17809](https://github.com/apache/tvm/pull/17809) - [PyTorch] Delete 
duplicate converter function `_to`
 * [#17807](https://github.com/apache/tvm/pull/17807) - [PyTorch] Fix torch 2.6 
compatibility issues
 * [#17797](https://github.com/apache/tvm/pull/17797) - [Pytorch] Update SELU 
Implementation Using Decomposed Core-Level Ops
 * [#17802](https://github.com/apache/tvm/pull/17802) - [Pytorch] support for 
arange in exported programs translator
 * [#17801](https://github.com/apache/tvm/pull/17801) - [PyTorch] Support 
where, cumprod and reciprocal ops for ExportedProgram importer
 * [#17790](https://github.com/apache/tvm/pull/17790) - [PyTorch] Add support 
for index_select
 * [#17786](https://github.com/apache/tvm/pull/17786) - [PyTorch] Support 
softshrink op for ExportedProgram
 * [#17788](https://github.com/apache/tvm/pull/17788) - [PyTorch] Add support 
for where, cumprod and reciprocal ops
 * [#17785](https://github.com/apache/tvm/pull/17785) - [PyTorch] Support prod, 
std and var ops for ExportedProgram importer
 * [#17778](https://github.com/apache/tvm/pull/17778) - [PyTorch] Support log2, 
log10 and log1p ops for ExportedProgram importer
 * [#17772](https://github.com/apache/tvm/pull/17772) - [PyTorch] Add support 
for prod, std and var ops
 * [#17766](https://github.com/apache/tvm/pull/17766) - [PyTorch] Add support 
for log2, log10 and log1p ops
 * [#17760](https://github.com/apache/tvm/pull/17760) - [PyTorch] Add support 
for lerp, select and clone ops
 * [#17751](https://github.com/apache/tvm/pull/17751) - [PyTorch] Support 
one_hot, empty_like ops for ExportedProgram importer
 * [#17747](https://github.com/apache/tvm/pull/17747) - [PyTorch] Support flip, 
gather, take ops for ExportedProgram importer
 * [#17738](https://github.com/apache/tvm/pull/17738) - [PyTorch] Support elu, 
celu, selu ops for ExportedProgram importer
 * [#17726](https://github.com/apache/tvm/pull/17726) - [PyTorch] Add support 
for numel, empty_like and one_hot ops
 * [#17707](https://github.com/apache/tvm/pull/17707) - [PyTorch] Add support 
for gather, flip and take ops
 * [#17702](https://github.com/apache/tvm/pull/17702) - [PyTorch] Add support 
for celu, selu, is_floating_point ops
 * [#17694](https://github.com/apache/tvm/pull/17694) - [PyTorch] Add support 
for elu, hardtanh ops
 * [#17689](https://github.com/apache/tvm/pull/17689) - [PyTorch] Support 
several binary ops for ExportedProgram importer
 * [#17672](https://github.com/apache/tvm/pull/17672) - [PyTorch] Refactor 
binary ops tests
 * [#17679](https://github.com/apache/tvm/pull/17679) - [PyTorch] Support 
several unary ops for ExportedProgram importer
 * [#17668](https://github.com/apache/tvm/pull/17668) - [PyTorch] Add support 
for and_, lshift, min, or_, rshift, xor ops
 * [#17664](https://github.com/apache/tvm/pull/17664) - [PyTorch] Add support 
for ge, gt, le, mod, ne ops
 * [#17659](https://github.com/apache/tvm/pull/17659) - [PyTorch] Add support 
for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
 * [#17622](https://github.com/apache/tvm/pull/17622) - [PyTorch] Add support 
for abs, ceil, erf, floor, log ops and refactor unary tests
 * [#17566](https://github.com/apache/tvm/pull/17566) - [ONNX] Add prim 
experssion support to Neg converter and update Arange converter to use 
relax.op.arange
 * [#17642](https://github.com/apache/tvm/pull/17642) - [ONNX]replace 
topi.split with relax.op.split in the onnx frontend
 * [#17674](https://github.com/apache/tvm/pull/17674) - [KVCache] PagedKVCache 
refactor, FlashInfer JIT and MLA integration
 * [#17618](https://github.com/apache/tvm/pull/17618) - [KVCache] TIR attention 
kernel support for MLA
 * [#17615](https://github.com/apache/tvm/pull/17615) - [KVCache] Add KV Cache 
for CPU Runtime
 * [#17616](https://github.com/apache/tvm/pull/17616) - [Runtime][KVCache] 
Initial interface setup for MLA
 * [#17782](https://github.com/apache/tvm/pull/17782) - [Frontend] Support 
max/min in frontend op interface
 * [#17758](https://github.com/apache/tvm/pull/17758) - Allow ingesting 
tensor.chunk() from exported torch program
 * [#17781](https://github.com/apache/tvm/pull/17781) - Enable bfloat16 for 
softmax struct-info inference
 * [#17752](https://github.com/apache/tvm/pull/17752) - Batch norm correctness 
on eval mode
 * [#17774](https://github.com/apache/tvm/pull/17774) - check for tensor_meta 
in exported_program_translator
 * [#17757](https://github.com/apache/tvm/pull/17757) - Tensor.split with 
uneven tensors
 * [#17749](https://github.com/apache/tvm/pull/17749) - Move TIR backend to 
gpu_generic
 * [#17725](https://github.com/apache/tvm/pull/17725) - Ingest Tensor.clamp 
from torch export
 * [#17724](https://github.com/apache/tvm/pull/17724) - Add support to ingest 
Tensor.expand_as()
 * [#17723](https://github.com/apache/tvm/pull/17723) - Add torch exported 
program ingestion capability for Tensor.detach(), Tensor.copy_, and 
aten.lift_fresh_copy
 * [#17721](https://github.com/apache/tvm/pull/17721) - Allow ingesting 
Upsample module from torch.export either using Size or Scale Factor argument
 * [#17722](https://github.com/apache/tvm/pull/17722) - Allow ingesting 
vector_norm from torch.export
 * [#17728](https://github.com/apache/tvm/pull/17728) - ingest 
Tensor.contiguous from torch export
 * [#17700](https://github.com/apache/tvm/pull/17700) - Fix tree attention for 
Qwen2-1.5 models
 * [#17682](https://github.com/apache/tvm/pull/17682) - Add support for func 
attr inheritance in SplitLayoutRewritePreproc
 * [#17654](https://github.com/apache/tvm/pull/17654) - [BYOC] OpenCLML offload 
support for Relax
 * [#17633](https://github.com/apache/tvm/pull/17633) - Pipeline file 
reorganization
 * [#17626](https://github.com/apache/tvm/pull/17626) - Initial setup of relax 
backend pipeline
 * [#17568](https://github.com/apache/tvm/pull/17568) - [PASS] Convert layout 
pass and ops enhanced to support sub indexing

### Runtime
 * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling 
options enabled for CLML
 * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling 
options enabled for CLML
 * [#17570](https://github.com/apache/tvm/pull/17570) - [OPENCL] Bugfix

### TIR
 * [#17799](https://github.com/apache/tvm/pull/17799) - Fix reduce buffer 
allocation position
 * [#17783](https://github.com/apache/tvm/pull/17783) - [REFACTOR]remove legacy 
tir::any
 * [#17706](https://github.com/apache/tvm/pull/17706) - Minor fix for default 
GPU schedule
 * [#17579](https://github.com/apache/tvm/pull/17579) - [SoftwarePipeline] 
Ensure pipeline epilogue and prologue do not overlap
 * [#17584](https://github.com/apache/tvm/pull/17584) - [LoopPartition] 
enforcement on loop partition control

### TVMC
 * [#17606](https://github.com/apache/tvm/pull/17606) - Bug fix

### cuda & cutlass & tensorrt
 * [#17789](https://github.com/apache/tvm/pull/17789) - [CUTLASS] Add blockwise 
scale gemm/bmm kernels
 * [#17741](https://github.com/apache/tvm/pull/17741) - [Codegen][CUDA] Fix 
codegen of cast among vector bfloat16, fp8 and fp4
 * [#17708](https://github.com/apache/tvm/pull/17708) - [CUDA] FP4 cast and 
reinterpret support
 * [#17639](https://github.com/apache/tvm/pull/17639) - [CUDA] Remove htanh 
from unsupported math ops for CUDA 12.8
 * [#16950](https://github.com/apache/tvm/pull/16950) - [Codegen, CUDA] Add FP8 
Tensor Core Codegen

### web
 * [#17695](https://github.com/apache/tvm/pull/17695) - [WASM] Update wasm 
include in accordance to kv cache revamp

### Misc
 * [#17796](https://github.com/apache/tvm/pull/17796) - [Cublas] Added support 
for bfloat16 while dispatching to cublas kernels
 * [#17763](https://github.com/apache/tvm/pull/17763) - [Flashinfer] Added jit 
flow for sampling kernel
 * [#17811](https://github.com/apache/tvm/pull/17811) - [NFC] Fix `explict` typo
 * [#17780](https://github.com/apache/tvm/pull/17780) - [3rdparty] Enable 
bfloat16 for custom allreduce kernel
 * [#17784](https://github.com/apache/tvm/pull/17784) - [REFACTOR] Phase out 
StackVM
 * [#17750](https://github.com/apache/tvm/pull/17750) - BugFix: Relax comment
 * [#17748](https://github.com/apache/tvm/pull/17748) - [Codegen] Support 
codegen for vectorized tir.ShuffleNode
 * [#17743](https://github.com/apache/tvm/pull/17743) - Fix: Change variable i 
to x in split operation in cross_compilation_and_rpc.py
 * [#17730](https://github.com/apache/tvm/pull/17730) - [Attention] Added 
caching for flashinfer binaries during JIT
 * [#17733](https://github.com/apache/tvm/pull/17733) - [Refactor] Clean up 
Relay references in the codebase
 * [#17739](https://github.com/apache/tvm/pull/17739) - [BF16] Support 
ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes
 * [#17734](https://github.com/apache/tvm/pull/17734) - Remove Google Analytics
 * [#17731](https://github.com/apache/tvm/pull/17731) - [IR] Compact Functor 
vtable
 * [#17736](https://github.com/apache/tvm/pull/17736) - Fix typos in comments 
and strings
 * [#17670](https://github.com/apache/tvm/pull/17670) - [DataType] BF16 Support
 * [#17727](https://github.com/apache/tvm/pull/17727) - [FFI] Fix dynamic FFI 
index to ensure compatibility
 * [#17718](https://github.com/apache/tvm/pull/17718) - [Refactor] Migrate 
build API to `tvm.compile`
 * [#17714](https://github.com/apache/tvm/pull/17714) - [FFI] Phase out ctypes 
fallback in favor of cython
 * [#17716](https://github.com/apache/tvm/pull/17716) - Fix the 
get_target_compute_version for sm >= 100
 * [#17710](https://github.com/apache/tvm/pull/17710) - [Refactor] Introduce 
base Executable class and `tvm.compile` interface
 * [#17713](https://github.com/apache/tvm/pull/17713) - [REFACTOR] Cleanup 
legacy relay runtime data structures
 * [#17712](https://github.com/apache/tvm/pull/17712) - [DataType] Rename FP8 
dtypes to standard names
 * [#17703](https://github.com/apache/tvm/pull/17703) - Fix typos in multiple 
files
 * [#17693](https://github.com/apache/tvm/pull/17693) - updated the assert in 
BindParams to allow tvm.relax.Constant
 * [#17701](https://github.com/apache/tvm/pull/17701) - [Refactor] Remove 
legacy TE schedule tag
 * [#17683](https://github.com/apache/tvm/pull/17683) - [MSC] Remove relay
 * [#17688](https://github.com/apache/tvm/pull/17688) - Fix 
relax.ccl.scatter_from_worker0 assert
 * [#17630](https://github.com/apache/tvm/pull/17630) - [Codegen] FP4 support
 * [#17685](https://github.com/apache/tvm/pull/17685) - [REFACTOR] Cleanup 
legacy TE-based passes
 * [#17681](https://github.com/apache/tvm/pull/17681) - [REFACTOR] Followup 
cleanup of relay phase out
 * [#17678](https://github.com/apache/tvm/pull/17678) - Bump 
3rdparty/cutlass_fpA_intB_gemm
 * [#17669](https://github.com/apache/tvm/pull/17669) - [REFACTOR] Allow target 
dependent default tir pipeline dispatch in tir.build()
 * [#17665](https://github.com/apache/tvm/pull/17665) - [REFACTOR] move build 
flow from C++ to Python
 * [#17624](https://github.com/apache/tvm/pull/17624) - Added support for 
normal MLA kernel
 * [#17641](https://github.com/apache/tvm/pull/17641) - Pick up vector length 
from 'zvlXXXb' (RVV) mattr for riscv
 * [#17666](https://github.com/apache/tvm/pull/17666) - [Refactor] Improve 
TargetHasSVE function with optional target handling
 * [#17661](https://github.com/apache/tvm/pull/17661) - [Refactor] Phrase out 
python dependency `decorator`
 * [#17662](https://github.com/apache/tvm/pull/17662) - [REFACTOR] Phase out 
te.Schedule c++ components
 * [#17660](https://github.com/apache/tvm/pull/17660) - [REFACTOR] Phase out 
relay c++ components 
 * [#17655](https://github.com/apache/tvm/pull/17655) - Upgrading onnx and 
onnxrt verions
 * [#17657](https://github.com/apache/tvm/pull/17657) - Update argument order 
for relax.op.pad to make it round-trippable
 * [#17658](https://github.com/apache/tvm/pull/17658) - [REFACTOR] Phase out 
te.schedule python components 
 * [#17653](https://github.com/apache/tvm/pull/17653) - Update images to 
20250214-034537-bd1411f8
 * [#17656](https://github.com/apache/tvm/pull/17656) - [REFACTOR] Phase out 
relay python components
 * [#17649](https://github.com/apache/tvm/pull/17649) - [Refactor] Phase out 
python dependency attrs
 * [#17644](https://github.com/apache/tvm/pull/17644) - Bump rollup from 2.79.1 
to 2.79.2 in /web
 * [#17637](https://github.com/apache/tvm/pull/17637) - [PYTHON] Build cython 
by default
 * [#17631](https://github.com/apache/tvm/pull/17631) - Handle vector width 
(VLEN) for RISCV arches
 * [#17613](https://github.com/apache/tvm/pull/17613) - Bug Fix: Removed unused 
code
 * [#17585](https://github.com/apache/tvm/pull/17585) - [Relay]Disable 
InferType if it was done and no changes after previous pass
 * [#17605](https://github.com/apache/tvm/pull/17605) - [Refactor] Phase out 
legacy example apps
 * [#17603](https://github.com/apache/tvm/pull/17603) - [Refactor] Phase out 
legacy docs
 * [#17513](https://github.com/apache/tvm/pull/17513) - [GRAPH RT] Additional 
API support




-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/17860
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/issues/17...@github.com>

Reply via email to