# Introduction The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (**bold text is with lots of progress**): Relax (especial PyTorch frontend), CUDA etc. Please visit the full listing of commits for a complete view: [v0.20.dev0...v0.20.0.rc0](https://github.com/apache/tvm/compare/v0.20.dev0...v0.20.0.rc0). ### Community None. ### RFCs None. ### Adreno * [#17608](https://github.com/apache/tvm/pull/17608) - [WINDOWS] Windows build dependencies for Adreno target ### BugFix * [#17761](https://github.com/apache/tvm/pull/17761) - [FIX][RELAX] fix fusion of transpose + matmul when constant weight * [#17762](https://github.com/apache/tvm/pull/17762) - [Fix] Fix OpenCL header in attention utils * [#17711](https://github.com/apache/tvm/pull/17711) - [Fix][dlight] add an explicit reduction loop check in Reduce * [#17697](https://github.com/apache/tvm/pull/17697) - [Fix] Include `<chrono>` for `std::chrono` * [#17677](https://github.com/apache/tvm/pull/17677) - Declare build backend for python package * [#17598](https://github.com/apache/tvm/pull/17598) - [TIR][FIX] update FlopEstimator to include missing nodes * [#17601](https://github.com/apache/tvm/pull/17601) - [Flashinfer][Fix] fix missing args in flashinfer test * [#17607](https://github.com/apache/tvm/pull/17607) - [FIX][TVMC] Fix the mixed precision conversion pipeline ### CI * [#17687](https://github.com/apache/tvm/pull/17687) - Update images to 20250226-223225-63bc315f * [#17680](https://github.com/apache/tvm/pull/17680) - update images to 20250225-035137-aeadc31c * [#17675](https://github.com/apache/tvm/pull/17675) - [skip ci]Update github tvmbot * [#17635](https://github.com/apache/tvm/pull/17635) - Cleanup legacy files * [#17634](https://github.com/apache/tvm/pull/17634) - [skip ci]Improve build time * [#17629](https://github.com/apache/tvm/pull/17629) - [skip ci]Robustify CI for SPOT failure * [#17620](https://github.com/apache/tvm/pull/17620) - Unpin pytest-profiling * [#17621](https://github.com/apache/tvm/pull/17621) - [skip ci] Remove legacy CI runners protection * [#17619](https://github.com/apache/tvm/pull/17619) - [Refactor]Remove legacy frontend tests ### Dlight * [#17754](https://github.com/apache/tvm/pull/17754) - Fix general reduction rule to support non-last reduction axis * [#17663](https://github.com/apache/tvm/pull/17663) - [CPU] Add CPU Backend Support for GEMV Optimization ### Docker * [#17691](https://github.com/apache/tvm/pull/17691) - Fix ml_dtypes downgrade issue introduced by TensorFlow * [#17686](https://github.com/apache/tvm/pull/17686) - Update ml_dtypes to 0.5.1+ * [#17676](https://github.com/apache/tvm/pull/17676) - Use Torch GPU on gpu device * [#17648](https://github.com/apache/tvm/pull/17648) - Tensorflow (aka TFLite) upgrade to 2.18.0 * [#17643](https://github.com/apache/tvm/pull/17643) - Update ml_dtypes version * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update ml_dtypes version * [#17638](https://github.com/apache/tvm/pull/17638) - [skip ci]Update ml_dtypes version * [#17617](https://github.com/apache/tvm/pull/17617) - Tensorflow upgrade to 2.18.0 ### Docs * [#17650](https://github.com/apache/tvm/pull/17650) - Update README * [#17611](https://github.com/apache/tvm/pull/17611) - Download 3rd party embeds to local files * [#17604](https://github.com/apache/tvm/pull/17604) - Update README ### MetaSchedule * [#17104](https://github.com/apache/tvm/pull/17104) - Adding post optimization in MetaSchedule to Improve Scheduling ### OpenCL & CLML * [#17571](https://github.com/apache/tvm/pull/17571) - [OPENCL][TEXTURE] Improved texture memory planning ### Relax * [#17814](https://github.com/apache/tvm/pull/17814) - [PyTorch] Add stack.default and sum.default to exported programs translator * [#17820](https://github.com/apache/tvm/pull/17820) - [PyTorch] Add support for broadcast_to, narrow ops * [#17822](https://github.com/apache/tvm/pull/17822) - [PyTorch] Cleanup tests for ExportedProgram frontend * [#17806](https://github.com/apache/tvm/pull/17806) - [PyTorch] Add Softplus Op Support for Exported Program and FX graph * [#17817](https://github.com/apache/tvm/pull/17817) - [PyTorch] Support dynamic shapes in ExportedProgram frontend * [#17813](https://github.com/apache/tvm/pull/17813) - [PyTorch] Improve ExportedProgram frontend by supporting `unflatten.int`, `hardtanh_.default`, `dropout_.default`, `silu_.default`, `add_.Tensor` and `relu_.default` * [#17812](https://github.com/apache/tvm/pull/17812) - [PyTorch] Support argsort, topk ops for ExportedProgram importer * [#17810](https://github.com/apache/tvm/pull/17810) - [PyTorch] Add support for argsort, sort, topk ops * [#17809](https://github.com/apache/tvm/pull/17809) - [PyTorch] Delete duplicate converter function `_to` * [#17807](https://github.com/apache/tvm/pull/17807) - [PyTorch] Fix torch 2.6 compatibility issues * [#17797](https://github.com/apache/tvm/pull/17797) - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops * [#17802](https://github.com/apache/tvm/pull/17802) - [Pytorch] support for arange in exported programs translator * [#17801](https://github.com/apache/tvm/pull/17801) - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer * [#17790](https://github.com/apache/tvm/pull/17790) - [PyTorch] Add support for index_select * [#17786](https://github.com/apache/tvm/pull/17786) - [PyTorch] Support softshrink op for ExportedProgram * [#17788](https://github.com/apache/tvm/pull/17788) - [PyTorch] Add support for where, cumprod and reciprocal ops * [#17785](https://github.com/apache/tvm/pull/17785) - [PyTorch] Support prod, std and var ops for ExportedProgram importer * [#17778](https://github.com/apache/tvm/pull/17778) - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer * [#17772](https://github.com/apache/tvm/pull/17772) - [PyTorch] Add support for prod, std and var ops * [#17766](https://github.com/apache/tvm/pull/17766) - [PyTorch] Add support for log2, log10 and log1p ops * [#17760](https://github.com/apache/tvm/pull/17760) - [PyTorch] Add support for lerp, select and clone ops * [#17751](https://github.com/apache/tvm/pull/17751) - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer * [#17747](https://github.com/apache/tvm/pull/17747) - [PyTorch] Support flip, gather, take ops for ExportedProgram importer * [#17738](https://github.com/apache/tvm/pull/17738) - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer * [#17726](https://github.com/apache/tvm/pull/17726) - [PyTorch] Add support for numel, empty_like and one_hot ops * [#17707](https://github.com/apache/tvm/pull/17707) - [PyTorch] Add support for gather, flip and take ops * [#17702](https://github.com/apache/tvm/pull/17702) - [PyTorch] Add support for celu, selu, is_floating_point ops * [#17694](https://github.com/apache/tvm/pull/17694) - [PyTorch] Add support for elu, hardtanh ops * [#17689](https://github.com/apache/tvm/pull/17689) - [PyTorch] Support several binary ops for ExportedProgram importer * [#17672](https://github.com/apache/tvm/pull/17672) - [PyTorch] Refactor binary ops tests * [#17679](https://github.com/apache/tvm/pull/17679) - [PyTorch] Support several unary ops for ExportedProgram importer * [#17668](https://github.com/apache/tvm/pull/17668) - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops * [#17664](https://github.com/apache/tvm/pull/17664) - [PyTorch] Add support for ge, gt, le, mod, ne ops * [#17659](https://github.com/apache/tvm/pull/17659) - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops * [#17622](https://github.com/apache/tvm/pull/17622) - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests * [#17566](https://github.com/apache/tvm/pull/17566) - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange * [#17642](https://github.com/apache/tvm/pull/17642) - [ONNX]replace topi.split with relax.op.split in the onnx frontend * [#17674](https://github.com/apache/tvm/pull/17674) - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration * [#17618](https://github.com/apache/tvm/pull/17618) - [KVCache] TIR attention kernel support for MLA * [#17615](https://github.com/apache/tvm/pull/17615) - [KVCache] Add KV Cache for CPU Runtime * [#17616](https://github.com/apache/tvm/pull/17616) - [Runtime][KVCache] Initial interface setup for MLA * [#17782](https://github.com/apache/tvm/pull/17782) - [Frontend] Support max/min in frontend op interface * [#17758](https://github.com/apache/tvm/pull/17758) - Allow ingesting tensor.chunk() from exported torch program * [#17781](https://github.com/apache/tvm/pull/17781) - Enable bfloat16 for softmax struct-info inference * [#17752](https://github.com/apache/tvm/pull/17752) - Batch norm correctness on eval mode * [#17774](https://github.com/apache/tvm/pull/17774) - check for tensor_meta in exported_program_translator * [#17757](https://github.com/apache/tvm/pull/17757) - Tensor.split with uneven tensors * [#17749](https://github.com/apache/tvm/pull/17749) - Move TIR backend to gpu_generic * [#17725](https://github.com/apache/tvm/pull/17725) - Ingest Tensor.clamp from torch export * [#17724](https://github.com/apache/tvm/pull/17724) - Add support to ingest Tensor.expand_as() * [#17723](https://github.com/apache/tvm/pull/17723) - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy * [#17721](https://github.com/apache/tvm/pull/17721) - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument * [#17722](https://github.com/apache/tvm/pull/17722) - Allow ingesting vector_norm from torch.export * [#17728](https://github.com/apache/tvm/pull/17728) - ingest Tensor.contiguous from torch export * [#17700](https://github.com/apache/tvm/pull/17700) - Fix tree attention for Qwen2-1.5 models * [#17682](https://github.com/apache/tvm/pull/17682) - Add support for func attr inheritance in SplitLayoutRewritePreproc * [#17654](https://github.com/apache/tvm/pull/17654) - [BYOC] OpenCLML offload support for Relax * [#17633](https://github.com/apache/tvm/pull/17633) - Pipeline file reorganization * [#17626](https://github.com/apache/tvm/pull/17626) - Initial setup of relax backend pipeline * [#17568](https://github.com/apache/tvm/pull/17568) - [PASS] Convert layout pass and ops enhanced to support sub indexing ### Runtime * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling options enabled for CLML * [#17614](https://github.com/apache/tvm/pull/17614) - [CLML] Profiling options enabled for CLML * [#17570](https://github.com/apache/tvm/pull/17570) - [OPENCL] Bugfix ### TIR * [#17799](https://github.com/apache/tvm/pull/17799) - Fix reduce buffer allocation position * [#17783](https://github.com/apache/tvm/pull/17783) - [REFACTOR]remove legacy tir::any * [#17706](https://github.com/apache/tvm/pull/17706) - Minor fix for default GPU schedule * [#17579](https://github.com/apache/tvm/pull/17579) - [SoftwarePipeline] Ensure pipeline epilogue and prologue do not overlap * [#17584](https://github.com/apache/tvm/pull/17584) - [LoopPartition] enforcement on loop partition control ### TVMC * [#17606](https://github.com/apache/tvm/pull/17606) - Bug fix ### cuda & cutlass & tensorrt * [#17789](https://github.com/apache/tvm/pull/17789) - [CUTLASS] Add blockwise scale gemm/bmm kernels * [#17741](https://github.com/apache/tvm/pull/17741) - [Codegen][CUDA] Fix codegen of cast among vector bfloat16, fp8 and fp4 * [#17708](https://github.com/apache/tvm/pull/17708) - [CUDA] FP4 cast and reinterpret support * [#17639](https://github.com/apache/tvm/pull/17639) - [CUDA] Remove htanh from unsupported math ops for CUDA 12.8 * [#16950](https://github.com/apache/tvm/pull/16950) - [Codegen, CUDA] Add FP8 Tensor Core Codegen ### web * [#17695](https://github.com/apache/tvm/pull/17695) - [WASM] Update wasm include in accordance to kv cache revamp ### Misc * [#17796](https://github.com/apache/tvm/pull/17796) - [Cublas] Added support for bfloat16 while dispatching to cublas kernels * [#17763](https://github.com/apache/tvm/pull/17763) - [Flashinfer] Added jit flow for sampling kernel * [#17811](https://github.com/apache/tvm/pull/17811) - [NFC] Fix `explict` typo * [#17780](https://github.com/apache/tvm/pull/17780) - [3rdparty] Enable bfloat16 for custom allreduce kernel * [#17784](https://github.com/apache/tvm/pull/17784) - [REFACTOR] Phase out StackVM * [#17750](https://github.com/apache/tvm/pull/17750) - BugFix: Relax comment * [#17748](https://github.com/apache/tvm/pull/17748) - [Codegen] Support codegen for vectorized tir.ShuffleNode * [#17743](https://github.com/apache/tvm/pull/17743) - Fix: Change variable i to x in split operation in cross_compilation_and_rpc.py * [#17730](https://github.com/apache/tvm/pull/17730) - [Attention] Added caching for flashinfer binaries during JIT * [#17733](https://github.com/apache/tvm/pull/17733) - [Refactor] Clean up Relay references in the codebase * [#17739](https://github.com/apache/tvm/pull/17739) - [BF16] Support ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes * [#17734](https://github.com/apache/tvm/pull/17734) - Remove Google Analytics * [#17731](https://github.com/apache/tvm/pull/17731) - [IR] Compact Functor vtable * [#17736](https://github.com/apache/tvm/pull/17736) - Fix typos in comments and strings * [#17670](https://github.com/apache/tvm/pull/17670) - [DataType] BF16 Support * [#17727](https://github.com/apache/tvm/pull/17727) - [FFI] Fix dynamic FFI index to ensure compatibility * [#17718](https://github.com/apache/tvm/pull/17718) - [Refactor] Migrate build API to `tvm.compile` * [#17714](https://github.com/apache/tvm/pull/17714) - [FFI] Phase out ctypes fallback in favor of cython * [#17716](https://github.com/apache/tvm/pull/17716) - Fix the get_target_compute_version for sm >= 100 * [#17710](https://github.com/apache/tvm/pull/17710) - [Refactor] Introduce base Executable class and `tvm.compile` interface * [#17713](https://github.com/apache/tvm/pull/17713) - [REFACTOR] Cleanup legacy relay runtime data structures * [#17712](https://github.com/apache/tvm/pull/17712) - [DataType] Rename FP8 dtypes to standard names * [#17703](https://github.com/apache/tvm/pull/17703) - Fix typos in multiple files * [#17693](https://github.com/apache/tvm/pull/17693) - updated the assert in BindParams to allow tvm.relax.Constant * [#17701](https://github.com/apache/tvm/pull/17701) - [Refactor] Remove legacy TE schedule tag * [#17683](https://github.com/apache/tvm/pull/17683) - [MSC] Remove relay * [#17688](https://github.com/apache/tvm/pull/17688) - Fix relax.ccl.scatter_from_worker0 assert * [#17630](https://github.com/apache/tvm/pull/17630) - [Codegen] FP4 support * [#17685](https://github.com/apache/tvm/pull/17685) - [REFACTOR] Cleanup legacy TE-based passes * [#17681](https://github.com/apache/tvm/pull/17681) - [REFACTOR] Followup cleanup of relay phase out * [#17678](https://github.com/apache/tvm/pull/17678) - Bump 3rdparty/cutlass_fpA_intB_gemm * [#17669](https://github.com/apache/tvm/pull/17669) - [REFACTOR] Allow target dependent default tir pipeline dispatch in tir.build() * [#17665](https://github.com/apache/tvm/pull/17665) - [REFACTOR] move build flow from C++ to Python * [#17624](https://github.com/apache/tvm/pull/17624) - Added support for normal MLA kernel * [#17641](https://github.com/apache/tvm/pull/17641) - Pick up vector length from 'zvlXXXb' (RVV) mattr for riscv * [#17666](https://github.com/apache/tvm/pull/17666) - [Refactor] Improve TargetHasSVE function with optional target handling * [#17661](https://github.com/apache/tvm/pull/17661) - [Refactor] Phrase out python dependency `decorator` * [#17662](https://github.com/apache/tvm/pull/17662) - [REFACTOR] Phase out te.Schedule c++ components * [#17660](https://github.com/apache/tvm/pull/17660) - [REFACTOR] Phase out relay c++ components * [#17655](https://github.com/apache/tvm/pull/17655) - Upgrading onnx and onnxrt verions * [#17657](https://github.com/apache/tvm/pull/17657) - Update argument order for relax.op.pad to make it round-trippable * [#17658](https://github.com/apache/tvm/pull/17658) - [REFACTOR] Phase out te.schedule python components * [#17653](https://github.com/apache/tvm/pull/17653) - Update images to 20250214-034537-bd1411f8 * [#17656](https://github.com/apache/tvm/pull/17656) - [REFACTOR] Phase out relay python components * [#17649](https://github.com/apache/tvm/pull/17649) - [Refactor] Phase out python dependency attrs * [#17644](https://github.com/apache/tvm/pull/17644) - Bump rollup from 2.79.1 to 2.79.2 in /web * [#17637](https://github.com/apache/tvm/pull/17637) - [PYTHON] Build cython by default * [#17631](https://github.com/apache/tvm/pull/17631) - Handle vector width (VLEN) for RISCV arches * [#17613](https://github.com/apache/tvm/pull/17613) - Bug Fix: Removed unused code * [#17585](https://github.com/apache/tvm/pull/17585) - [Relay]Disable InferType if it was done and no changes after previous pass * [#17605](https://github.com/apache/tvm/pull/17605) - [Refactor] Phase out legacy example apps * [#17603](https://github.com/apache/tvm/pull/17603) - [Refactor] Phase out legacy docs * [#17513](https://github.com/apache/tvm/pull/17513) - [GRAPH RT] Additional API support -- View it on GitHub: https://github.com/apache/tvm/releases/tag/v0.20.0.rc0 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/releases/213480...@github.com>