# Introduction The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new [quarterly release](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0067-quarterly-releases.md) schedule and includes many highlights, such as:
* [MetaSchedule's full implementation](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0005-meta-schedule-autotensorir.md) * [ARM cascading scheduler](https://github.com/apache/tvm-rfcs/pull/37) for Arm Ethos(TM)-U NPUs * [Collage](https://github.com/apache/tvm-rfcs/pull/62) which brings tuning to BYOC * Several microTVM improvements * [Dynamic shape optimization](https://github.com/apache/tvm-rfcs/pull/72) * New `tvm.relay.build` parameters: `runtime=`, `executor=`, * AOT: support for the C++ runtime (with `llvm` and `c` targets only) and support for host-driven AOT in the C runtime * Hexagon RPC support * Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK boards and phones * AOT and USMP support * Threading * Initial op support * MLF: support for multiple modules in a single MLF artifact * And many more! See the list of RFCs and PRs included in v0.9.0 for a complete list, as well as [the full change list](https://github.com/apache/tvm/compare/v0.8.0...v0.9.0.rc0). ## RFCs These RFCs have been merged in [apache/tvm-rfcs](https://github.com/apache/tvm-rfcs) since the last release. * [[RFC] TUNIP: TVMScript Unified Printer (#74)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0074-tvmscript-unified-printer.md) ([`48d47c5`](https://github.com/apache/tvm-rfcs/commit/48d47c526fbb509115adf5bbaf8699129f4fe2be)) * [[RFC][Backend] RFC-CSI-NN2-Integration (#75)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0075_RISC-V_CSI-NN2_Intergration.md) ([`cfcf114`](https://github.com/apache/tvm-rfcs/commit/cfcf11464003af4c3201345a24ab380952fed944)) * [[RFC] Introducing DeclBuffer (#70)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0070-introducing-decl-buffer.md) ([`87ff1fa`](https://github.com/apache/tvm-rfcs/commit/87ff1facd55c0a7cef45efdf2b7548ee299e8e06)) * [[RFC][MLF] Model Library Format with Multiple Modules (#76)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0076_mlf_with_multiple_modules.md) ([`f47c6ad`](https://github.com/apache/tvm-rfcs/commit/f47c6ad660edf2c97e54279910e8b60b2bdf9ada)) * [[RFC] UMA Universal Modular Accelerator Interface (#60)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0060_UMA_Unified_Modular_Accelerator_Interface.md) ([`6990e13`](https://github.com/apache/tvm-rfcs/commit/6990e1363c96945b6c9d19dc2d331306d2be2a17)) * [[RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs (#72)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0072-dynamic-autoscheduler.md) ([`a518000`](https://github.com/apache/tvm-rfcs/commit/a518000cbc82e53321f526d2090c7c8067607391)) * [[RFC] Quarterly Releases (#67)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0067-quarterly-releases.md) ([`70293c7`](https://github.com/apache/tvm-rfcs/commit/70293c7f910ad70f4744d26295d6c4f6fe0366c5)) * [RFC-BYOC-DNNL-Integration (#73)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-byoc-onednn-integration.md) ([`7aed0ca`](https://github.com/apache/tvm-rfcs/commit/7aed0ca0e1e636aebcfe7fd10585f4c7e1893674)) * [[RFC] Relay Next Roadmap (#69)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-relax-roadmap.md) ([`ac15f2a`](https://github.com/apache/tvm-rfcs/commit/ac15f2a7bd77b106636ae5e4d8357a05d7267ef5)) * [RFC: clarifying buffer declaration and access (#63)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0063-clarifying-buffer-declaration-and-access.md) ([`de4fe97`](https://github.com/apache/tvm-rfcs/commit/de4fe97ac0ab9d8fe9d4309a380a5ea278aa1493)) * [Inclusive Language RFC (#68) (#68)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0068-inclusive-language.md) ([`4203bd2`](https://github.com/apache/tvm-rfcs/commit/4203bd2da4ffc98eb70731a05fcadc478679a461)) * [[USMP] Adding U4 usecase (#65)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0009_Unified_Static_Memory_Planning.md) ([`b9e246f`](https://github.com/apache/tvm-rfcs/commit/b9e246f6d1715176f5de10fe1e7e382b88bacabe)) * [Collage RFC (#62)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md) ([`23250f5`](https://github.com/apache/tvm-rfcs/commit/23250f59cfc89673b7efe1d8f13192ed8381824d)) * [Replace codeowners with more relevant automation (#58)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0058-replace-codeowners.md) ([`540c1f8`](https://github.com/apache/tvm-rfcs/commit/540c1f80631b5d9ee2ae9ff827edc73240fe1b09)) * [[RFC][TIR] Layout transformations on buffer access (#39)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md) ([`b675ef8`](https://github.com/apache/tvm-rfcs/commit/b675ef8e74351f298b8f3bc4ae944eaf3cbbe430)) * [Module Based Model Runtime for AOT (#46)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0046-module-based-model-runtime-for-aot.md) ([`d9dd6eb`](https://github.com/apache/tvm-rfcs/commit/d9dd6eb5e522ff169fe3bf4f5b9c5c3e84d95145)) * [@slow test RFC (#55)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0055-slow-tests.md) ([`9b6203a`](https://github.com/apache/tvm-rfcs/commit/9b6203a743693ea65eef66137d48abfc75d79d77)) * [[RFC][Roadmap] TVM Continuous Integration & Testing Roadmap (#54)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0054-ci-testing-roadmap.md) ([`41e5ba0`](https://github.com/apache/tvm-rfcs/commit/41e5ba078b830299692c844e5ab3e5193d3d6129)) * [Bring `PackedFunc` into TVM Object System (#51)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0051-PackedFunc-as-Object.md) ([`2e0de6c`](https://github.com/apache/tvm-rfcs/commit/2e0de6cbdae162c17f0842c32a4eb797cbd2c1ea)) * [[RFC][OpenCLML] OpenCLML integration as BYOC (#52)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0052-OpenCLML-integratio-as-BYOC.md) ([`f5ef65f`](https://github.com/apache/tvm-rfcs/commit/f5ef65fd0f04178af266ed109206ba3f011d8eaa)) * [Introduce the Arm(R) Ethos(TM)-U Cascading Scheduler (#37)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0037-arm-ethosu-cascading-scheduler.md) ([`f9fa824`](https://github.com/apache/tvm-rfcs/commit/f9fa8246ab9c95afa7025c2bd045b4e715c75103)) * [[RFC][Roadmap] microTVM roadmap (#53)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0053-microtvm-roadmap.md) ([`1b14456`](https://github.com/apache/tvm-rfcs/commit/1b1445647039f0241f9ff0268de538329b48b3d6)) * [Add Managed Jenkins Infrastructure for TVM RFC (#49)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0049-managed-jenkins-infrastructure-for-tvm.md) ([`a3a7d2c`](https://github.com/apache/tvm-rfcs/commit/a3a7d2c083e2384cfad158300ba2a22551ed4433)) * [TVM Roadmap RFC (#50)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0050-roadmaps.md) ([`263335f`](https://github.com/apache/tvm-rfcs/commit/263335f905afae55e161a23e77b1c19da274e795)) * [[RFC] Integrate LIBXSMM with TVM. (#47)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0046-Intel-LIBXSMM-integration.md) ([`1a3d4f1`](https://github.com/apache/tvm-rfcs/commit/1a3d4f13bf9c2ffb2baf420daeae460901fe79c7)) * [[RELAY][AST] Add virtual device as a first class field to Relay expressions (#45)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0045-first-class-virtual-device.md) ([`67c39d2`](https://github.com/apache/tvm-rfcs/commit/67c39d2a0766165e0c090286130109173a86dbe2)) ## What's Changed Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.8.0...v0.9.0.rc0. ### AOT * #11208 - Calculate used memory at the callsite of primitive functions * #11365 - Fix function number datatype from char to uint16_t * #11091 - Enable A-Normal Form in the AOT executor * #10753 - Support LLVM backend with C++ runtime * #10518 - Use python temporary directory for AOT tests * #10337 - BugFix of workspace calculation * #10282 - [runtime] Add Metadata classes for AOTExecutor * #9501 - [3/3][DeviceAPI] Wire up cpacked Device API context * #9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close * #9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators ### BYOC * #11474 - Two helper passes for external codegen using RelayToTIR custom pass machinery * #11144 - Remove support for run-time linked-params from codegen * #10590 - Add order to functions in C Codegen * #11638 - [DNNL][CBLAS]Unifles all MKLDNN/DNNL to DNNL * #11619 - RelayToTIR custom codegen passes can still depend on dynamic shape functions * DNNL - #11902, #11642, #11513, #11571, #11560, #11345, #11111, #10837, #10421, #9995, #9797 * TensorRT - #11923, #11203, #10759, #10772, #10388 * CMSIS-NN - #11732, #11625, #10939, #11013, #10817, #10563, #10224, #10148, #10100, #9338, #9531, #9409, #9331 * OpenCLML - #10243 * CUTLASS - #11631, #10185, #10177, #10110, #10036, #9899, #9820, #9800, #9795, #9746, #9737, #9698, #9595, #9571 * CUDNN - #10997, #9986, #9948 * ACL - #10801 * PTX - #10855, #10339, #9909 * CUBLAS - #10826, #10820 ### CI * #11313 - Refactor of tvm.testing.requires_* annotations * #11666 - Enable pylint for tests/python/ci * #11657 - Apply linting rules to AOT tests * #11380 - Restructure Jenkinsfile * Automation - #11813, #11775, #11480, #11437, #10833, #10056, #9973, #9934 * User experience improvements - #11470, #11329, #11553, #11497, #11051, #10933, #10960, #10525, #10425, #10322, #10121, #9971, #9554, #9752, #9556 * Reduce CI runtime - #11402, #11349, #11258, #11132, #10946, #10743, #10359 * Code cleanups - #10968, #10740 ### Frontends * PaddlePaddle - #11537, #9724, #9564 * TFLite - #10915, #10566 * Oneflow - #11321, #11036, #8790 * PyTorch - #11190, #10504, #10184, #10091 * ONNX - #10949, #9438, #9186, #9493, #9475 * Keras - #7006 ### Hexagon * #11549 - Initial clip operator for Hexagon * #11834 - Add op resize2d for hexagon * #11559 - Softmax slice op initial version * #11529 - Slice ops added - add, subtract, multiply * #11720 - [testing] add max_pool2d benchmark * #11417 - Implement avg_pool2d slice op * #11653 - Add HexagonThreadManager * #11547 - Run single RPC server on Android in each testing session * #11490 - [testing] add TVMScript elemwise-add * #11400 - [testing] refactor benchmark-table code * #11277 - moves conftest.py to tvm.contrib.hexagon so outside repos can access the testing fixtures * #11319 - Add unit tests for Hexagon Device API * #11279 - Add USMP tests * #11283 - Update Readme * #11239 - capture gtest output and return over FFI * #11175 - Add schedule and test for conv2d_transpose_nchw * #11018 - [Runtime] Add QuRT thread pool backend * #11145 - Add support for on-device unit testing using gtest * #11138 - Add test for depthwise conv2d schedule * #11016 - Add test for registered schedules * #11104 - Add mobilenet test * #11090 - Delete offload runtime, move files to right places * #11065 - AoT with LLVM Codegen on Hexagon * #11025 - Deprecate USE_HEXAGON_DEVICE, introduce USE_HEXAGON * #10604 - HVX scheduling and bench-marking of TE element-wise add * #10905 - [LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout * #10907 - Move aot/graph_executor interactions into launcher * #10919 - Register basic strategies and schedules for common operators * #10904 - Add unit tests executing 2-d VTCM usage * #10910 - Refactor to keep HexagonBuffer private to the device api * #10908 - [LLVM][CodeGen] Make CodeGenHexagon a subclass of CodeGenCPU * #10878 - Generalized HexagonBuffer::CopyTo/CopyFrom * #10846 - Support both 1-d and 2-d VTCM allocations * #10581 - Improved ergonomics of HexagonLauncher in unit tests. * #10616 - Refactor tvm.contrib.hexagon, NFC * #10612 - Deprecate SDK 3.x, rewrite HexagonSDK.cmake * #10586 - Codegen for 2d Load/Store * #10558 - Generalize builtin for Nd memory alloc with storage scope and add lowering for VTCM / Hexagon * #10543 - [Runtime][PipelineExecutor] Add the pipeline internal forwarding logic. * #10507 - Add doc on TVM - Hexagon RPC flow * #10520 - Resolve breakage in test_hexagon/test_cache_read_write * #10311 - [runtime]AOTExecutor implementation for C Codegen * #10454 - Allow execution on target or simulator from HexagonLauncher * #10365 - Lower cache_read and cache_write to Hexagon DMA via tensorize * #10361 - RPC server/client for simulator * #10302 - [CI]Add Hexagon Tests to pipeline * #10263 - [Docker]Add docker file and scripts * #10227 - Refactor Hexagon.cmake * #10217 - Adding support for Hexagon User DMA Engine * #10068 - Update hexagon API build instruction and cleanup hexagon_proxy_rpc * #9970 - Do not auto-build apps when building TVM * #9736 - Add unit tests for HexagonBuffer * #9525 - Add Hexagon VTCM and discontiguous allocation support * #9631 - Add RPC Mechanism for Hexagon * #9473 - cleanup Hexagon conv2d tests ### MetaSchedule * #11884 - Postproc: Rewrite-Layout * #11848 - [OpStrategy] Support MetaSchedule Layout * #11845 - [Relay][Pass] Meta-Schedule-Layout-Rewrite * #11758 - [Runtime] Enhance Runner RandomFill * #11683 - Distributed Measurement * #11751 - [Minor] Organize Testing Scripts * #11735 - Modify Profiler Timers * #11727 - Developer Ergonomics Enhancement II * #11692 - Apply-History-Best Task Filtering * #11486 - Add Profiler Support For Tuning Efficiency Optimization * #11680 - JSONDatabase Utilities * #11641 - Generate MetaSchedule Dataset * #11622 - Developer Ergonomics Enhancement * #11604 - Resolve dependencies between header files * #11587 - Add Testing Script with ONNX Support * #11590 - Evo Independence from TaskScheduler * #11534 - No explicit unrolling for spatial PrimFunc * #11512 - Enable Task Filtering * #11177 - AutoBind rule and MutateThreadBinding * #11157 - Logging Interface Unification * #11088 - Auto tensorization for CPU / GPU dot product * #10986 - [Refactor] Introduce TuneConfig * #11020 - [Metaschedule, Refactor] Move MultiLevelTilingNode decl to a header * #10927 - [Refactor] Clarify Integration Logic * #10876 - Add utility API to ease using manual schedules * #10885 - [BugFix] Fix skipped tests * #10366 - Add Gradient Based Task Scheduler * #10823 - Fine-Grained Rewrite Unbound Block * #10793 - Add demonstration of selectively tuning relay ops with TIR schedules * #10811 - Support grouping in the cost model * #10810 - Extract task weights during task extraction * #10782 - [TIR]Estimate TIR FLOPs * #10776 - Misc updates for tuning end-to-end workloads * #10689 - Upstream the leftover changes * #10648 - [Meta Schedule] Refactor meta schedule testing utils * #10578 - New relay backend for meta schedule task extraction * #10534 - Bug Fix for Relay Integration * #10501 - Update scripts for subgraph tuning * #10497 - Refactor testing workloads * #10461 - Enable AutoTVM-style template-based search space * #10368 - Fix Cyclic Dependency in PyClass Family * #10403 - Arithmetic analysis * #10367 - Update Tuning Interfaces. * #10079 - [M4a] User-API: Tune-TE/TIR/Relay * #10081 - [M4a] Rewrite-Cooperative-Fetch * #10055 - [M4b] Testcases for TensorRT builder/runner * #10092 - [M4a] Mutator: Mutate-Tile-Size * #10096 - [M4a] Mutator: Mutate Parallel * #10071 - [M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll * #10043 - [M4a] Schedule Rule: Multi-Level-Tiling * #10045 - Mutator: Mutate-Unroll * #10033 - [M4a] Schedule Rule: Parallelize-Vectorize-Unroll * #10027 - [M4a] PostProcessor: Rewrite-Unbound-Block * #10028 - Mutator: Mutate-Compute-Location * #9997 - [M4a] PostProcessor: Disallow-Dynamic-Loop * #9994 - [M4a] Schedule Rule: Cross-Thread-Reduction * #10013 - [M4a] PostProcessor: Rewrite Reduction Block * #9975 - [M4a] Schedule Rule: Add-RFactor * #9945 - [M4a] PostProcessor: Verify-GPU-Code * #9940 - [M4a] Schedule Rule: Random-Compute-Location * #9943 - [M4a] Schedule Rule: Auto-Inline * #9860 - [M3c] Add Per-Store-Feature * #9859 - [M3c] XGB-based Cost Model * #9836 - [M4a] Add EvolutionarySearch Search Strategy * #9799 - [M4a] Add ReplayFunc Search Strategy * #9789 - [M3c] Update TuneContext, TaskScheduler & Search Strategy Design * #9780 - [M3c] Add More Measure Callbacks * #9761 - [M4a] Add ScheduleRule class & PostOrderApply space generator * #9760 - [M3c] Random Feature Extractor ### MicroTVM * #11741 - Refactor RVM scripts and fix DNS network issue * #11472 - [ARM]Add tests for arm schedules * #11634 - Update pyproject to python3.7 * Zephyr support - #11650 * RPC - #11227, #10967 ### Relay * #11825 - [realy][pass]add split infer shape with convert op layout pass * #11674 - Finish implementations of WithFields * #11481 - IndexedGraph improvements in preparation for Collage * #11432 - Plumb external codegen target via Target.current() * #11494 - [Pass] Add MaxPool, AvgPool to FoldExplicitPadding * #11183 - Add unidirectional sequence lstm * #11442 - Add 'static_library' runtime::Module * #11413 - [Topi]Support for FP16 ERF on CPU. * #11382 - Finish support for list-of-targets * #11386 - [Tests] Replace the Relay interpreter with the VM in the op tests * #11224 - Support i16, f16 scalars in Relay text * #11337 - Fix eltwise alter op layout for broadcast axis * #11199 - Flexible shape dispatch transformation * #11173 - Support 'external codegen targets'. * #10996 - Add FlattenAtrousConv transformation * #10871 - [CUDNN] Add cuDNN as a Relay partitioning target (BYOC) * #10787 - [Pass][Bugfix] Disable re-use of non-flat buffers in StorageRewrite. * #10378 - [FQ2I] Add leaky relu to FQ21 * #10400 - RelayViz graphviz renderer * #10352 - [VIRTUALDEVICE] Change syntax for device planning and store parameter virtual devices in virtual_device_ field * #10310 - [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 * #10085 - RelayViz interface and terminal ast-dump * #10239 - Add a conversion of individual operations in FQ2I pass. * #10236 - [Refactor] Clean up type relations that are declared as template for no reason * #10156 - Fix broadcast InferCorrectLayout * #10026 - [VM] Relay VM memory liveness/lifetime analysis * #10089 - [Pass] Add a relay pass to extract fake quantized ops * #9690 - Change function constructors to WithFields * #10069 - [DefuseOps pass] bug fix: To support function body types other… * #9954 - Add `conv2d_backward_weight` op (without topi) * #9838 - [FoldScaleAxis] Support dense and bias_add op in fold scale axis * #9816 - Add sliding_window operator * #9874 - Add a JSON converter for 0.7 -> 0.8 and 0.8 -> 0.9 * #9735 - [AMP][Pass][Typing] Add faster type inference * #9723 - [Frontend] Add Span filling for frontends to Relay * #9749 - Fix invalid shape function for "copy" operator * #9759 - s/SEScope/VirtualDevice/g * #9734 - Support large constants saved/loaded outside of VM executable * #9613 - Re-run PlanDevices after LowerTE to flow new memory scope constraints. * #9693 - PlanDevices supports 'free' on_device annotations * #9641 - [AST] Add virtual_device as a first class field in Relay * #9483 - Switch the VM to use the LowerTE pass instead of TECompiler::{Lower,LowerShapeFunc}. * #9569 - WithFields method for Call, Function, Var, TupleGetItem, If, Let, RefCreate, RefRead, RefWrite, Match, and Clause * #9533 - WithFields for Tuples * #9550 - Prepare for switching VM to LowerTEPass. * #9542 - Prepare DeadCodeElimination for running post LowerTEPass/ManifestAlloc. * #9352 - [TVMC]Introduce executor and runtime parameters * #9457 - Add the Arm(R) Ethos(TM)-U NPU identity operator * #9326 - Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. * QNN - #11228, #10718, #10086, #10053, #9637, #9982 ### Runtime * #11334 - [PipelineExecutor] Add graph manually splitting logic into the unit test. * #11133 - [PipelineExecutor] Refactor PipelineExecutor.py and Add cross compile support for pipeline executor. * #11172 - Move WrapTimeEvaluator from RPC to profiling, NFC * #10990 - [PipelineExecutor]Add forwarding queue logic for set input. * #10953 - [Vulkan] Add RGP support to TVM for vulkan device * #10723 - [PipelineExecutor] Getting the asynchronous output * #10283 - AOTExecutor implementation and c target code-generator * #9802 - [ThreadPool]Refactor affinity function and support CPU affinity list setting. * #10234 - [Pipeline Executor] multiple threads management and the data forwarding notification mechanism. * #10326 - Improved log information with function signature * #10032 - [PackedFunc] Bring `PackedFunc` into TVM Object System * #10082 - [PipelineExecutor] Pipeline Executor Sequential execution * #10010 - [PipelineExecutor] Add Pipeline Executor Interface * #9846 - [Pipeline executor] Global parameters group name and runtime modules parameters map. * #9889 - [GraphExecutor] Add API `get_input_info` to graph_executor * #9751 - [Pipeline Executor] Add the map logic of global input and subgraph input. ### TE * #11589 - Support schedulable TIR compute definitions in TOPI * #11341 - Optimized version of concatenation layer * #10561 - [TECompiler] Decouple TE compute and schedule lowering in ScheduleBuilder ### TIR * #11592 - HoistExpression, generalization of HoistIfThenElse * #11870 - [Pass] Remove-Weight-Layout-Rewrite-Block * #11740 - [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate transforms for auto tensorization * #11585 - Add preserve-unit-iters * #11677 - Register CUDA WMMA tensor intrinsics * #11658 - [TIR, CUDA] Add pass to replace global to shared memory copy with cp.async * #11624 - [Schedule] Allow named block and buffer arguments in Schedule * #11628 - [PASS] Refactor a couple of TIR passes - BindTarget, AnnotateEntryFunc, Filter, LowerInitBlock * #11574 - CSE pass : Restrict the equivalence to be decided by a normal form - avoids comparison of terms * #11575 - Schedule Primitive: Add-Unit-Loop * #11515 - Add schedule primitive ReIndex * #11524 - [Arith] Additional Simplifications Inside Conditionals * #11485 - Add schedule primitive TransformBlockLayout * #11495 - [Software pipeline] Fix hardcoded index in `access_ptr` rewriting, add a GPU test with depth 4 * #11269 - [Schedule] Transform layout quality of life * #11355 - Support tensorization using ldmatrix + MMA * #11289 - [Schedule] Allowed typing.Tuple in tir.schedule._type_checker * #11317 - Support affine expressions as indices in reverse compute inline * #11235 - [Arith] Implemented padded inverses in IndexMap * #11238 - [ROOFLINE] Calculate roofline from existing TIR PrimFunc * #11225 - Add schedule primitive SetAxisSeparator * #11110 - Get read/write access precisely for opaque access. * #11106 - Enhance software pipeline validation and fix predicate of epilogue * #10843 - StmtFunctor RenewDefs * #11075 - Add function to tile a block according to a given tensor intrinsic * #11050 - Utility function to decide loop mapping for auto tensorization * #11009 - [ROCM] DP4A intrinsic support for TE/TIR * #10925 - VNNI and ARM dot product intrinsic for tensorization * #10887 - [Schedule] Relax reorder primitive's affine binding check * #10732 - [Analysis] Add SuggestIndexMap for layout rewriting * #10538 - [Schedule] Transform layout * #10638 - Change the behavior of read/write region analysis for reduction blocks. * #10705 - Use local complete block and local reduction block to identify compact dataflow * #10671 - Tuple Reduction Support in CreatePrimFunc * #9727 - [TE]Implement layout transformations, non-flat memory buffers * #10405 - [TensorIR] Update VerifyGPU * #10401 - [TensorIR] Renormalize split pattern * #10112 - [TIR, Relay] improve bfloat16 support * #8509 - Tir constants integration into compilation pipeline * #9996 - add support for multi-blocking layout and their transformation * #10066 - Add software pipelining * #10207 - Support sub warp reduction for CUDA target. * #9482 - Implementation of Common Subexpression Elimination for TIR * #9527 - Allow compute_at create block predicate for non-trivial bounds and support floordiv pattern * #10158 - [Schedule] Update compact_dataflow constraint * #9871 - [Schedule] Blockize and Tensorize * #10016 - [BugFix]Fix cross-thread reduction when single reduction loop with predicate * #9880 - Encode conditional accesses info into block read/write regions * #9699 - Affine utility support iter lowerbound and diagnostics * #9742 - [Schedule] Add Annotate/Unannotate primitive * #9738 - [TensorIR] Primitive "SetScope" * #9743 - [Schedule] Analysis functions to check if compute_inline and com… * #9689 - Allow memory (aka storage) scopes to be retrieved/applied to PrimFuncs * #9559 - [TensorIR][UX] Type annotation-based runtime type checking * #9444 - Add a 'rolling_buffer' scheduling primitive * #9360 - [TensorIR] Cross-Thread Reduction ### TOPI * #11531 - TE implementation of LSTM using scan * #11161 - Add Adreno GPU target and topi supporting textures with dynamically allocated textures * #10332 - VNNI support for batch matmul * #9873 - Add support for groupped conv3d * #10230 - VNNI support for int8 dense * #10098 - [Op]5 ops can accept unsigned integers as indices * #9832 - Support grouped conv1d * #9694 - Add generic batch norm * #9233 - Cortex-M DSP support ### TVMScript * #11308 - Represent ramp as index slice * #10099 - Support T.buffer_decl using data pointer from Let/Allocate * #9680 - Improve printer for TIR syntax sugar * #9492 - Add syntax sugar for T.handle and T.match_buffer * #9620 - Add for loop syntax sugar * #9543 - Misc error message improvements * #9505 - [Fix] Add type hints for more uncovered cases ### USMP * #11015 - U3 use case * #10189 - Adding support for U1 usecase for constant pools * #10785 - Adding support for U4 usecase * #10193 - adding support for U2 and U3 usecases * #10005 - Add performance characteristics to PoolInfo * #9565 - [TIR]Integrating USMP to AoT Executor * #9704 - Hill Climb allocator * #9418 - [TIR]adding the pass to convert to pool offsets * #9649 - [TIR]Augmenting the algo interface with memory pressure * #9214 - [TIR]Greedy memory planning algorithm * #8468 - [TIR]Added buffer info extraction pass ### microNPU * #11468 - Optimize separate padding operation for conv2d * #11453 - Add transform matrices and part matcher to identity op * #11410 - add E2E tests with cascader wo striping * #11288 - Expose compute cycle annotations to TIR lowering * #10959 - Add a pass to reorder copy and compute nodes * #10509 - Add various options to the cascader * #11263 - Adding a option to enable striping * #10251 - Add support for conv2d running on two cores on U65 * #10862 - Integrate the cascader * #10344 - Integrate rolling buffers in Arm(R) Ethos(TM)-U * #10824 - Some housekeeping in the test_ethosu folder * #10763 - Tweak a layout transform matrix * #10725 - Add a pass to move allocate nodes to the outer scope * #10695 - Determine block configs using the cascader * #10599 - Refactor Relay to TIR hook * #10508 - Improve cascader memory transfer estimates * #10345 - Add support for TFLite FULLY_CONNECTED * #10254 - Introduce a pass to remove redundant identity operations * #10062 - [5] Convert Proposals to te.Schedules * #9959 - [4] Add the cascader Proposal generator * #10022 - enable USMP * #10127 - Add support for LeakyReLU * #10004 - Add FreeRTOS variant of NPU demo * #10060 - Refactor type inference data type checks * #9960 - Add support for pack and unpack * #10143 - Fix layout assignment in layout optimizer pass * #9890 - [3] Plan generation for the cascader * #9855 - Add support for transpose convolution * #9841 - Add support for nearest neighbor and bilinear upsampling * #9951 - Removing constant args from PrimFunc * #9929 - Refactor base address determination to codegen * #9910 - Add support for requantize * #9831 - Move optimization passes to be a module pass and ensure they are running * #9785 - [2d] Add more Part matchers to cascader * #9778 - [2c] Add performance modelling to cascader * #9471 - [2b] Create CascaderGraphs from TE graphs * #9469 - [2a] Add CascaderGraph for cascading analysis * #9621 - Add support for SPLIT and SPLIT_V * #9508 - Update Conv2D Tests to Use TF API to Gen Test Cases * #9627 - Add support for SIGMOID * #9589 - Add support for TFLite concatenate * #9623 - Refactor codegen tests * #9561 - Add NHWC -> NHCWB16 layout transformation pass * #9576 - Mean legalization support * #9597 - Move the compilation to use Target Hooks. * #9458 - [1] Add affine analysis structures for the cascader * #9547 - Add the infrastructure for lookup table and TANH * #9521 - Support binary elementwise with non-4D inputs * #9560 - Fix incorrectly calculated stride when converting NHWC to NHCWB16 * #9530 - Add unary elementwise operator infrastructure with ABS * #9514 - Adding rounding mode attribute to operators * #9515 - Allow constants to be given as input to an operator ### microTVM * #11250 - [ARM] Add Relay tests for conv2d registered schedules * #11232 - [rpc] Implemented rpc logging * #11044 - Add support for host-driven AoT Executor * #11043 - Better version handling for Arduino * #10555 - Enable micro tvmc tutorial testing in CI * #10194 - [RVM] Add scripts for automated build and testing * #10144 - TVMCon 2021 Zephyr Demo with CMSIS-NN * #10024 - [tvmc] Add TVMC Micro tutorial for Zephyr * #9684 - Fix zephye/test_zephyr_armv7m test * #9584 - [TVMC] Add TVMC test for Arduino and Zephyr * #9526 - Add minimal forwarding RPC server for host driven python execution on Hexagon * Zephyr support - #11362, #10138 ### Misc * #11465 - Add cooldown interval logic for the profiling functional * #11888 - [LLVM] Include LLVM headers in files that use them, not in llvm_common.h * #11646 - [Arith] Simplification of ceil, log2, and left_shift * #11464 - [MLF] Add support for multiple modules in Model Library Format * #11632 - [AutoTVM][Autoscheduler] Default build funcs inherit PassContext * #11543 - [OpenCL] Implement conv2d_winograd algorithm for Adreno * #11287 - [Arith] Merge surjective/non-surjective iter mapping detections * #11393 - Add utility to replace direct call to pytest.main * #11252 - [ROOFLINE] Roofline analysis over RPC * #11000 - [Graph Debugger] Expose way to benchmark individual nodes. * #10794 - bump PyTorch version to 1.11 * #10821 - [REFACTOR] Remove legacy nnvm folder * #10798 - [Arith] Remove diagnostic ctx argument from DetectIterMap * #10567 - [Refactor] Reduced repetition in CodeGenLLVM's buffer access * #10455 - [AUTO_SCHEDULER] Add feature extraction directly from PrimFunc * #7401 - RFC: initial stab at TorchScript fallback * #10391 - [vulkan] Add integer dot product (4xint8, 4xuint8) tensorization for the vulkan SPIR-V target. * #10293 - [VirtualMachine] new method allowing to set one input tensor by its index or name * #10191 - Generate correct output tensor names in C Interface API * #9276 - Parameterize test_link_params * #9808 - [Rust] Update Rust bindings * #9553 - [PROFILING] Add ability to profile a single function_profiling * #9611 - [CMAKE] Automatically detect newly added source files * #9544 - [Target] enable -arch=sm_xx for assigning cuda target arch and deprecate autotvm.measure.set_cuda_target_arch api * Profiler - #11530, #11066 * Docs - #10921, #11403, #10774, #10912, #9633, #9906, #9534, #9307, #9654, #9580 * Android - #11241 * ETHOSN - #11261, #10486, #10018, #9596 * TVMC - #11012, #10962, #10722, #9817, #9529, #9229 -- View it on GitHub: https://github.com/apache/tvm/releases/tag/v0.9.0.rc0 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/releases/72044...@github.com>