[apache/tvm] Pre-release v0.9.0.rc0 - Apache TVM v0.9.0.rc0

driazati Thu, 14 Jul 2022 15:36:44 -0700

# Introduction

The TVM community has worked since the v0.8 release to deliver many exciting 
features and improvements. v0.9.0 is the first release on the new [quarterly 
release](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0067-quarterly-releases.md)
 schedule and includes many highlights, such as:


* [MetaSchedule's full 
implementation](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0005-meta-schedule-autotensorir.md)
* [ARM cascading scheduler](https://github.com/apache/tvm-rfcs/pull/37) for Arm 
Ethos(TM)-U NPUs
* [Collage](https://github.com/apache/tvm-rfcs/pull/62) which brings tuning to 
BYOC
* Several microTVM improvements
* [Dynamic shape optimization](https://github.com/apache/tvm-rfcs/pull/72)
* New `tvm.relay.build` parameters: `runtime=`, `executor=`,
* AOT: support for the C++ runtime (with `llvm` and `c` targets only) and 
support for host-driven AOT in the C runtime
* Hexagon RPC support
    * Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK 
boards and phones
    * AOT and USMP support
    * Threading
    * Initial op support
* MLF: support for multiple modules in a single MLF artifact
* And many more! See the list of RFCs and PRs included in v0.9.0 for a complete 
list, as well as [the full change 
list](https://github.com/apache/tvm/compare/v0.8.0...v0.9.0.rc0).

## RFCs

These RFCs have been merged in 
[apache/tvm-rfcs](https://github.com/apache/tvm-rfcs) since the last release.

 * [[RFC] TUNIP: TVMScript Unified Printer 
(#74)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0074-tvmscript-unified-printer.md)
 
([`48d47c5`](https://github.com/apache/tvm-rfcs/commit/48d47c526fbb509115adf5bbaf8699129f4fe2be))
 * [[RFC][Backend] RFC-CSI-NN2-Integration 
(#75)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0075_RISC-V_CSI-NN2_Intergration.md)
 
([`cfcf114`](https://github.com/apache/tvm-rfcs/commit/cfcf11464003af4c3201345a24ab380952fed944))
 * [[RFC] Introducing DeclBuffer 
(#70)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0070-introducing-decl-buffer.md)
 
([`87ff1fa`](https://github.com/apache/tvm-rfcs/commit/87ff1facd55c0a7cef45efdf2b7548ee299e8e06))
 * [[RFC][MLF] Model Library Format with Multiple Modules 
(#76)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0076_mlf_with_multiple_modules.md)
 
([`f47c6ad`](https://github.com/apache/tvm-rfcs/commit/f47c6ad660edf2c97e54279910e8b60b2bdf9ada))
 * [[RFC] UMA Universal Modular Accelerator Interface 
(#60)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0060_UMA_Unified_Modular_Accelerator_Interface.md)
 
([`6990e13`](https://github.com/apache/tvm-rfcs/commit/6990e1363c96945b6c9d19dc2d331306d2be2a17))
 * [[RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs 
(#72)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0072-dynamic-autoscheduler.md)
 
([`a518000`](https://github.com/apache/tvm-rfcs/commit/a518000cbc82e53321f526d2090c7c8067607391))
 * [[RFC] Quarterly Releases 
(#67)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0067-quarterly-releases.md)
 
([`70293c7`](https://github.com/apache/tvm-rfcs/commit/70293c7f910ad70f4744d26295d6c4f6fe0366c5))
 * [RFC-BYOC-DNNL-Integration 
(#73)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-byoc-onednn-integration.md)
 
([`7aed0ca`](https://github.com/apache/tvm-rfcs/commit/7aed0ca0e1e636aebcfe7fd10585f4c7e1893674))
 * [[RFC] Relay Next Roadmap 
(#69)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0069-relax-roadmap.md) 
([`ac15f2a`](https://github.com/apache/tvm-rfcs/commit/ac15f2a7bd77b106636ae5e4d8357a05d7267ef5))
 * [RFC: clarifying buffer declaration and access 
(#63)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0063-clarifying-buffer-declaration-and-access.md)
 
([`de4fe97`](https://github.com/apache/tvm-rfcs/commit/de4fe97ac0ab9d8fe9d4309a380a5ea278aa1493))
 * [Inclusive Language RFC (#68) 
(#68)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0068-inclusive-language.md)
 
([`4203bd2`](https://github.com/apache/tvm-rfcs/commit/4203bd2da4ffc98eb70731a05fcadc478679a461))
 * [[USMP] Adding U4 usecase 
(#65)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0009_Unified_Static_Memory_Planning.md)
 
([`b9e246f`](https://github.com/apache/tvm-rfcs/commit/b9e246f6d1715176f5de10fe1e7e382b88bacabe))
 * [Collage RFC 
(#62)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md) 
([`23250f5`](https://github.com/apache/tvm-rfcs/commit/23250f59cfc89673b7efe1d8f13192ed8381824d))
 * [Replace codeowners with more relevant automation 
(#58)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0058-replace-codeowners.md)
 
([`540c1f8`](https://github.com/apache/tvm-rfcs/commit/540c1f80631b5d9ee2ae9ff827edc73240fe1b09))
 * [[RFC][TIR] Layout transformations on buffer access 
(#39)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md)
 
([`b675ef8`](https://github.com/apache/tvm-rfcs/commit/b675ef8e74351f298b8f3bc4ae944eaf3cbbe430))
 * [Module Based Model Runtime for AOT 
(#46)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0046-module-based-model-runtime-for-aot.md)
 
([`d9dd6eb`](https://github.com/apache/tvm-rfcs/commit/d9dd6eb5e522ff169fe3bf4f5b9c5c3e84d95145))
 * [@slow test RFC 
(#55)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0055-slow-tests.md) 
([`9b6203a`](https://github.com/apache/tvm-rfcs/commit/9b6203a743693ea65eef66137d48abfc75d79d77))
 * [[RFC][Roadmap] TVM Continuous Integration & Testing Roadmap 
(#54)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0054-ci-testing-roadmap.md)
 
([`41e5ba0`](https://github.com/apache/tvm-rfcs/commit/41e5ba078b830299692c844e5ab3e5193d3d6129))
 * [Bring `PackedFunc` into TVM Object System 
(#51)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0051-PackedFunc-as-Object.md)
 
([`2e0de6c`](https://github.com/apache/tvm-rfcs/commit/2e0de6cbdae162c17f0842c32a4eb797cbd2c1ea))
 * [[RFC][OpenCLML] OpenCLML integration as BYOC 
(#52)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0052-OpenCLML-integratio-as-BYOC.md)
 
([`f5ef65f`](https://github.com/apache/tvm-rfcs/commit/f5ef65fd0f04178af266ed109206ba3f011d8eaa))
 * [Introduce the Arm(R) Ethos(TM)-U Cascading Scheduler 
(#37)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0037-arm-ethosu-cascading-scheduler.md)
 
([`f9fa824`](https://github.com/apache/tvm-rfcs/commit/f9fa8246ab9c95afa7025c2bd045b4e715c75103))
 * [[RFC][Roadmap] microTVM roadmap 
(#53)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0053-microtvm-roadmap.md)
 
([`1b14456`](https://github.com/apache/tvm-rfcs/commit/1b1445647039f0241f9ff0268de538329b48b3d6))
 * [Add Managed Jenkins Infrastructure for TVM RFC 
(#49)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0049-managed-jenkins-infrastructure-for-tvm.md)
 
([`a3a7d2c`](https://github.com/apache/tvm-rfcs/commit/a3a7d2c083e2384cfad158300ba2a22551ed4433))
 * [TVM Roadmap RFC 
(#50)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0050-roadmaps.md) 
([`263335f`](https://github.com/apache/tvm-rfcs/commit/263335f905afae55e161a23e77b1c19da274e795))
 * [[RFC] Integrate LIBXSMM with TVM. 
(#47)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0046-Intel-LIBXSMM-integration.md)
 
([`1a3d4f1`](https://github.com/apache/tvm-rfcs/commit/1a3d4f13bf9c2ffb2baf420daeae460901fe79c7))
 * [[RELAY][AST] Add virtual device as a first class field to Relay expressions 
(#45)](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0045-first-class-virtual-device.md)
 
([`67c39d2`](https://github.com/apache/tvm-rfcs/commit/67c39d2a0766165e0c090286130109173a86dbe2))


## What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.8. 
Please visit the full listing of commits for a complete view: 
https://github.com/apache/tvm/compare/v0.8.0...v0.9.0.rc0.


### AOT
 * #11208 - Calculate used memory at the callsite of primitive functions
 * #11365 - Fix function number datatype from char to uint16_t
 * #11091 - Enable A-Normal Form in the AOT executor
 * #10753 - Support LLVM backend with C++ runtime
 * #10518 - Use python temporary directory for AOT tests
 * #10337 - BugFix of workspace calculation
 * #10282 - [runtime] Add Metadata classes for AOTExecutor
 * #9501 - [3/3][DeviceAPI] Wire up cpacked Device API context
 * #9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close
 * #9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators

### BYOC
 * #11474 - Two helper passes for external codegen using RelayToTIR custom pass 
machinery
 * #11144 - Remove support for run-time linked-params from codegen
 * #10590 - Add order to functions in C Codegen
 * #11638 - [DNNL][CBLAS]Unifles all MKLDNN/DNNL  to DNNL
 * #11619 - RelayToTIR custom codegen passes can still depend on dynamic shape 
functions
 * DNNL - #11902, #11642, #11513, #11571, #11560, #11345, #11111, #10837, 
#10421, #9995, #9797
 * TensorRT - #11923, #11203, #10759, #10772, #10388
 * CMSIS-NN - #11732, #11625, #10939, #11013, #10817, #10563, #10224, #10148, 
#10100, #9338, #9531, #9409, #9331
 * OpenCLML - #10243
 * CUTLASS - #11631, #10185, #10177, #10110, #10036, #9899, #9820, #9800, 
#9795, #9746, #9737, #9698, #9595, #9571
 * CUDNN - #10997, #9986, #9948
 * ACL - #10801
 * PTX - #10855, #10339, #9909
 * CUBLAS - #10826, #10820

### CI
 * #11313 - Refactor of tvm.testing.requires_* annotations
 * #11666 - Enable pylint for tests/python/ci
 * #11657 - Apply linting rules to AOT tests
 * #11380 - Restructure Jenkinsfile
 * Automation - #11813, #11775, #11480, #11437, #10833, #10056, #9973, #9934
 * User experience improvements - #11470, #11329, #11553, #11497, #11051, 
#10933, #10960, #10525, #10425, #10322, #10121, #9971, #9554, #9752, #9556
 * Reduce CI runtime - #11402, #11349, #11258, #11132, #10946, #10743, #10359
 * Code cleanups - #10968, #10740

### Frontends
 * PaddlePaddle - #11537, #9724, #9564
 * TFLite - #10915, #10566
 * Oneflow - #11321, #11036, #8790
 * PyTorch - #11190, #10504, #10184, #10091
 * ONNX - #10949, #9438, #9186, #9493, #9475
 * Keras - #7006

### Hexagon
 * #11549 - Initial clip operator for Hexagon
 * #11834 - Add op resize2d for hexagon
 * #11559 - Softmax slice op initial version
 * #11529 - Slice ops added - add, subtract, multiply
 * #11720 - [testing] add max_pool2d benchmark
 * #11417 - Implement avg_pool2d slice op
 * #11653 - Add HexagonThreadManager
 * #11547 - Run single RPC server on Android in each testing session
 * #11490 - [testing] add TVMScript elemwise-add
 * #11400 - [testing] refactor benchmark-table code
 * #11277 - moves conftest.py to tvm.contrib.hexagon so outside repos can 
access the testing fixtures
 * #11319 - Add unit tests for Hexagon Device API
 * #11279 - Add USMP tests
 * #11283 - Update Readme
 * #11239 - capture gtest output and return over FFI
 * #11175 - Add schedule and test for conv2d_transpose_nchw
 * #11018 - [Runtime] Add QuRT thread pool backend
 * #11145 - Add support for on-device unit testing using gtest
 * #11138 - Add test for depthwise conv2d schedule
 * #11016 - Add test for registered schedules
 * #11104 - Add mobilenet test
 * #11090 - Delete offload runtime, move files to right places
 * #11065 - AoT with LLVM Codegen on Hexagon
 * #11025 - Deprecate USE_HEXAGON_DEVICE, introduce USE_HEXAGON
 * #10604 - HVX scheduling and bench-marking of TE element-wise add
 * #10905 - [LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout
 * #10907 - Move aot/graph_executor interactions into launcher
 * #10919 - Register basic strategies and schedules for common operators
 * #10904 - Add unit tests executing 2-d VTCM usage
 * #10910 - Refactor to keep HexagonBuffer private to the device api
 * #10908 - [LLVM][CodeGen] Make CodeGenHexagon a subclass of CodeGenCPU
 * #10878 - Generalized HexagonBuffer::CopyTo/CopyFrom
 * #10846 - Support both 1-d and 2-d VTCM allocations
 * #10581 - Improved ergonomics of HexagonLauncher in unit tests.
 * #10616 - Refactor tvm.contrib.hexagon, NFC
 * #10612 - Deprecate SDK 3.x, rewrite HexagonSDK.cmake
 * #10586 - Codegen for 2d Load/Store
 * #10558 - Generalize builtin for Nd memory alloc with storage scope and add 
lowering for VTCM / Hexagon
 * #10543 - [Runtime][PipelineExecutor] Add the pipeline internal forwarding 
logic.
 * #10507 - Add doc on TVM - Hexagon RPC flow
 * #10520 - Resolve breakage in test_hexagon/test_cache_read_write
 * #10311 - [runtime]AOTExecutor implementation for C Codegen
 * #10454 - Allow execution on target or simulator from HexagonLauncher
 * #10365 - Lower cache_read and cache_write to Hexagon DMA via tensorize
 * #10361 - RPC server/client for simulator
 * #10302 - [CI]Add Hexagon Tests to pipeline
 * #10263 - [Docker]Add docker file and scripts
 * #10227 - Refactor Hexagon.cmake
 * #10217 - Adding support for Hexagon User DMA Engine
 * #10068 - Update hexagon API build instruction and cleanup hexagon_proxy_rpc
 * #9970 - Do not auto-build apps when building TVM
 * #9736 - Add unit tests for HexagonBuffer
 * #9525 - Add Hexagon VTCM and discontiguous allocation support
 * #9631 - Add RPC Mechanism for Hexagon
 * #9473 - cleanup Hexagon conv2d tests

### MetaSchedule
 * #11884 - Postproc: Rewrite-Layout
 * #11848 - [OpStrategy] Support MetaSchedule Layout
 * #11845 - [Relay][Pass] Meta-Schedule-Layout-Rewrite
 * #11758 - [Runtime] Enhance Runner RandomFill
 * #11683 - Distributed Measurement
 * #11751 - [Minor] Organize Testing Scripts
 * #11735 - Modify Profiler Timers
 * #11727 - Developer Ergonomics Enhancement II
 * #11692 - Apply-History-Best Task Filtering
 * #11486 - Add Profiler Support For Tuning Efficiency Optimization
 * #11680 - JSONDatabase Utilities
 * #11641 - Generate MetaSchedule Dataset
 * #11622 - Developer Ergonomics Enhancement
 * #11604 - Resolve dependencies between header files
 * #11587 - Add Testing Script with ONNX Support
 * #11590 - Evo Independence from TaskScheduler
 * #11534 - No explicit unrolling for spatial PrimFunc
 * #11512 - Enable Task Filtering
 * #11177 - AutoBind rule and MutateThreadBinding
 * #11157 - Logging Interface Unification
 * #11088 - Auto tensorization for CPU / GPU dot product
 * #10986 - [Refactor] Introduce TuneConfig
 * #11020 - [Metaschedule, Refactor] Move MultiLevelTilingNode decl to a header
 * #10927 - [Refactor] Clarify Integration Logic
 * #10876 - Add utility API to ease using manual schedules
 * #10885 - [BugFix] Fix skipped tests
 * #10366 - Add Gradient Based Task Scheduler
 * #10823 - Fine-Grained Rewrite Unbound Block
 * #10793 - Add demonstration of selectively tuning relay ops with TIR schedules
 * #10811 - Support grouping in the cost model
 * #10810 - Extract task weights during task extraction
 * #10782 - [TIR]Estimate TIR FLOPs
 * #10776 - Misc updates for tuning end-to-end workloads
 * #10689 - Upstream the leftover changes
 * #10648 - [Meta Schedule] Refactor meta schedule testing utils
 * #10578 - New relay backend for meta schedule task extraction
 * #10534 - Bug Fix for Relay Integration
 * #10501 - Update scripts for subgraph tuning
 * #10497 - Refactor testing workloads
 * #10461 - Enable AutoTVM-style template-based search space
 * #10368 - Fix Cyclic Dependency in PyClass Family
 * #10403 - Arithmetic analysis
 * #10367 - Update Tuning Interfaces.
 * #10079 - [M4a] User-API: Tune-TE/TIR/Relay
 * #10081 - [M4a] Rewrite-Cooperative-Fetch
 * #10055 - [M4b] Testcases for TensorRT builder/runner
 * #10092 - [M4a] Mutator: Mutate-Tile-Size
 * #10096 - [M4a] Mutator: Mutate Parallel
 * #10071 - [M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll
 * #10043 - [M4a] Schedule Rule: Multi-Level-Tiling
 * #10045 - Mutator: Mutate-Unroll
 * #10033 - [M4a] Schedule Rule: Parallelize-Vectorize-Unroll
 * #10027 - [M4a] PostProcessor: Rewrite-Unbound-Block
 * #10028 - Mutator: Mutate-Compute-Location
 * #9997 - [M4a] PostProcessor: Disallow-Dynamic-Loop
 * #9994 - [M4a] Schedule Rule: Cross-Thread-Reduction
 * #10013 - [M4a] PostProcessor: Rewrite Reduction Block
 * #9975 - [M4a] Schedule Rule: Add-RFactor
 * #9945 - [M4a] PostProcessor: Verify-GPU-Code
 * #9940 - [M4a] Schedule Rule: Random-Compute-Location
 * #9943 - [M4a] Schedule Rule: Auto-Inline
 * #9860 - [M3c] Add Per-Store-Feature
 * #9859 - [M3c] XGB-based Cost Model
 * #9836 - [M4a] Add EvolutionarySearch Search Strategy
 * #9799 - [M4a] Add ReplayFunc Search Strategy
 * #9789 - [M3c] Update TuneContext, TaskScheduler & Search Strategy Design
 * #9780 - [M3c] Add More Measure Callbacks
 * #9761 - [M4a] Add ScheduleRule class & PostOrderApply space generator
 * #9760 - [M3c] Random Feature Extractor

### MicroTVM
 * #11741 - Refactor RVM scripts and fix DNS network issue
 * #11472 - [ARM]Add tests for arm schedules
 * #11634 - Update pyproject to python3.7
 * Zephyr support - #11650
 * RPC - #11227, #10967

### Relay
 * #11825 - [realy][pass]add split infer shape with convert op layout pass
 * #11674 - Finish implementations of WithFields
 * #11481 - IndexedGraph improvements in preparation for Collage
 * #11432 - Plumb external codegen target via Target.current()
 * #11494 - [Pass] Add MaxPool, AvgPool to FoldExplicitPadding
 * #11183 - Add unidirectional sequence lstm
 * #11442 - Add 'static_library' runtime::Module
 * #11413 - [Topi]Support for FP16 ERF on CPU.
 * #11382 - Finish support for list-of-targets
 * #11386 - [Tests] Replace the Relay interpreter with the VM in the op tests
 * #11224 - Support i16, f16 scalars in Relay text
 * #11337 - Fix eltwise alter op layout for broadcast axis
 * #11199 - Flexible shape dispatch transformation
 * #11173 - Support 'external codegen targets'.
 * #10996 - Add FlattenAtrousConv transformation
 * #10871 - [CUDNN] Add cuDNN as a Relay partitioning target (BYOC)
 * #10787 - [Pass][Bugfix] Disable re-use of non-flat buffers in StorageRewrite.
 * #10378 - [FQ2I] Add leaky relu to FQ21
 * #10400 - RelayViz graphviz renderer
 * #10352 - [VIRTUALDEVICE] Change syntax for device planning and store 
parameter virtual devices in virtual_device_ field
 * #10310 - [ARM_CPU] Conv2d int8 intrinsic for cortex-A72
 * #10085 - RelayViz interface and terminal ast-dump
 * #10239 - Add a conversion of individual operations in FQ2I pass.
 * #10236 - [Refactor] Clean up type relations that are declared as template 
for no reason
 * #10156 - Fix broadcast InferCorrectLayout
 * #10026 - [VM] Relay VM memory liveness/lifetime analysis
 * #10089 - [Pass] Add a relay pass to extract fake quantized ops
 * #9690 - Change function constructors to WithFields
 * #10069 - [DefuseOps pass] bug fix: To support function body types other…
 * #9954 - Add `conv2d_backward_weight` op (without topi)
 * #9838 - [FoldScaleAxis] Support dense and bias_add op in fold scale axis
 * #9816 - Add sliding_window operator
 * #9874 - Add a JSON converter for 0.7 -> 0.8 and 0.8 -> 0.9
 * #9735 - [AMP][Pass][Typing] Add faster type inference
 * #9723 - [Frontend] Add Span filling for frontends to Relay
 * #9749 - Fix invalid shape function for "copy" operator
 * #9759 - s/SEScope/VirtualDevice/g
 * #9734 - Support large constants saved/loaded outside of VM executable
 * #9613 - Re-run PlanDevices after LowerTE to flow new memory scope 
constraints.
 * #9693 - PlanDevices supports 'free' on_device annotations
 * #9641 - [AST] Add virtual_device as a first class field in Relay
 * #9483 - Switch the VM to use the LowerTE pass instead of 
TECompiler::{Lower,LowerShapeFunc}.
 * #9569 - WithFields method for Call, Function, Var, TupleGetItem, If, Let, 
RefCreate, RefRead, RefWrite, Match, and Clause
 * #9533 - WithFields for Tuples
 * #9550 - Prepare for switching VM to LowerTEPass.
 * #9542 - Prepare DeadCodeElimination for running post 
LowerTEPass/ManifestAlloc.
 * #9352 - [TVMC]Introduce executor and runtime parameters
 * #9457 - Add the Arm(R) Ethos(TM)-U NPU identity operator
 * #9326 - Switch PlanDevices pass to be w.r.t. SEScopes instead of 
DLDeviceTypes.
 * QNN - #11228, #10718, #10086, #10053, #9637, #9982

### Runtime
 * #11334 - [PipelineExecutor] Add graph manually splitting logic into the unit 
test.
 * #11133 - [PipelineExecutor] Refactor PipelineExecutor.py and Add cross 
compile support for pipeline executor.
 * #11172 - Move WrapTimeEvaluator from RPC to profiling, NFC
 * #10990 - [PipelineExecutor]Add forwarding queue logic for set input.
 * #10953 - [Vulkan] Add RGP support to TVM for vulkan device
 * #10723 - [PipelineExecutor] Getting the asynchronous output
 * #10283 - AOTExecutor implementation and c target code-generator
 * #9802 - [ThreadPool]Refactor affinity function and support CPU affinity list 
setting.
 * #10234 - [Pipeline Executor] multiple threads management and the data 
forwarding notification mechanism.
 * #10326 - Improved log information with function signature
 * #10032 - [PackedFunc] Bring `PackedFunc` into TVM Object System
 * #10082 - [PipelineExecutor] Pipeline Executor Sequential execution
 * #10010 - [PipelineExecutor] Add Pipeline Executor Interface
 * #9846 - [Pipeline executor] Global parameters group name and runtime modules 
parameters map.
 * #9889 - [GraphExecutor] Add API `get_input_info` to graph_executor
 * #9751 - [Pipeline Executor] Add the map logic of global input and subgraph 
input.

### TE
 * #11589 - Support schedulable TIR compute definitions in TOPI
 * #11341 - Optimized version of concatenation layer
 * #10561 - [TECompiler] Decouple TE compute and schedule lowering in 
ScheduleBuilder

### TIR
 * #11592 - HoistExpression, generalization of HoistIfThenElse
 * #11870 - [Pass] Remove-Weight-Layout-Rewrite-Block
 * #11740 - [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate 
transforms for auto tensorization
 * #11585 - Add preserve-unit-iters
 * #11677 - Register CUDA WMMA tensor intrinsics
 * #11658 - [TIR, CUDA] Add pass to replace global to shared memory copy with 
cp.async
 * #11624 - [Schedule] Allow named block and buffer arguments in Schedule
 * #11628 - [PASS] Refactor a couple of TIR passes - BindTarget, 
AnnotateEntryFunc, Filter, LowerInitBlock
 * #11574 - CSE pass : Restrict the equivalence to be decided by a normal form 
- avoids comparison of terms
 * #11575 - Schedule Primitive: Add-Unit-Loop
 * #11515 - Add schedule primitive ReIndex
 * #11524 - [Arith] Additional Simplifications Inside Conditionals
 * #11485 - Add schedule primitive TransformBlockLayout
 * #11495 - [Software pipeline] Fix hardcoded index in `access_ptr` rewriting, 
add a GPU test with depth 4
 * #11269 - [Schedule] Transform layout quality of life
 * #11355 - Support tensorization using ldmatrix + MMA
 * #11289 - [Schedule] Allowed typing.Tuple in tir.schedule._type_checker
 * #11317 - Support affine expressions as indices in reverse compute inline
 * #11235 - [Arith] Implemented padded inverses in IndexMap
 * #11238 - [ROOFLINE] Calculate roofline from existing TIR PrimFunc
 * #11225 - Add schedule primitive SetAxisSeparator
 * #11110 - Get read/write access precisely for opaque access.
 * #11106 - Enhance software pipeline validation and fix predicate of epilogue
 * #10843 - StmtFunctor RenewDefs
 * #11075 - Add function to tile a block according to a given tensor intrinsic
 * #11050 - Utility function to decide loop mapping for auto tensorization
 * #11009 - [ROCM] DP4A intrinsic support for TE/TIR
 * #10925 - VNNI and ARM dot product intrinsic for tensorization
 * #10887 - [Schedule] Relax reorder primitive's affine binding check
 * #10732 - [Analysis] Add SuggestIndexMap for layout rewriting
 * #10538 - [Schedule] Transform layout
 * #10638 - Change the behavior of read/write region analysis for reduction 
blocks.
 * #10705 - Use local complete block and local reduction block to identify 
compact dataflow
 * #10671 - Tuple Reduction Support in CreatePrimFunc
 * #9727 - [TE]Implement layout transformations, non-flat memory buffers
 * #10405 - [TensorIR] Update VerifyGPU
 * #10401 - [TensorIR] Renormalize split pattern
 * #10112 - [TIR, Relay] improve bfloat16 support
 * #8509 - Tir constants integration into compilation pipeline
 * #9996 - add support for multi-blocking layout and their transformation
 * #10066 - Add software pipelining
 * #10207 - Support sub warp reduction for CUDA target.
 * #9482 - Implementation of Common Subexpression Elimination for TIR
 * #9527 - Allow compute_at create block predicate for non-trivial bounds and 
support floordiv pattern
 * #10158 - [Schedule] Update compact_dataflow constraint
 * #9871 - [Schedule] Blockize and Tensorize
 * #10016 - [BugFix]Fix cross-thread reduction when single reduction loop with 
predicate
 * #9880 - Encode conditional accesses info into block read/write regions
 * #9699 - Affine utility support iter lowerbound and diagnostics
 * #9742 - [Schedule] Add Annotate/Unannotate primitive
 * #9738 - [TensorIR] Primitive "SetScope"
 * #9743 - [Schedule] Analysis functions to check if compute_inline and com…
 * #9689 - Allow memory (aka storage) scopes to be retrieved/applied to 
PrimFuncs
 * #9559 - [TensorIR][UX] Type annotation-based runtime type checking
 * #9444 - Add a 'rolling_buffer' scheduling primitive
 * #9360 - [TensorIR] Cross-Thread Reduction

### TOPI
 * #11531 - TE implementation of LSTM using scan
 * #11161 - Add Adreno GPU target and topi supporting textures with dynamically 
allocated textures
 * #10332 - VNNI support for batch matmul
 * #9873 - Add support for groupped conv3d
 * #10230 - VNNI support for int8 dense
 * #10098 - [Op]5 ops can accept unsigned integers as indices
 * #9832 - Support grouped conv1d
 * #9694 - Add generic batch norm
 * #9233 - Cortex-M DSP support

### TVMScript
 * #11308 - Represent ramp as index slice
 * #10099 - Support T.buffer_decl using data pointer from Let/Allocate
 * #9680 - Improve printer for TIR syntax sugar
 * #9492 - Add syntax sugar for T.handle and T.match_buffer
 * #9620 - Add for loop syntax sugar
 * #9543 - Misc error message improvements
 * #9505 - [Fix] Add type hints for more uncovered cases

### USMP
 * #11015 - U3 use case
 * #10189 - Adding support for U1 usecase for constant pools
 * #10785 - Adding support for U4 usecase
 * #10193 - adding support for U2 and U3 usecases
 * #10005 - Add performance characteristics to PoolInfo
 * #9565 - [TIR]Integrating USMP to AoT Executor
 * #9704 - Hill Climb allocator
 * #9418 - [TIR]adding the pass to convert to pool offsets
 * #9649 - [TIR]Augmenting the algo interface with memory pressure
 * #9214 - [TIR]Greedy memory planning algorithm
 * #8468 - [TIR]Added buffer info extraction pass

### microNPU
 * #11468 - Optimize separate padding operation for conv2d
 * #11453 - Add transform matrices and part matcher to identity op
 * #11410 - add E2E tests with cascader wo striping
 * #11288 - Expose compute cycle annotations to TIR lowering
 * #10959 - Add a pass to reorder copy and compute nodes
 * #10509 - Add various options to the cascader
 * #11263 - Adding a option to enable striping
 * #10251 - Add support for conv2d running on two cores on U65
 * #10862 - Integrate the cascader
 * #10344 - Integrate rolling buffers in Arm(R) Ethos(TM)-U
 * #10824 - Some housekeeping in the test_ethosu folder
 * #10763 - Tweak a layout transform matrix
 * #10725 - Add a pass to move allocate nodes to the outer scope
 * #10695 - Determine block configs using the cascader
 * #10599 - Refactor Relay to TIR hook
 * #10508 - Improve cascader memory transfer estimates
 * #10345 - Add support for TFLite FULLY_CONNECTED
 * #10254 - Introduce a pass to remove redundant identity operations
 * #10062 - [5] Convert Proposals to te.Schedules
 * #9959 - [4] Add the cascader Proposal generator
 * #10022 - enable USMP
 * #10127 - Add support for LeakyReLU
 * #10004 - Add FreeRTOS variant of NPU demo
 * #10060 - Refactor type inference data type checks
 * #9960 - Add support for pack and unpack
 * #10143 - Fix layout assignment in layout optimizer pass
 * #9890 - [3] Plan generation for the cascader
 * #9855 - Add support for transpose convolution
 * #9841 - Add support for nearest neighbor and bilinear upsampling
 * #9951 - Removing constant args from PrimFunc
 * #9929 - Refactor base address determination to codegen
 * #9910 - Add support for requantize
 * #9831 - Move optimization passes to be a module pass and ensure they are 
running
 * #9785 - [2d] Add more Part matchers to cascader
 * #9778 - [2c] Add performance modelling to cascader
 * #9471 - [2b] Create CascaderGraphs from TE graphs
 * #9469 - [2a] Add CascaderGraph for cascading analysis
 * #9621 - Add support for SPLIT and SPLIT_V
 * #9508 - Update Conv2D Tests to Use TF API to Gen Test Cases
 * #9627 - Add support for SIGMOID
 * #9589 - Add support for TFLite concatenate
 * #9623 - Refactor codegen tests
 * #9561 - Add NHWC -> NHCWB16 layout transformation pass
 * #9576 - Mean legalization support
 * #9597 - Move the compilation to use Target Hooks.
 * #9458 - [1] Add affine analysis structures for the cascader
 * #9547 - Add the infrastructure for lookup table and TANH
 * #9521 - Support binary elementwise with non-4D inputs
 * #9560 - Fix incorrectly calculated stride when converting NHWC to NHCWB16
 * #9530 - Add unary elementwise operator infrastructure with ABS
 * #9514 - Adding rounding mode attribute to operators
 * #9515 - Allow constants to be given as input to an operator

### microTVM
 * #11250 - [ARM] Add Relay tests for conv2d registered schedules
 * #11232 - [rpc] Implemented rpc logging
 * #11044 - Add support for host-driven AoT Executor
 * #11043 - Better version handling for Arduino
 * #10555 - Enable micro tvmc tutorial testing in CI
 * #10194 - [RVM] Add scripts for automated build and testing
 * #10144 - TVMCon 2021 Zephyr Demo with CMSIS-NN
 * #10024 - [tvmc] Add TVMC Micro tutorial for Zephyr
 * #9684 - Fix zephye/test_zephyr_armv7m test
 * #9584 - [TVMC] Add TVMC test for Arduino and Zephyr
 * #9526 - Add minimal forwarding RPC server for host driven python execution 
on Hexagon
 * Zephyr support - #11362, #10138

### Misc
 * #11465 - Add cooldown interval logic for the profiling functional
 * #11888 - [LLVM] Include LLVM headers in files that use them, not in 
llvm_common.h
 * #11646 - [Arith] Simplification of ceil, log2, and left_shift
 * #11464 - [MLF] Add support for multiple modules in Model Library Format
 * #11632 - [AutoTVM][Autoscheduler] Default build funcs inherit PassContext
 * #11543 - [OpenCL] Implement conv2d_winograd algorithm for Adreno
 * #11287 - [Arith] Merge surjective/non-surjective iter mapping detections
 * #11393 - Add utility to replace direct call to pytest.main
 * #11252 - [ROOFLINE] Roofline analysis over RPC
 * #11000 - [Graph Debugger] Expose way to benchmark individual nodes.
 * #10794 - bump PyTorch version to 1.11
 * #10821 - [REFACTOR] Remove legacy nnvm folder
 * #10798 - [Arith] Remove diagnostic ctx argument from DetectIterMap
 * #10567 - [Refactor] Reduced repetition in CodeGenLLVM's buffer access
 * #10455 - [AUTO_SCHEDULER] Add feature extraction directly from PrimFunc
 * #7401 - RFC: initial stab at TorchScript fallback
 * #10391 - [vulkan] Add integer dot product (4xint8, 4xuint8) tensorization 
for the vulkan SPIR-V target.
 * #10293 - [VirtualMachine] new method allowing to set one input tensor by its 
index or name
 * #10191 - Generate correct output tensor names in C Interface API
 * #9276 - Parameterize test_link_params
 * #9808 - [Rust] Update Rust bindings
 * #9553 - [PROFILING] Add ability to profile a single function_profiling
 * #9611 - [CMAKE] Automatically detect newly added source files
 * #9544 - [Target] enable -arch=sm_xx for assigning cuda target arch and 
deprecate autotvm.measure.set_cuda_target_arch api
 * Profiler - #11530, #11066
 * Docs - #10921, #11403, #10774, #10912, #9633, #9906, #9534, #9307, #9654, 
#9580
 * Android - #11241
 * ETHOSN - #11261, #10486, #10018, #9596
 * TVMC - #11012, #10962, #10722, #9817, #9529, #9229




-- 
View it on GitHub:
https://github.com/apache/tvm/releases/tag/v0.9.0.rc0
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/releases/72044...@github.com>

[apache/tvm] Pre-release v0.9.0.rc0 - Apache TVM v0.9.0.rc0

Reply via email to