This is an automated email from the ASF dual-hosted git repository.

ruihangl pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new 1d02ac3f2c [Docs] Add Relax VM architecture documentation (#19389)
1d02ac3f2c is described below

commit 1d02ac3f2c305f9d5b83fb7094492cdd65dad8cc
Author: Shushi Hong <[email protected]>
AuthorDate: Sun Apr 12 11:46:25 2026 -0400

    [Docs] Add Relax VM architecture documentation (#19389)
    
    - Add a dedicated architecture document (`docs/arch/relax_vm.rst`) for
    the Relax Virtual
    Machine, covering the compilation pipeline (Relax IR → VMCodeGen →
    VMExecutable), the
    4-opcode instruction set and encoding format, the register-based
    execution model (VMFrame,
    RunLoop, VMClosure dispatch), built-in operations, serialization, and
    the Python-level
    interface (direct invocation, stateful API, profiling, instrumentation).
    - Refactor the inline VM section in `docs/arch/index.rst` into a brief
    summary with a
    cross-reference to the new page, and add `relax_vm` to the toctree.
---
 docs/arch/index.rst    |  29 ++--
 docs/arch/relax_vm.rst | 441 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 453 insertions(+), 17 deletions(-)

diff --git a/docs/arch/index.rst b/docs/arch/index.rst
index 17b26a5ddc..9479d22948 100644
--- a/docs/arch/index.rst
+++ b/docs/arch/index.rst
@@ -243,23 +243,18 @@ Relax Virtual Machine
 
 Relax defines *what* to compute — it is a graph-level IR that describes the 
operators and dataflow
 of a model. The Relax Virtual Machine (VM) handles *how* to run it — it is the 
runtime component
-that executes the compiled result. During compilation, ``tvm.compile()`` 
invokes ``VMCodeGen`` to
-translate Relax functions into a compact bytecode representation. The 
resulting ``VMExecutable``
-bundles the bytecode together with a constant pool and per-function metadata, 
and can be serialized
-to disk for deployment.
-
-The VM uses a register-based interpreter with an intentionally minimal 
instruction set — only four
-opcodes: ``Call``, ``Ret``, ``Goto``, and ``If``. The VM itself performs no 
mathematical computation;
-it only orchestrates control flow (function calls, conditional branches, 
loops). The actual
-compute-intensive work — matrix multiplications, convolutions, and other 
operators — is carried out
-by TIR functions that have been compiled down to native GPU/CPU kernels, or by 
external libraries
-such as cuBLAS and cuDNN. The VM dispatches to them through the PackedFunc 
mechanism. Internally the
-VM recognizes three function kinds: *PackedFunc* for external C/C++ functions, 
*VMFunc* for
-bytecode-interpreted Relax functions, and *VMTIRFunc* for compiled TIR kernels.
-
-On the Python side, users interact with the VM through 
``relax.VirtualMachine(executable, device)``,
-which provides both a direct invocation interface and a stateful set-input / 
invoke / get-output
-interface suitable for RPC-based remote execution.
+that executes the compiled result. The VM uses a register-based interpreter 
with only four opcodes
+(``Call``, ``Ret``, ``Goto``, ``If``) and performs no mathematical computation 
itself — it
+orchestrates control flow while dispatching actual work to compiled TIR 
kernels or external
+libraries.
+
+See :ref:`relax-vm-arch` for the full architecture documentation, including 
the compilation
+pipeline, instruction set details, execution model, and Python interface.
+
+.. toctree::
+   :maxdepth: 1
+
+   relax_vm
 
 Disco: Distributed Runtime
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/arch/relax_vm.rst b/docs/arch/relax_vm.rst
new file mode 100644
index 0000000000..dddee57b36
--- /dev/null
+++ b/docs/arch/relax_vm.rst
@@ -0,0 +1,441 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+.. _relax-vm-arch:
+
+Relax Virtual Machine
+=====================
+
+Relax defines *what* to compute — it is a graph-level IR that describes the 
operators and dataflow
+of a model. The Relax Virtual Machine (VM) handles *how* to run it — it is the 
runtime component
+that executes the compiled result. This document explains the VM architecture 
in detail, covering
+the compilation pipeline from Relax IR to bytecode, the instruction set, the 
execution model, and
+the Python-level user interface.
+
+Overview
+--------
+
+The end-to-end flow from model to execution is:
+
+1. **Relax IR** — a high-level computational graph (``relax.Function`` inside 
an ``IRModule``).
+2. **Compilation** — ``tvm.compile()`` applies the Relax transformation 
pipeline, then invokes
+   ``VMCodeGen`` to translate each Relax function into bytecode instructions.
+3. **Linking** — TIR functions are compiled to native kernels (via LLVM, CUDA, 
etc.); the bytecode,
+   constant pool, and compiled kernels are packaged together into a 
``VMExecutable``.
+4. **Execution** — at runtime, a ``VirtualMachine`` loads the executable, 
initializes devices and
+   memory allocators, and runs the bytecode.
+
+.. code-block:: text
+
+   IRModule (Relax + TIR)
+        │
+        ▼  relax_pipeline (FuseOps, LegalizeOps, ...)
+   IRModule (optimized)
+        │
+        ▼  VMCodeGen
+   ExecBuilder (bytecode) + IRModule (TIR only)
+        │                        │
+        │                        ▼  tirx.build()
+        │                   runtime.Module (native kernels)
+        │                        │
+        ▼  VMLink               ▼
+   VMExecutable ◄───────── linked together
+        │
+        ▼  VirtualMachine(exec, device)
+   Runtime execution
+
+
+Compilation: From Relax IR to Bytecode
+--------------------------------------
+
+Build entry point
+~~~~~~~~~~~~~~~~~
+
+The main entry point is ``tvm.compile()`` (which delegates to 
``relax.build()`` in
+``python/tvm/relax/vm_build.py``):
+
+.. code-block:: python
+
+   import tvm
+   from tvm import relax
+
+   @tvm.script.ir_module
+   class MyModule:
+       @R.function
+       def main(x: R.Tensor((3, 4), "float32")):
+           return R.add(x, x)
+
+   target = tvm.target.Target("llvm")
+   ex = tvm.compile(MyModule, target)
+
+Internally, ``relax.build()`` performs these steps:
+
+1. Apply the **Relax pipeline** (``relax.get_pipeline("default")``), which 
includes operator
+   legalization, fusion, buffer planning, and other graph-level passes.
+2. Create an ``ExecBuilder`` and run **VMCodeGen** 
(``src/relax/backend/vm/codegen_vm.cc``),
+   which walks each ``relax.Function`` and emits bytecode instructions. The 
Relax functions are
+   removed from the IRModule; only TIR functions remain.
+3. Compile the remaining TIR functions to native code via ``tirx.build()``.
+4. **Link** the bytecode executable with the compiled native module using 
``VMLink``, producing
+   a ``VMExecutable``.
+
+Two execution modes are supported:
+
+- ``exec_mode="bytecode"`` (default): Relax functions are interpreted by the 
VM's bytecode
+  dispatch loop.
+- ``exec_mode="compiled"``: Relax functions are compiled into TIR functions 
(``VMTIRCodeGen``)
+  that directly manipulate the register file, bypassing the interpreter loop. 
This avoids
+  dispatch overhead but produces more code.
+
+Bytecode generation
+~~~~~~~~~~~~~~~~~~~
+
+The ``CodeGenVM`` class (``src/relax/backend/vm/codegen_vm.cc``) is an 
``ExprFunctor`` that visits
+each Relax expression and emits instructions through the ``ExecBuilder``:
+
+- Each ``relax.Var`` is mapped to a register.
+- Function parameters occupy registers 0 through N-1.
+- Each binding in a ``SeqExpr`` generates one or more instructions; the result 
is stored in a
+  new register.
+- Function calls (``R.call_tir``, ``R.call_packed``, operator calls) become 
``Call`` instructions.
+- Conditional expressions (``relax.If``, written as Python ``if`` in 
TVMScript) become an ``If``
+  instruction followed by ``Goto`` to skip branches.
+- The function body ends with a ``Ret`` instruction.
+
+
+Instruction Set
+---------------
+
+The VM uses a **register-based** architecture with an intentionally minimal 
instruction set.
+There are only four opcodes:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 15 30 55
+
+   * - Opcode
+     - Fields
+     - Semantics
+   * - ``Call``
+     - ``dst``, ``func_idx``, ``num_args``, ``args[]``
+     - Call function ``func_idx`` with the given arguments; store the result 
in register ``dst``.
+   * - ``Ret``
+     - ``result``
+     - Return the value in register ``result`` to the caller.
+   * - ``Goto``
+     - ``pc_offset``
+     - Jump forward or backward by ``pc_offset`` instructions.
+   * - ``If``
+     - ``cond``, ``false_offset``
+     - If register ``cond`` is nonzero, fall through (pc++); otherwise jump by 
``false_offset``.
+
+The VM itself performs **no mathematical computation**. All actual work — 
matrix multiplications,
+convolutions, elementwise operations — is carried out by compiled TIR kernels 
or external
+libraries (cuBLAS, cuDNN, etc.), dispatched through ``Call`` instructions.
+
+Instruction encoding
+~~~~~~~~~~~~~~~~~~~~
+
+Each instruction argument (``Instruction::Arg``) is a 64-bit word encoded as:
+
+- **Bits [63:56]** — ``ArgKind`` (8 bits): ``kRegister`` (0), ``kImmediate`` 
(1), ``kConstIdx`` (2),
+  or ``kFuncIdx`` (3).
+- **Bits [55:0]** — value (56 bits, sign-extended).
+
+Two special register values exist:
+
+- ``kVoidRegister``: indicates "no destination" (the return value is 
discarded).
+- ``kVMRegister``: refers to the VM context pointer itself, passed as the 
first argument to
+  closures.
+
+The instruction stream is stored as a flat ``vector<ExecWord>`` 
(``instr_data``) with an offset
+table (``instr_offset``) for random access.
+
+
+Executable
+----------
+
+A ``VMExecutable`` (``include/tvm/runtime/vm/executable.h``) bundles 
everything needed for
+execution:
+
+- **Function table** (``func_table``): a ``vector<VMFuncInfo>`` describing 
every function. Each
+  entry records the function's kind, name, instruction range (``start_instr`` 
to ``end_instr``),
+  number of arguments, register file size, and parameter names.
+- **Constant pool** (``constants``): model weights, shape tuples, and other 
compile-time constants.
+- **Bytecode** (``instr_data`` + ``instr_offset``): the instruction stream.
+- **Imported modules**: the compiled TIR kernels and external libraries.
+
+Function kinds
+~~~~~~~~~~~~~~
+
+The VM recognizes three function kinds (``VMFuncInfo::FuncKind``):
+
+.. list-table::
+   :header-rows: 1
+   :widths: 20 80
+
+   * - Kind
+     - Description
+   * - ``kPackedFunc``
+     - An external C/C++ function looked up from imported modules or the 
global PackedFunc
+       registry. Examples: ``vm.builtin.alloc_shape_heap``, 
``vm.builtin.match_shape``.
+   * - ``kVMFunc``
+     - A bytecode-interpreted Relax function. The VM interprets its 
instructions in ``RunLoop()``.
+   * - ``kVMTIRFunc``
+     - A Relax function compiled to a TIR function (``exec_mode="compiled"``). 
Found in
+       imports under the name ``__vmtir__<func_name>``. Called directly with 
register file
+       pointers, bypassing the interpreter loop.
+
+Serialization
+~~~~~~~~~~~~~
+
+The executable supports binary serialization for deployment:
+
+.. code-block:: python
+
+   # Save
+   ex.export_library("model.so")
+
+   # Load
+   loaded = tvm.runtime.load_module("model.so")
+   vm = relax.VirtualMachine(loaded, tvm.cuda())
+
+The binary format includes a magic number (``0xD225DE2F4214151E``), a version 
string
+(currently ``"0.14"``), followed by four sections: globals (the function 
table), memory scopes,
+constant pool, and bytecode. ``AsText()`` and ``AsPython()`` provide 
human-readable representations
+for debugging.
+
+
+Runtime Execution
+-----------------
+
+VM initialization
+~~~~~~~~~~~~~~~~~
+
+At runtime, a ``VirtualMachine`` is created and initialized:
+
+.. code-block:: python
+
+   from tvm.relax import VirtualMachine
+
+   vm = VirtualMachine(exec_module, tvm.cuda())
+
+Under the hood:
+
+1. **LoadExecutable**: the bytecode and metadata are loaded from the 
``VMExecutable``.
+2. **Init**: devices and memory allocators are set up. Each device gets an 
``Allocator``
+   (either ``NAIVE_ALLOCATOR`` or ``POOLED_ALLOCATOR``, defaulting to pooled). 
A CPU device
+   is always added for shape computations.
+3. **InitFuncPool**: the function pool is populated — ``kPackedFunc`` entries 
are resolved from
+   imports or the global registry; ``kVMFunc`` and ``kVMTIRFunc`` entries are 
wrapped in
+   ``VMClosure`` objects.
+4. **Constant pool**: model constants are loaded and optionally transferred to 
the target device.
+
+The bytecode dispatch loop
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a ``kVMFunc`` is invoked, the VM enters ``InvokeBytecode()``:
+
+1. A new ``VMFrame`` is pushed onto the call stack. Each frame contains:
+
+   - A **register file** (``vector<ffi::Any>``) — type-erased slots that can 
hold tensors,
+     shapes, closures, or any TVM object. The size is determined at compile 
time
+     (``VMFuncInfo::register_file_size``).
+   - The **return program counter** — where to resume after the function 
returns.
+   - The **caller's return register** — which register in the parent frame 
receives the result.
+
+2. Function arguments are written to registers 0..N-1.
+3. The program counter (``pc_``) is set to the function's ``start_instr``.
+4. ``RunLoop()`` executes instructions until a ``Ret`` is encountered:
+
+   - **Call**: resolve arguments (from registers, immediates, constant pool, 
or function pool),
+     invoke the target function via ``InvokeClosurePacked()``, store the 
result in ``dst``.
+   - **Ret**: read the return value from the specified register, write the 
result to the
+     caller's return register, and return from ``RunLoop()`` (the frame is 
popped by an RAII
+     guard when ``InvokeBytecode()`` exits).
+   - **Goto**: adjust ``pc_`` by the offset.
+   - **If**: check the condition register; if nonzero, fall through; otherwise 
jump by
+     ``false_offset``.
+
+The dispatch loop is implemented in ``src/runtime/vm/vm.cc`` 
(``VirtualMachineImpl::RunLoop``).
+
+.. code-block:: text
+
+   Frame Stack              Register File (per frame)
+   ┌─────────────┐          ┌────┬────┬────┬─────┬────┐
+   │  Frame 2    │ ───────► │ R0 │ R1 │ R2 │ ... │ Rn │
+   ├─────────────┤          └────┴────┴────┴─────┴────┘
+   │  Frame 1    │ ───────► [register file]
+   ├─────────────┤
+   │  Frame 0    │ ───────► [register file]
+   └─────────────┘
+
+VMClosure and function dispatch
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Functions in the VM are stored in a ``func_pool_`` indexed by function table 
position.
+``kVMFunc`` and ``kVMTIRFunc`` entries are wrapped as ``VMClosure`` objects, 
while ``kPackedFunc``
+entries are stored as plain ``ffi::Function``. A ``VMClosure`` stores:
+
+- ``func_name``: the function's string name.
+- ``impl``: a ``ffi::Function`` that takes the VM context pointer as its first 
argument, followed
+  by the actual parameters.
+
+When the VM encounters a ``Call`` instruction, it looks up the function in 
``func_pool_`` by
+index and dispatches via ``InvokeClosurePacked()``. If the target is a 
``VMClosure``, the VM
+pointer is prepended to the arguments and ``impl`` is invoked. If it is a plain
+``ffi::Function``, it is called directly.
+
+``VMClosure::BindLastArgs`` enables partial application — it creates a new 
function with
+some arguments pre-bound at the end, useful for implementing captured closures 
in Relax.
+
+Built-in operations
+~~~~~~~~~~~~~~~~~~~
+
+The VM relies on several built-in PackedFuncs (registered in 
``src/runtime/vm/builtin.cc``)
+for runtime support:
+
+- ``vm.builtin.alloc_shape_heap``: allocate workspace for symbolic shape 
computations.
+- ``vm.builtin.match_shape``: validate tensor shapes against expected patterns 
at runtime,
+  supporting assertions (``kAssertEqualToImm``, ``kAssertEqualToLoad``), 
storing symbolic
+  dimensions to the shape heap (``kStoreToHeap``), or no-ops (``kNoOp``).
+- ``vm.builtin.make_shape``: construct shape tuples from immediates or 
heap-loaded values.
+- ``vm.builtin.match_prim_value``: validate primitive values (e.g., integers) 
against expected
+  patterns.
+- ``vm.builtin.copy``: copy a value into a register. Used in several codegen 
scenarios:
+  materializing non-register arguments (immediates, constants) into registers, 
ensuring each
+  variable binding gets its own register, and merging results from if/else 
branches.
+
+
+Python Interface
+----------------
+
+Users interact with the VM through ``tvm.relax.VirtualMachine``:
+
+.. code-block:: python
+
+   import tvm
+   from tvm import relax
+   import numpy as np
+
+   # Compile
+   ex = tvm.compile(MyModule, target="llvm")
+
+   # Create VM
+   vm = relax.VirtualMachine(ex, tvm.cpu())
+
+   # Direct invocation
+   inp = tvm.runtime.tensor(np.random.rand(3, 4).astype("float32"))
+   result = vm["main"](inp)
+
+   # Stateful interface (useful for RPC)
+   vm.set_input("main", inp)
+   vm.invoke_stateful("main")
+   output = vm.get_outputs("main")
+
+Key methods:
+
+- ``vm["func_name"](*args)`` — direct invocation, returns the result.
+- ``vm.set_input()`` / ``vm.invoke_stateful()`` / ``vm.get_outputs()`` — 
stateful interface
+  that avoids sending output over the wire, useful for RPC-based remote 
execution.
+- ``vm.save_function(func_name, saved_name, *args)`` — pre-bind arguments for 
repeated calls,
+  reducing dictionary lookup overhead during benchmarking.
+- ``vm.time_evaluator(func_name, dev)`` — returns a timing function following 
the same convention
+  as ``tvm.runtime.Module.time_evaluator``.
+- ``vm.profile(func_name, *args)`` — returns a per-operator profiling report 
(requires
+  ``profile=True`` at VM construction).
+- ``vm.set_instrument(func)`` — register an instrumentation callback that is 
invoked before/after
+  every ``Call`` instruction. The callback can return 
``VMInstrumentReturnKind.SKIP_RUN`` to
+  skip the call.
+
+Profiling and instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The VM supports two levels of observability:
+
+**Profiling** via ``VirtualMachine(exec, dev, profile=True)``:
+
+.. code-block:: python
+
+   vm = relax.VirtualMachine(ex, tvm.cuda(), profile=True)
+   report = vm.profile("main", inp)
+   print(report)
+
+This produces a ``tvm.runtime.profiling.Report`` with per-operator timing 
breakdown.
+
+**Instrumentation** via ``set_instrument()``:
+
+.. code-block:: python
+
+   def my_instrument(func, func_symbol, before_run, ret_value, *args):
+       if before_run:
+           print(f"About to call: {func_symbol}")
+       return VMInstrumentReturnKind.NO_OP
+
+   vm.set_instrument(my_instrument)
+   vm["main"](inp)
+
+The instrument function is called before and after every ``Call`` instruction, 
receiving the
+function object, its symbol name, a flag indicating before/after, the return 
value (only valid
+after), and all arguments.
+
+
+Inspecting Bytecode
+-------------------
+
+The executable provides text and Python representations of the compiled 
bytecode:
+
+.. code-block:: python
+
+   ex = tvm.compile(MyModule, target="llvm")
+   print(ex.as_text())    # Human-readable instruction listing
+   print(ex.as_python())  # Equivalent Python program
+   print(ex.stats())      # Summary statistics
+
+These are invaluable for debugging compilation issues — they show exactly 
which functions
+are called, in what order, and how registers are used.
+
+
+Source Code Map
+---------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 45 55
+
+   * - Path
+     - Contents
+   * - ``include/tvm/runtime/vm/bytecode.h``
+     - Instruction, Opcode, and Arg definitions
+   * - ``include/tvm/runtime/vm/executable.h``
+     - VMExecutable, VMFuncInfo, serialization
+   * - ``include/tvm/runtime/vm/vm.h``
+     - VirtualMachine base class, VMClosure
+   * - ``src/runtime/vm/vm.cc``
+     - VirtualMachineImpl, RunLoop, InvokeBytecode
+   * - ``src/runtime/vm/executable.cc``
+     - Serialization/deserialization, text output
+   * - ``src/runtime/vm/builtin.cc``
+     - Built-in operations (shape matching, allocation)
+   * - ``src/relax/backend/vm/codegen_vm.cc``
+     - CodeGenVM: Relax IR → bytecode
+   * - ``src/relax/backend/vm/codegen_vm_tir.cc``
+     - VMTIRCodeGen: Relax IR → compiled TIR
+   * - ``python/tvm/runtime/vm.py``
+     - Python VirtualMachine wrapper
+   * - ``python/tvm/relax/vm_build.py``
+     - ``relax.build()`` and VMExecutable Python class

Reply via email to