This is an automated email from the ASF dual-hosted git repository.

ruihangl pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new 628b394ed7 [Docs] Add Disco distributed runtime architecture overview 
(#19357)
628b394ed7 is described below

commit 628b394ed779a518e1e3aaeb0866d0884d5abadb
Author: Shushi Hong <[email protected]>
AuthorDate: Mon Apr 6 12:02:52 2026 -0400

    [Docs] Add Disco distributed runtime architecture overview (#19357)
    
    Add Disco distributed runtime architecture overview
---
 docs/arch/index.rst | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/docs/arch/index.rst b/docs/arch/index.rst
index 90c0b83c26..f46e374724 100644
--- a/docs/arch/index.rst
+++ b/docs/arch/index.rst
@@ -248,6 +248,31 @@ On the Python side, users interact with the VM through 
``relax.VirtualMachine(ex
 which provides both a direct invocation interface and a stateful set-input / 
invoke / get-output
 interface suitable for RPC-based remote execution.
 
+Disco: Distributed Runtime
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Disco is TVM's distributed runtime for executing models across multiple 
devices. When a model is
+too large to fit on a single GPU, the ``relax.distributed`` module annotates 
how tensors should be
+partitioned and placed across a mesh of devices at compile time. Disco then 
takes over at runtime:
+it manages a group of workers, dispatches the compiled program to all of them 
simultaneously, and
+coordinates inter-device communication through collective operations such as 
allreduce, allgather,
+broadcast, and scatter.
+
+The central abstraction is the ``Session``, which owns the workers and exposes 
a SPMD-style
+programming interface. Every object that lives on workers is represented by a 
``DRef`` — a
+distributed reference that maps to a concrete value on each worker. When the 
controller invokes a
+``DPackedFunc`` through the session, all workers execute the same PackedFunc 
call synchronously, each
+operating on its own local shard. Compiled VM modules can be loaded into a 
session as ``DModule``
+objects and called in the same fashion. The session also provides collective 
primitives backed by
+NCCL or RCCL, so that workers can exchange partial results without routing 
data through the
+controller.
+
+Three session backends cover different deployment topologies. 
``ThreadedSession`` spawns workers as
+threads within a single process — this is the most common choice for multi-GPU 
inference on a
+single machine. ``ProcessSession`` launches workers as separate OS processes 
connected by pipes,
+providing stronger isolation. ``SocketSession`` extends the model to 
multi-node clusters by
+connecting workers across machines via TCP sockets.
+
 tvm/node
 --------
 The node module adds additional features on top of the `runtime::Object` for 
IR data structures.

Reply via email to