This is an automated email from the ASF dual-hosted git repository.
ruihangl pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new 628b394ed7 [Docs] Add Disco distributed runtime architecture overview
(#19357)
628b394ed7 is described below
commit 628b394ed779a518e1e3aaeb0866d0884d5abadb
Author: Shushi Hong <[email protected]>
AuthorDate: Mon Apr 6 12:02:52 2026 -0400
[Docs] Add Disco distributed runtime architecture overview (#19357)
Add Disco distributed runtime architecture overview
---
docs/arch/index.rst | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/docs/arch/index.rst b/docs/arch/index.rst
index 90c0b83c26..f46e374724 100644
--- a/docs/arch/index.rst
+++ b/docs/arch/index.rst
@@ -248,6 +248,31 @@ On the Python side, users interact with the VM through
``relax.VirtualMachine(ex
which provides both a direct invocation interface and a stateful set-input /
invoke / get-output
interface suitable for RPC-based remote execution.
+Disco: Distributed Runtime
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Disco is TVM's distributed runtime for executing models across multiple
devices. When a model is
+too large to fit on a single GPU, the ``relax.distributed`` module annotates
how tensors should be
+partitioned and placed across a mesh of devices at compile time. Disco then
takes over at runtime:
+it manages a group of workers, dispatches the compiled program to all of them
simultaneously, and
+coordinates inter-device communication through collective operations such as
allreduce, allgather,
+broadcast, and scatter.
+
+The central abstraction is the ``Session``, which owns the workers and exposes
a SPMD-style
+programming interface. Every object that lives on workers is represented by a
``DRef`` — a
+distributed reference that maps to a concrete value on each worker. When the
controller invokes a
+``DPackedFunc`` through the session, all workers execute the same PackedFunc
call synchronously, each
+operating on its own local shard. Compiled VM modules can be loaded into a
session as ``DModule``
+objects and called in the same fashion. The session also provides collective
primitives backed by
+NCCL or RCCL, so that workers can exchange partial results without routing
data through the
+controller.
+
+Three session backends cover different deployment topologies.
``ThreadedSession`` spawns workers as
+threads within a single process — this is the most common choice for multi-GPU
inference on a
+single machine. ``ProcessSession`` launches workers as separate OS processes
connected by pipes,
+providing stronger isolation. ``SocketSession`` extends the model to
multi-node clusters by
+connecting workers across machines via TCP sockets.
+
tvm/node
--------
The node module adds additional features on top of the `runtime::Object` for
IR data structures.