This is an automated email from the ASF dual-hosted git repository.
tlopex pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new b9ced1a078 [BugFix][Relax] Select target-specific pipeline in
tvm.compile when GPU target is provided (#19384)
b9ced1a078 is described below
commit b9ced1a07855c391ab31ca9277dbfe8f5ad49511
Author: Soowon Jeong <[email protected]>
AuthorDate: Sun Apr 12 05:04:30 2026 +0900
[BugFix][Relax] Select target-specific pipeline in tvm.compile when GPU
target is provided (#19384)
## Problem
`relax.build()` (exposed as `tvm.compile`) with
`relax_pipeline="default"` always
resolved to `default_build_pipeline`, regardless of the target.
`default_build_pipeline` does not include DLight scheduling — it is a
target-agnostic lowering pipeline. On CUDA, this left TIR functions
generated
from ops like `Clip`/`ReLU6` without thread bindings, causing
`VerifyMemory` to fail:
```
Memory verification failed: Variable `X` is directly accessed by host memory
(it is not contained in a thread environment or in the function arguments).
Did you forget to bind?
```
## Fix
When `relax_pipeline="default"` and the target is a GPU target
(`"gpu" in target.keys`), use `relax.get_default_pipeline(target)` which
includes target-aware DLight scheduling. Fall back to
`default_build_pipeline`
if no target-specific pipeline is registered.
CPU targets (`llvm`, `c`) continue to use `default_build_pipeline`
unchanged.
The CPU-specific pipeline adds `FuseOps`/`FuseTIR`/`FoldConstant` on
top,
which can DCE `call_pure_packed` calls whose results are unused —
correct
per pure semantics, but a separate concern from this fix.
---
python/tvm/relax/backend/cpu_generic/pipeline.py | 5 ++++-
python/tvm/relax/vm_build.py | 16 +++++++++++++++-
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/python/tvm/relax/backend/cpu_generic/pipeline.py
b/python/tvm/relax/backend/cpu_generic/pipeline.py
index dc078ee25d..d0b819cea7 100644
--- a/python/tvm/relax/backend/cpu_generic/pipeline.py
+++ b/python/tvm/relax/backend/cpu_generic/pipeline.py
@@ -22,7 +22,10 @@ from tvm import relax
def library_dispatch_passes(target: tvm.target.Target): # pylint:
disable=unused-argument
"""The default library dispatch passes for CPU backend."""
- return []
+ return [
+ relax.backend.DispatchSampling(),
+ relax.backend.DispatchSortScan(),
+ ]
def legalize_passes(target: tvm.target.Target): # pylint:
disable=unused-argument
diff --git a/python/tvm/relax/vm_build.py b/python/tvm/relax/vm_build.py
index 68592d67f8..adc0f7ad83 100644
--- a/python/tvm/relax/vm_build.py
+++ b/python/tvm/relax/vm_build.py
@@ -248,7 +248,21 @@ def build(
if relax_pipeline is not None:
if isinstance(relax_pipeline, str):
- relax_pipeline = relax.get_pipeline(relax_pipeline)
+ # For GPU targets, prefer the target-specific pipeline which
+ # includes DLight scheduling. Without it, TIR functions generated
+ # from ops like Clip/ReLU6 lack thread bindings and fail
+ # VerifyMemory. CPU targets continue to use the generic pipeline
+ # since the CPU-specific pipeline applies fusion passes that can
+ # incorrectly remove call_pure_packed calls whose results are
+ # unused but whose side effects are relied upon.
+ _is_gpu = target is not None and "gpu" in target.keys
+ if relax_pipeline == "default" and _is_gpu:
+ try:
+ relax_pipeline = relax.get_default_pipeline(target)
+ except (ValueError, AttributeError):
+ relax_pipeline = relax.get_pipeline(relax_pipeline)
+ else:
+ relax_pipeline = relax.get_pipeline(relax_pipeline)
if target is None:
mod = relax_pipeline(mod)
else: