LudovicoYIN opened a new pull request, #18671: URL: https://github.com/apache/tvm/pull/18671
### Motivation InjectPTXLDG32 rewrites BufferStore when encountering if_then_else, but it only initializes temporary buffers when an Allocate node exists. For functions without Allocate, this leads to uninitialized buffers and a hard segfault during compilation. In addition, the PTX-only pass can run on CPU/LLVM targets when tir.ptx_ldg32=1, injecting PTX intrinsics that are invalid for non-CUDA codegen. This PR ensures temporary buffers are created even when no Allocate exists, and skips InjectPTXLDG32 on non-CUDA targets, preventing segfaults and invalid PTX intrinsics on CPU. ### Changes - Ensure temp buffers are created when the rewrite path is taken without Allocate - Insert allocations at the function level when needed - Guard InjectPTXLDG32 so it only runs on CUDA targets - Add tests for CUDA (insertion) and CPU (skip) behavior ### Testing test_tir_transform_inject_ptx_ldg32.py ### Fixes - https://github.com/apache/tvm/issues/18612 - https://github.com/apache/tvm/issues/18617 - https://github.com/apache/tvm/issues/18599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
