hliao added a comment. This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment F14026286: sample.tar.xz <https://reviews.llvm.org/F14026286> includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through `LDC` instruction in SASS instead of `MOV` due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?
BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91590/new/ https://reviews.llvm.org/D91590 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits