jchlanda added a comment. In D118977#3297465 <https://reviews.llvm.org/D118977#3297465>, @tra wrote:
>> They all require PTX 7.0, SM_80. > > According to > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma > only `fma.relu` and `bf16*` variants require ptx70/sm80: > > PTX ISA Notes > Introduced in PTX ISA version 4.2. > > fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} introduced in PTX ISA > version 7.0. > > Target ISA Notes > Requires sm_53 or higher. > > fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} require sm_80 or higher. My bad, sorry. Fixed now. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118977/new/ https://reviews.llvm.org/D118977 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits