hfinkel added inline comments.
================ Comment at: cfe/trunk/include/clang/Basic/LangOptions.def:220 +/// \brief FP_CONTRACT mode (on/off/fast). +ENUM_LANGOPT(DefaultFPContractMode, FPContractModeKind, 2, FPC_Off, "FP contraction type") LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment") ---------------- anemet wrote: > hfinkel wrote: > > anemet wrote: > > > hfinkel wrote: > > > > yaxunl wrote: > > > > > hfinkel wrote: > > > > > > yaxunl wrote: > > > > > > > anemet wrote: > > > > > > > > yaxunl wrote: > > > > > > > > > anemet wrote: > > > > > > > > > > yaxunl wrote: > > > > > > > > > > > This change seemed to cause some performance degradations > > > > > > > > > > > in some OpenCL applications. > > > > > > > > > > > > > > > > > > > > > > This option used to be on by default. Now it is off by > > > > > > > > > > > default. > > > > > > > > > > > > > > > > > > > > > > There are applications do separated compile/link/codegen > > > > > > > > > > > stages. In the codegen stage, clang is invoked with .bc > > > > > > > > > > > as input, therefore this option is off by default, > > > > > > > > > > > whereas it was on by default before this change. > > > > > > > > > > > > > > > > > > > > > > Is there any reason not to keep the original behavior? > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > This change seemed to cause some performance degradations > > > > > > > > > > > in some OpenCL applications. > > > > > > > > > > > > > > > > > > > > > > This option used to be on by default. Now it is off by > > > > > > > > > > > default. > > > > > > > > > > > > > > > > > > > > It's always been off. It was set to fast for CUDA which > > > > > > > > > > should still be the case. See the changes further down on > > > > > > > > > > the patch. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There are applications do separated compile/link/codegen > > > > > > > > > > > stages. In the codegen stage, clang is invoked with .bc > > > > > > > > > > > as input, therefore this option is off by default, > > > > > > > > > > > whereas it was on by default before this change. > > > > > > > > > > > > > > > > > > > > > > Is there any reason not to keep the original behavior? > > > > > > > > > > > > > > > > > > > > Sorry but I am not sure what changed, can you elaborate on > > > > > > > > > > what you're doing and how things used to work for you? > > > > > > > > > Before your change, -ffp-contract was a codegen option > > > > > > > > > defined as > > > > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > ENUM_CODEGENOPT(FPContractMode, FPContractModeKind, 2, FPC_On) > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > therefore the default value as on. After your change, it > > > > > > > > > becomes off by default. > > > > > > > > No. -ffp-contract=on was a FE option and -ffp-contract=fast was > > > > > > > > a backend option. I still don't understand your use-case. Can > > > > > > > > you include a small testcase how this used to work before? > > > > > > > -ffp-contract=on used to be a codegen option before this change. > > > > > > > In CodeGen/BackendUtil.cpp, before this change: > > > > > > > > > > > > > > ``` > > > > > > > switch (CodeGenOpts.getFPContractMode()) { > > > > > > > switch (LangOpts.getDefaultFPContractMode()) { > > > > > > > case LangOptions::FPC_Off: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Strict; > > > > > > > break; > > > > > > > case LangOptions::FPC_On: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; > > > > > > > break; > > > > > > > case LangOptions::FPC_Fast: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Fast; > > > > > > > break; > > > > > > > } > > > > > > > ``` > > > > > > > By default, -fp-contract=on, which results in > > > > > > > llvm::FPOpFusion::Standard in the backend. This options allows > > > > > > > backend to translate llvm.fmuladd to FMA machine instructions. > > > > > > > > > > > > > > After this change, it becomes: > > > > > > > > > > > > > > ``` > > > > > > > switch (LangOpts.getDefaultFPContractMode()) { > > > > > > > case LangOptions::FPC_Off: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Strict; > > > > > > > break; > > > > > > > case LangOptions::FPC_On: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Standard; > > > > > > > break; > > > > > > > case LangOptions::FPC_Fast: > > > > > > > Options.AllowFPOpFusion = llvm::FPOpFusion::Fast; > > > > > > > break; > > > > > > > } > > > > > > > ``` > > > > > > > now by default -ffp-contract=off, which results in > > > > > > > llvm::FPOpFusion::Strict in the backend. This option disables > > > > > > > backend translating llvm.fmuladd to FMA machine instructions in > > > > > > > certain situations. > > > > > > > > > > > > > > A simple lit test is as follows: > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > ; RUN: %clang_cc1 -triple amdgcn -S -o - %s| FileCheck %s > > > > > > > > > > > > > > define amdgpu_kernel void @f(double addrspace(1)* nocapture %out, > > > > > > > double %a, double %b, double %c) local_unnamed_addr #0 { > > > > > > > entry: > > > > > > > ; CHECK: v_fma_f64 > > > > > > > %0 = tail call double @llvm.fmuladd.f64(double %b, double %c, > > > > > > > double %a) > > > > > > > store double %0, double addrspace(1)* %out, align 8, !tbaa !6 > > > > > > > ret void > > > > > > > } > > > > > > > > > > > > > > declare double @llvm.fmuladd.f64(double, double, double) #1 > > > > > > > > > > > > > > attributes #0 = { nounwind > > > > > > > "correctly-rounded-divide-sqrt-fp-math"="false" > > > > > > > "disable-tail-calls"="false" "less-precise-fpmad"="false" > > > > > > > "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" > > > > > > > "no-jump-tables"="false" "no-nans-fp-math"="false" > > > > > > > "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" > > > > > > > "stack-protector-buffer-size"="8" > > > > > > > "target-features"="+fp64-fp16-denormals,-fp32-denormals" > > > > > > > "unsafe-fp-math"="false" "use-soft-float"="false" } > > > > > > > attributes #1 = { nounwind readnone } > > > > > > > > > > > > > > ``` > > > > > > > The test passes before this change and fails after this change. > > > > > > I'm missing something here. We're calling for OpenCL: > > > > > > > > > > > > Opts.setDefaultFPContractMode(LangOptions::FPC_On); > > > > > > > > > > > > which seems like it should change the result returned by > > > > > > getDefaultFPContractMode(). Why does it not? > > > > > > > > > > > The input to this test is LLVM IR, so > > > > > `Opts.setDefaultFPContractMode(LangOptions::FPC_On)` is not executed. > > > > Seems like this is another aspect of the LTO problem. All of these > > > > things need to be function attributes. This is also true to make LTO > > > > work. > > > > > > > @hfinkel, what is the actual point of un-fusing the intrinsic in the BE > > > with FPOpFusion::Strict? > > > > > > I don't know if targets use the target-independent intrinsic to implement > > > builtin support but if yes this would override the user choosing of FMA > > > by directly specifying the builtin. > > > @hfinkel, what is the actual point of un-fusing the intrinsic in the BE > > > with FPOpFusion::Strict? > > > > I think this was added based on a different conception of how frontends > > would use the intrinsic: that they'd always add it where the > > language-semantics allow, and then the backend would either never use FMAs > > (strict), use FMAs where profitable (standard), etc. > > > > > I don't know if targets use the target-independent intrinsic to implement > > > builtin support but if yes this would override the user choosing of FMA > > > by directly specifying the builtin. > > > > No target should use fmuladd to implement the builtin support for a > > target-specific intrinsic. It should use the fma intrinsic would always > > maps to an fma. > > I think this was added based on a different conception of how frontends > > would use the intrinsic: that they'd always add it where the > > language-semantics allow, and then the backend would either never use FMAs > > (strict), use FMAs where profitable (standard), etc. > > So would you support disabling un-fusion on Strict? That should solve > Yaxun's problem. > > Also my mid-term goal is to completely remove the FPOpFusion target option in > favor of the explicit representation in the IR: either the FMF or the direct > use of the intrinsic. At that point the behavior would have to change anyway. > > > No target should use fmuladd to implement the builtin support for a > > target-specific intrinsic. It should use the fma intrinsic would always > > maps to an fma. > > Could you please rephrase this, I am not sure I understand? > > So would you support disabling un-fusion on Strict? That should solve Yaxun's > problem. Why don't you just have the frontend never set 'Strict' and always set' Standard' (or 'Fast')? That seems like a reasonable option to me given how Clang generates IR. > Also my mid-term goal is to completely remove the FPOpFusion target option in > favor of the explicit representation in the IR: either the FMF or the direct > use of the intrinsic. Sounds good. > Could you please rephrase this, I am not sure I understand? Hrmm, maybe I misunderstood the question. Were you asking whether any target intrinsics, __builtin_<my target prefix>_<some fma thing>(a, b, c) would be lowered in terms of fmuladd instead of fma? Repository: rL LLVM https://reviews.llvm.org/D31167 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits