tra added a subscriber: scanon. tra added a comment. Things are even more interesting. -ffp-contract=fast is *not* what this change does. :-)
We have two places where we can fuse FP instructions -- in clang and in LLVM back-end. Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=on (default) and DefaultFPContract=1 (which is only set for OpenCL for some reason) and back-end then decides whether it's profitable to emit fused operation or not. NVPTX does emit fmad. Compare this to -ffp-contract=fast which actually *disables* fusing in clang and instead allows LLVM backend to do fusing wherever it sees fit (as opposed to 'fuse intrinsics only'. It may potentially fuse any suitable multiply/add pair, not only those vetted by front-end. Currently there's no way to enable front-end fusing via command line, unless you compile OpenCL source. With this patch in place for CUDA compilation we can pick either no fusing, controlled fusing by front-end or more aggressive fusing by back-end. Setting DefaultFPContract=1 for CUDA seems to be the least evil -- it's somewhat controlled in scope and gives us a way to disable fusing completely or make it more aggressive if it's needed. Perhaps @scanon and @hfinkel can weigh in. http://reviews.llvm.org/D20341 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits