tra added a subscriber: scanon.
tra added a comment.

Things are even more interesting. -ffp-contract=fast is *not* what this change 
does. :-)

We have two places where we can fuse FP instructions -- in clang and in LLVM 
back-end.
Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=on (default) 
and DefaultFPContract=1 (which is only set for OpenCL for some reason) and 
back-end then decides whether it's profitable to emit fused operation or not. 
NVPTX does emit fmad.

Compare this to -ffp-contract=fast which actually *disables* fusing in clang 
and instead allows LLVM backend to do fusing wherever it sees fit (as opposed 
to 'fuse intrinsics only'. It may potentially fuse any suitable multiply/add 
pair, not only those vetted by front-end.

Currently there's no way to enable front-end fusing via command line, unless 
you compile OpenCL source. With this patch in place for CUDA compilation we can 
pick either no fusing, controlled fusing by front-end or more aggressive fusing 
by back-end.

Setting DefaultFPContract=1 for CUDA seems to be the least evil -- it's 
somewhat controlled in scope and gives us a way to disable fusing completely or 
make it more aggressive if it's needed.

Perhaps @scanon and @hfinkel can weigh in.


http://reviews.llvm.org/D20341



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to