Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-19 Thread Artem Belevich via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rL270094: [CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default. (authored by tra). Changed prior to commit: http://reviews.llvm.org/D20341?vs=57541&id=57833#toc Repository: rL LLVM htt

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-19 Thread Artem Belevich via cfe-commits
tra added a subscriber: chandlerc. tra added a comment. Short version of offline discussion with @chandlerc : Default of -ffp-contract=fast for CUDA is fine. http://reviews.llvm.org/D20341 ___ cfe-commits mailing list cfe-commits@lists.llvm.org htt

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-19 Thread Justin Lebar via cfe-commits
jlebar accepted this revision. jlebar added a comment. This revision is now accepted and ready to land. Well, if the CUDA documentation says so...let's do it. :) Thanks for your patience, everyone. http://reviews.llvm.org/D20341 ___ cfe-commits m

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-18 Thread Hal Finkel via cfe-commits
hfinkel added a comment. In http://reviews.llvm.org/D20341#432586, @jlebar wrote: > > But people also don't expect IEEE compliance on GPUs > > > Is that true? Yes. > You have a lot more experience with this than I do, but my observation of > nvidia's hardware is that it's moved to add *more*

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-18 Thread Artem Belevich via cfe-commits
tra added a comment. I don't think using FMA throws away IEEE compliance. IEEE 784-2008 says: > A language standard should also define, and require implementations to > provide, attributes that allow and > disallow value-changing optimizations, separately or collectively, for a > block. Thes

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Justin Lebar via cfe-commits
jlebar added a comment. > But people also don't expect IEEE compliance on GPUs Is that true? You have a lot more experience with this than I do, but my observation of nvidia's hardware is that it's moved to add *more* IEEE compliance as it's matured. For example, older hardware didn't suppor

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra updated this revision to Diff 57541. tra added a comment. Added test case. Is there a better way to test that correct options are passed to back-end? This test resorts to checking assembly generated by back-end which is way too far away from what actually needs testing. http://reviews.llvm

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra updated this revision to Diff 57540. tra added a comment. Changed default to -ffp-contract=fast. http://reviews.llvm.org/D20341 Files: lib/Frontend/CompilerInvocation.cpp Index: lib/Frontend/CompilerInvocation.cpp === --- li

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra added a comment. OK. Consensus seems to be that -ffp-contract=fast is the way to go. I'll update the patch. I've just checked Steve's example with nvcc and indeed it fused mul+add. http://reviews.llvm.org/D20341 ___ cfe-commits mailing list cfe

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Hal Finkel via cfe-commits
hfinkel added a comment. In http://reviews.llvm.org/D20341#432525, @tra wrote: > In http://reviews.llvm.org/D20341#432494, @hfinkel wrote: > > > > > > > > > That having been said, is this change the equivalent of -ffp-contract=fast > > or -ffp-contract=on? I think it is the latter and we want

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Steve Canon via cfe-commits
scanon added a comment. `-ffp-contract=on` obeys the semantics of C's FP_CONTRACT pragma. In particular, it will not fuse: float m = x*y; float a = m + z; Whereas you probably want that to fuse for your purposes. `-ffp-contract=fast` seems more in line with your needs. http://reviews.l

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra added a comment. In http://reviews.llvm.org/D20341#432494, @hfinkel wrote: > > That having been said, is this change the equivalent of -ffp-contract=fast or > -ffp-contract=on? I think it is the latter and we want the former (i.e. where > we let the backend be as aggressive as possible

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra added a subscriber: scanon. tra added a comment. Things are even more interesting. -ffp-contract=fast is *not* what this change does. :-) We have two places where we can fuse FP instructions -- in clang and in LLVM back-end. Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=o

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Hal Finkel via cfe-commits
hfinkel added a subscriber: hfinkel. hfinkel added a comment. In http://reviews.llvm.org/D20341#432461, @jlebar wrote: > I am not sure we want this? Although it matches nvcc, it does not match our > floating-point behavior for C++ in general -- it makes us non-IEEE-whatever > compliant by defa

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Justin Lebar via cfe-commits
jlebar added a comment. I am not sure we want this? Although it matches nvcc, it does not match our floating-point behavior for C++ in general -- it makes us non-IEEE-whatever compliant by default. Although I agree that if we don't do this, lots of people are not going to pass -fp-contract=fa

[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

2016-05-17 Thread Artem Belevich via cfe-commits
tra created this revision. tra added reviewers: jlebar, jingyue. tra added a subscriber: cfe-commits. This matches default nvcc behavior and gives substantial performance boost on GPU where fmad is much cheaper compared to add+mul. http://reviews.llvm.org/D20341 Files: lib/Frontend/Compiler