This revision was automatically updated to reflect the committed changes.
Closed by commit rL270094: [CUDA] Enable fusing FP ops (-ffp-contract=fast) for
CUDA by default. (authored by tra).
Changed prior to commit:
http://reviews.llvm.org/D20341?vs=57541&id=57833#toc
Repository:
rL LLVM
htt
tra added a subscriber: chandlerc.
tra added a comment.
Short version of offline discussion with @chandlerc : Default of
-ffp-contract=fast for CUDA is fine.
http://reviews.llvm.org/D20341
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
htt
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.
Well, if the CUDA documentation says so...let's do it. :) Thanks for your
patience, everyone.
http://reviews.llvm.org/D20341
___
cfe-commits m
hfinkel added a comment.
In http://reviews.llvm.org/D20341#432586, @jlebar wrote:
> > But people also don't expect IEEE compliance on GPUs
>
>
> Is that true?
Yes.
> You have a lot more experience with this than I do, but my observation of
> nvidia's hardware is that it's moved to add *more*
tra added a comment.
I don't think using FMA throws away IEEE compliance.
IEEE 784-2008 says:
> A language standard should also define, and require implementations to
> provide, attributes that allow and
> disallow value-changing optimizations, separately or collectively, for a
> block. Thes
jlebar added a comment.
> But people also don't expect IEEE compliance on GPUs
Is that true? You have a lot more experience with this than I do, but my
observation of nvidia's hardware is that it's moved to add *more* IEEE
compliance as it's matured. For example, older hardware didn't suppor
tra updated this revision to Diff 57541.
tra added a comment.
Added test case.
Is there a better way to test that correct options are passed to back-end?
This test resorts to checking assembly generated by back-end which is way too
far away from what actually needs testing.
http://reviews.llvm
tra updated this revision to Diff 57540.
tra added a comment.
Changed default to -ffp-contract=fast.
http://reviews.llvm.org/D20341
Files:
lib/Frontend/CompilerInvocation.cpp
Index: lib/Frontend/CompilerInvocation.cpp
===
--- li
tra added a comment.
OK. Consensus seems to be that -ffp-contract=fast is the way to go. I'll update
the patch.
I've just checked Steve's example with nvcc and indeed it fused mul+add.
http://reviews.llvm.org/D20341
___
cfe-commits mailing list
cfe
hfinkel added a comment.
In http://reviews.llvm.org/D20341#432525, @tra wrote:
> In http://reviews.llvm.org/D20341#432494, @hfinkel wrote:
>
> >
>
>
>
>
> > That having been said, is this change the equivalent of -ffp-contract=fast
> > or -ffp-contract=on? I think it is the latter and we want
scanon added a comment.
`-ffp-contract=on` obeys the semantics of C's FP_CONTRACT pragma. In
particular, it will not fuse:
float m = x*y;
float a = m + z;
Whereas you probably want that to fuse for your purposes. `-ffp-contract=fast`
seems more in line with your needs.
http://reviews.l
tra added a comment.
In http://reviews.llvm.org/D20341#432494, @hfinkel wrote:
>
> That having been said, is this change the equivalent of -ffp-contract=fast or
> -ffp-contract=on? I think it is the latter and we want the former (i.e. where
> we let the backend be as aggressive as possible
tra added a subscriber: scanon.
tra added a comment.
Things are even more interesting. -ffp-contract=fast is *not* what this change
does. :-)
We have two places where we can fuse FP instructions -- in clang and in LLVM
back-end.
Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=o
hfinkel added a subscriber: hfinkel.
hfinkel added a comment.
In http://reviews.llvm.org/D20341#432461, @jlebar wrote:
> I am not sure we want this? Although it matches nvcc, it does not match our
> floating-point behavior for C++ in general -- it makes us non-IEEE-whatever
> compliant by defa
jlebar added a comment.
I am not sure we want this? Although it matches nvcc, it does not match our
floating-point behavior for C++ in general -- it makes us non-IEEE-whatever
compliant by default.
Although I agree that if we don't do this, lots of people are not going to pass
-fp-contract=fa
tra created this revision.
tra added reviewers: jlebar, jingyue.
tra added a subscriber: cfe-commits.
This matches default nvcc behavior and gives substantial performance boost on
GPU where fmad is much cheaper compared to add+mul.
http://reviews.llvm.org/D20341
Files:
lib/Frontend/Compiler
16 matches
Mail list logo