This revision was automatically updated to reflect the committed changes.
Closed by commit rC321326: [CUDA] More fixes for __shfl_* intrinsics. (authored
by tra, committed by ).
Changed prior to commit:
https://reviews.llvm.org/D41521?vs=127950&id=127962#toc
Repository:
rC Clang
https://rev
tra added a comment.
Added to my todo list. There are few more gaps that I want to test in order to
make sure we don't regress on compatibility with older CUDA versions while
changing these wrappers.
https://reviews.llvm.org/D41521
___
cfe-commit
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.
Since this is tricky and we've seen it affecting user code, do you think it's a
bad idea to add tests to the test-suite?
https://reviews.llvm.org/D41521
___
tra created this revision.
tra added a reviewer: jlebar.
Herald added a subscriber: sanjoy.
- __shfl_{up,down}* uses `unsigned int` for the third parameter.
- added [unsigned] long overloads for non-sync shuffles. Augments r319908 which
added long overload for sync shuffles.
https://reviews.llv