This revision was automatically updated to reflect the committed changes.
tra marked 2 inline comments as done.
Closed by commit rC337587: [CUDA] Provide integer SIMD functions for CUDA-9.2
(authored by tra, committed by ).
Changed prior to commit:
https://reviews.llvm.org/D49274?vs=156397&id=1
bkramer accepted this revision.
bkramer added a comment.
This revision is now accepted and ready to land.
lg
https://reviews.llvm.org/D49274
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c
tra marked 2 inline comments as done.
tra added a comment.
Ben, PTAL.
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:1080
+ unsigned int r;
+ asm("vabsdiff2.u32.u32.u32.sat %0,%1,%2,0;" : "=r"(r) : "r"(__a), "r"(__b));
+ return r;
bkramer wrot
tra updated this revision to Diff 156397.
tra added a comment.
Fixed the issues pointed out by bkramer@.
Apparently. sat does not matter for vabsdiff instruction with unsigned operands.
My tests were also missing __vabsssN.
https://reviews.llvm.org/D49274
Files:
clang/lib/Headers/__clang_cu
tra updated this revision to Diff 156386.
tra added a comment.
Fixed inline asm syntax.
Added workaround for the bug in __vmaxs2() discovered during testing().
I've got set of tests for these functions that I'll add to test-suite shortly.
AFAICT this implementation matches nvidia's bit-to-bit.
tra added a comment.
I'm in the middle of writing the tests for these as it's very easy to mess
things up. I'll update the patch once I run it through the tests.
Another problem with the patch in the current form is that these instructions
apparently do not accept immediate arguments. PTX is a
bkramer accepted this revision.
bkramer added inline comments.
This revision is now accepted and ready to land.
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:1080
+ unsigned int r;
+ asm("vabsdiff2.u32.u32.u32.sat %0,%1,%2,0;" : "=r"(r) : "r"(__a), "r"(__b));
+
tra created this revision.
tra added reviewers: jlebar, bkramer.
Herald added subscribers: bixia, sanjoy.
CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its ow