[PATCH] D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called

2018-08-09 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

+tra in the hopes that perhaps he's comfortable reviewing this (sorry that I'm 
not).


Repository:
  rC Clang

https://reviews.llvm.org/D47757



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46993: [CUDA] Make std::min/max work when compiling in C++14 mode with a C++11 stdlib.

2018-05-16 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rsmith.
Herald added a subscriber: sanjoy.

https://reviews.llvm.org/D46993

Files:
  clang/lib/Headers/cuda_wrappers/algorithm


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -24,28 +24,36 @@
 #ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM
 #define __CLANG_CUDA_WRAPPERS_ALGORITHM
 
-// This header defines __device__ overloads of std::min/max, but only if we're
-// <= C++11.  In C++14, these functions are constexpr, and so are implicitly
-// __host__ __device__.
+// This header defines __device__ overloads of std::min/max.
 //
-// We don't support the initializer_list overloads because
-// initializer_list::begin() and end() are not __host__ __device__ functions.
+// Ideally we'd declare these functions only if we're <= C++11.  In C++14,
+// these functions are constexpr, and so are implicitly __host__ __device__.
 //
-// When compiling in C++14 mode, we could force std::min/max to have different
-// implementations for host and device, by declaring the device overloads
-// before the constexpr overloads appear.  We choose not to do this because
-
-//  a) why write our own implementation when we can use one from the standard
-// library? and
-//  b) libstdc++ is evil and declares min/max inside a header that is included
-// *before* we include .  So we'd have to unconditionally
-// declare our __device__ overloads of min/max, but that would pollute
-// things for people who choose not to include .
+// However, the compiler being in C++14 mode does not imply that the standard
+// library supports C++14.  There is no macro we can test to check that the
+// stdlib has constexpr std::min/max.  Thus we have to unconditionally define
+// our device overloads.
+//
+// A host+device function cannot be overloaded, and a constexpr function
+// implicitly become host device if there's no explicitly host or device
+// overload preceding it.  So the simple thing to do would be to declare our
+// device min/max overloads, and then #include_next .  This way our
+// device overloads would come first, and so if we have a C++14 stdlib, its
+// min/max won't become host+device and conflict with our device overloads.
+//
+// But that also doesn't work.  libstdc++ is evil and declares std::min/max in
+// an internal header that is included *before* .  Thus by the time
+// we're inside of this file, std::min/max may already have been declared, and
+// thus we can't prevent them from becoming host+device if they're constexpr.
+//
+// Therefore we perpetrate the following hack: We mark our __device__ overloads
+// with __attribute__((enable_if(true, ""))).  This causes the signature of the
+// function to change without changing anything else about it.  (Except that
+// overload resolution will prefer it over the __host__ __device__ version
+// rather than considering them equally good).
 
 #include_next 
 
-#if __cplusplus <= 201103L
-
 // We need to define these overloads in exactly the namespace our standard
 // library uses (including the right inline namespace), otherwise they won't be
 // picked up by other functions in the standard library (e.g. functions in
@@ -60,24 +68,28 @@
 #endif
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
@@ -92,5 +104,4 @@
 } // namespace std
 #endif
 
-#endif // __cplusplus <= 201103L
 #endif // __CLANG_CUDA_WRAPPERS_ALGORITHM


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -24,28 +24,36 @@
 #ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM
 #define __CLANG_CUDA_WRAPPERS_ALGORITHM
 
-// This header defines __device__ overloads of std::min/max, but only if we're
-// <= C++11.  In C++14, these functions are constexpr, and so are implicitly
-// __host__ __device__.
+// This header defines __device__ overloads of std::min/max.
 //
-// We don't support the initializer_list overloads because
-// initializer_list::begin() and end() are not __host__ __device__ functions.
+// Ideally we'd declare these functions only if we're <= C++11.  In C++14,
+// these functions are constexpr, and so are implicitly __host__ __device_

[PATCH] D46994: [test-suite] Test CUDA in C++14 mode with C++11 stdlibs.

2018-05-16 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added a reviewer: tra.
Herald added subscribers: llvm-commits, mgorny, sanjoy.

Previously (https://reviews.llvm.org/D46993) std::min/max didn't work in C++14 
mode with a C++11
stdlib; we'd assumed that compiler std=c++14 implied stdlib in C++14
mode.


Repository:
  rT test-suite

https://reviews.llvm.org/D46994

Files:
  External/CUDA/CMakeLists.txt
  External/CUDA/algorithm.cu
  External/CUDA/cmath.cu
  External/CUDA/complex.cu

Index: External/CUDA/complex.cu
===
--- External/CUDA/complex.cu
+++ External/CUDA/complex.cu
@@ -7,25 +7,29 @@
 //
 //===--===//
 
-#include 
-#include 
-#include 
-
 // These are loosely adapted from libc++'s tests.  In general, we don't care a
 // ton about verifying the return types or results we get, on the assumption
 // that our standard library is correct. But we care deeply about calling every
 // overload of every function (so that we verify that everything compiles).
 //
 // We do care about the results of complex multiplication / division, since
 // these use code we've written.
 
+#include 
+
 // These tests are pretty annoying to write without C++11, so we require that.
 // In addition, these tests currently don't compile with libc++, because of the
 // issue in https://reviews.llvm.org/D25403.
 //
 // TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION)
+//
+// In addition, these tests don't work in C++14 mode with pre-C++14 versions of
+// libstdc++ (compile errors in ).
+#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
+(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
+#include 
+#include 
 #include 
 
 template 
@@ -69,7 +73,7 @@
 }
 
 __device__ void test_literals() {
-#if __cplusplus >= 201402L
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
   using namespace std::literals::complex_literals;
 
   {
Index: External/CUDA/cmath.cu
===
--- External/CUDA/cmath.cu
+++ External/CUDA/cmath.cu
@@ -1145,7 +1145,7 @@
 assert(std::hypot(3.f, 4.) == 5);
 assert(std::hypot(3.f, 4.f) == 5);
 
-#if TEST_STD_VER > 14
+#if __cplusplus >= 201703L && STDLIB_VERSION >= 2017
 static_assert((std::is_same::value), "");
 static_assert((std::is_same::value), "");
 static_assert((std::is_same::value), "");
@@ -1158,8 +1158,8 @@
 static_assert((std::is_same::value), "");
 static_assert((std::is_same::value), "");
 
-assert(std::hypot(2,3,6) == 7);
-assert(std::hypot(1,4,8) == 9);
+assert(std::hypot(2, 3, 6) == 7);
+assert(std::hypot(1, 4, 8) == 9);
 #endif
 }
 
Index: External/CUDA/algorithm.cu
===
--- External/CUDA/algorithm.cu
+++ External/CUDA/algorithm.cu
@@ -27,7 +27,7 @@
 // initializer_lists until C++14, when it gets these for free from the standard
 // library (because they're constexpr).
 __device__ void cpp14_tests() {
-#if __cplusplus >= 201402L
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
   assert(std::greater()(1, 0));
   assert(std::min({5, 1, 10}) == 1);
   assert(std::max({5, 1, 10}, std::less()) == 10);
Index: External/CUDA/CMakeLists.txt
===
--- External/CUDA/CMakeLists.txt
+++ External/CUDA/CMakeLists.txt
@@ -316,26 +316,31 @@
   set(_Std_LDFLAGS -std=${_Std})
   foreach(_GccPath IN LISTS GCC_PATHS)
 get_version(_GccVersion ${_GccPath})
-# libstdc++ seems not to support C++14 before version 5.0.
-if(${_Std} STREQUAL "c++14" AND ${_GccVersion} VERSION_LESS "5.0")
-  continue()
-endif()
 set(_Gcc_Suffix "libstdc++-${_GccVersion}")
 # Tell clang to use libstdc++ and where to find it.
 set(_Stdlib_CPPFLAGS -stdlib=libstdc++ -gcc-toolchain ${_GccPath})
 set(_Stdlib_LDFLAGS  -stdlib=libstdc++)
 # Add libstdc++ as link dependency.
 set(_Stdlib_Libs libstdcxx-${_GccVersion})
 
+# libstdc++ seems not to support C++14 before version 5.0.  We still
+# want to run in C++14 mode with old libstdc++s to test compiler C++14
+# with stdlib C++11, but we add a -D so that our tests can detect this.
+if (${_GccVersion} VERSION_LESS "5.0")
+  list(APPEND _Stdlib_CPPFLAGS -DSTDLIB_VERSION=2011)
+else()
+  list(APPEND _Stdlib_CPPFLAGS -DSTDLIB_VERSION=2014)
+endif()
+
 create_cuda_test_variant(${_Std} "${_Cuda_Suffix}-${_Std_Suffix}-${_Gcc_Suffix}")
   endforeach()
 
   if(HAVE_LIBCXX)
 	# Same as above, but for libc++
 	# Tell clang to use libc++
 	# We also need to add compiler's include path for cxxabi.h
 	get_filename_component(_compiler_path ${CMAKE_CXX_COMPILER} DIRECTORY)
-	set(_Stdlib_CPPFLAGS -std

[PATCH] D46995: [test-suite] Enable CUDA complex tests with libc++ now that D25403 is resolved.

2018-05-16 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added a reviewer: tra.
Herald added subscribers: llvm-commits, sanjoy.
Herald added a reviewer: EricWF.

Repository:
  rT test-suite

https://reviews.llvm.org/D46995

Files:
  External/CUDA/complex.cu


Index: External/CUDA/complex.cu
===
--- External/CUDA/complex.cu
+++ External/CUDA/complex.cu
@@ -18,15 +18,10 @@
 #include 
 
 // These tests are pretty annoying to write without C++11, so we require that.
-// In addition, these tests currently don't compile with libc++, because of the
-// issue in https://reviews.llvm.org/D25403.
-//
-// TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
 //
 // In addition, these tests don't work in C++14 mode with pre-C++14 versions of
 // libstdc++ (compile errors in ).
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
-(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
+#if __cplusplus >= 201103L && (__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
 #include 
 #include 


Index: External/CUDA/complex.cu
===
--- External/CUDA/complex.cu
+++ External/CUDA/complex.cu
@@ -18,15 +18,10 @@
 #include 
 
 // These tests are pretty annoying to write without C++11, so we require that.
-// In addition, these tests currently don't compile with libc++, because of the
-// issue in https://reviews.llvm.org/D25403.
-//
-// TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
 //
 // In addition, these tests don't work in C++14 mode with pre-C++14 versions of
 // libstdc++ (compile errors in ).
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
-(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
+#if __cplusplus >= 201103L && (__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
 #include 
 #include 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46993: [CUDA] Make std::min/max work when compiling in C++14 mode with a C++11 stdlib.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Thank you for the review!


https://reviews.llvm.org/D46993



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46993: [CUDA] Make std::min/max work when compiling in C++14 mode with a C++11 stdlib.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC332619: [CUDA] Make std::min/max work when compiling in 
C++14 mode with a C++11 stdlib. (authored by jlebar, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D46993?vs=147224&id=147330#toc

Repository:
  rC Clang

https://reviews.llvm.org/D46993

Files:
  lib/Headers/cuda_wrappers/algorithm


Index: lib/Headers/cuda_wrappers/algorithm
===
--- lib/Headers/cuda_wrappers/algorithm
+++ lib/Headers/cuda_wrappers/algorithm
@@ -24,28 +24,36 @@
 #ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM
 #define __CLANG_CUDA_WRAPPERS_ALGORITHM
 
-// This header defines __device__ overloads of std::min/max, but only if we're
-// <= C++11.  In C++14, these functions are constexpr, and so are implicitly
-// __host__ __device__.
+// This header defines __device__ overloads of std::min/max.
 //
-// We don't support the initializer_list overloads because
-// initializer_list::begin() and end() are not __host__ __device__ functions.
+// Ideally we'd declare these functions only if we're <= C++11.  In C++14,
+// these functions are constexpr, and so are implicitly __host__ __device__.
 //
-// When compiling in C++14 mode, we could force std::min/max to have different
-// implementations for host and device, by declaring the device overloads
-// before the constexpr overloads appear.  We choose not to do this because
-
-//  a) why write our own implementation when we can use one from the standard
-// library? and
-//  b) libstdc++ is evil and declares min/max inside a header that is included
-// *before* we include .  So we'd have to unconditionally
-// declare our __device__ overloads of min/max, but that would pollute
-// things for people who choose not to include .
+// However, the compiler being in C++14 mode does not imply that the standard
+// library supports C++14.  There is no macro we can test to check that the
+// stdlib has constexpr std::min/max.  Thus we have to unconditionally define
+// our device overloads.
+//
+// A host+device function cannot be overloaded, and a constexpr function
+// implicitly become host device if there's no explicitly host or device
+// overload preceding it.  So the simple thing to do would be to declare our
+// device min/max overloads, and then #include_next .  This way our
+// device overloads would come first, and so if we have a C++14 stdlib, its
+// min/max won't become host+device and conflict with our device overloads.
+//
+// But that also doesn't work.  libstdc++ is evil and declares std::min/max in
+// an internal header that is included *before* .  Thus by the time
+// we're inside of this file, std::min/max may already have been declared, and
+// thus we can't prevent them from becoming host+device if they're constexpr.
+//
+// Therefore we perpetrate the following hack: We mark our __device__ overloads
+// with __attribute__((enable_if(true, ""))).  This causes the signature of the
+// function to change without changing anything else about it.  (Except that
+// overload resolution will prefer it over the __host__ __device__ version
+// rather than considering them equally good).
 
 #include_next 
 
-#if __cplusplus <= 201103L
-
 // We need to define these overloads in exactly the namespace our standard
 // library uses (including the right inline namespace), otherwise they won't be
 // picked up by other functions in the standard library (e.g. functions in
@@ -60,24 +68,28 @@
 #endif
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
+__attribute__((enable_if(true, "")))
 inline __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
@@ -92,5 +104,4 @@
 } // namespace std
 #endif
 
-#endif // __cplusplus <= 201103L
 #endif // __CLANG_CUDA_WRAPPERS_ALGORITHM


Index: lib/Headers/cuda_wrappers/algorithm
===
--- lib/Headers/cuda_wrappers/algorithm
+++ lib/Headers/cuda_wrappers/algorithm
@@ -24,28 +24,36 @@
 #ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM
 #define __CLANG_CUDA_WRAPPERS_ALGORITHM
 
-// This header defines __device__ overloads of std::min/max, but only if we're
-// <= C++11.  In C++14, these functions are constexpr, and so are implicitly
-// __host__ __device__.
+// This header defines __device__ overloads of std::min/max.
 //
-// We don't support the initializer_list overloads because
-// initializer_list::begin() and end() are

[PATCH] D46782: [CUDA] Allow "extern __shared__ Foo foo[]" within anon. namespaces.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC332621: [CUDA] Allow "extern __shared__ Foo foo[]" 
within anon. namespaces. (authored by jlebar, committed by ).
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D46782?vs=147216&id=147331#toc

Repository:
  rC Clang

https://reviews.llvm.org/D46782

Files:
  include/clang/AST/Decl.h
  lib/AST/Decl.cpp
  lib/Sema/Sema.cpp
  test/SemaCUDA/extern-shared.cu


Index: include/clang/AST/Decl.h
===
--- include/clang/AST/Decl.h
+++ include/clang/AST/Decl.h
@@ -1456,6 +1456,11 @@
 
   void setDescribedVarTemplate(VarTemplateDecl *Template);
 
+  // Is this variable known to have a definition somewhere in the complete
+  // program? This may be true even if the declaration has internal linkage and
+  // has no definition within this source file.
+  bool isKnownToBeDefined() const;
+
   // Implement isa/cast/dyncast/etc.
   static bool classof(const Decl *D) { return classofKind(D->getKind()); }
   static bool classofKind(Kind K) { return K >= firstVar && K <= lastVar; }
Index: test/SemaCUDA/extern-shared.cu
===
--- test/SemaCUDA/extern-shared.cu
+++ test/SemaCUDA/extern-shared.cu
@@ -1,10 +1,10 @@
-// RUN: %clang_cc1 -fsyntax-only -verify %s
-// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -fsyntax-only -Wundefined-internal -verify %s
+// RUN: %clang_cc1 -fsyntax-only -Wundefined-internal -fcuda-is-device -verify 
%s
 
-// RUN: %clang_cc1 -fsyntax-only -fcuda-rdc -verify=rdc %s
-// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-rdc -verify=rdc %s
-// These declarations are fine in separate compilation mode:
-// rdc-no-diagnostics
+// RUN: %clang_cc1 -fsyntax-only -Wundefined-internal -fcuda-rdc -verify=rdc %s
+// RUN: %clang_cc1 -fsyntax-only -Wundefined-internal -fcuda-is-device 
-fcuda-rdc -verify=rdc %s
+
+// Most of these declarations are fine in separate compilation mode.
 
 #include "Inputs/cuda.h"
 
@@ -26,3 +26,18 @@
 extern __shared__ int global; // expected-error {{__shared__ variable 'global' 
cannot be 'extern'}}
 extern __shared__ int global_arr[]; // ok
 extern __shared__ int global_arr1[1]; // expected-error {{__shared__ variable 
'global_arr1' cannot be 'extern'}}
+
+// Check that, iff we're not in rdc mode, extern __shared__ can appear in an
+// anonymous namespace / in a static function without generating a warning
+// about a variable with internal linkage but no definition
+// (-Wundefined-internal).
+namespace {
+extern __shared__ int global_arr[]; // rdc-warning {{has internal linkage but 
is not defined}}
+__global__ void in_anon_ns() {
+  extern __shared__ int local_arr[]; // rdc-warning {{has internal linkage but 
is not defined}}
+
+  // Touch arrays to generate the warning.
+  local_arr[0] = 0;  // rdc-note {{used here}}
+  global_arr[0] = 0; // rdc-note {{used here}}
+}
+} // namespace
Index: lib/AST/Decl.cpp
===
--- lib/AST/Decl.cpp
+++ lib/AST/Decl.cpp
@@ -2432,6 +2432,23 @@
   getASTContext().setTemplateOrSpecializationInfo(this, Template);
 }
 
+bool VarDecl::isKnownToBeDefined() const {
+  const auto &LangOpts = getASTContext().getLangOpts();
+  // In CUDA mode without relocatable device code, variables of form 'extern
+  // __shared__ Foo foo[]' are pointers to the base of the GPU core's shared
+  // memory pool.  These are never undefined variables, even if they appear
+  // inside of an anon namespace or static function.
+  //
+  // With CUDA relocatable device code enabled, these variables don't get
+  // special handling; they're treated like regular extern variables.
+  if (LangOpts.CUDA && !LangOpts.CUDARelocatableDeviceCode &&
+  hasExternalStorage() && hasAttr() &&
+  isa(getType()))
+return true;
+
+  return hasDefinition();
+}
+
 MemberSpecializationInfo *VarDecl::getMemberSpecializationInfo() const {
   if (isStaticDataMember())
 // FIXME: Remove ?
Index: lib/Sema/Sema.cpp
===
--- lib/Sema/Sema.cpp
+++ lib/Sema/Sema.cpp
@@ -653,6 +653,11 @@
   !isExternalWithNoLinkageType(VD) &&
   !VD->getMostRecentDecl()->isInline())
 continue;
+
+  // Skip VarDecls that lack formal definitions but which we know are in
+  // fact defined somewhere.
+  if (VD->isKnownToBeDefined())
+continue;
 }
 
 Undefined.push_back(std::make_pair(ND, UndefinedUse.second));


Index: include/clang/AST/Decl.h
===
--- include/clang/AST/Decl.h
+++ include/clang/AST/Decl.h
@@ -1456,6 +1456,11 @@
 
   void setDescribedVarTemplate(VarTemplateDecl *Template);
 
+  // Is this variable known to have a definition somewhere in the complete
+  // program? This

[PATCH] D46995: [test-suite] Enable CUDA complex tests with libc++ now that D25403 is resolved.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar marked an inline comment as done.
jlebar added inline comments.



Comment at: External/CUDA/complex.cu:24
 // libstdc++ (compile errors in ).
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
-(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
+#if __cplusplus >= 201103L && (__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 

tra wrote:
> Is this specific to c++14 only, or will we have similar conditions for 
> c++17,20, etc?
> Perhaps we could express library version requirements as `STDLIB_VERSION >= 
> (__cplusplus / 100)` ?
> I'm OK with either way.
> 
> 
I think it's specific to c++14 -- or at least, it's not necessarily a general 
problem.  The other benchmarks work with C++14 compiler plus C++11 stdlib -- 
it's just  that gives us problems in the particular gcc versions we 
happen to use.


Repository:
  rT test-suite

https://reviews.llvm.org/D46995



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46994: [test-suite] Test CUDA in C++14 mode with C++11 stdlibs.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar marked an inline comment as done.
jlebar added a comment.

Thanks for the reviews, Art.  Submitting with this change...


Repository:
  rT test-suite

https://reviews.llvm.org/D46994



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46994: [test-suite] Test CUDA in C++14 mode with C++11 stdlibs.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL332659: [test-suite] Test CUDA in C++14 mode with C++11 
stdlibs. (authored by jlebar, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D46994?vs=147225&id=147383#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D46994

Files:
  test-suite/trunk/External/CUDA/CMakeLists.txt
  test-suite/trunk/External/CUDA/algorithm.cu
  test-suite/trunk/External/CUDA/cmath.cu
  test-suite/trunk/External/CUDA/complex.cu

Index: test-suite/trunk/External/CUDA/complex.cu
===
--- test-suite/trunk/External/CUDA/complex.cu
+++ test-suite/trunk/External/CUDA/complex.cu
@@ -7,25 +7,29 @@
 //
 //===--===//
 
-#include 
-#include 
-#include 
-
 // These are loosely adapted from libc++'s tests.  In general, we don't care a
 // ton about verifying the return types or results we get, on the assumption
 // that our standard library is correct. But we care deeply about calling every
 // overload of every function (so that we verify that everything compiles).
 //
 // We do care about the results of complex multiplication / division, since
 // these use code we've written.
 
+#include 
+
 // These tests are pretty annoying to write without C++11, so we require that.
 // In addition, these tests currently don't compile with libc++, because of the
 // issue in https://reviews.llvm.org/D25403.
 //
 // TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION)
+//
+// In addition, these tests don't work in C++14 mode with pre-C++14 versions of
+// libstdc++ (compile errors in ).
+#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
+(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
+#include 
+#include 
 #include 
 
 template 
@@ -69,7 +73,7 @@
 }
 
 __device__ void test_literals() {
-#if __cplusplus >= 201402L
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
   using namespace std::literals::complex_literals;
 
   {
Index: test-suite/trunk/External/CUDA/CMakeLists.txt
===
--- test-suite/trunk/External/CUDA/CMakeLists.txt
+++ test-suite/trunk/External/CUDA/CMakeLists.txt
@@ -316,28 +316,33 @@
   set(_Std_LDFLAGS -std=${_Std})
   foreach(_GccPath IN LISTS GCC_PATHS)
 get_version(_GccVersion ${_GccPath})
-# libstdc++ seems not to support C++14 before version 5.0.
-if(${_Std} STREQUAL "c++14" AND ${_GccVersion} VERSION_LESS "5.0")
-  continue()
-endif()
 set(_Gcc_Suffix "libstdc++-${_GccVersion}")
 # Tell clang to use libstdc++ and where to find it.
 set(_Stdlib_CPPFLAGS -stdlib=libstdc++ -gcc-toolchain ${_GccPath})
 set(_Stdlib_LDFLAGS  -stdlib=libstdc++)
 # Add libstdc++ as link dependency.
 set(_Stdlib_Libs libstdcxx-${_GccVersion})
 
+# libstdc++ seems not to support C++14 before version 5.0.  We still
+# want to run in C++14 mode with old libstdc++s to test compiler C++14
+# with stdlib C++11, but we add a -D so that our tests can detect this.
+if (${_GccVersion} VERSION_LESS "5.0")
+  list(APPEND _Stdlib_CPPFLAGS -DSTDLIB_VERSION=2011)
+else()
+  list(APPEND _Stdlib_CPPFLAGS -DSTDLIB_VERSION=2014)
+endif()
+
 create_cuda_test_variant(${_Std} "${_Cuda_Suffix}-${_Std_Suffix}-${_Gcc_Suffix}")
   endforeach()
 
   if(HAVE_LIBCXX)
-	# Same as above, but for libc++
-	# Tell clang to use libc++
-	# We also need to add compiler's include path for cxxabi.h
-	get_filename_component(_compiler_path ${CMAKE_CXX_COMPILER} DIRECTORY)
-	set(_Stdlib_CPPFLAGS -stdlib=libc++ -I${_compiler_path}/../include/c++-build)
-	set(_Stdlib_LDFLAGS  -stdlib=libc++)
-	set(_Stdlib_Libs libcxx)
+# Same as above, but for libc++
+# Tell clang to use libc++
+# We also need to add compiler's include path for cxxabi.h
+get_filename_component(_compiler_path ${CMAKE_CXX_COMPILER} DIRECTORY)
+set(_Stdlib_CPPFLAGS -stdlib=libc++ -I${_compiler_path}/../include/c++-build -DSTDLIB_VERSION=2017)
+set(_Stdlib_LDFLAGS  -stdlib=libc++)
+set(_Stdlib_Libs libcxx)
 create_cuda_test_variant(${_Std} "${_Cuda_Suffix}-${_Std_Suffix}-libc++")
   endif()
 endforeach()
Index: test-suite/trunk/External/CUDA/cmath.cu
===
--- test-suite/trunk/External/CUDA/cmath.cu
+++ test-suite/trunk/External/CUDA/cmath.cu
@@ -1145,7 +1145,7 @@
 assert(std::hypot(3.f, 4.) == 5);
 assert(std::hypot(3.f, 4.f) == 5);
 
-#if TEST_STD_VER > 14
+#if __cplusplus >= 201703L && STDLIB_VERSION >= 2017
 static_assert((std::is_same::value), "");
 static_assert((std::is_same::value), "");
 static_assert(

[PATCH] D46995: [test-suite] Enable CUDA complex tests with libc++ now that D25403 is resolved.

2018-05-17 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.
Closed by commit rL332660: [test-suite] Enable CUDA complex tests with libc++ 
now that D25403 is resolved. (authored by jlebar, committed by ).

Repository:
  rL LLVM

https://reviews.llvm.org/D46995

Files:
  test-suite/trunk/External/CUDA/complex.cu


Index: test-suite/trunk/External/CUDA/complex.cu
===
--- test-suite/trunk/External/CUDA/complex.cu
+++ test-suite/trunk/External/CUDA/complex.cu
@@ -18,15 +18,10 @@
 #include 
 
 // These tests are pretty annoying to write without C++11, so we require that.
-// In addition, these tests currently don't compile with libc++, because of the
-// issue in https://reviews.llvm.org/D25403.
-//
-// TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
 //
 // In addition, these tests don't work in C++14 mode with pre-C++14 versions of
 // libstdc++ (compile errors in ).
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
-(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
+#if __cplusplus >= 201103L && (__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
 #include 
 #include 


Index: test-suite/trunk/External/CUDA/complex.cu
===
--- test-suite/trunk/External/CUDA/complex.cu
+++ test-suite/trunk/External/CUDA/complex.cu
@@ -18,15 +18,10 @@
 #include 
 
 // These tests are pretty annoying to write without C++11, so we require that.
-// In addition, these tests currently don't compile with libc++, because of the
-// issue in https://reviews.llvm.org/D25403.
-//
-// TODO: Once that issue is resolved, take out !defined(_LIBCPP_VERSION) here.
 //
 // In addition, these tests don't work in C++14 mode with pre-C++14 versions of
 // libstdc++ (compile errors in ).
-#if __cplusplus >= 201103L && !defined(_LIBCPP_VERSION) && \
-(__cplusplus < 201402L || STDLIB_VERSION >= 2014)
+#if __cplusplus >= 201103L && (__cplusplus < 201402L || STDLIB_VERSION >= 2014)
 
 #include 
 #include 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

2018-05-18 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

I defer to Art on this one.


Repository:
  rC Clang

https://reviews.llvm.org/D47070



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38188: [CUDA] Fix names of __nvvm_vote* intrinsics.

2017-09-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Should we add tests to the test-suite?  Or, are these already caught by the 
existing tests we have?


https://reviews.llvm.org/D38188



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38191: [NVPTX] added match.{any, all}.sync instructions, intrinsics & builtins.

2017-09-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:419
+TARGET_BUILTIN(__nvvm_match_any_sync_i64, "WiUiWi", "", "ptx60")
+// These return a pair {value, predicate} which requires custom lowering.
+TARGET_BUILTIN(__nvvm_match_all_sync_i32p, "UiUiUii*", "", "ptx60")

Nit, non-restrictive "which" should get a comma.  :)



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9603
+Value *Pred = Builder.CreateSExt(Builder.CreateExtractValue(ResultPair, 1),
+ PredOutPtr.getElementType());
+Builder.CreateStore(Pred, PredOutPtr);

Doing sext i1 -> i32 is going to cause us to store 0 or -1 in the pred 
(right?).  The CUDA docs say

> Predicate pred is set to true if all threads in mask have the same value of 
> value; otherwise the predicate is set to false.

I'd guess that "true" probably means 1 (i.e. uext i1 -> i32) rather than -1, 
although, I guess we have to check.


https://reviews.llvm.org/D38191



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38468: [CUDA] Fix name of __activemask()

2017-10-02 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Thank you for the fix!


https://reviews.llvm.org/D38468



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70

2017-10-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9726
+  case NVPTX::BI__hmma_m16n16k16_ld_c_f16:
+case NVPTX::BI__hmma_m16n16k16_ld_c_f32:{
+Address Dst = EmitPointerWithAlignment(E->getArg(0));

weird indentation?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9733
+  return nullptr;
+bool isColMajor = isColMajorArg.getZExtValue();
+unsigned IID;

Urg, this isn't a bool?  Do we want it to be?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9761
+
+//auto EltTy = cast(Src->getType())->getElementType();
+// Returned are 8 16x2 elements.

Accidentally left over?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9762
+//auto EltTy = cast(Src->getType())->getElementType();
+// Returned are 8 16x2 elements.
+for (unsigned i = 0; i < NumResults; ++i) {

s/8/NumElements/?
s/16/f16/?

Maybe it would be better to write it as "Return value has type [[f16 x 2] x 
NumResults]."?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9784
+unsigned IID;
+unsigned NumResults = 8;
+// PTX Instructions (and LLVM instrinsics) are defined for slice _d_, yet

Nit, at this point it's probably better to assign NumResults in each branch, 
since there are only two.  clang should make sure that we don't accidentally 
use it uninitialized.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9786
+// PTX Instructions (and LLVM instrinsics) are defined for slice _d_, yet
+// for some reason nvcc buildtins are using _c_.
+switch(BuiltinID) {

s/are using/use/



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9800
+}
+Function * Intrinsic = CGM.getIntrinsic(IID);
+llvm::Type *ParamType = Intrinsic->getFunctionType()->getParamType(1);

spacing.  (Probably just worth clang-formatting this and the other patch.)



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9802
+llvm::Type *ParamType = Intrinsic->getFunctionType()->getParamType(1);
+SmallVector Values;
+Values.push_back(Builder.CreatePointerCast(Dst, VoidPtrTy));

Nit, we know that there won't ever be more than 8 elements...


https://reviews.llvm.org/D38742



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70

2017-10-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:9733
+  return nullptr;
+bool isColMajor = isColMajorArg.getZExtValue();
+unsigned IID;

tra wrote:
> jlebar wrote:
> > Urg, this isn't a bool?  Do we want it to be?
> There are no explicit declarations for these builtins in CUDA headers. 
> Callers of these builtins pass 0/1 and corresponding intrinsic described in 
> [[ 
> http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#nvvm-intrin-warp-level-matrix-ld
>  | NVVM-IR spec ]] shows the argument type as i32, so I've made the type 
> integer in clang. 
> 
> 
sgtm


https://reviews.llvm.org/D38742



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38816: Convert clang::LangAS to a strongly typed enum

2017-10-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

My only regret is that I have but one +1 to give to this patch.




Comment at: include/clang/Basic/AddressSpaces.h:51
 
+namespace LanguageAS {
 /// The type of a lookup table which maps from language-specific address spaces

I wonder if you need this namespace?  LangAS right next to LanguageAS reads 
strangely to me -- "what's the difference?".

I guess you'd need to rename Map and fromTargetAS, but the other two members 
are probably OK?


Repository:
  rL LLVM

https://reviews.llvm.org/D38816



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38816: Convert clang::LangAS to a strongly typed enum

2017-10-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> The only reason I added this namespace is that I wasn't sure whether having 
> those functions in the clang namespace is acceptable.

Maybe someone else will object, or suggest an existing namespace they should be 
in.  FWIW I think it's fine.

> Not quite sure what to call the functions though. langASFromTargetAS?

sgtm!


Repository:
  rL LLVM

https://reviews.llvm.org/D38816



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39005: [OpenMP] Clean up variable and function names for NVPTX backend

2017-10-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

This has been tried twice before, see https://reviews.llvm.org/D29883 and 
https://reviews.llvm.org/D17738.  I'm as unhappy about this as anyone, and 
personally I don't have any preference about how we try to solve it.  But I 
think we shouldn't check this in without hearing the objections from those past 
attempts.


Repository:
  rL LLVM

https://reviews.llvm.org/D39005



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39005: [OpenMP] Clean up variable and function names for NVPTX backend

2017-10-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> I'd be interested to get the ball rolling in regard to coming up with a fix 
> for this. I see some suggestions in past patches. Some help/clarification 
> would be much appreciated.

Happy to help, but I'm not sure what to offer beyond the link in Art's previous 
comment.


Repository:
  rL LLVM

https://reviews.llvm.org/D39005



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39005: [OpenMP] Clean up variable and function names for NVPTX backend

2017-10-18 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> The first question that comes to mind is what is the link between data layout 
> and name mangling conventions?

I pulled up http://llvm.org/doxygen/classllvm_1_1DataLayout.html and searched 
for "mangling" -- presumably this is what they were referring to.  We also 
don't need to speculate, rnk still works on LLVM.  :)


Repository:
  rL LLVM

https://reviews.llvm.org/D39005



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49763: [CUDA] Call atexit() for CUDA destructor early on.

2018-07-24 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:379
+  // Create destructor and register it with atexit() the way NVCC does it. 
Doing
+  // it during regular destructor phase worked in CUDA before 9.2 but results 
in
+  // double-free in 9.2.

the regular destructor phase



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:380
+  // it during regular destructor phase worked in CUDA before 9.2 but results 
in
+  // double-free in 9.2.
+  if (llvm::Function *CleanupFn = makeModuleDtorFunction()) {

a double-free


https://reviews.llvm.org/D49763



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43602: [CUDA] Added missing functions.

2018-02-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

For my information, how are we verifying that we've caught everything?


https://reviews.llvm.org/D43602



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41521: [CUDA] fixes for __shfl_* intrinsics.

2017-12-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Since this is tricky and we've seen it affecting user code, do you think it's a 
bad idea to add tests to the test-suite?


https://reviews.llvm.org/D41521



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41788: [DeclPrinter] Fix two cases that crash clang -ast-print.

2018-01-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

I strongly approve of fixing these crashes, but I don't think I can say with 
confidence whether this change is correct.


https://reviews.llvm.org/D41788



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47804: [CUDA] Replace 'nv_weak' attributes in CUDA headers with 'weak'.

2018-06-05 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

What could possibly go wrong.


https://reviews.llvm.org/D47804



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rsmith.
Herald added a subscriber: sanjoy.

Fixes PR37753: min/max can't be called from __host__ __device__
functions in C++14 mode.

Testcase in a separate test-suite commit.


https://reviews.llvm.org/D48036

Files:
  clang/lib/Headers/cuda_wrappers/algorithm


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -69,28 +69,28 @@
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -69,28 +69,28 @@
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48037: [CUDA] Add tests to ensure that std::min/max can be called from __host__ __device__ functions.

2018-06-11 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rsmith.
Herald added subscribers: llvm-commits, sanjoy.

Tests for https://reviews.llvm.org/D48036 / PR37753.


Repository:
  rT test-suite

https://reviews.llvm.org/D48037

Files:
  External/CUDA/algorithm.cu


Index: External/CUDA/algorithm.cu
===
--- External/CUDA/algorithm.cu
+++ External/CUDA/algorithm.cu
@@ -17,10 +17,16 @@
 __device__ void min() {
   assert(std::min(0, 1) == 0);
 }
+__host__ __device__ void min_hd() {
+  assert(std::min(0, 1) == 0);
+}
 
 __device__ void max() {
   assert(std::max(0, 1) == 1);
 }
+__host__ __device__ void max_hd() {
+  assert(std::max(0, 1) == 1);
+}
 
 // Clang has device-side shims implementing std::min and std::max for scalars
 // starting in C++11, but doesn't implement minimax or std::min/max on
@@ -39,10 +45,27 @@
 #endif
 }
 
+// Same tests as cpp14_tests, but from a host-device context.
+__host__ __device__ void cpp14_tests_hd() {
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
+  assert(std::greater()(1, 0));
+  assert(std::min({5, 1, 10}) == 1);
+  assert(std::max({5, 1, 10}, std::less()) == 10);
+
+  assert(std::minmax(1, 0).first == 0);
+  assert(std::minmax(1, 0).second == 1);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+#endif
+}
+
 __global__ void kernel() {
   min();
+  min_hd();
   max();
+  max_hd();
   cpp14_tests();
+  cpp14_tests_hd();
 }
 
 int main() {
@@ -52,6 +75,11 @@
 printf("CUDA error %d\n", (int)err);
 return 1;
   }
+
+  min_hd();
+  max_hd();
+  cpp14_tests_hd();
+
   printf("Success!\n");
   return 0;
 }


Index: External/CUDA/algorithm.cu
===
--- External/CUDA/algorithm.cu
+++ External/CUDA/algorithm.cu
@@ -17,10 +17,16 @@
 __device__ void min() {
   assert(std::min(0, 1) == 0);
 }
+__host__ __device__ void min_hd() {
+  assert(std::min(0, 1) == 0);
+}
 
 __device__ void max() {
   assert(std::max(0, 1) == 1);
 }
+__host__ __device__ void max_hd() {
+  assert(std::max(0, 1) == 1);
+}
 
 // Clang has device-side shims implementing std::min and std::max for scalars
 // starting in C++11, but doesn't implement minimax or std::min/max on
@@ -39,10 +45,27 @@
 #endif
 }
 
+// Same tests as cpp14_tests, but from a host-device context.
+__host__ __device__ void cpp14_tests_hd() {
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
+  assert(std::greater()(1, 0));
+  assert(std::min({5, 1, 10}) == 1);
+  assert(std::max({5, 1, 10}, std::less()) == 10);
+
+  assert(std::minmax(1, 0).first == 0);
+  assert(std::minmax(1, 0).second == 1);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+#endif
+}
+
 __global__ void kernel() {
   min();
+  min_hd();
   max();
+  max_hd();
   cpp14_tests();
+  cpp14_tests_hd();
 }
 
 int main() {
@@ -52,6 +75,11 @@
 printf("CUDA error %d\n", (int)err);
 return 1;
   }
+
+  min_hd();
+  max_hd();
+  cpp14_tests_hd();
+
   printf("Success!\n");
   return 0;
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> Last comment in the bug pointed out that those overloads should be constexpr 
> in c++14. Maybe in a separate patch, though.

Yeah, would prefer to do it in a separate patch.  It's possible that having 
constexpr min/max in C++14 mode *without a C++14 standard library* will cause 
problems.  (Don't mean to FUD it -- we should try.  I just would like to be 
able to roll them back separately.  :)


https://reviews.llvm.org/D48036



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48151: [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.

2018-06-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added reviewers: rsmith, tra.
Herald added a subscriber: sanjoy.

Tests in a separate change to the test-suite.


https://reviews.llvm.org/D48151

Files:
  clang/lib/Headers/cuda_wrappers/algorithm


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -67,34 +67,43 @@
 #endif
 #endif
 
+#pragma push_macro("_CPP14_CONSTEXPR")
+#if __cplusplus >= 201402L
+#define _CPP14_CONSTEXPR constexpr
+#else
+#define _CPP14_CONSTEXPR
+#endif
+
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
 
+#pragma pop_macro("_CPP14_CONSTEXPR")
+
 #ifdef _LIBCPP_END_NAMESPACE_STD
 _LIBCPP_END_NAMESPACE_STD
 #else


Index: clang/lib/Headers/cuda_wrappers/algorithm
===
--- clang/lib/Headers/cuda_wrappers/algorithm
+++ clang/lib/Headers/cuda_wrappers/algorithm
@@ -67,34 +67,43 @@
 #endif
 #endif
 
+#pragma push_macro("_CPP14_CONSTEXPR")
+#if __cplusplus >= 201402L
+#define _CPP14_CONSTEXPR constexpr
+#else
+#define _CPP14_CONSTEXPR
+#endif
+
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
 
+#pragma pop_macro("_CPP14_CONSTEXPR")
+
 #ifdef _LIBCPP_END_NAMESPACE_STD
 _LIBCPP_END_NAMESPACE_STD
 #else
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48152: [CUDA] Add tests that, in C++14 mode, min/max are constexpr.

2018-06-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar added reviewers: rsmith, tra.
Herald added a subscriber: llvm-commits.

Repository:
  rT test-suite

https://reviews.llvm.org/D48152

Files:
  External/CUDA/algorithm.cu


Index: External/CUDA/algorithm.cu
===
--- External/CUDA/algorithm.cu
+++ External/CUDA/algorithm.cu
@@ -42,6 +42,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
@@ -56,6 +58,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 


Index: External/CUDA/algorithm.cu
===
--- External/CUDA/algorithm.cu
+++ External/CUDA/algorithm.cu
@@ -42,6 +42,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
@@ -56,6 +58,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

In https://reviews.llvm.org/D48036#1131279, @tra wrote:

> Ack.


Patches sent (see dependency chain in phab).


https://reviews.llvm.org/D48036



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

@rsmith friendly ping on this one.


https://reviews.llvm.org/D48036



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48151: [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.

2018-06-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

@rsmith friendly ping on this review.


https://reviews.llvm.org/D48151



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48151: [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.

2018-06-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

In https://reviews.llvm.org/D48151#1133954, @rsmith wrote:

> LGTM


Thank you for the review, Richard.

Will check this in once the whole stack is ready -- just need 
https://reviews.llvm.org/D48036.


https://reviews.llvm.org/D48151



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57487: [CUDA] Propagate detected version of CUDA to cc1

2019-01-30 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/include/clang/Basic/Cuda.h:108
+enum class CudaFeature {
+  CUDA_USES_NEW_LAUNCH,
+};

Should this enum be documented?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:3475
+  if (IsCuda) {
+// We need to figure out which CUDA version we're compiling for as that
+// determines how we load and launch GPU kernels.

nit, s/as/, as/


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57487/new/

https://reviews.llvm.org/D57487



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

2019-01-30 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

LGTM, mostly nits.




Comment at: clang/include/clang/Sema/Sema.h:10316
 
+  /// Returns the name of the launch configuration function.
+  std::string getCudaConfigureFuncName() const;

Could we be a little less vague, what exactly is the launch-configuration 
function?  (Could be as simple as adding `e.g. cudaFooBar()`.)



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:201
 
-void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF,
- FunctionArgList &Args) {
+// CUDA 9.0+ uses new way to launch kernels. Parameters are packed in local
+// array and kernels are launched using cudaLaunchKernel().

nit `in a local array`



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:212
+  VoidPtrTy, CharUnits::fromQuantity(16), "kernel_args",
+  llvm::ConstantInt::get(SizeTy, std::max(1UL, Args.size(;
+  // Store pointers to the arguments in a locally allocated launch_args.

Nit, s/`1UL`/`uint64{1}`/ or size_t, whatever this function takes.  As-is we're 
baking in the assumption that unsigned long is the same as the type returned by 
Args.size(), which isn't necessarily true.

As an alternative, you could do `std::max(1, Args.size())` or whatever 
the appropriate type is.



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:239
+CGM.Error(CGF.CurFuncDecl->getLocation(),
+  "Can't find declaration for cudaLaunchKernel()"); // FIXME.
+return;

Unfixed FIXME?



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:260
+  /*isVarArg=*/false),
+  "__cudaPopCallConfiguration");
+

I see lots of references to `__cudaPushCallConfiguration`, but this is the only 
reference I see to `__cudaPopCallConfiguration`.  Is this a typo?  Also are we 
supposed to emit matching push and pop function calls?  Kind of weird to do one 
without the other...



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:266
+  // Emit the call to cudaLaunch
+
+  llvm::Value *Kernel = CGF.Builder.CreatePointerCast(CGF.CurFn, VoidPtrTy);

Whitespace nit, maybe move this whitespace line before the comment?



Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:429
 
+// CUDA runtime uses undocumented function to access kernel launch
+// configuration. We need to provide our own declaration for it here.

s/undocumented function/this undocumented function/?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57488/new/

https://reviews.llvm.org/D57488



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59647: [CUDA][HIP] Warn shared var initialization

2019-03-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar requested changes to this revision.
jlebar added a comment.
This revision now requires changes to proceed.

I agree with Art.  The fact that nvcc allows this is broken.

If you want a flag that makes this error a warning, that might work for me.  
The flag should probably say "unsafe" or "I promise I will not complain when 
this breaks me" or something to that effect.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59647/new/

https://reviews.llvm.org/D59647



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59647: [CUDA][HIP] Warn shared var initialization

2019-03-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> By default it is still treated as error, therefore no behavior change of 
> clang.

Oh, I see, you already did what I'd suggested.  :)

That's better.  I think this needs to be made *much scarier* though.  "Maybe 
race condition" doesn't capture the danger here -- you can very quickly get UB.

Maybe Richard has thoughts on whether we should allow broken things like this.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59647/new/

https://reviews.llvm.org/D59647



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59900: [Sema] Fix a crash when nonnull checking

2019-03-27 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

I uh...  I also think this is an @rsmith question, I have no idea.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59900/new/

https://reviews.llvm.org/D59900



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-02 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a subscriber: rsmith.
jlebar added a comment.

Here's one for you:

  __host__ float bar();
  __device__ int bar();
  __host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of `foo`?  :)

I don't believe the right answer is, "float when compiling for host, int when 
compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what 
exactly we're disallowing.  But I bet @rsmith can come up with substantially 
more evil testcases than this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61458/new/

https://reviews.llvm.org/D61458



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61458: [hip] Relax CUDA call restriction within `decltype` context.

2019-05-03 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> At [nvcc] from CUDA 10, that's not acceptable as we are declaring two 
> functions only differ from the return type. It seems CUDA attributes do not 
> contribute to the function signature. clang is quite different here.

Yes, this is an intentional and more relaxed semantics in clang.  It's also 
sort of the linchpin of our mixed-mode compilation strategy, which is very 
different from nvcc's source-to-source splitting strategy.

Back in the day you could trick nvcc into allowing host/device overloading on 
same-signature functions by slapping a `template` on one or both of them.  
Checking just now it seems they fixed this, but I suspect there are still dark 
corners where nvcc relies on effectively the same behavior as we get in clang 
via true overloading.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61458/new/

https://reviews.llvm.org/D61458



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51809: [CUDA][HIP] Fix assertion in LookupSpecialMember

2018-09-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar requested changes to this revision.
jlebar added subscribers: timshen, rsmith.
jlebar added a comment.
This revision now requires changes to proceed.

Sorry for missing tra's ping earlier, I get a lot of HIP email traffic that's 
99% inactionable by me, so I didn't notice my username in tra's earlier email.

@tra, @timshen, and I debugged this IRL for a few hours this afternoon.  The 
result of this is that we don't think the fix in this patch is correct.

Here's what we think is happening.

When clang sees `using A::A` inside of `B`, it has to check whether this 
constructor is legal in `B`.  An example of where this constructor would *not* 
be legal is something like:

  struct NoDefaultConstructor { NoDefaultConstructor() = delete; };
  struct A { A(const int& x) {} }
  struct B {
using A::A;
NoDefaultConstructor c;
  };

The reason this `using A::A` is not legal here is because the `using` statement 
is equivalent to writing

  B(const int& x) : A(x) {}

but this constructor is not legal, because `NoDefaultConstructor` is not 
default-constructible, and a constructor for `B` must explicitly initialize all 
non-default-initializable members.

Here is the code that checks whether the `using` statement is legal:

  
https://github.com/llvm-project/llvm-project-20170507/blob/51b65eeeab0d24268783d6246fd949d9a16e10e8/clang/lib/Sema/SemaDeclCXX.cpp#L11018

This code is kind of a lie!  `DerivedCtor` is the constructor `B(const int& x) 
: A(x) {}` that we've created in response to the `using` declaration.  Notice 
that it's not a default constructor!  In fact, it's not even a special member 
(i.e. it's not a default constructor, copy constructor, move constructor, 
etc.).  But notice that we pass `CXXDefaultConstructor`, and we call the 
function `ShouldDeleteSpecialMember`!

The reason we don't tell the truth here seems to be out of convenience.  To 
determine whether we should delete the new constructor on `B`, it seems like we 
are trying to ask: Would a default constructor on `B` be legal, ignoring the 
fact that `A` has to be explicitly initialized?  That is, the new constructor 
we're creating is just like a default constructor on `B`, except that first it 
initializes `A`.  So we're trying to reuse the default constructor logic.

But eventually our tricks and dishonesty catch up to us, here in CUDA code.  
This patch fixes one instance where we do the wrong thing and hit an assertion, 
but who knows if the code is right in general; simply adding on another layer 
of hack does not seem like the right approach to us.

cc @rsmith


https://reviews.llvm.org/D51809



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors

2018-10-06 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: lib/Sema/SemaDeclCXX.cpp:7231
+if (ICI)
+  CSM = getSpecialMember(MD);
+

LGTM, but perhaps we should use a new variable instead of modifying `CSM` in 
case someone adds code beneath this branch?


https://reviews.llvm.org/D51809



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57771: [CUDA] Add basic support for CUDA-10.1

2019-02-05 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/lib/CodeGen/CGCUDANV.cpp:620
+
+// CUDA version requires calling __cudaRegisterFatBinaryEnd(Handle);
+if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),

Was confused by this sentence for a while.  Maybe change to "Check whether cuda 
version requires "?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57771/new/

https://reviews.llvm.org/D57771



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37539: [CUDA] Add device overloads for non-placement new/delete.

2017-09-06 Thread Justin Lebar via Phabricator via cfe-commits
jlebar marked an inline comment as done.
jlebar added inline comments.



Comment at: clang/lib/Headers/cuda_wrappers/new:79
+}
+__device__ void operator delete[](void *ptr, std::size_t sz) CUDA_NOEXCEPT {
+  ::operator delete(ptr);

tra wrote:
> Is std::size_t intentional here?  You use __SIZE_TYPE__ everywhere else.
Fixed, thanks.


https://reviews.llvm.org/D37539



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37539: [CUDA] Add device overloads for non-placement new/delete.

2017-09-06 Thread Justin Lebar via Phabricator via cfe-commits
jlebar updated this revision to Diff 114104.
jlebar marked an inline comment as done.
jlebar added a comment.

Address review comments.


https://reviews.llvm.org/D37539

Files:
  clang/lib/Headers/cuda_wrappers/new


Index: clang/lib/Headers/cuda_wrappers/new
===
--- clang/lib/Headers/cuda_wrappers/new
+++ clang/lib/Headers/cuda_wrappers/new
@@ -26,22 +26,71 @@
 
 #include_next 
 
-// Device overrides for placement new and delete.
 #pragma push_macro("CUDA_NOEXCEPT")
 #if __cplusplus >= 201103L
 #define CUDA_NOEXCEPT noexcept
 #else
 #define CUDA_NOEXCEPT
 #endif
 
+// Device overrides for non-placement new and delete.
+__device__ inline void *operator new(__SIZE_TYPE__ size) {
+  if (size == 0) {
+size = 1;
+  }
+  return ::malloc(size);
+}
+__device__ inline void *operator new(__SIZE_TYPE__ size,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  return ::operator new(size);
+}
+
+__device__ inline void *operator new[](__SIZE_TYPE__ size) {
+  return ::operator new(size);
+}
+__device__ inline void *operator new[](__SIZE_TYPE__ size,
+   const std::nothrow_t &) {
+  return ::operator new(size);
+}
+
+__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT {
+  if (ptr) {
+::free(ptr);
+  }
+}
+__device__ inline void operator delete(void *ptr,
+   const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ inline void operator delete[](void *ptr,
+ const std::nothrow_t &) CUDA_NOEXCEPT 
{
+  ::operator delete(ptr);
+}
+
+// Sized delete, C++14 only.
+#if __cplusplus >= 201402L
+__device__ void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT 
{
+  ::operator delete(ptr);
+}
+#endif
+
+// Device overrides for placement new and delete.
 __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT 
{
   return __ptr;
 }
 __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) 
CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
 __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
+
 #pragma pop_macro("CUDA_NOEXCEPT")
 
 #endif // include guard


Index: clang/lib/Headers/cuda_wrappers/new
===
--- clang/lib/Headers/cuda_wrappers/new
+++ clang/lib/Headers/cuda_wrappers/new
@@ -26,22 +26,71 @@
 
 #include_next 
 
-// Device overrides for placement new and delete.
 #pragma push_macro("CUDA_NOEXCEPT")
 #if __cplusplus >= 201103L
 #define CUDA_NOEXCEPT noexcept
 #else
 #define CUDA_NOEXCEPT
 #endif
 
+// Device overrides for non-placement new and delete.
+__device__ inline void *operator new(__SIZE_TYPE__ size) {
+  if (size == 0) {
+size = 1;
+  }
+  return ::malloc(size);
+}
+__device__ inline void *operator new(__SIZE_TYPE__ size,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  return ::operator new(size);
+}
+
+__device__ inline void *operator new[](__SIZE_TYPE__ size) {
+  return ::operator new(size);
+}
+__device__ inline void *operator new[](__SIZE_TYPE__ size,
+   const std::nothrow_t &) {
+  return ::operator new(size);
+}
+
+__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT {
+  if (ptr) {
+::free(ptr);
+  }
+}
+__device__ inline void operator delete(void *ptr,
+   const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ inline void operator delete[](void *ptr,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+// Sized delete, C++14 only.
+#if __cplusplus >= 201402L
+__device__ void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+#endif
+
+// Device overrides for placement new and delete.
 __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
 __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
+
 #pragma pop_macro("CUDA_NOEXCEPT")
 
 #endif // include guard
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://list

[PATCH] D37540: [CUDA] Tests for device-side overloads of non-placement new/delete.

2017-09-06 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL312682: [CUDA] Tests for device-side overloads of 
non-placement new/delete. (authored by jlebar).

Changed prior to commit:
  https://reviews.llvm.org/D37540?vs=114094&id=114109#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D37540

Files:
  test-suite/trunk/External/CUDA/CMakeLists.txt
  test-suite/trunk/External/CUDA/new.cu
  test-suite/trunk/External/CUDA/new.reference_output


Index: test-suite/trunk/External/CUDA/CMakeLists.txt
===
--- test-suite/trunk/External/CUDA/CMakeLists.txt
+++ test-suite/trunk/External/CUDA/CMakeLists.txt
@@ -53,6 +53,7 @@
   create_one_local_test(cmath cmath.cu)
   create_one_local_test(complex complex.cu)
   create_one_local_test(math_h math_h.cu)
+  create_one_local_test(new new.cu)
   create_one_local_test(empty empty.cu)
   create_one_local_test(printf printf.cu)
   create_one_local_test(future future.cu)
Index: test-suite/trunk/External/CUDA/new.reference_output
===
--- test-suite/trunk/External/CUDA/new.reference_output
+++ test-suite/trunk/External/CUDA/new.reference_output
@@ -0,0 +1,2 @@
+Success!
+exit 0
Index: test-suite/trunk/External/CUDA/new.cu
===
--- test-suite/trunk/External/CUDA/new.cu
+++ test-suite/trunk/External/CUDA/new.cu
@@ -0,0 +1,69 @@
+// Check that operator new and operator delete work.
+
+#include 
+#include 
+#include 
+
+__device__ void global_new() {
+  void* x = ::operator new(42);
+  assert(x != NULL);
+  ::operator delete(x);
+
+  x = ::operator new(42, std::nothrow);
+  assert(x != NULL);
+  ::operator delete(x, std::nothrow);
+
+  x = ::operator new[](42);
+  assert(x != NULL);
+  ::operator delete[](x);
+
+  x = ::operator new[](42, std::nothrow);
+  assert(x != NULL);
+  ::operator delete[](x, std::nothrow);
+}
+
+__device__ void sized_delete() {
+#if __cplusplus>= 201402L
+  void* x = ::operator new(42);
+  assert(x != NULL);
+  ::operator delete(x, 42);
+
+  x = ::operator new[](42);
+  assert(x != NULL);
+  ::operator delete[](x, 42);
+#endif
+}
+
+__device__ void int_new() {
+  int* x = new int();
+  assert(*x == 0);
+  delete x;
+}
+
+struct Foo {
+  __device__ Foo() : x(42) {}
+  int x;
+};
+__device__ void class_new() {
+  Foo* foo = new Foo();
+  assert(foo->x == 42);
+  delete foo;
+}
+
+__global__ void kernel() {
+  global_new();
+  sized_delete();
+  int_new();
+  class_new();
+}
+
+int main() {
+  kernel<<<32, 32>>>();
+  cudaError_t err = cudaDeviceSynchronize();
+  if (err != cudaSuccess) {
+printf("CUDA error %d\n", (int)err);
+return 1;
+  }
+  printf("Success!\n");
+  return 0;
+}


Index: test-suite/trunk/External/CUDA/CMakeLists.txt
===
--- test-suite/trunk/External/CUDA/CMakeLists.txt
+++ test-suite/trunk/External/CUDA/CMakeLists.txt
@@ -53,6 +53,7 @@
   create_one_local_test(cmath cmath.cu)
   create_one_local_test(complex complex.cu)
   create_one_local_test(math_h math_h.cu)
+  create_one_local_test(new new.cu)
   create_one_local_test(empty empty.cu)
   create_one_local_test(printf printf.cu)
   create_one_local_test(future future.cu)
Index: test-suite/trunk/External/CUDA/new.reference_output
===
--- test-suite/trunk/External/CUDA/new.reference_output
+++ test-suite/trunk/External/CUDA/new.reference_output
@@ -0,0 +1,2 @@
+Success!
+exit 0
Index: test-suite/trunk/External/CUDA/new.cu
===
--- test-suite/trunk/External/CUDA/new.cu
+++ test-suite/trunk/External/CUDA/new.cu
@@ -0,0 +1,69 @@
+// Check that operator new and operator delete work.
+
+#include 
+#include 
+#include 
+
+__device__ void global_new() {
+  void* x = ::operator new(42);
+  assert(x != NULL);
+  ::operator delete(x);
+
+  x = ::operator new(42, std::nothrow);
+  assert(x != NULL);
+  ::operator delete(x, std::nothrow);
+
+  x = ::operator new[](42);
+  assert(x != NULL);
+  ::operator delete[](x);
+
+  x = ::operator new[](42, std::nothrow);
+  assert(x != NULL);
+  ::operator delete[](x, std::nothrow);
+}
+
+__device__ void sized_delete() {
+#if __cplusplus>= 201402L
+  void* x = ::operator new(42);
+  assert(x != NULL);
+  ::operator delete(x, 42);
+
+  x = ::operator new[](42);
+  assert(x != NULL);
+  ::operator delete[](x, 42);
+#endif
+}
+
+__device__ void int_new() {
+  int* x = new int();
+  assert(*x == 0);
+  delete x;
+}
+
+struct Foo {
+  __device__ Foo() : x(42) {}
+  int x;
+};
+__device__ void class_new() {
+  Foo* foo = new Foo();
+  assert(foo->x == 42);
+  delete foo;
+}
+
+__global__ void kernel() {
+  global_new();
+  sized_delete();
+  int_new();
+  class_new();
+}
+
+int main() {
+  kernel<<<32, 32>>>();
+  cudaError_t err = cudaDeviceSynchr

[PATCH] D37539: [CUDA] Add device overloads for non-placement new/delete.

2017-09-06 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL312681: [CUDA] Add device overloads for non-placement 
new/delete. (authored by jlebar).

Changed prior to commit:
  https://reviews.llvm.org/D37539?vs=114104&id=114108#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D37539

Files:
  cfe/trunk/lib/Headers/cuda_wrappers/new


Index: cfe/trunk/lib/Headers/cuda_wrappers/new
===
--- cfe/trunk/lib/Headers/cuda_wrappers/new
+++ cfe/trunk/lib/Headers/cuda_wrappers/new
@@ -26,22 +26,71 @@
 
 #include_next 
 
-// Device overrides for placement new and delete.
 #pragma push_macro("CUDA_NOEXCEPT")
 #if __cplusplus >= 201103L
 #define CUDA_NOEXCEPT noexcept
 #else
 #define CUDA_NOEXCEPT
 #endif
 
+// Device overrides for non-placement new and delete.
+__device__ inline void *operator new(__SIZE_TYPE__ size) {
+  if (size == 0) {
+size = 1;
+  }
+  return ::malloc(size);
+}
+__device__ inline void *operator new(__SIZE_TYPE__ size,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  return ::operator new(size);
+}
+
+__device__ inline void *operator new[](__SIZE_TYPE__ size) {
+  return ::operator new(size);
+}
+__device__ inline void *operator new[](__SIZE_TYPE__ size,
+   const std::nothrow_t &) {
+  return ::operator new(size);
+}
+
+__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT {
+  if (ptr) {
+::free(ptr);
+  }
+}
+__device__ inline void operator delete(void *ptr,
+   const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ inline void operator delete[](void *ptr,
+ const std::nothrow_t &) CUDA_NOEXCEPT 
{
+  ::operator delete(ptr);
+}
+
+// Sized delete, C++14 only.
+#if __cplusplus >= 201402L
+__device__ void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT 
{
+  ::operator delete(ptr);
+}
+#endif
+
+// Device overrides for placement new and delete.
 __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT 
{
   return __ptr;
 }
 __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) 
CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
 __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
+
 #pragma pop_macro("CUDA_NOEXCEPT")
 
 #endif // include guard


Index: cfe/trunk/lib/Headers/cuda_wrappers/new
===
--- cfe/trunk/lib/Headers/cuda_wrappers/new
+++ cfe/trunk/lib/Headers/cuda_wrappers/new
@@ -26,22 +26,71 @@
 
 #include_next 
 
-// Device overrides for placement new and delete.
 #pragma push_macro("CUDA_NOEXCEPT")
 #if __cplusplus >= 201103L
 #define CUDA_NOEXCEPT noexcept
 #else
 #define CUDA_NOEXCEPT
 #endif
 
+// Device overrides for non-placement new and delete.
+__device__ inline void *operator new(__SIZE_TYPE__ size) {
+  if (size == 0) {
+size = 1;
+  }
+  return ::malloc(size);
+}
+__device__ inline void *operator new(__SIZE_TYPE__ size,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  return ::operator new(size);
+}
+
+__device__ inline void *operator new[](__SIZE_TYPE__ size) {
+  return ::operator new(size);
+}
+__device__ inline void *operator new[](__SIZE_TYPE__ size,
+   const std::nothrow_t &) {
+  return ::operator new(size);
+}
+
+__device__ inline void operator delete(void* ptr) CUDA_NOEXCEPT {
+  if (ptr) {
+::free(ptr);
+  }
+}
+__device__ inline void operator delete(void *ptr,
+   const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+__device__ inline void operator delete[](void* ptr) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ inline void operator delete[](void *ptr,
+ const std::nothrow_t &) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+
+// Sized delete, C++14 only.
+#if __cplusplus >= 201402L
+__device__ void operator delete(void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+__device__ void operator delete[](void *ptr, __SIZE_TYPE__ size) CUDA_NOEXCEPT {
+  ::operator delete(ptr);
+}
+#endif
+
+// Device overrides for placement new and delete.
 __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
   return __ptr;
 }
 __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
 __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT

[PATCH] D37548: [CUDA] When compilation fails, print the compilation mode.

2017-09-06 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
Herald added a subscriber: sanjoy.

That is, instead of "1 error generated", we now say "1 error generated
when compiling for sm_35".

This (partially) solves a usability foogtun wherein e.g. users call a
function that's only defined on sm_60 when compiling for sm_35, and they
get an unhelpful error message.


https://reviews.llvm.org/D37548

Files:
  clang/lib/Frontend/CompilerInstance.cpp
  clang/test/SemaCUDA/error-includes-mode.cu


Index: clang/test/SemaCUDA/error-includes-mode.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/error-includes-mode.cu
@@ -0,0 +1,7 @@
+// RUN: not %clang_cc1 -fsyntax-only %s 2>&1 | FileCheck --check-prefix HOST %s
+// RUN: not %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_35 \
+// RUN:   -fcuda-is-device -fsyntax-only %s 2>&1 | FileCheck --check-prefix 
SM35 %s
+
+// HOST: 1 error generated when compiling for host
+// SM35: 1 error generated when compiling for sm_35
+error;
Index: clang/lib/Frontend/CompilerInstance.cpp
===
--- clang/lib/Frontend/CompilerInstance.cpp
+++ clang/lib/Frontend/CompilerInstance.cpp
@@ -1003,8 +1003,17 @@
   OS << " and ";
 if (NumErrors)
   OS << NumErrors << " error" << (NumErrors == 1 ? "" : "s");
-if (NumWarnings || NumErrors)
-  OS << " generated.\n";
+if (NumWarnings || NumErrors) {
+  OS << " generated";
+  if (getLangOpts().CUDA) {
+if (!getLangOpts().CUDAIsDevice) {
+  OS << " when compiling for host";
+} else {
+  OS << " when compiling for " << getTargetOpts().CPU;
+}
+  }
+  OS << ".\n";
+}
   }
 
   if (getFrontendOpts().ShowStats) {


Index: clang/test/SemaCUDA/error-includes-mode.cu
===
--- /dev/null
+++ clang/test/SemaCUDA/error-includes-mode.cu
@@ -0,0 +1,7 @@
+// RUN: not %clang_cc1 -fsyntax-only %s 2>&1 | FileCheck --check-prefix HOST %s
+// RUN: not %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_35 \
+// RUN:   -fcuda-is-device -fsyntax-only %s 2>&1 | FileCheck --check-prefix SM35 %s
+
+// HOST: 1 error generated when compiling for host
+// SM35: 1 error generated when compiling for sm_35
+error;
Index: clang/lib/Frontend/CompilerInstance.cpp
===
--- clang/lib/Frontend/CompilerInstance.cpp
+++ clang/lib/Frontend/CompilerInstance.cpp
@@ -1003,8 +1003,17 @@
   OS << " and ";
 if (NumErrors)
   OS << NumErrors << " error" << (NumErrors == 1 ? "" : "s");
-if (NumWarnings || NumErrors)
-  OS << " generated.\n";
+if (NumWarnings || NumErrors) {
+  OS << " generated";
+  if (getLangOpts().CUDA) {
+if (!getLangOpts().CUDAIsDevice) {
+  OS << " when compiling for host";
+} else {
+  OS << " when compiling for " << getTargetOpts().CPU;
+}
+  }
+  OS << ".\n";
+}
   }
 
   if (getFrontendOpts().ShowStats) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37576: [CUDA] Added rudimentary support for CUDA-9 and sm_70.

2017-09-07 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Looks great.


https://reviews.llvm.org/D37576



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37548: [CUDA] When compilation fails, print the compilation mode.

2017-09-07 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL312736: [CUDA] When compilation fails, print the compilation 
mode. (authored by jlebar).

Changed prior to commit:
  https://reviews.llvm.org/D37548?vs=114112&id=114222#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D37548

Files:
  cfe/trunk/lib/Frontend/CompilerInstance.cpp
  cfe/trunk/test/SemaCUDA/error-includes-mode.cu


Index: cfe/trunk/test/SemaCUDA/error-includes-mode.cu
===
--- cfe/trunk/test/SemaCUDA/error-includes-mode.cu
+++ cfe/trunk/test/SemaCUDA/error-includes-mode.cu
@@ -0,0 +1,7 @@
+// RUN: not %clang_cc1 -fsyntax-only %s 2>&1 | FileCheck --check-prefix HOST %s
+// RUN: not %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_35 \
+// RUN:   -fcuda-is-device -fsyntax-only %s 2>&1 | FileCheck --check-prefix 
SM35 %s
+
+// HOST: 1 error generated when compiling for host
+// SM35: 1 error generated when compiling for sm_35
+error;
Index: cfe/trunk/lib/Frontend/CompilerInstance.cpp
===
--- cfe/trunk/lib/Frontend/CompilerInstance.cpp
+++ cfe/trunk/lib/Frontend/CompilerInstance.cpp
@@ -1003,8 +1003,17 @@
   OS << " and ";
 if (NumErrors)
   OS << NumErrors << " error" << (NumErrors == 1 ? "" : "s");
-if (NumWarnings || NumErrors)
-  OS << " generated.\n";
+if (NumWarnings || NumErrors) {
+  OS << " generated";
+  if (getLangOpts().CUDA) {
+if (!getLangOpts().CUDAIsDevice) {
+  OS << " when compiling for host";
+} else {
+  OS << " when compiling for " << getTargetOpts().CPU;
+}
+  }
+  OS << ".\n";
+}
   }
 
   if (getFrontendOpts().ShowStats) {


Index: cfe/trunk/test/SemaCUDA/error-includes-mode.cu
===
--- cfe/trunk/test/SemaCUDA/error-includes-mode.cu
+++ cfe/trunk/test/SemaCUDA/error-includes-mode.cu
@@ -0,0 +1,7 @@
+// RUN: not %clang_cc1 -fsyntax-only %s 2>&1 | FileCheck --check-prefix HOST %s
+// RUN: not %clang_cc1 -triple nvptx-unknown-unknown -target-cpu sm_35 \
+// RUN:   -fcuda-is-device -fsyntax-only %s 2>&1 | FileCheck --check-prefix SM35 %s
+
+// HOST: 1 error generated when compiling for host
+// SM35: 1 error generated when compiling for sm_35
+error;
Index: cfe/trunk/lib/Frontend/CompilerInstance.cpp
===
--- cfe/trunk/lib/Frontend/CompilerInstance.cpp
+++ cfe/trunk/lib/Frontend/CompilerInstance.cpp
@@ -1003,8 +1003,17 @@
   OS << " and ";
 if (NumErrors)
   OS << NumErrors << " error" << (NumErrors == 1 ? "" : "s");
-if (NumWarnings || NumErrors)
-  OS << " generated.\n";
+if (NumWarnings || NumErrors) {
+  OS << " generated";
+  if (getLangOpts().CUDA) {
+if (!getLangOpts().CUDAIsDevice) {
+  OS << " when compiling for host";
+} else {
+  OS << " when compiling for " << getTargetOpts().CPU;
+}
+  }
+  OS << ".\n";
+}
   }
 
   if (getFrontendOpts().ShowStats) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37906: [CUDA] Work around a new quirk in CUDA9 headers.

2017-09-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

This is a bit of a Chesterton's Fence -- do we know why they're doing this?

I guess it's probably going to be OK because our overriding semantics will make 
it OK, and our test-suite tests (should) exercise all of math.h.  But I'm still 
a little worried about it.


https://reviews.llvm.org/D37906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37906: [CUDA] Work around a new quirk in CUDA9 headers.

2017-09-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

> BTW, this change essentially augments the job that the "#undef GNUC" above 
> used to do in older CUDA versions. CUDA9 replaced GNUC with _GLIBCXX_MATH_H 
> in CUDA-9 in some places.

Ah, that's right.  Okay then.  :)


https://reviews.llvm.org/D37906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38090: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.

2017-09-20 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added inline comments.



Comment at: clang/lib/Headers/__clang_cuda_intrinsics.h:161
+#endif // __CUDA_VERSION >= 9000 && (!defined(__CUDA_ARCH__) || __CUDA_ARCH__ 
>=
+   // 300)
+

Nit, better linebreaking in the comment?



Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:3744
+  Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, 
llvm_i32_ty],
+[IntrNoMem], "llvm.nvvm.shfl.sync.down.i32">,
+  GCCBuiltin<"__nvvm_shfl_sync_down_i32">;

IntrConvergent?


https://reviews.llvm.org/D38090



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38113: OpenCL: Assume functions are convergent

2017-09-20 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

LGTM for the changes other than the test (I don't read opencl).


https://reviews.llvm.org/D38113



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38147: [CUDA] Fixed order of words in the names of shfl builtins.

2017-09-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Naturally they're different orders in the PTX and CUDA.  :)


https://reviews.llvm.org/D38147



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38113: OpenCL: Assume functions are convergent

2017-09-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> The problem of adding this attribute conservatively for all functions is that 
> it prevents some optimizations to happen.

function-attrs removes the convergent attribute from anything it can prove does 
not call a convergent function.

I agree this is a nonoptimal solution.  A better way would be to assume that 
any cuda/opencl function is convergent and then figure out what isn't.  This 
would let you generate correct cuda/opencl code in a front-end without worrying 
about this attribute.

One problem with this approach is, suppose you call an external function, whose 
body llvm cannot see.  We need some way to mark this function as 
not-convergent, so that its callers can also be inferred to be not convergent.  
LLVM currently only has a "convergent" attribute.  In the absence of a new 
"not-convergent" attribute, the only way we can tell LLVM that this external 
function is not convergent is to leave off the attribute.  But then this means 
we assume all functions without the convergent attribute are not convergent, 
and thus we have to add the attribute everywhere, as this patch does.

OTOH if we added a not-convergent attribute, we'd have to have rules about what 
happens if both attributes are on a function, and everywhere that checked 
whether a function was convergent would become significantly more complicated.  
I'm not sure that's worthwhile.


https://reviews.llvm.org/D38113



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38113: OpenCL: Assume functions are convergent

2017-09-22 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> Yes, that's why if it would be responsibility of the kernel developer to 
> specify this explicitly we could avoid this complications in the compiler. 
> But if we add it into the language now we still need to support the 
> correctness for the code written with the earlier standards. And also it adds 
> the complexity to the programmer to make sure it's specified correctly. But I 
> think it is still worth discussing with the spec committee.

To me this seems like a small complication in the compiler to avoid an 
extremely easy bug for users to write.  But, not my language.  :)

> The deduction of convergent is indeed tricky. So if there is any function in 
> the CFG path which is marked as convergent ( or "non-convergent") this will 
> have to be back propagated to the callers unless we force to explicitly 
> specify it but it would be too error prone for the kernel writers I guess.

This probably isn't the right forum to discuss proposals to change the LLVM IR 
spec.  But if you want to propose something like this, please cc me on the 
thread, I probably have opinions.  :)

> Btw, what is the advantage of having "non-convergent" instead and why is the 
> deduction of convergent property more complicated with it?

The advantage of switching LLVM IR to non-convergent would be that front-ends 
wouldn't have the bug that arsenm is fixing here.  "Unadorned" IR would be 
correct.  And, in the absence of external or unanalyzable indirect calls, you'd 
get the same performance as we get today even if you had no annotations.

The complexity I was referring to occurs if you add the non-convergent 
attribute and keep the convergent attr.  I don't think we want that.

But I'm not really proposing a change to the convergent attribute in LLVM IR -- 
it's probably better to leave it as-is, since we all understand how it works, 
it ain't broke.


https://reviews.llvm.org/D38113



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D56033: [CUDA] Treat extern global variable shadows same as regular extern vars.

2018-12-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/test/CodeGenCUDA/device-stub.cu:51
 
+// external device-side variables with definitiions should generate
+// definitions for the shadows.

definiitions


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56033/new/

https://reviews.llvm.org/D56033



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter

2019-01-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Without reading the patch in detail (sorry) but looking mainly at the testcase: 
It looks like we're not checking how overloading and `__host__ __device__` 
functions play into this.  Maybe there are some additional edge-cases to 
explore/check.

Just some examples:

Will we DTRT and parse `bar` call as calling the `device` overload of `bar` in

  __host__ void bar() {}
  __device__ int bar() { return 0; }
  __host__ __device__ void foo() { int x = bar(); }
  template  __global__ void kernel() { devF();}
  
  kernel();

?  Also will we know that we don't have to codegen `foo` for host (so `foo` is 
actually able to do things that only device functions can)?

Another one: How should the following template be instantiated?

  __host__ constexpr int n() { return 0; }
  __device__ constexpr int n() { return 1; }
  template  __global__ void kernel() {}
  
  kernel

Presumably the call to `n` should be the host one?  That seems correct to me, 
but then it's pretty odd that a function pointer template argument would point 
to a *device* function.  Maybe that's the right thing, but I bet I can come up 
with something weird, like:

  __host__ void bar() {}
  __device__ int bar() { return 0; }
  __device__ auto baz() -> decltype(foo()) {} // which n() does it call?  
Presumably host, but:
  __device__ auto baz() -> decltype(bar()) {}  // does baz return void or int?  
Presumably...the device one, int?

Now mix in templates and sizeof and...yeah.  Rife for opportunities.  :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56411/new/

https://reviews.llvm.org/D56411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter

2019-01-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

  __host__ void bar() {}
  __device__ int bar() { return 0; }
  __host__ __device__ void foo() { int x = bar(); }
  template  __global__ void kernel() { devF();}
  
  kernel();



> we DTRT for this case. Here __host__ bar needs to return int since foo() 
> expects that. will add a test for that.

`__host__ bar()` should not need to return int if `foo` is inline (or 
templated), because then we should never codegen `foo` for host.  I guess my 
question is, we should be sure that `kernel()` does not force an 
inline/templated `foo` to be codegen'ed for host.  (Sorry that wasn't more 
clear before.)

> I think n() should be resolved in the containing function context. n itself 
> is not template argument. the result of n() is.

Yes, that's a fair way to think about it.  It just is a bit weird that in this 
context `&n` refers to one function but `n()` refers to another.  Maybe that's 
unavoidable.  :shrug:

  __host__ void bar() {}
  __device__ int bar() { return 0; }
  __device__ auto baz() -> decltype(foo()) {} // which n() does it call?  
Presumably host, but:
  __device__ auto baz() -> decltype(bar()) {}  // does baz return void or int?  
Presumably...the device one, int?
  Now mix in templates and sizeof and...yeah. Rife for opportunities. :)

> I think this example is different from the issue which this patch tries to 
> address.

Agreed.

> Therefore I tend to suggest we keep things as they are, i.e., bar is 
> host/device resolved in its containing function context.

I'm not sure what is the containing function context in these examples, since 
all of the definitions don't have a containing function.

Currently `baz()` returns void, but it sort of seems to me like the decltype 
should morally be executed within a `__device__` context?

Anyway I know much of this is a distraction from your patch.  So long as we 
have `__host__ __device__` tests I'm happy here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56411/new/

https://reviews.llvm.org/D56411



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45827: [CUDA] Enable CUDA compilation with CUDA-9.2

2018-04-19 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Well that was unusually easy...


https://reviews.llvm.org/D45827



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

@rsmith friendly ping on this one.


https://reviews.llvm.org/D48036



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called

2018-06-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> @jlebar, is the change I made to call-host-fn-from-device.cu correct?

I don't think so -- that's a change in overloading behavior afaict.


Repository:
  rC Clang

https://reviews.llvm.org/D47757



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called

2018-06-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

In https://reviews.llvm.org/D47757#1142886, @ahatanak wrote:

> I mean ToT clang (without my patch applied) seems to select the non-sized 
> host version 'T::operator delete(void*)'.


OK, if this is just making an error out of something which previously silently 
didn't work (and should result in a compile error further down the line when we 
try to and can't resolve that function), then this is totally fine.


Repository:
  rC Clang

https://reviews.llvm.org/D47757



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-29 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> Looks right to me (other than the missing constexpr in C++14 onwards). Though 
> this is subtle enough that I suspect the only way to know for sure is to try 
> it.

Thanks a lot, Richard.  FTR the missing constexpr is in 
https://reviews.llvm.org/D48151.


https://reviews.llvm.org/D48036



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48036: [CUDA] Make min/max shims host+device.

2018-06-29 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL336025: [CUDA] Make min/max shims host+device. (authored by 
jlebar, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D48036?vs=150790&id=153593#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D48036

Files:
  cfe/trunk/lib/Headers/cuda_wrappers/algorithm


Index: cfe/trunk/lib/Headers/cuda_wrappers/algorithm
===
--- cfe/trunk/lib/Headers/cuda_wrappers/algorithm
+++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm
@@ -69,28 +69,28 @@
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }


Index: cfe/trunk/lib/Headers/cuda_wrappers/algorithm
===
--- cfe/trunk/lib/Headers/cuda_wrappers/algorithm
+++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm
@@ -69,28 +69,28 @@
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __device__ const __T &
+inline __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48151: [CUDA] Make __host__/__device__ min/max overloads constexpr in C++14.

2018-06-29 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL336026: [CUDA] Make __host__/__device__ min/max overloads 
constexpr in C++14. (authored by jlebar, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D48151?vs=151248&id=153594#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D48151

Files:
  cfe/trunk/lib/Headers/cuda_wrappers/algorithm


Index: cfe/trunk/lib/Headers/cuda_wrappers/algorithm
===
--- cfe/trunk/lib/Headers/cuda_wrappers/algorithm
+++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm
@@ -67,34 +67,43 @@
 #endif
 #endif
 
+#pragma push_macro("_CPP14_CONSTEXPR")
+#if __cplusplus >= 201402L
+#define _CPP14_CONSTEXPR constexpr
+#else
+#define _CPP14_CONSTEXPR
+#endif
+
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
 
+#pragma pop_macro("_CPP14_CONSTEXPR")
+
 #ifdef _LIBCPP_END_NAMESPACE_STD
 _LIBCPP_END_NAMESPACE_STD
 #else


Index: cfe/trunk/lib/Headers/cuda_wrappers/algorithm
===
--- cfe/trunk/lib/Headers/cuda_wrappers/algorithm
+++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm
@@ -67,34 +67,43 @@
 #endif
 #endif
 
+#pragma push_macro("_CPP14_CONSTEXPR")
+#if __cplusplus >= 201402L
+#define _CPP14_CONSTEXPR constexpr
+#else
+#define _CPP14_CONSTEXPR
+#endif
+
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__a, __b) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 max(const __T &__a, const __T &__b) {
   return __a < __b ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b, __Cmp __cmp) {
   return __cmp(__b, __a) ? __b : __a;
 }
 
 template 
 __attribute__((enable_if(true, "")))
-inline __host__ __device__ const __T &
+inline _CPP14_CONSTEXPR __host__ __device__ const __T &
 min(const __T &__a, const __T &__b) {
   return __a < __b ? __a : __b;
 }
 
+#pragma pop_macro("_CPP14_CONSTEXPR")
+
 #ifdef _LIBCPP_END_NAMESPACE_STD
 _LIBCPP_END_NAMESPACE_STD
 #else
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48152: [CUDA] Add tests that, in C++14 mode, min/max are constexpr.

2018-06-29 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL336030: [CUDA] Add tests that, in C++14 mode, min/max are 
constexpr. (authored by jlebar, committed by ).

Repository:
  rL LLVM

https://reviews.llvm.org/D48152

Files:
  test-suite/trunk/External/CUDA/algorithm.cu


Index: test-suite/trunk/External/CUDA/algorithm.cu
===
--- test-suite/trunk/External/CUDA/algorithm.cu
+++ test-suite/trunk/External/CUDA/algorithm.cu
@@ -42,6 +42,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
@@ -56,6 +58,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 


Index: test-suite/trunk/External/CUDA/algorithm.cu
===
--- test-suite/trunk/External/CUDA/algorithm.cu
+++ test-suite/trunk/External/CUDA/algorithm.cu
@@ -42,6 +42,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
@@ -56,6 +58,8 @@
   assert(std::minmax(1, 0).second == 1);
   assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
   assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+  constexpr auto min = std::min(1, 2);
+  constexpr auto max = std::max(1, 2);
 #endif
 }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D48037: [CUDA] Add tests to ensure that std::min/max can be called from __host__ __device__ functions.

2018-06-29 Thread Justin Lebar via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL336029: [CUDA] Add tests to ensure that std::min/max can be 
called from __host__… (authored by jlebar, committed by ).

Repository:
  rL LLVM

https://reviews.llvm.org/D48037

Files:
  test-suite/trunk/External/CUDA/algorithm.cu


Index: test-suite/trunk/External/CUDA/algorithm.cu
===
--- test-suite/trunk/External/CUDA/algorithm.cu
+++ test-suite/trunk/External/CUDA/algorithm.cu
@@ -17,10 +17,16 @@
 __device__ void min() {
   assert(std::min(0, 1) == 0);
 }
+__host__ __device__ void min_hd() {
+  assert(std::min(0, 1) == 0);
+}
 
 __device__ void max() {
   assert(std::max(0, 1) == 1);
 }
+__host__ __device__ void max_hd() {
+  assert(std::max(0, 1) == 1);
+}
 
 // Clang has device-side shims implementing std::min and std::max for scalars
 // starting in C++11, but doesn't implement minimax or std::min/max on
@@ -39,10 +45,27 @@
 #endif
 }
 
+// Same tests as cpp14_tests, but from a host-device context.
+__host__ __device__ void cpp14_tests_hd() {
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
+  assert(std::greater()(1, 0));
+  assert(std::min({5, 1, 10}) == 1);
+  assert(std::max({5, 1, 10}, std::less()) == 10);
+
+  assert(std::minmax(1, 0).first == 0);
+  assert(std::minmax(1, 0).second == 1);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+#endif
+}
+
 __global__ void kernel() {
   min();
+  min_hd();
   max();
+  max_hd();
   cpp14_tests();
+  cpp14_tests_hd();
 }
 
 int main() {
@@ -52,6 +75,11 @@
 printf("CUDA error %d\n", (int)err);
 return 1;
   }
+
+  min_hd();
+  max_hd();
+  cpp14_tests_hd();
+
   printf("Success!\n");
   return 0;
 }


Index: test-suite/trunk/External/CUDA/algorithm.cu
===
--- test-suite/trunk/External/CUDA/algorithm.cu
+++ test-suite/trunk/External/CUDA/algorithm.cu
@@ -17,10 +17,16 @@
 __device__ void min() {
   assert(std::min(0, 1) == 0);
 }
+__host__ __device__ void min_hd() {
+  assert(std::min(0, 1) == 0);
+}
 
 __device__ void max() {
   assert(std::max(0, 1) == 1);
 }
+__host__ __device__ void max_hd() {
+  assert(std::max(0, 1) == 1);
+}
 
 // Clang has device-side shims implementing std::min and std::max for scalars
 // starting in C++11, but doesn't implement minimax or std::min/max on
@@ -39,10 +45,27 @@
 #endif
 }
 
+// Same tests as cpp14_tests, but from a host-device context.
+__host__ __device__ void cpp14_tests_hd() {
+#if __cplusplus >= 201402L && STDLIB_VERSION >= 2014
+  assert(std::greater()(1, 0));
+  assert(std::min({5, 1, 10}) == 1);
+  assert(std::max({5, 1, 10}, std::less()) == 10);
+
+  assert(std::minmax(1, 0).first == 0);
+  assert(std::minmax(1, 0).second == 1);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).first == -10);
+  assert(std::minmax({0, 10, -10, 100}, std::less()).second == 100);
+#endif
+}
+
 __global__ void kernel() {
   min();
+  min_hd();
   max();
+  max_hd();
   cpp14_tests();
+  cpp14_tests_hd();
 }
 
 int main() {
@@ -52,6 +75,11 @@
 printf("CUDA error %d\n", (int)err);
 return 1;
   }
+
+  min_hd();
+  max_hd();
+  cpp14_tests_hd();
+
   printf("Success!\n");
   return 0;
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83893: [CUDA][HIP] Always defer diagnostics for wrong-sided reference

2020-07-15 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

tra and I talked offline and I...think this makes sense.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83893/new/

https://reviews.llvm.org/D83893



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA.

2020-08-04 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

LGTM, and can we write a test in the test-suite?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85236/new/

https://reviews.llvm.org/D85236

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40453: Add the nvidia-cuda-toolkit Debian package path to search path

2017-11-28 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

I defer to tra on this.


https://reviews.llvm.org/D40453



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40673: Add _Float128 as alias to __float128 to enable compilations on Fedora27/glibc2-26

2017-11-30 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

LGTM for the CUDA test changes.


https://reviews.llvm.org/D40673



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121259: [clang] Fix CodeGenAction for LLVM IR MemBuffers

2022-03-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Congrats on your first patch!

Can Daniele or Shangwu land this for you, or do you need me to?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121259/new/

https://reviews.llvm.org/D121259

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119207: [CUDA][SPIRV] Convert CUDA kernels to SPIR-V kernels

2022-02-07 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> [CUDA][SPIRV] Convert CUDA kernels to SPIR-V kernels

Rephrase this?  This patch is about kernel *arguments*, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119207: [CUDA][SPIRV] Convert CUDA kernels to SPIR-V kernels

2022-02-08 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added inline comments.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:10322
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup

I am surprised by this change.  Is the language mode HIP only when compiling 
for device?  Or are you intentionally changing the behavior in HIP mode?

Same in SPIR.h


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

In D119207#3327476 , @shangwuyao 
wrote:

> Thanks for the review, if it looks good, can we get this to land now? 
> Otherwise more comments are welcome!

I'll land this for you!

At some point you should get commit access yourself, Shangwu.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-17 Thread Justin Lebar via Phabricator via cfe-commits
This revision was not accepted when it landed; it landed in state "Needs 
Review".
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9de4fc0f2d3b: [CUDA][SPIRV] Assign global address space to 
CUDA kernel arguments (authored by shangwuyao, committed by jlebar).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

Files:
  clang/lib/Basic/Targets/SPIR.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDASPIRV/kernel-argument.cu


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,17 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef 
%output.coerce)
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address space to CrossWorkGroup
-// pointers for HIPSPV. When the language mode is HIP, the SPIRTargetInfo
-// maps cuda_device to SPIR-V's CrossWorkGroup address space.
+// pointers for HIPSPV/CUDASPV. When the language mode is HIP/CUDA, the
+// SPIRTargetInfo maps cuda_device to SPIR-V's CrossWorkGroup address 
space.
 llvm::Type *LTy = CGT.ConvertType(Ty);
 auto DefaultAS = getContext().getTargetAddressSpace(LangAS::Default);
 auto GlobalAS = getContext().getTargetAddressSpace(LangAS::cuda_device);
Index: clang/lib/Basic/Targets/SPIR.h
===
--- clang/lib/Basic/Targets/SPIR.h
+++ clang/lib/Basic/Targets/SPIR.h
@@ -144,16 +144,16 @@
 // FIXME: SYCL specification considers unannotated pointers and references
 // to be pointing to the generic address space. See section 5.9.3 of
 // SYCL 2020 specification.
-// Currently, there is no way of representing SYCL's and HIP's default
+// Currently, there is no way of representing SYCL's and HIP/CUDA's default
 // address space language semantic along with the semantics of embedded C's
 // default address space in the same address space map. Hence the map needs
 // to be reset to allow mapping to the desired value of 'Default' entry for
-// SYCL and HIP.
+// SYCL and HIP/CUDA.
 setAddressSpaceMap(
 /*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
-// The address mapping from HIP language for device code is only 
defined
-// for SPIR-V.
-(getTriple().isSPIRV() && Opts.HIP && Opts.CUDAIsDevice));
+// The address mapping from HIP/CUDA language for device code is only
+// defined for SPIR-V.
+(getTriple().isSPIRV() && Opts.CUDAIsDevice));
   }
 
   void setSupportedOpenCLOpts() override {


Index: clang/test/CodeGenCUDASPIRV/kernel-argument.cu
===
--- /dev/null
+++ clang/test/CodeGenCUDASPIRV/kernel-argument.cu
@@ -0,0 +1,17 @@
+// Tests CUDA kernel arguments get global address space when targetting SPIR-V.
+
+// REQUIRES: clang-driver
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv32 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// RUN: %clang -emit-llvm --cuda-device-only --offload=spirv64 \
+// RUN:   -nocudalib -nocudainc %s -o %t.bc -c 2>&1
+// RUN: llvm-dis %t.bc -o %t.ll
+// RUN: FileCheck %s --input-file=%t.ll
+
+// CHECK: define spir_kernel void @_Z6kernelPi(i32 addrspace(1)* noundef %output.coerce)
+
+__attribute__((global)) void kernel(int* output) { *output = 1; }
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -10320,10 +10320,10 @@
 }
 
 ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
-  if (getContext().getLangOpts().HIP) {
+  if (getContext().getLangOpts().CUDAIsDevice) {
 // Coerce pointer arguments with default address 

[PATCH] D119207: [CUDA][SPIRV] Assign global address space to CUDA kernel arguments

2022-02-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

  commit 9de4fc0f2d3b60542956f7e5254951d049edeb1f (HEAD -> main, origin/main, 
origin/HEAD)
  Author: Shangwu Yao 
  Date:   Thu Feb 17 09:38:06 2022 -0800
  
  [CUDA][SPIRV] Assign global address space to CUDA kernel arguments
  
  This patch converts CUDA pointer kernel arguments with default address 
space to
  CrossWorkGroup address space (__global in OpenCL). This is because 
Generic or
  Function (OpenCL's private) is not supported as storage class for kernel 
pointer types.
  
  Differential Revision: https://reviews.llvm.org/D119207


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119207/new/

https://reviews.llvm.org/D119207

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars.

2020-09-24 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

I know it comes in a separate change, but can we add a check to the test-suite?




Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:381
+__device__ inline __cuda_builtin_threadIdx_t::operator dim3() const {
+  return {x, y, z};
+}

This is a C++11-ism (right?).  Do we support compiling without C++11?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88250/new/

https://reviews.llvm.org/D88250

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-25 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

wha... As you know, `const` doesn't mean anything, that can be const-casted 
away.  And then you'll be able to observe that this nominally-static variable 
is just a normal variable.

Since this doesn't make sense and contradicts their documentation, I'm tempted 
to say this should only apply to the nvidia headers.  Is that technically 
possible?  And then we file a bug against nvidia and/or ask Bryce?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-28 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

OK, backing up, what are the semantics of `static` on `__constant__`, 
`__device__`, and `__shared__`?

- My understanding is that `__shared__` behaves the same whether or not it's 
static.  It's not equivalent to `namespace a { __shared__ int c = 4; }`, 
because that's illegal.
- Does `__constant__` behave the same whether or not it's static?  A static 
`__constant__` is equivalent to `namespace a { __constant__ int c = 4; }`, and 
a non-static `__constant__` is *also* equivalent to that?
- And `__device__` does not behave the same whether or not it's static?  In 
function scope `__device__ int x;` is a variable local to the thread.  Whereas 
in global scope `__device__ int x;` is a global variable that lives in device 
memory (?).  In function scope `static __device__ int x;` is equivalent to 
`static int x;` which is equivalent to `int x;` in namespace scope?

Should we mandate that you initialize `static __constant__` variables in 
function scope?  That is, if you write `static __constant__ int x;` in a 
function, then x is always uninitialized (right)?  You should do `static 
__constant__ int x = 42;`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-28 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

OK, now I'm starting to I understand this change..

Before, in function scope, we allow static const/non-const `__shared__`, and 
allow static const so long as it's not `__device__` or `__constant__`.

- `static` -> error?  (I understood us saying above that it is, but now that I 
read the code, isn't it saying it's an error?)
- `static const` -> allowed
- `static __device__` -> error
- `static const __device__` -> error
- `static __constant__` -> error
- `static const __constant__` -> error

After, in function scope, the rule is, allow static const/non-const 
`__shared__` or anything that's `static const`.

- `static` -> error, must be const
- `static const` -> allowed
- `static __device__` -> error, must be const
- `static const __device__` -> allowed
- `static __constant__` -> error, must be const
- `static const __constant__` -> allowed

I guess my question when I write out this table is, why shouldn't it be like 
this?

- `static` -> allowed
- `static const` -> allowed
- `static __device__` -> allowed
- `const static __device__` -> allowed
- `static __constant__` -> error, must be const
- `const static __constant__` -> allowed

This makes some sense to me because we're saying, "`__constant__` must be 
const", otherwise, anything goes.

Or here's another way of thinking about it.  You're saying that `static` and 
`static __device__` in function scope are the same as a `__device__` variable 
in block scope.  And a `__device__` variable in block scope doesn't have to be 
const (right?).  So why the extra restriction on function-scope static?




Comment at: clang/lib/Sema/SemaDecl.cpp:13161
   // without device memory qualifier is implemented, therefore only static
   // const variable without device memory qualifier is allowed.
   [&]() {

Update comment?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D90409: [HIP] Math Headers to use type promotion

2020-11-03 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

I agree that the C and C++ standard libraries should behave the same in CUDA 
mode and host mode!

But if doing so would make our behavior different than nvcc's, maybe we could 
emit a warning or something?  Like, "this code you wrote maybe for nvcc is 
going to do something different with clang."


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90409/new/

https://reviews.llvm.org/D90409

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

2020-11-17 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

I am legit excited about this if we could figure out how to make it work, but I 
don't have anything to add beyond what tra said.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91590/new/

https://reviews.llvm.org/D91590

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91807: [CUDA] Unbreak CUDA compilation with -std=c++20

2020-11-19 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

How fun.  :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91807/new/

https://reviews.llvm.org/D91807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-09-28 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> It should. I did mention in a previous comment that > Looks like the 
> const-ness check should not be there, either. I need to revise the patch.

Heh, okay.  Sorry I missed that, somehow this patch was confusing to me.

> Except that NVCC allows non-const __constant__, too. Generally speaking, C++ 
> does not care about the attributes. While technically __constant__ is not 
> changeable from the device code, not specifying const is a missed 
> optimization/diagnostic opportunity, but not an error per se. It does not 
> affect how the variable is emitted, but rather what user can do with it and 
> that's beyond the scope of this patch. I don't think it warrants a hard 
> error. A warning, perhaps, that non-const __constant__ is probably an error?

Sure, that makes sense to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88668: [CUDA] Add support for 11.1

2020-10-01 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> It looks like 11.1 doesn't have a version.txt file

Yikes, this is a problem if we can't tell the difference between CUDA versions!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88668/new/

https://reviews.llvm.org/D88668

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-02 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Hey, I'm leaving on a vacation tomorrow and didn't have a chance to get to
this review today.

Is that ok?  I'm not bringing my work laptop, but I could look at it on my
personal laptop.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables.

2020-10-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added inline comments.
This revision is now accepted and ready to land.



Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8163
 "%select{__device__|__global__|__host__|__host__ __device__}0 functions">;
-def err_cuda_nonglobal_constant : Error<"__constant__ variables must be 
global">;
+def err_cuda_nonstatic_constdev: Error<"__constant__ and __device__ are not 
allowed on non-static local variables">;
 def err_cuda_ovl_target : Error<

`__device__` is not allowed on non-static function-scope variables?

This appears to be more restrictive than we were before.  I want to check, are 
we OK with the possibility that this will break user code?  
https://gcc.godbolt.org/z/Y85GKe work with clang, though not with nvcc.

I notice that we even allow `__device__ int x;` in `__host__ __device__` 
functions, which is...questionable.  :)  https://gcc.godbolt.org/z/GjjMGx

I'm OK matching the nvcc behavior here and accepting user breakage so long as 
we're intentional about it.  Possibly should be called out in relnotes?



Comment at: clang/lib/Sema/SemaDeclAttr.cpp:4353
   const auto *VD = cast(D);
-  if (!VD->hasGlobalStorage()) {
-S.Diag(AL.getLoc(), diag::err_cuda_nonglobal_constant);
+  if (VD->hasLocalStorage()) {
+S.Diag(AL.getLoc(), diag::err_cuda_nonstatic_constdev);

So just to check, in our new world, `__constant__` variables don't have to be 
const.  That matches nvcc, fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88345/new/

https://reviews.llvm.org/D88345

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89832: [CUDA] Extract CUDA version from cuda.h if version.txt is not found

2020-10-22 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.

LGTM modulo emankov's comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89832/new/

https://reviews.llvm.org/D89832

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95754: [clang] Print 32 candidates on the first failure, with -fshow-overloads=best.

2021-01-30 Thread Justin Lebar via Phabricator via cfe-commits
jlebar created this revision.
jlebar requested review of this revision.
Herald added a project: clang.

Previously, -fshow-overloads=best always showed 4 candidates.  The
problem is, when this isn't enough, you're kind of up a creek; the only
option available is to recompile with different flags.  This can be
quite expensive!

With this change, we try to strike a compromise.  The *first* error with
more than 4 candidates will show up to 32 candidates.  All further
errors continue to show only 4 candidates.

The hope is that this way, users will have *some chance* of making
forward progress, without facing unbounded amounts of error spam.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D95754

Files:
  clang/include/clang/Basic/Diagnostic.h
  clang/lib/Sema/Sema.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCXX/ambiguous-conversion-show-overload.cpp
  clang/test/SemaCXX/overloaded-builtin-operators.cpp

Index: clang/test/SemaCXX/overloaded-builtin-operators.cpp
===
--- clang/test/SemaCXX/overloaded-builtin-operators.cpp
+++ clang/test/SemaCXX/overloaded-builtin-operators.cpp
@@ -195,8 +195,7 @@
 
 void test_dr425(A a) {
   (void)(1.0f * a); // expected-error{{ambiguous}} \
-// expected-note 4{{candidate}} \
-// expected-note {{remaining 8 candidates omitted; pass -fshow-overloads=all to show them}}
+// expected-note 12{{candidate}}
 }
 
 // pr5432
Index: clang/test/SemaCXX/ambiguous-conversion-show-overload.cpp
===
--- clang/test/SemaCXX/ambiguous-conversion-show-overload.cpp
+++ clang/test/SemaCXX/ambiguous-conversion-show-overload.cpp
@@ -10,9 +10,20 @@
   S(signed int*);
 };
 void f(const S& s);
-void g() {
-  f(0);
-}
+
+// First call to f emits all candidates.  Second call emits just the first 4.
+void g() { f(0); }
+// CHECK: {{conversion from 'int' to 'const S' is ambiguous}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+// CHECK-NEXT: {{candidate constructor}}
+
+void h() { f(0); }
 // CHECK: {{conversion from 'int' to 'const S' is ambiguous}}
 // CHECK-NEXT: {{candidate constructor}}
 // CHECK-NEXT: {{candidate constructor}}
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -10354,18 +10354,15 @@
  const PartialDiagnostic &PDiag) const {
   S.Diag(CaretLoc, PDiag)
 << Ambiguous.getFromType() << Ambiguous.getToType();
-  // FIXME: The note limiting machinery is borrowed from
-  // OverloadCandidateSet::NoteCandidates; there's an opportunity for
-  // refactoring here.
-  const OverloadsShown ShowOverloads = S.Diags.getShowOverloads();
   unsigned CandsShown = 0;
   AmbiguousConversionSequence::const_iterator I, E;
   for (I = Ambiguous.begin(), E = Ambiguous.end(); I != E; ++I) {
-if (CandsShown >= 4 && ShowOverloads == Ovl_Best)
+if (CandsShown >= S.Diags.getNumOverloadCandidatesToShow())
   break;
 ++CandsShown;
 S.NoteOverloadCandidate(I->first, I->second);
   }
+  S.Diags.noteNumOverloadCandidatesShown(CandsShown);
   if (I != E)
 S.Diag(SourceLocation(), diag::note_ovl_too_many_candidates) << int(E - I);
 }
@@ -11643,7 +11640,7 @@
  (Cand.Function->template hasAttr() &&
   Cand.Function->template hasAttr());
 });
-DeferHint = WrongSidedCands.size();
+DeferHint = !WrongSidedCands.empty();
   }
   return DeferHint;
 }
@@ -11676,10 +11673,8 @@
   for (; I != E; ++I) {
 OverloadCandidate *Cand = *I;
 
-// Set an arbitrary limit on the number of candidate functions we'll spam
-// the user with.  FIXME: This limit should depend on details of the
-// candidate list.
-if (CandsShown >= 4 && ShowOverloads == Ovl_Best) {
+if (CandsShown >= S.Diags.getNumOverloadCandidatesToShow() &&
+ShowOverloads == Ovl_Best) {
   break;
 }
 ++CandsShown;
@@ -11708,6 +11703,10 @@
 }
   }
 
+  // Inform S.Diags that we've shown an overload set with N elements.  This may
+  // inform the future value of S.Diags.getNumOverloadCandidatesToShow().
+  S.Diags.noteNumOverloadCandidatesShown(CandsShown);
+
   if (I != E)
 S.Diag(OpLoc, diag::note_ovl_too_many_candidates,
shouldDeferDiags(S, Args, OpLoc))
Index: clang/lib/Sema/Sema.cpp
===
--- clang/lib/Sema/Sema.cpp
+++ clang/lib/Sema/Sema.cpp
@@ -2291,9 +2291,7 @@
   int SuppressedOverloads = 0;
   for (UnresolvedSetImpl::iterator It = Overloads.begin(),
  

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-09-21 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> One alternative would be to use run-time dispatch, but, given that texture 
> lookup is a single instruction, the overhead would be 
> substantial-to-prohibitive.

I guess I'm confused...  Is the parameter value that we're "overloading" on 
usually/always a constant?  In that case, there's no overhead with runtime 
dispatch.  Or, is it not a constant?  In which case, how does nvcc generate a 
single instruction for this idiom at all?

But then I see `switch` statements in the code, so now I'm extra confused.  :)

Overall, I am unsure of why we need all of this magic.  We can rely on LLVM to 
optimize away constant integer comparisons, and also even comparisons between 
string literals.

What specifically would be inefficient if this were a series of "real" 
overloaded functions, with none of the macros, templates, or builtins?  
(Assuming efficiency is the concern here?)




Comment at: clang/lib/AST/ExprConstant.cpp:11097
 
+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'

Write a comment explaining what this function does?

(It seems to...translate a string into an integer?  If so, to me, it's strange 
that it uses a sorted list for this because...what if I add another function?  
Won't that mess up all the numbers?  Anyway, to be clarified in the comment.)

Now that I read more, I see that you don't care about this being a stable 
mapping etc etc...

I don't really get why this has to be a builtin at all, though.  If it's always 
a string literal, a simple strcmp will do the job, LLVM can optimize this?  And 
I'm almost sure you can assert that the char* is always a string literal, so 
you can guarantee that it's always optimized away.



Comment at: clang/lib/AST/ExprConstant.cpp:11098
+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'
+  static constexpr StringRef TextureOps[] = {"__isurf1DLayeredread",

stuuported



Comment at: clang/lib/AST/ExprConstant.cpp:11209
+  const StringLiteral *S =
+  dyn_cast(E->getArg(0)->IgnoreParenCasts());
+  auto I = llvm::lower_bound(TextureOps, S->getString());

how do we know the arg is a string constant?  Looking at the builtin def it 
doesn't seem that we enforce it there.



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:12
+#ifndef __CUDA__
+#error "This file is for CUDA __compilation only."
+#endif

is `__compilation` intentional?  (Maybe search-and-replace bug?)



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:41
+
+namespace {
+

what are you trying to accomplish with an anon ns inside a header?



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:41
+
+namespace {
+

jlebar wrote:
> what are you trying to accomplish with an anon ns inside a header?
I know you wrote it in the commit message, but this file could really use 
comments, otherwise I'm afraid you are going to be the only human being on the 
planet who can edit this...

For starters, it seems that the purpose of this file is to define the 
__nv_tex_surf_handler "function" -- is that right?



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:57-58
+template <> struct __FT {
+  using __bt = float;
+  using __ft = float4;
+};

I have no idea what bt and ft are supposed to stand for.  "fetch type" and ...? 
 But __FT stands for "fundamental type" per the comment?

Oh, I found it later, "base type".

I'm all for brevity, but would `__base_ty` and `__fetch_ty` be too long?



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:90
+// Derived base/fetch types for N-element vectors.
+template  struct __FT {
+  using __bt = decltype(__T::x);

There are only a limited number of these.  Could we assert that __T is one of 
the expected vector types, just for readability and maybe to help the next 
person who tries to edit this?



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:91
+template  struct __FT {
+  using __bt = decltype(__T::x);
+  using __ft = typename __FT<__bt>::__ft;

this is c++11-only.  Which, you know what, fine by me.  But might be worth an 
explicit #error at least?



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:92
+  using __bt = decltype(__T::x);
+  using __ft = typename __FT<__bt>::__ft;
+};

I think this is also C++11 syntax


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.

[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-09-22 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

> Depending on which particular operation is used, the arguments vary, too.

So something like

  T __nv_tex_surf_handler(name, arg1) {
switch (name) {
  ...
  default:
panic();
}
  }
  
  T __nv_tex_surf_handler(name, arg1, arg2) {
switch(...) { ... }
  }

and so on?

> I'd need to push strcmp-based runtime dispatch down to the implementation of 
> the texture lookups with the same operand signature.

Agree.

> That's harder to generalize, as I'd have to implement string-based dispatch 
> for quite a few subsets of the operations -- basically for each variant of 
> cartesian product of {dimensionality, Lod, Level, Sparse}.



> Another downside is that the string comparison code will result in functions 
> being much larger than necessary. Probably not a big thing overall, but why 
> add overhead that would be paid for by all users and which does not buy us 
> anything?

If it didn't buy us anything, I'd agree.  The thing I'm concerned about is 
readability of this code.  Which, if we want to tie it back to users, affects 
our ability to catch bugs in this implementation.

> Having one trivial compiler builtin that simplifies things a lot is a better 
> trade-off, IMO.

Ah, maybe I wasn't clear then.  I'm not actually super-concerned with the 
compiler builtin.  It'd be nice to get rid of it if there's a clean way to do 
so, but if we don't, that's ok.  Basically, the builtin is just for changing 
`strcmp(x, "foo")` into `builtin(x) == builtin("foo")`.  Fine.

What I'm more concerned with is the spaghetti of macros here to do something as 
simple as a series of overloaded functions.  It seems like a premature 
optimization, and I don't feel confident I can check it for bugs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-09-24 Thread Justin Lebar via Phabricator via cfe-commits
jlebar accepted this revision.
jlebar added a comment.

Okay, I give up on the phab interface.  It's unreadable with all the existing
comments and lint errors.

Hope you don't mind comments this way.  I'm just going to put it all in a giant
code block so it doesn't get wrapped or whatever.

  +// __nv_tex_surf_handler() provided by this header as a macro.
  +#define __nv_tex_surf_handler(__op, __ptr, ...)  
  \
  +  
__cuda_tex::__tex_fetch<__cuda_tex::__Tag<__cuda_tex::__tex_op_hash(__op)>>( \
  +  __ptr, __VA_ARGS__)
  
  ::__cuda_tex
  
  +// Put all functions into anonymous namespace so they have internal linkage.
  
  Say a little more?  Specifically, you want anon ns because this is device code
  and it has to work even without being linked.
  
  (Also, are you sure that plain `inline` doesn't do the right thing?  Like, we
  have lots of CUDA headers that are `inline`'ed without all being in an anon
  ns.)
  
  +// First, we need a perfect hash function and a few constexpr helper 
functions
  +// for converting a string literal into a numeric value which can be used to
  +// parametrize a template. We can not use string literals for that as that 
would
  +// require C++20.
  +//
  +// The hash function was generated with 'gperf' and then manually converted 
into
  +// its constexpr equivalent.
  +//
  +// NOTE: the perfect hashing scheme comes with inherent self-test. If the 
hash
  +// function has a collision for any of the texture operations, the 
compilation
  +// will fail due to an attempt to redefine a tag with the same value. If the
  +// header compiles, then the hash function is good enough for the job.
  
  I guess if it has a self-test then that's fine.  Though is this really better
  than a series of `if` statements with strcmp?  I guess I am scared of this 
kind
  of thing because I did it once in ccache.  I thought I was very clever and got
  a good speedup.  1 year later I found out I'd broken handling of __DATE__ and
  __TIME__.  o.O




Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:26
+#define __nv_tex_surf_handler(__op, __ptr, ...)
\
+  __cuda_tex::__tex_fetch<__cuda_tex::__Tag<__cuda_tex::__tex_op_hash(__op)>>( 
\
+  __ptr, __VA_ARGS__)

`::__cuda_tex` (appears twice)



Comment at: clang/lib/Headers/__clang_cuda_texture_intrinsics.h:53
+
+// Put all functions into anonymous namespace so they have internal linkage.
+namespace {

Write a little more?  This looks super-suspicious, but you need it specifically 
because these are *device* functions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110089: [CUDA] Implement experimental support for texture lookups.

2021-09-24 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Presumably as a separate commit we should add tests to the test_suite 
repository to ensure that this at least still compiles with different versions 
of CUDA?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95754: [clang] Print 32 candidates on the first failure, with -fshow-overloads=best.

2021-02-13 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

Not sure who can review this, but looking through blame it seems like maybe 
@aaronpuchert?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95754/new/

https://reviews.llvm.org/D95754

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   >