[PATCH] D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs.

2019-08-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm not sure about this diff. I think it's breaking  and . 
Raised bug https://bugs.llvm.org/show_bug.cgi?id=42972


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62046/new/

https://reviews.llvm.org/D62046



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm not sure copying the crtbegin/crtend mechanism from the early days of C 
runtime is ideal. Since the data is stored in a common section anyway, please 
could we rename it to __omp_offloading_entries in which case the linker will 
provide start/end symbols automatically? That removes the two object files and 
the link order dependency which is a hazard to bitcode libraries.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

> OpenMP linker script is known to cause problems for gold and lld linkers on 
> Linux and it will also cause problems for Windows enabling in future

What are the known problems with the linker script? I'm wondering if they can 
be resolved without the overhead of introducing a new tool.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D64943#173 , @ABataev wrote:

> In D64943#158 , @JonChesterfield 
> wrote:
>
> > > OpenMP linker script is known to cause problems for gold and lld linkers 
> > > on Linux and it will also cause problems for Windows enabling in future
> >
> > What are the known problems with the linker script? I'm wondering if they 
> > can be resolved without the overhead of introducing a new tool.
>
>
> They just do not support linker script. And, thus, cannot be used for 
> offloading. Only `ld` supports it.


In what respect? I've used linker scripts with both gold and lld, and both 
instances of --help text claim to support them. In the case of lld, a very 
complicated script hit a few internal errors, but I believe they've all been 
fixed since.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D64943#179 , @ABataev wrote:

> In D64943#178 , @JonChesterfield 
> wrote:
>
> > In D64943#173 , @ABataev wrote:
> >
> > > In D64943#158 , 
> > > @JonChesterfield wrote:
> > >
> > > > > OpenMP linker script is known to cause problems for gold and lld 
> > > > > linkers on Linux and it will also cause problems for Windows enabling 
> > > > > in future
> > > >
> > > > What are the known problems with the linker script? I'm wondering if 
> > > > they can be resolved without the overhead of introducing a new tool.
> > >
> > >
> > > They just do not support linker script. And, thus, cannot be used for 
> > > offloading. Only `ld` supports it.
> >
> >
> > In what respect? I've used linker scripts with both gold and lld, and both 
> > instances of --help text claim to support them. In the case of lld, a very 
> > complicated script hit a few internal errors, but I believe they've all 
> > been fixed since.
>
>
> Hmm, I tried it with gold some time ago and it just did not work for me. The 
> linking failed with diagnostics that some of the commands in the script are 
> unknown.


The problem turns out to be the 'insert before' statement. ld and lld support 
it, gold does not. According to 
https://bugzilla.redhat.com/show_bug.cgi?id=927573, the recommended workaround 
is essentially that implemented in this differential.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D64943#1666849 , @sdmitriev wrote:

> In D64943#136 , @JonChesterfield 
> wrote:
>
> > I'm not sure copying the crtbegin/crtend mechanism from the early days of C 
> > runtime is ideal. Since the data is stored in a common section anyway, 
> > please could we rename it to __omp_offloading_entries in which case the 
> > linker will provide start/end symbols automatically?
>
>
> Well, I never said that it is an ideal solution, but it is a known mechanism 
> that works well in many cases and can also be reused for the offloading entry 
> table.
>  I do not fully understand your suggestion for renaming entries section, how 
> it will help with providing start/end symbols for the entries. Can you please 
> provide more details?


Given a custom elf section with a C identifier as a name, the linker will 
provide definitions of `__start_name`/`__stop_name` to satisfy unresolved 
symbols. I don't believe this occurs if the section name is not a C identifier, 
e.g. contains a period. So unless I've misinterpreted the purpose of the two 
object files, they can be removed in exchange for renaming the section.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

> Hm, I was not aware of this Linux linker feature, thanks a lot for the 
> explanation! I see only one problem with using it as a replacement for the 
> begin/end objects – it looks like `__start_name`/`__stop_name` symbols are 
> created with `default` visibility instead of `hidden`. I guess it will cause 
> problems for offload programs that use shared libraries because DSO’s 
> `__start_name`/`__stop_name` symbols will be preempted by the executable’s 
> symbols and that is not what we want. Is there any way to change this 
> behavior?

Declaring the symbol as `__attribute__((__visibility__("hidden")))` just works 
as far as I can tell. The linker still provides the right definition, objdump 
says it's hidden.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm on board with getting rid of the linker script. Gold's limited support for 
that seems conclusive.

I believe the current script does two things:
1/ takes a binary and embeds it in a section named 
.omp_offloading.amdgcn-amd-amdhsa
2/ provides start/end symbols for that section and for .omp_offloading.entries.

2/ is discussed above.
1/ can be implemented as a call to (llvm-)objcopy

> If binary is used as the value for --input-target, the input file will be 
> embedded as a data section in an ELF relocatable object, with symbols 
> _binary__start, _binary__end, and 
> _binary__size representing the start, end and size of the data, 
> where  is the path of the input file as specified on the command 
> line with non-alphanumeric characters converted to _.

I think dropping the linker script means that cmake will need to invoke an 
extra executable. As far as I can see, that tool can be objcopy instead of 
clang-offload-wrapper.

Does this diff mix getting rid of the linker script with other changes? E.g. it 
looks like the metadata generation is moving from clang to the new tool, but 
that seems orthogonal to dropping the linker script.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

>> Does this diff mix getting rid of the linker script with other changes? E.g. 
>> it looks like the metadata generation is moving from clang to the new tool, 
>> but that seems orthogonal to dropping the linker script.
> 
> Metadata is still generated by the clang, there are no changes in this area. 
> What is moving to a wrapper tool is the generation of the offload 
> registration code. Let me just attach the slides that I presented on the 
> inter company meeting were the proposal was discussed. It'll probably answer 
> most of your questions. F9983224: openmp_linker_script.pdf 
> 

It does indeed, thanks. I see the motivation for delaying offload registration 
code. I'm pretty sure that is indeed orthogonal to removing the linker script.

How would you feel about using objcopy to embed the device binary?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

> I see some problems with using llvm-objcopy for that. First issue is that 
> symbols created by llvm-objcopy for embedded data depend on the input file 
> name. As you know these symbols are referenced from the offload registration 
> code that is currently added to an object by the clang at compile time. I not 
> sure how you can guarantee that symbol names will match.

That seems solvable by renaming the input file / passing a string to clang.

> And another, more important problem is that it won't work on Windows because 
> llvm-objcopy produces ELF object according to the description.

objcopy works with coff in the meantime, and we already need a bunch of unix 
tools to build llvm on windows.

> Anyway I am going to change entries section name to "omp_offloading_entries", 
> remove omptargetbegin.o/omptargetend.o and upload the revised patch.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-25 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D64943#1682452 , @hfinkel wrote:

> This LGTM. I'm happy that this is a design improvement over the current 
> scheme. @JonChesterfield , @ABataev , any further comments?


This patch mixes two concerns. 
1/ Remove the linker script
2/ Change generation of offload registration code

These should be separate patches. I think the linker script removal would then 
be uncontentious.

It'll be easier to consider the offload registration changes without the linker 
script changes. That's a more complicated design space. In particular, this 
change is motivated by supporting additional platforms, and I don't see how 
offload registration is related to that.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script

2019-09-25 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The three way split looks great, thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64943/new/

https://reviews.llvm.org/D64943



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68166: [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries

2019-10-02 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I still don't understand what advantage the standalone tool has over renaming 
the file to `omp_offloading` and then using `objcopy -I binary`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68166/new/

https://reviews.llvm.org/D68166



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68166: [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries

2019-10-02 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think this patch is a behaviour change. Currently, the target binary is 
embedded in the host binary at link time. With this change, the contents of the 
binary are embedded in bitcode which is subsequently fed into the link. If 
indeed so, that seems strictly better - code in the host that cares about the 
size of the bitcode now has it available at opt time, instead of at link time. 
The target specific nastiness objcopy would introduce is neatly sidestepped.

This change takes N binaries (that I think need to be for different triples, or 
the loop doesn't work) and puts them in separate section-annotated bitcode 
arrays. Equivalent behaviour would result from calling the tool once per binary 
and passing the N results onward, e.g. to llvm-link.

The functionality of 'take a binary and embed it in bitcode as a const array' 
is likely to be useful outside of openmp. I've done similar things in the past 
in non-portable fashion. Aside from the section and symbol names, I don't think 
there's anything specific to openmp in the tool.

How would you feel about simplifying the tool to work on one file at a time, 
with an interface that takes the host target (could default to whatever is 
running the tool) and a string for section name, which generates some bitcode 
containing that file as a const array plus start/end symbols derived from the 
section name? The change would involve deleting the multiple file handling and 
renaming OffloadTargets to SectionName or similar.

clang-offload-wrapper than becomes binary-to-bitcode-embedder (or better, names 
are hard), with the intent that projects outside of the openmp target offload 
compiler could use it.




Comment at: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp:84
+  void createImages(ArrayRef Binaries) {
+for (const BinaryDesc &Bin : Binaries) {
+  StringRef SectionName = SS.save(".omp_offloading." + Bin.second);

I don't think this works for multiple binaries with the same target triple. 
They'll all be put in the same section and there will be duplicate symbols for 
start/end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68166/new/

https://reviews.llvm.org/D68166



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68166: [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries

2019-10-02 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

> The tool indeed does not have anything specific to OpenMP at this step, but 
> that will change...

That makes sense to me, thanks.

I think we're going to have some trouble adapting this to our build as there's 
already a standalone tool that runs at link time. Overall dropping the linker 
script is probably worth the integration headache.




Comment at: clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp:84
+  void createImages(ArrayRef Binaries) {
+for (const BinaryDesc &Bin : Binaries) {
+  StringRef SectionName = SS.save(".omp_offloading." + Bin.second);

sdmitriev wrote:
> JonChesterfield wrote:
> > I don't think this works for multiple binaries with the same target triple. 
> > They'll all be put in the same section and there will be duplicate symbols 
> > for start/end.
> Adding the same target triple to the list of OpenMP targets more than once is 
> not supported, so such use case isn't viable:
> 
> ```
> bash-4.2$ clang -fopenmp 
> -fopenmp-targets=x86_64-pc-linux-gnu,x86_64-pc-linux-gnu test.c
> clang-10: warning: The OpenMP offloading target 'x86_64-pc-linux-gnu' is 
> similar to target 'x86_64-pc-linux-gnu' already specified - will be ignored. 
> [-Wopenmp-target]
> bash-4.2$ 
> ```
> 
> But in any case I am going to remove the code which passes offload target 
> triples to the wrapper tool in the last part of D64943 because they will not 
> be needed for creating wrapper bit-code. As you know start/end symbols are 
> referenced from the offload registration code only, so, moving offload 
> registration code to the wrapper bit-code eliminates the need to create 
> global start/end symbols with predefined names derived from the triple.
That's true. It seems a shame that we can embed at most one device binary per 
architecture into the host, but that's an existing limitation.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68166/new/

https://reviews.llvm.org/D68166



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68166: [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries

2019-10-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

The direction is good and I believe all the feedback from D64943 
 has already been incorporated. LGTM, thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68166/new/

https://reviews.llvm.org/D68166



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69494: OpenMP: Add helper function for convergent runtime calls

2019-10-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a reviewer: grokos.
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

LGTM.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69494/new/

https://reviews.llvm.org/D69494



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80804: [AMDGPU] Expose llvm atomic inc/dec instructions as clang builtins for AMDGPU target

2020-05-29 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Thanks for this. I like the refactor to share code with amdgcn_fence. Agreed 
with Matt's points above.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80804/new/

https://reviews.llvm.org/D80804



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80858: [HIP] Support accessing static device variable in host code

2020-06-01 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The value is based on llvm::sys::Process::GetRandomNumber(). So unless one 
provides a build-system-derived uuid for every compilation unit, recompiling 
identical source will yield an observably different binary.

The distinction between 'unique' and 'random' is significant for anyone 
depending on repeatable binary output, so this patch should probably rename 
'unique' to 'random' everywhere.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80858/new/

https://reviews.llvm.org/D80858



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84476: Make hip math headers easier to use from C

2020-07-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield created this revision.
JonChesterfield added a reviewer: yaxunl.
Herald added a reviewer: jdoerfert.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

Make hip math headers easier to use from C

Motivation is a step towards using the hip math headers to implement math.h
for openmp, which needs to work with C as well as C++. NFC for C++ code.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D84476

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/lib/Headers/__clang_hip_math.h


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef _cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef _cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef _cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplus
 extern "C" {
+#endif
 
 // BEGIN FLOAT
 __device__ __attribute__((const)) float __ocml_acos_f32(float);
@@ -316,7 +318,7 @@
 __device__ inline __2f16
 __llvm_amdgcn_rcp_2f16(__2f16 __x) // Not currently exposed by ROCDL.
 {
-  return __2f16{__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
+  return (__2f16){__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
 }
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
@@ -325,6 +327,8 @@
 __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_pown_2f16(__2f16, __2i16);
 
+#ifdef __cplusplus
 } // extern "C"
+#endif
 
 #endif // __CLANG_HIP_LIBDEVICE_DECLARES_H__


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef _cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef _cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef _cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice

[PATCH] D84476: Make hip math headers easier to use from C

2020-07-24 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 280524.
JonChesterfield added a comment.

- Fix missing underscores


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84476/new/

https://reviews.llvm.org/D84476

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/lib/Headers/__clang_hip_math.h


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef __cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef __cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef __cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplus
 extern "C" {
+#endif
 
 // BEGIN FLOAT
 __device__ __attribute__((const)) float __ocml_acos_f32(float);
@@ -316,7 +318,7 @@
 __device__ inline __2f16
 __llvm_amdgcn_rcp_2f16(__2f16 __x) // Not currently exposed by ROCDL.
 {
-  return __2f16{__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
+  return (__2f16){__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
 }
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
@@ -325,6 +327,8 @@
 __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_pown_2f16(__2f16, __2i16);
 
+#ifdef __cplusplus
 } // extern "C"
+#endif
 
 #endif // __CLANG_HIP_LIBDEVICE_DECLARES_H__


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef __cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef __cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef __cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplus
 extern "C" {
+#endif
 
 // BEGIN FLOAT
 __device__ __attribute__

[PATCH] D84476: Make hip math headers easier to use from C

2020-07-24 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield marked 4 inline comments as done.
JonChesterfield added inline comments.



Comment at: clang/lib/Headers/__clang_hip_math.h:561
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__

yaxunl wrote:
> jdoerfert wrote:
> > Nit: You mix the C and C++ math declarations in this file, while possible, 
> > I somehow thing the cuda_{cmath/math} split is nicer.
> right
Open to me implementing that in a later patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84476/new/

https://reviews.llvm.org/D84476



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84476: Make hip math headers easier to use from C

2020-07-24 Thread Jon Chesterfield via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG679158e662aa: Make hip math headers easier to use from C 
(authored by JonChesterfield).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84476/new/

https://reviews.llvm.org/D84476

Files:
  clang/lib/Headers/__clang_hip_libdevice_declares.h
  clang/lib/Headers/__clang_hip_math.h


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef __cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef __cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef __cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplus
 extern "C" {
+#endif
 
 // BEGIN FLOAT
 __device__ __attribute__((const)) float __ocml_acos_f32(float);
@@ -316,7 +318,7 @@
 __device__ inline __2f16
 __llvm_amdgcn_rcp_2f16(__2f16 __x) // Not currently exposed by ROCDL.
 {
-  return __2f16{__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
+  return (__2f16){__llvm_amdgcn_rcp_f16(__x.x), __llvm_amdgcn_rcp_f16(__x.y)};
 }
 __device__ __attribute__((const)) __2f16 __ocml_rint_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_rsqrt_2f16(__2f16);
@@ -325,6 +327,8 @@
 __device__ __attribute__((const)) __2f16 __ocml_trunc_2f16(__2f16);
 __device__ __attribute__((const)) __2f16 __ocml_pown_2f16(__2f16, __2i16);
 
+#ifdef __cplusplus
 } // extern "C"
+#endif
 
 #endif // __CLANG_HIP_LIBDEVICE_DECLARES_H__


Index: clang/lib/Headers/__clang_hip_math.h
===
--- clang/lib/Headers/__clang_hip_math.h
+++ clang/lib/Headers/__clang_hip_math.h
@@ -95,8 +95,10 @@
 }
 
 // BEGIN FLOAT
+#ifdef __cplusplus
 __DEVICE__
 inline float abs(float __x) { return __ocml_fabs_f32(__x); }
+#endif
 __DEVICE__
 inline float acosf(float __x) { return __ocml_acos_f32(__x); }
 __DEVICE__
@@ -251,7 +253,7 @@
   uint32_t sign : 1;
 } bits;
 
-static_assert(sizeof(float) == sizeof(ieee_float), "");
+static_assert(sizeof(float) == sizeof(struct ieee_float), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -553,8 +555,10 @@
 // END FLOAT
 
 // BEGIN DOUBLE
+#ifdef __cplusplus
 __DEVICE__
 inline double abs(double __x) { return __ocml_fabs_f64(__x); }
+#endif
 __DEVICE__
 inline double acos(double __x) { return __ocml_acos_f64(__x); }
 __DEVICE__
@@ -712,7 +716,7 @@
   uint32_t exponent : 11;
   uint32_t sign : 1;
 } bits;
-static_assert(sizeof(double) == sizeof(ieee_double), "");
+static_assert(sizeof(double) == sizeof(struct ieee_double), "");
   } __tmp;
 
   __tmp.bits.sign = 0u;
@@ -1178,6 +1182,7 @@
   return std::max(__arg1, __arg2);
 }
 
+#ifdef __cplusplus
 __DEVICE__
 inline float pow(float __base, int __iexp) { return powif(__base, __iexp); }
 
@@ -1188,6 +1193,7 @@
 inline _Float16 pow(_Float16 __base, int __iexp) {
   return __ocml_pown_f16(__base, __iexp);
 }
+#endif
 
 #pragma pop_macro("__DEF_FUN1")
 #pragma pop_macro("__DEF_FUN2")
Index: clang/lib/Headers/__clang_hip_libdevice_declares.h
===
--- clang/lib/Headers/__clang_hip_libdevice_declares.h
+++ clang/lib/Headers/__clang_hip_libdevice_declares.h
@@ -10,7 +10,9 @@
 #ifndef __CLANG_HIP_LIBDEVICE_DECLARES_H__
 #define __CLANG_HIP_LIBDEVICE_DECLARES_H__
 
+#ifdef __cplusplu

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2020-07-26 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

A macro for wavefront size would make targeting gfx10 for openmp easier.

We currently use an int32_t for nvptx and an int64_t for amdgcn in various 
runtime function interfaces. I'd like to be able to set the latter based on 
said macro.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82087/new/

https://reviews.llvm.org/D82087



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84743: [Clang][AMDGCN] Universal device offloading macros header

2020-07-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a reviewer: arsenm.
JonChesterfield added a comment.
Herald added a subscriber: wdng.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84743/new/

https://reviews.llvm.org/D84743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80917: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 2

2020-07-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h:66
+  /// for NVPTX.
+  GV_Warp_Size_32,
+  /// The number of bits required to represent the max number of threads in 
warp

What's the point of warp_size_32? It's always set to 32 and seems redundant 
with warp_size



Comment at: llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h:68
+  /// The number of bits required to represent the max number of threads in 
warp
+  GV_Warp_Size_Log2,
+  /// GV_Warp_Size * GV_Slot_Size,

log2 of a compile time constant would be computed at compile time, we don't 
need this field



Comment at: llvm/include/llvm/Frontend/OpenMP/OMPGridValues.h:76
+  /// (~0u >> (GV_Warp_Size - GV_Warp_Size_Log2))
+  GV_Warp_Size_Log2_Mask,
+  // An alternative to the heavy data sharing infrastructure that uses global

As above, this can be computed at compile time, e.g. from that expression


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80917/new/

https://reviews.llvm.org/D80917

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D84743: [Clang][AMDGCN] Universal device offloading macros header

2020-07-29 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

We probably do want a macro to indicate 'compiling for amdgcn as the device 
half of a combined host+device language'. I'm having a tough time with the 
control flow in this header so we probably want tests to check the overall 
behaviour is as intended. E.g. static assert + various language modes.

The header should be obviously implemention only so we can change it later. 
Maybe also provide an unset header and keep them out of application scope 
entirely to begin with. That's the advantage over the otherwise simpler design 
of clang always setting them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84743/new/

https://reviews.llvm.org/D84743

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Sema/SemaChecking.cpp:1888
+// Check valididty of memory ordering as per C11 / C++11's memody model.
+if (ord < static_cast(llvm::AtomicOrderingCABI::acquire) ||
+  ord > static_cast(llvm::AtomicOrderingCABI::seq_cst)) {

I think I'd write this as a switch over the enum instead of a ranged compare.

It'll codegen to the same thing, but we'll get warnings if more values are 
introduced to the enum and things will keep working (here, anyway) if the 
values are reordered.



Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:9
+  // CHECK: fence syncscope("workgroup") seq_cst
+  __builtin_memory_fence(__ATOMIC_SEQ_CST,  "workgroup");
+  

saiislam wrote:
> sameerds wrote:
> > Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory models. They 
> > should not be used with the new builtin because this new builtin does not 
> > follow any specific language model. For user convenience, the right thing 
> > to do is to introduce new tokens in the Clang preprocessor, similar to the 
> > `__ATOMIC_*` tokens. The convenient shortcut is to just tell the user to 
> > supply numerical values by looking at the LLVM source code.
> > 
> > From llvm/Support/AtomicOrdering.h, note how the numerical value for 
> > `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM 
> > SequentiallyConsistent ordering is 7. The numerical value 5 refers to the 
> > LLVM ordering "release". So, if the implementation were correct, this line 
> > should result in the following unexpected LLVM IR:
> >   fence syncscope("workgroup") release
> As you pointed out, the range of acquire to sequentiallly consistent memory 
> orders for llvm::AtomicOrdering is [4, 7], while for llvm::AtomicOrderingCABI 
> is [2, 5]. Enums of C ABI was taken to ensure easy of use for the users who 
> are familiar with C/C++ standard memory model. It allows them to use macros 
> like __ATOMIC_ACQUIRE etc.
> Clang CodeGen of the builtin internally maps C ABI ordering to llvm atomic 
> ordering.
What language, implemented in clang, do you have in mind that reusing the 
existing __ATOMIC_* macros would be incorrect for?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13651
+  llvm::getConstantStringInfo(Scope, scp);
+  SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
+

sameerds wrote:
> saiislam wrote:
> > sameerds wrote:
> > > This seems to be creating a new ID for any arbitrary string passed as 
> > > sync scope. This should be validated against 
> > > LLVMContext::getSyncScopeNames(). 
> > As the FE is not aware about specific target and implementation of sync 
> > scope for that target, getSyncScopeNames() here returns llvm'sdefault sync 
> > scopes, which only supports singlethreaded and system as valid scopes. 
> > Validity checking of memory scope string is being intentionally left for 
> > the later stages which deal with the generated IR.
> That's pretty strange. At this point, Clang should know what the target is, 
> and it should have a chance to update the list of sync scopes somewhere. 
> @arsenm, do you see a way around this?
There is already sufficient IR level checking for the string at the instruction 
level. Warning in clang as well could be a nicer user experience, but that 
seems low priority to me.



Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:9
+  // CHECK: fence syncscope("workgroup") seq_cst
+  __builtin_memory_fence(__ATOMIC_SEQ_CST,  "workgroup");
+  

sameerds wrote:
> JonChesterfield wrote:
> > saiislam wrote:
> > > sameerds wrote:
> > > > Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory models. 
> > > > They should not be used with the new builtin because this new builtin 
> > > > does not follow any specific language model. For user convenience, the 
> > > > right thing to do is to introduce new tokens in the Clang preprocessor, 
> > > > similar to the `__ATOMIC_*` tokens. The convenient shortcut is to just 
> > > > tell the user to supply numerical values by looking at the LLVM source 
> > > > code.
> > > > 
> > > > From llvm/Support/AtomicOrdering.h, note how the numerical value for 
> > > > `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM 
> > > > SequentiallyConsistent ordering is 7. The numerical value 5 refers to 
> > > > the LLVM ordering "release". So, if the implementation were correct, 
> > > > this line should result in the following unexpected LLVM IR:
> > > >   fence syncscope("workgroup") release
> > > As you pointed out, the range of acquire to sequentiallly consistent 
> > > memory orders for llvm::AtomicOrdering is [4, 7], while for 
> > > llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to ensure 
> > > easy of use for the users who are familiar with C/C++ standard memory 
> > > model. It allows them to use macros like __ATOMIC_ACQUIRE etc.
> > > Clang CodeGen of the builtin internally maps C ABI ordering to llvm 
> > > atomic ordering.
> > What language, implemented in clang, do you have in mind that reusing the 
> > existing __ATOMIC_* macros would be incorrect for?
> I think we agreed that this builtin exposes the LLVM fence exactly. That 
> would mean it takes arguments defined by LLVM. If you are implementing 
> something different from that, then it first needs to be specified properly. 
> Perhaps you could say that this is a C ABI compatible builtin, that happens 
> to take target specific scopes? That should cover OpenCL whose scope enum is 
> designed to be compatible with C.
> 
> Whatever it is that you are trying to implement here, it definitely does not 
> expose a raw LLVM fence.
The llvm fence, in text form, uses a symbol for the memory scope. Not an enum.

This symbol is set using these macros for the existing atomic builtins. Using 
an implementation detail of clang instead is strictly worse, by layering and by 
precedent.

ABI is not involved here. Nor is OpenCl.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:9
+  // CHECK: fence syncscope("workgroup") seq_cst
+  __builtin_memory_fence(__ATOMIC_SEQ_CST,  "workgroup");
+  

sameerds wrote:
> JonChesterfield wrote:
> > sameerds wrote:
> > > JonChesterfield wrote:
> > > > saiislam wrote:
> > > > > sameerds wrote:
> > > > > > Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory 
> > > > > > models. They should not be used with the new builtin because this 
> > > > > > new builtin does not follow any specific language model. For user 
> > > > > > convenience, the right thing to do is to introduce new tokens in 
> > > > > > the Clang preprocessor, similar to the `__ATOMIC_*` tokens. The 
> > > > > > convenient shortcut is to just tell the user to supply numerical 
> > > > > > values by looking at the LLVM source code.
> > > > > > 
> > > > > > From llvm/Support/AtomicOrdering.h, note how the numerical value 
> > > > > > for `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM 
> > > > > > SequentiallyConsistent ordering is 7. The numerical value 5 refers 
> > > > > > to the LLVM ordering "release". So, if the implementation were 
> > > > > > correct, this line should result in the following unexpected LLVM 
> > > > > > IR:
> > > > > >   fence syncscope("workgroup") release
> > > > > As you pointed out, the range of acquire to sequentiallly consistent 
> > > > > memory orders for llvm::AtomicOrdering is [4, 7], while for 
> > > > > llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to 
> > > > > ensure easy of use for the users who are familiar with C/C++ standard 
> > > > > memory model. It allows them to use macros like __ATOMIC_ACQUIRE etc.
> > > > > Clang CodeGen of the builtin internally maps C ABI ordering to llvm 
> > > > > atomic ordering.
> > > > What language, implemented in clang, do you have in mind that reusing 
> > > > the existing __ATOMIC_* macros would be incorrect for?
> > > I think we agreed that this builtin exposes the LLVM fence exactly. That 
> > > would mean it takes arguments defined by LLVM. If you are implementing 
> > > something different from that, then it first needs to be specified 
> > > properly. Perhaps you could say that this is a C ABI compatible builtin, 
> > > that happens to take target specific scopes? That should cover OpenCL 
> > > whose scope enum is designed to be compatible with C.
> > > 
> > > Whatever it is that you are trying to implement here, it definitely does 
> > > not expose a raw LLVM fence.
> > The llvm fence, in text form, uses a symbol for the memory scope. Not an 
> > enum.
> > 
> > This symbol is set using these macros for the existing atomic builtins. 
> > Using an implementation detail of clang instead is strictly worse, by 
> > layering and by precedent.
> > 
> > ABI is not involved here. Nor is OpenCl.
> The `__ATOMIC_*` symbols in Clang quite literally represent the C/C++ ABI. 
> See the details in AtomicOrdering.h and InitPreprocessor.cpp. I am not 
> opposed to specifying that the builtin expects these symbols, but then it is 
> incorrect to say that the builtin exposes the raw LLVM builtin. It is a 
> C-ABI-compatible builtin that happens to take target-specific scope as a 
> string argument. And that would also make it an overload of the already 
> existing builting __atomic_fence().
I don't know what you mean by "raw",  but am guessing you're asking for 
documentation for the intrinsic. Said documentation should indeed be added for 
this builtin - it'll probably be in a tablegen file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:9
+  // CHECK: fence syncscope("workgroup") seq_cst
+  __builtin_memory_fence(__ATOMIC_SEQ_CST,  "workgroup");
+  

JonChesterfield wrote:
> sameerds wrote:
> > JonChesterfield wrote:
> > > sameerds wrote:
> > > > JonChesterfield wrote:
> > > > > saiislam wrote:
> > > > > > sameerds wrote:
> > > > > > > Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory 
> > > > > > > models. They should not be used with the new builtin because this 
> > > > > > > new builtin does not follow any specific language model. For user 
> > > > > > > convenience, the right thing to do is to introduce new tokens in 
> > > > > > > the Clang preprocessor, similar to the `__ATOMIC_*` tokens. The 
> > > > > > > convenient shortcut is to just tell the user to supply numerical 
> > > > > > > values by looking at the LLVM source code.
> > > > > > > 
> > > > > > > From llvm/Support/AtomicOrdering.h, note how the numerical value 
> > > > > > > for `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM 
> > > > > > > SequentiallyConsistent ordering is 7. The numerical value 5 
> > > > > > > refers to the LLVM ordering "release". So, if the implementation 
> > > > > > > were correct, this line should result in the following unexpected 
> > > > > > > LLVM IR:
> > > > > > >   fence syncscope("workgroup") release
> > > > > > As you pointed out, the range of acquire to sequentiallly 
> > > > > > consistent memory orders for llvm::AtomicOrdering is [4, 7], while 
> > > > > > for llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to 
> > > > > > ensure easy of use for the users who are familiar with C/C++ 
> > > > > > standard memory model. It allows them to use macros like 
> > > > > > __ATOMIC_ACQUIRE etc.
> > > > > > Clang CodeGen of the builtin internally maps C ABI ordering to llvm 
> > > > > > atomic ordering.
> > > > > What language, implemented in clang, do you have in mind that reusing 
> > > > > the existing __ATOMIC_* macros would be incorrect for?
> > > > I think we agreed that this builtin exposes the LLVM fence exactly. 
> > > > That would mean it takes arguments defined by LLVM. If you are 
> > > > implementing something different from that, then it first needs to be 
> > > > specified properly. Perhaps you could say that this is a C ABI 
> > > > compatible builtin, that happens to take target specific scopes? That 
> > > > should cover OpenCL whose scope enum is designed to be compatible with 
> > > > C.
> > > > 
> > > > Whatever it is that you are trying to implement here, it definitely 
> > > > does not expose a raw LLVM fence.
> > > The llvm fence, in text form, uses a symbol for the memory scope. Not an 
> > > enum.
> > > 
> > > This symbol is set using these macros for the existing atomic builtins. 
> > > Using an implementation detail of clang instead is strictly worse, by 
> > > layering and by precedent.
> > > 
> > > ABI is not involved here. Nor is OpenCl.
> > The `__ATOMIC_*` symbols in Clang quite literally represent the C/C++ ABI. 
> > See the details in AtomicOrdering.h and InitPreprocessor.cpp. I am not 
> > opposed to specifying that the builtin expects these symbols, but then it 
> > is incorrect to say that the builtin exposes the raw LLVM builtin. It is a 
> > C-ABI-compatible builtin that happens to take target-specific scope as a 
> > string argument. And that would also make it an overload of the already 
> > existing builting __atomic_fence().
> I don't know what you mean by "raw",  but am guessing you're asking for 
> documentation for the intrinsic. Said documentation should indeed be added 
> for this builtin - it'll probably be in a tablegen file.
I will try to stop using builtin and intrinsic as synonyms.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:1
 // REQUIRES: amdgpu-registered-target
 // RUN: %clang_cc1 %s -x hip -emit-llvm -O0 -o - \

Codegen test should be under CodeGen and/or CodeGenCXX


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77774: [OpenMP] Allow to go first in C++-mode in target regions

2020-04-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

The cmath/math.h story makes me sad, but this is a good workaround. Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D4/new/

https://reviews.llvm.org/D4



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77390: Fix __builtin_amdgcn_workgroup_size_x/y/z return type

2020-04-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn.cl:541
switch (d) {
-   case 0: *out = __builtin_amdgcn_workgroup_size_x(); break;
+   case 0: *out = __builtin_amdgcn_workgroup_size_x() + 1; break;
case 1: *out = __builtin_amdgcn_workgroup_size_y(); break;

This looks unrelated to the return type change


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77390/new/

https://reviews.llvm.org/D77390



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77918: [OpenMP] Avoid crash in preparation for diagnose of unsupported type

2020-04-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77918/new/

https://reviews.llvm.org/D77918



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80897: [OpenMP] Initial support for std::complex in target regions

2020-07-02 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I think this change is good. The library story is a bit difficult, but 
fundamentally openmp needs a shim of some sort to map target math functions 
onto the libm of the underlying device.

For nvptx, that's the cuda library. Amdgcn has math functions and may need 
another shim to map them to libm.

include_next is nasty, but that's the existing pattern for some library headers.




Comment at: clang/test/Headers/Inputs/include/complex:10
+// Taken from libc++
+template 
+class complex {

Can we #include from libc++ instead? Needs some cmake to skip the test if the 
library is unavailable but spares duplicating this class


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80897/new/

https://reviews.llvm.org/D80897



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83121: [AMDGPU] Change Clang AMDGCN atomic inc/dec builtins to take unsigned values

2020-07-03 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks for this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83121/new/

https://reviews.llvm.org/D83121



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80858: [CUDA][HIP] Support accessing static device variable in host code

2020-07-06 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/AST/ASTContext.cpp:10068
+isa(D) && cast(D)->isFileVarDecl() &&
+cast(D)->getStorageClass() == SC_Static) {
+  return GVA_StrongExternal;

yaxunl wrote:
> rjmccall wrote:
> > Are you sure this doesn't apply to e.g. local statics?  Can't you have 
> > kernel lambdas, or am I confusing HIP with another language?
> function-scope static var in a device function is only visible to the device 
> function. Host code cannot access it, therefore no need to externalize it.
This doesn't sound right. An inline function can return a pointer to a function 
scope static variable, e.g. to implement a singleton in a header file.  I think 
host code can then access said variable.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80858/new/

https://reviews.llvm.org/D80858



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments

2020-07-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm not sure we have a consensus on api stability. Usually llvm allows mixing 
libraries and compilers from different sources, e.g. the various libunwind or 
compiler-rt vs libgcc.  Libomptarget in general appears to be considered fixed 
and has external users (intel, maybe gcc).

The device runtime would be ill served by this default. This is the only openmp 
device runtime library which works with llvm. It's statically linked, usually 
as bitcode when performance is important. The code used to handle target 
offloading for nvptx is spread across the compiler and the runtime, probably 
not optimally.

I'm not familiar with the gcc-nvptx-openmp implementation. Reading through 
https://gcc.gnu.org/wiki/Offloading strongly suggests a totally independent 
implementation to this one. I don't think gcc can be using this runtime library 
for nvptx. It definitely doesn't for amdgcn. Proprietary compilers might be 
using this code, but we can have no duty of care to toolchains that use this 
code without telling us they're doing so.

Therefore the only backwards/forwards compatibility we can strive for is 
between different versions of clang and the device runtime. That seems 
potentially useful - one could use a release clang binary while working on the 
deviceRTL or vice versa. It's an expensive developer convenience though.

We would pay with things like the API rot fixed above. Introducing a faster 
lowering for an openmp construct would mean a redundant path through clang and 
some version checking to guess which runtime library we're targeting, which is 
not presently versioned. Likewise moving code between compiler and runtime 
becomes much more expensive to implement. Getting it wrong is inevitable given 
our test coverage.

I think the project is much better served by assuming that the runtime library 
used by clang is the one from the same hash in the monorepo. That leaves us 
free to fix debt and improve performance, at the price of needing to build 
clang from (near to) trunk while developing the rtl.

Perhaps we can embrace API stability later on, once we have things like 
versioning and a solid optimisation pipeline in place, especially if gcc wants 
to use the deviceRTL for nvptx. Now is too early.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83268/new/

https://reviews.llvm.org/D83268



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments

2020-07-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Aside from the API stability concern this looks uncontentious. Removes dead 
arguments, generally makes things simpler. Thus LGTM.

@Hahnfeld @ABataev - are you sufficiently persuaded that preserving the current 
interface is not worth the development cost?




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:42
-  /// *outlined_function, int16_t
-  /// IsOMPRuntimeInitialized);
   OMPRTL_NVPTX__kmpc_kernel_prepare_parallel,

ABataev wrote:
> I think, instead the optimizer can try to detect if the runtime library is 
> used by the kernel and switch this flag to `0` if no runtime calls are used 
> in the kernel. For non-SPMD mode in most cases, the runtime is required, but 
> in some cases, it can be disabled.
If we can detect that no runtime calls are used, we should be able to do better 
than passing a different argument. E.g. delete some setup calls.

Failing that, if we want to pass an argument which says 'actually don't do any 
work', it shouldn't be the same argument used to check whether the runtime has 
been initialised.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83268/new/

https://reviews.llvm.org/D83268



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments

2020-07-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D83268#2135938 , @ABataev wrote:

> > @Hahnfeld @ABataev - are you sufficiently persuaded that preserving the 
> > current interface is not worth the development cost?
>
> No, I'm not. Long before that, we relied on the API stability and already 
> have some runtime calls marked as deprecated. Especially taking into account, 
> that libomp can be built separately.


Yes, the existing v# naming and deprecated comments should also go.

What can libomp be built by separately? Nvcc and gcc don't use this runtime. 
That leaves us with downstream proprietary compilers derived from clang that 
are already stuck carrying extensive compatibility patches and usually ship as 
one large toolchain blob which only needs to be internally self consistent.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83268/new/

https://reviews.llvm.org/D83268



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83349: [OpenMP][NFC] Remove unused and untested code from the device runtime

2020-07-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Applied to the amdgcn implementation. Compiles fine, tests all passing. Seems 
likely that this lot really is dead.

Interesting that this removes*_data_sharing_environment. I think some of the 
allocated objects will be more obviously dead after this patch.

Love it. Thanks! We can have this as soon as we hit consensus on dropping the 
API stability aspiration


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83349/new/

https://reviews.llvm.org/D83349



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83268: [OpenMP][NFC] Remove unused (always fixed) arguments

2020-07-07 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

__kmpc_spmd_kernel_init is always called with RequiresDataSharing == 0
Specifically, it's only called from clang, and emitSPMDEntryHeader 
unconditionally passes zero to it

I.e. I think there's more stuff that can be cleaned up in the theme of the 
above, suggest in later patches


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83268/new/

https://reviews.llvm.org/D83268



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83492: [OpenMP] Use common interface to access GPU Grid Values

2020-07-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Changing to getGridValue would be useful for sharing parts of this with amdgcn.

The aomp toolchain handles codegen for amdgcn by adding if (isAMDGCN) to this 
file. Until such time as tregions obsoletes this code, I think we should go 
with layers instead of scattered conditionals.

I.e. rename CGOpenMPRuntimeNVPTX to CGOpenMPRuntimeGPU which contains code that 
is common to nvptx and amdgcn. That probably uses getGridValue() as a way to 
abstract over minor differences. Derive CGOpenMPRuntimeAMDGCN and 
CGOpenMPRuntimeNVPTX from CGOpenMPRuntimeGPU to implement (virtual) functions 
which are different between the two.




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:628
 static llvm::Value *getNVPTXWarpSize(CodeGenFunction &CGF) {
+  if (CGF.getTarget().getTriple().isAMDGCN()) {
+CGBuilderTy &Bld = CGF.Builder;

This looks unrelated to using the constants. Amdgcn doesn't have an 
nvvm_read_ptx_sreg_warpsize so does need a different means of accessing the 
wave size, but that's not directly related to using OMPGridValues


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83492/new/

https://reviews.llvm.org/D83492



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83591: [OpenMP][CUDA] Fix std::complex in GPU regions

2020-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Fine by me. Let's get nvptx working properly in tree now and work out how to 
wire up amdgcn subsequently. I'm sure a reasonable abstraction will present 
itself.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83591/new/

https://reviews.llvm.org/D83591



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83591: [OpenMP][CUDA] Fix std::complex in GPU regions

2020-07-10 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D83591#2145437 , @jdoerfert wrote:

> I did not know they are using __clang_cuda headers. (Site note, we should 
> rename them then.)


I also did not know that. I am repeatedly caught out by things named 'cuda', 
'nvptx' or '__nv' being used by amdgpu.

Perhaps we should refactor the __clang_cuda_* headers to make the distinctions 
between cuda, hip, openmp-nvptx, openmp-amdgcn clear(er).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83591/new/

https://reviews.llvm.org/D83591



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83723: [OpenMP] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU

2020-07-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

This is indeed the direction I had in mind. Broad strokes look right. I 
probably wouldn't notice an accidental change amidst the whitespace reshuffle. 
Very happy to read through line by line if you can split the whitespace change 
out.




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.h:377
   /// Choose a default value for the dist_schedule clause.
-  void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,
-  const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,
-  llvm::Value *&Chunk) const override;
+  void
+  getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,

It's worth avoiding whitespace-only changes in a large diff, even when it 
brings the code in line with clang-format's rules. Signal to noise is 
challenging enough without it.

Please would you leave the whitespace-only changes out? I usually open the diff 
in a friendly GUI and eyeball each segment to see if it can be dropped.

Feel free to fix the whitespace before or after.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83723/new/

https://reviews.llvm.org/D83723



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83723: [OpenMP] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU

2020-07-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp:39
+  // return constant compile-time target-specific warp size
+  unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
+  return Bld.getInt32(WarpSize);

ABataev wrote:
> This is new functionality, better to move it in a separate patch, and this 
> one mark as NFC.
Works for me. This patch shows how the per-target parts are intended to be 
done. First patch being totally NFC seems good.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83723/new/

https://reviews.llvm.org/D83723



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83281: [OpenMP] Allow traits for the OpenMP context selector `isa`

2020-07-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Agreed on tests. I like the mechanism - passing a string through to the backend 
as a way to dispatch between isa properties looks cleanly extensible. We 
probably do want to emit a warning when the backend claims it doesn't know 
anything about said string as it'll be prone to typos.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83281/new/

https://reviews.llvm.org/D83281



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83832: [OpenMP] Provide a flag to disable safety checks for GPU optimizations

2020-07-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think there's an unfortunate interaction with link time optimisation here. If 
there are external regions, but their code is combined with llvm-link before 
codegen, then a user could reasonably assume this flag is safe.

Would it would be correct to compile the individual source assuming there may 
be external uses, then llvm-link the source, then run the pass assuming there 
are no external uses? That's of interest to the amdgcn case as we can 
(currently) assume the whole program is available towards the end of the 
compilation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83832/new/

https://reviews.llvm.org/D83832



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83832: [OpenMP] Provide a flag to disable safety checks for GPU optimizations

2020-07-15 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I think this is good. It's dangerous, but it's also undocumented and has unsafe 
in the name.

I should be able to use this to sidestep limitations in the amdgpu function 
pointer implementation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83832/new/

https://reviews.llvm.org/D83832



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78759: Add Statically Linked Libraries

2020-06-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

This appears to have been committed without addressing all the comments or 
waiting for an acceptance from someone outside of our organisation. That 
doesn't seem right - am I missing part of the thread here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78759/new/

https://reviews.llvm.org/D78759



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics

2020-06-30 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

This patch declares the clang builtin as acting on signed values, but the IR 
intrinsic maps to an instruction which does an unsigned comparison. We don't 
have ISA support for a signed comparison equivalent. Addition is the same 
operation on signed or unsigned integers, but signed integer comparison is not 
equivalent to unsigned integer comparison.

  // 32bit
   tmp = MEM[ADDR];
   MEM[ADDR] = (tmp >= DATA) ? 0 : tmp + 1; // unsigned
  compare
   RETURN_DATA = tmp.

The builtins should be changed to take unsigned values, optionally making that 
clear from the naming scheme, perhaps  `__amdgcn_builtin_atomic_dec_u32`.

Apologies for not reviewing this the first time around.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80804/new/

https://reviews.llvm.org/D80804



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-17 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

The tests look good, but I can't see the implementation in this diff. Maybe a 
file missing from the patch? Can be hard to tell with phabricator, the error 
may be at my end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78495: [nfc] Accept addrspacecast allocas in InitTempAlloca

2020-04-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield created this revision.
JonChesterfield added reviewers: rjmccall, aaron.ballman, ABataev, jdoerfert, 
arsenm.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

[nfc] Accept addrspacecast allocas in InitTempAlloca
Changes the precondition to be slightly more permissive. Useful for amdgcn where
allocas are created with a cast to an address space.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78495

Files:
  clang/lib/CodeGen/CGExpr.cpp


Index: clang/lib/CodeGen/CGExpr.cpp
===
--- clang/lib/CodeGen/CGExpr.cpp
+++ clang/lib/CodeGen/CGExpr.cpp
@@ -125,8 +125,12 @@
 }
 
 void CodeGenFunction::InitTempAlloca(Address Var, llvm::Value *Init) {
-  assert(isa(Var.getPointer()));
-  auto *Store = new llvm::StoreInst(Init, Var.getPointer());
+  auto *Alloca = Var.getPointer();
+  assert(isa(Alloca) ||
+ (isa(Alloca) &&
+  isa(
+  cast(Alloca)->getPointerOperand(;
+  auto *Store = new llvm::StoreInst(Init, Alloca);
   Store->setAlignment(Var.getAlignment().getAsAlign());
   llvm::BasicBlock *Block = AllocaInsertPt->getParent();
   Block->getInstList().insertAfter(AllocaInsertPt->getIterator(), Store);


Index: clang/lib/CodeGen/CGExpr.cpp
===
--- clang/lib/CodeGen/CGExpr.cpp
+++ clang/lib/CodeGen/CGExpr.cpp
@@ -125,8 +125,12 @@
 }
 
 void CodeGenFunction::InitTempAlloca(Address Var, llvm::Value *Init) {
-  assert(isa(Var.getPointer()));
-  auto *Store = new llvm::StoreInst(Init, Var.getPointer());
+  auto *Alloca = Var.getPointer();
+  assert(isa(Alloca) ||
+ (isa(Alloca) &&
+  isa(
+  cast(Alloca)->getPointerOperand(;
+  auto *Store = new llvm::StoreInst(Init, Alloca);
   Store->setAlignment(Var.getAlignment().getAsAlign());
   llvm::BasicBlock *Block = AllocaInsertPt->getParent();
   Block->getInstList().insertAfter(AllocaInsertPt->getIterator(), Store);
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78495: [nfc] Accept addrspacecast allocas in InitTempAlloca

2020-04-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D78495#1992404 , @arsenm wrote:

> Needs test?


I'm not sure how to write said test. How do we normally hit asserts from the 
clang test suite?

This fires a lot in the openmp on amdgcn downstream branch, but I'm happy 
carrying this as a local patch until the rest of the clang change can be put up 
for review if preferred.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78495/new/

https://reviews.llvm.org/D78495



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78495: [nfc] Accept addrspacecast allocas in InitTempAlloca

2020-04-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield abandoned this revision.
JonChesterfield added a comment.

No problem. This isn't on the live path - the function is mostly called from 
openmp codegen and clang doesn't target openmp/amdgcn just yet. I'll roll this 
change into the codegen patch to enable that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78495/new/

https://reviews.llvm.org/D78495



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Amdgcn specific is fine by me. Hopefully that unblocks this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp:5
+
+void test_amdgcn_fence_failure() {
+

arsenm wrote:
> Does this really depend on C++? Can it use OpenCL like the other builtin 
> tests?This also belongs in a Sema* test directory since it's checking an error
Making it opencl-only would force some of the openmp runtime to be written in 
opencl, which is not presently the case. Currently that library is written in a 
dialect of hip, but there's a plan to implement it in openmp instead.

I'd much rather this builtin work from any language, instead of tying it to 
opencl, as that means one can use it from openmp target regions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75917/new/

https://reviews.llvm.org/D75917



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46472: [HIP] Support offloading by linker script

2020-04-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.
Herald added a reviewer: jdoerfert.



Comment at: cfe/trunk/lib/Driver/ToolChains/CommonArgs.cpp:1329
+  // Get the HIP offload tool chain.
+  auto *HIPTC = static_cast(
+  C.getSingleOffloadToolChain());

Should this be `toolchains::HipToolChain`?


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D46472/new/

https://reviews.llvm.org/D46472



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85735: [OpenMP] Context selector extensions for template functions

2020-08-11 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Nice. What makes it an extension? 5.0 / 2.3.5 claims "and where variant-func-id 
is the name of a function variant that is either a base language identifier or, 
for C++, a template-id." which suggests this could be always-on




Comment at: clang/lib/Sema/SemaOpenMP.cpp:5875
+// TODO: Verify types for templates eventually.
+if (!UDeclTy->isDependentType()) {
+  QualType NewType = Context.mergeFunctionTypes(

tabs!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85735/new/

https://reviews.llvm.org/D85735

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU

2020-08-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I think this is reasonable. It's unfortunate to have isnan return bool or int 
depending on the system headers, but considering we have that in a language 
that doesn't mangle the return type into the name the workaround seems OK.

I think `#define isnan()` in a system header will clobber the text inside the 
variant region. Perhaps we want some #ifdef isnan #undef isnan logic, or at 
least #ifdef isnan #warning?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85879/new/

https://reviews.llvm.org/D85879

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85877: [OpenMP] Support nested OpenMP context selectors (declare variant)

2020-08-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Thanks! Probably good to implement this just at the time math headers needs it 
as that gives us a real world example of the change working.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85877/new/

https://reviews.llvm.org/D85877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85878: [OpenMP] Context selector extensions for return value overloading

2020-08-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

If I recall correctly, &foo with variants of foo returns a pointer to the base. 
If we have no base, and disable_implicit_base, what does &foo yield? It should 
probably be a compilation error with some descriptive message


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85878/new/

https://reviews.llvm.org/D85878

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74361: [Clang] Undef attribute for global variables

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D74361#2218143 , @erichkeane wrote:

> Also, see this bug:  This crashes immediately when used on a template 
> instantiation: https://bugs.llvm.org/show_bug.cgi?id=47169

Thanks for the bug report! Template instantiations are missing from the test 
cases, I will go debugging.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74361/new/

https://reviews.llvm.org/D74361

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74361: [Clang] Undef attribute for global variables

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D74361#2218258 , @erichkeane wrote:

> I did a little debugging, and the problem is the template itself isn't a 
> complete type

That's clear cut then, thanks. This patch was limited to trivially 
constructible types, and we don't know whether the type is trivially 
constructible if it's incomplete. Thus it does require complete types and we're 
missing a check in sema


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74361/new/

https://reviews.llvm.org/D74361

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74361: [Clang] Undef attribute for global variables

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D74361#2218294 , @erichkeane wrote:

> Yep!  Declaring a global variable that isn't 'extern' with an incomplete type 
> is disallowed anyway, so if you call RequireCompleteType, you're likely just 
> diagnosing that early.
>
> You MIGHT have to do some work to allow:
>
>   struct S;
>   extern S foo __attribute__((loader_uninitialized));

I'll add that to the tests. Looking OK so far, added a check to SemaDecl and 
the crash is gone. No fix needed for C. Will have a patch up soon


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74361/new/

https://reviews.llvm.org/D74361

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85990: [Clang] Fix BZ47169, loader_uninitialized on incomplete types

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield created this revision.
JonChesterfield added reviewers: erichkeane, aaron.ballman, rjmccall.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.
JonChesterfield requested review of this revision.

[Clang] Fix BZ47169, loader_uninitialized on incomplete types

Reported by @erichkeane. Fix proposed by @erichkeane works, tests included.
Bug introduced in D74361 . Crash was on 
querying a CXXRecordDecl for
hasTrivialDefaultConstructor on an incomplete type. Fixed by calling
RequireCompleteType in the right place.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D85990

Files:
  clang/lib/Sema/SemaDecl.cpp
  clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
  clang/test/Sema/attr-loader-uninitialized.c
  clang/test/Sema/attr-loader-uninitialized.cpp


Index: clang/test/Sema/attr-loader-uninitialized.cpp
===
--- clang/test/Sema/attr-loader-uninitialized.cpp
+++ clang/test/Sema/attr-loader-uninitialized.cpp
@@ -9,6 +9,11 @@
 extern int external_rejected __attribute__((loader_uninitialized));
 // expected-error@-1 {{variable 'external_rejected' cannot be declared both 
'extern' and with the 'loader_uninitialized' attribute}}
 
+struct S;
+extern S incomplete_external_rejected __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'S'}}
+// expected-note@-3 {{forward declaration of 'S'}}
+
 int noargs __attribute__((loader_uninitialized(0)));
 // expected-error@-1 {{'loader_uninitialized' attribute takes no arguments}}
 
@@ -58,3 +63,12 @@
 
 nontrivial needs_trivial_ctor __attribute__((loader_uninitialized));
 // expected-error@-1 {{variable with 'loader_uninitialized' attribute must 
have a trivial default constructor}}
+
+struct Incomplete;
+Incomplete incomplete __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'Incomplete'}}
+// expected-note@-3 {{forward declaration of 'Incomplete'}}
+
+struct Incomplete s_incomplete __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'struct Incomplete'}}
+// expected-note@-7 {{forward declaration of 'Incomplete'}}
Index: clang/test/Sema/attr-loader-uninitialized.c
===
--- clang/test/Sema/attr-loader-uninitialized.c
+++ clang/test/Sema/attr-loader-uninitialized.c
@@ -10,6 +10,11 @@
 extern int external_rejected __attribute__((loader_uninitialized));
 // expected-error@-1 {{variable 'external_rejected' cannot be declared both 
'extern' and with the 'loader_uninitialized' attribute}}
 
+struct S;
+extern struct S incomplete_external_rejected 
__attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'struct S'}}
+// expected-note@-3 {{forward declaration of 'struct S'}}
+
 int noargs __attribute__((loader_uninitialized(0)));
 // expected-error@-1 {{'loader_uninitialized' attribute takes no arguments}}
 
@@ -35,3 +40,8 @@
 
 extern __attribute__((visibility("hidden"))) int extern_hidden 
__attribute__((loader_uninitialized));
 // expected-error@-1 {{variable 'extern_hidden' cannot be declared both 
'extern' and with the 'loader_uninitialized' attribute}}
+
+struct Incomplete;
+struct Incomplete incomplete __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'struct Incomplete'}}
+// expected-note@-3 {{forward declaration of 'struct Incomplete'}}
Index: clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
===
--- clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
+++ clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
@@ -28,3 +28,15 @@
 // Defining as arr2[] [[clang..]] raises the error: attribute cannot be 
applied to types
 // CHECK: @arr2 = global [4 x double] undef
 double arr2 [[clang::loader_uninitialized]] [4];
+
+template struct templ{T * t;};
+
+// CHECK: @templ_int = global %struct.templ undef, align 8
+templ templ_int [[clang::loader_uninitialized]];
+
+// CHECK: @templ_trivial = global %struct.templ.0 undef, align 8
+templ templ_trivial [[clang::loader_uninitialized]];
+
+// CHECK: @templ_incomplete = global %struct.templ.1 undef, align 8
+struct incomplete;
+templ templ_incomplete [[clang::loader_uninitialized]];
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -12476,6 +12476,11 @@
 }
 
 if (!Var->isInvalidDecl() && RealDecl->hasAttr()) 
{
+  if (RequireCompleteType(Var->getLocation(), Var->getType(),
+  diag::err_typecheck_decl_incomplete_type)) {
+Var->setInvalidDecl();
+return;
+  }
   if (CXXRecordDecl *RD = Var->getType()->getAsCXXRecordDecl()) {
 if (!RD->hasTrivialDefaultConstructor()) {
   Diag(Va

[PATCH] D74361: [Clang] Undef attribute for global variables

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Fix proposed at D85990 . Thanks for the 
detailed bug report!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74361/new/

https://reviews.llvm.org/D74361

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85990: [Clang] Fix BZ47169, loader_uninitialized on incomplete types

2020-08-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/test/Sema/attr-loader-uninitialized.cpp:14
+extern S incomplete_external_rejected __attribute__((loader_uninitialized));
+// expected-error@-1 {{variable has incomplete type 'S'}}
+// expected-note@-3 {{forward declaration of 'S'}}

erichkeane wrote:
> Should this give the 'cannot be declared 'extern' and with the 
> 'loader_uninitialized' attribute?' error instead?  it seems to make more 
> sense there, and is more specific.
It probably should, yes. I'll see whether I can hit that diagnostic instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85990/new/

https://reviews.llvm.org/D85990

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85878: [OpenMP] Context selector extensions for return value overloading

2020-08-16 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Link error seems reasonable to me. Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85878/new/

https://reviews.llvm.org/D85878

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85875: [OpenMP][FIX] Do not crash trying to print a missing (demangled) user condition

2020-08-16 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Seems OK to me as a local fix. I'm not sure about encoding 'missing condition' 
as 0, but that's a preexisting choice.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85875/new/

https://reviews.llvm.org/D85875

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

We can do this but should expect an increase in code size from having multiple 
internalised copies of the same function. There may be an incidental benefit if 
we can specialise some functions to the call site without additional cloning. 
Address of the same functions from different TUs will be inequal, which is 
wrong, but probably doesn't matter in practice.

It does have the major advantage that mlink-builtin-bitcode patches up the 
invalid IR on the fly, which is likely easier than changing the device libs or 
making IR mcpu-agnostic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.h:58
+  llvm::SmallVector
+  getHIPDeviceLibs(const llvm::opt::ArgList &Args) const override;
+

Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. I'd 
expect us to want only the common set. Hip.bc shouldn't have non-hip stuff in 
it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-12 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.h:58
+  llvm::SmallVector
+  getHIPDeviceLibs(const llvm::opt::ArgList &Args) const override;
+

jhuber6 wrote:
> JonChesterfield wrote:
> > Why hip device libs? There's a common set, plus a hip.bc plus a opencl.bc. 
> > I'd expect us to want only the common set. Hip.bc shouldn't have non-hip 
> > stuff in it.
> Existing virtual function, just re-used it.
Rename it rocm perhaps?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-13 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

march for openmp, mcpu for hip seems ok. Notably llc needs to be told this as 
well, using mcpu, which may be an issue for save-temps


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D133726: [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU

2022-09-14 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I don't like this but will concede it's quicker than changing device libs to 
contain IR that doesn't have to be patched on the fly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133726/new/

https://reviews.llvm.org/D133726

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D122352: [OpenMP] Do not create offloading entries for internal or hidden symbols

2022-03-23 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Sounds good to me too, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122352/new/

https://reviews.llvm.org/D122352

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D122504: [OpenMP] Make Ctor / Dtor functions have external visibility

2022-03-25 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

Nice, thanks. Wonder if we want protected visibility as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122504/new/

https://reviews.llvm.org/D122504

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D122069: [Object] Add binary format for bundling offloading metadata

2022-03-29 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added reviewers: grokos, ABataev, ronlieb, tianshilei1992.
JonChesterfield added a comment.

Added some reviewers. I'd much prefer this used an existing binary format, DIY 
is prone to errors and maintenance hassles down the road. Don't care as much 
about which format as about it being one with an existing, tested 
implementation and ideally existing inspection tools.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D131307: [Clang] Allow downgrading to a warning the diagnostic for setting a non fixed enum to a value outside the range of the enumeration values

2022-08-09 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Did some digging here. The function hsa_agent_get_info takes an argument of 
type hsa_agent_info_t, which has declared values in the range [0 24]. The 
implementation of that (in amd_gpu_agent fwiw) casts that argument to a size_t 
and then switches on it, checking those declared values and a bunch of 
extensions. This is used to provide vendor extensions through a vendor-agnostic 
interface.

This seems to be a case where C and C++ have diverged. As far as I can tell, C 
thinks an enum is an int, and anything that fits in an int can be stored in one 
and retrieved later. C23 lets one specify the underlying type. C++ evidently 
thinks the value stored must be within [min max] of the declaration, which is 
at least more flexible than requiring it be one in the declaration.

So I think the fix here is to change hsa_agent_info_t to include 
`HSA_AGENT_INFO_UNUSED_INCREASE_RANGE_OF_TYPE = INT32_MAX` so the vendor 
extensions remain accessible. It's a header that is usable from C and C++ so it 
needs to do something conforming to both. Does that sound right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131307/new/

https://reviews.llvm.org/D131307

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106070: [HIP] Remove workaround in __clang_hip_runtime_wrapper.h

2021-07-15 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

OK from the openmp gpu side as far as I can tell. This is probably another 
instance of where we really wanted _OPENMP_TARGET or similar.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106070/new/

https://reviews.llvm.org/D106070

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105221: [openmp][nfc] Simplify macros guarding math complex headers

2021-07-18 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

@fodinabor?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105221/new/

https://reviews.llvm.org/D105221

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105221: [openmp][nfc] Simplify macros guarding math complex headers

2021-07-18 Thread Jon Chesterfield via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG3e649f8ef187: [openmp][nfc] Simplify macros guarding math 
complex headers (authored by JonChesterfield).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105221/new/

https://reviews.llvm.org/D105221

Files:
  clang/lib/Headers/openmp_wrappers/complex
  clang/lib/Headers/openmp_wrappers/complex.h


Index: clang/lib/Headers/openmp_wrappers/complex.h
===
--- clang/lib/Headers/openmp_wrappers/complex.h
+++ clang/lib/Headers/openmp_wrappers/complex.h
@@ -17,7 +17,6 @@
 // We require math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__
Index: clang/lib/Headers/openmp_wrappers/complex
===
--- clang/lib/Headers/openmp_wrappers/complex
+++ clang/lib/Headers/openmp_wrappers/complex
@@ -17,7 +17,6 @@
 // We require std::math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__
@@ -26,9 +25,6 @@
 // Grab the host header too.
 #include_next 
 
-
-#ifdef __cplusplus
-
 // If we are compiling against libc++, the macro _LIBCPP_STD_VER should be set
 // after including  above. Since the complex header we use is a
 // simplified version of the libc++, we don't need it in this case. If we
@@ -48,5 +44,3 @@
 #pragma omp end declare variant
 
 #endif
-
-#endif


Index: clang/lib/Headers/openmp_wrappers/complex.h
===
--- clang/lib/Headers/openmp_wrappers/complex.h
+++ clang/lib/Headers/openmp_wrappers/complex.h
@@ -17,7 +17,6 @@
 // We require math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__
Index: clang/lib/Headers/openmp_wrappers/complex
===
--- clang/lib/Headers/openmp_wrappers/complex
+++ clang/lib/Headers/openmp_wrappers/complex
@@ -17,7 +17,6 @@
 // We require std::math functions in the complex builtins below.
 #include 
 
-#define __CUDA__
 #define __OPENMP_NVPTX__
 #include <__clang_cuda_complex_builtins.h>
 #undef __OPENMP_NVPTX__
@@ -26,9 +25,6 @@
 // Grab the host header too.
 #include_next 
 
-
-#ifdef __cplusplus
-
 // If we are compiling against libc++, the macro _LIBCPP_STD_VER should be set
 // after including  above. Since the complex header we use is a
 // simplified version of the libc++, we don't need it in this case. If we
@@ -48,5 +44,3 @@
 #pragma omp end declare variant
 
 #endif
-
-#endif
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures

2021-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a subscriber: ronlieb.
JonChesterfield added a comment.

@ronlieb bisected amdgpu crashing to this too, rocm 'veccopy' case tries to 
dereference 0. Might be the same failure mode as the above or a different one, 
the hsa error reporting is quite coarse grained.

Suggest we pull this and try to fix it up before reapplying


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106301: [OpenMP] Disable trap before unreachable for OpenMP device jobs

2021-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

What's the problem with emitting llvm.trap in various unreachable places? 
Wondering if it also affects translating assert to an llvm.trap


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106301/new/

https://reviews.llvm.org/D106301

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106301: [OpenMP] Disable trap before unreachable for OpenMP device jobs

2021-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

In D106301#2888170 , @jdoerfert wrote:

> llvm.trap is preserved, thus branches to an llvm.trap are preserved.

That's interesting. Consistent with IR in general,

  template  int test(int x) {
if (x < 42) {
  return x;
} else {
  if (Trap)
__builtin_trap();
  __builtin_unreachable();
}
  }
  
  extern "C" {
  int trap(int x) { return test(x); }
  int none(int x) { return test(x); }
  }

`=>`

  define i32 @trap(i32 returned %0) {
%2 = icmp slt i32 %0, 42
br i1 %2, label %4, label %3
  
  3:; preds = %1
tail call void @llvm.trap() #3
unreachable
  
  4:; preds = %1
ret i32 %0
  }
  
  define i32 @none(i32 returned %0)  {
%2 = icmp slt i32 %0, 42
tail call void @llvm.assume(i1 %2) #3
ret i32 %0
  }

So yes, we'll get faster codegen if we are willing to throw away traps followed 
by unreachable code.

If that's a legitimate transform to do, it seems like something we should do in 
instcombine, instead of a separate pass. I.e. fold `trap, unreachable` to 
`unreachable`.

Can we do that instead?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106301/new/

https://reviews.llvm.org/D106301

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106301: [OpenMP] Disable trap before unreachable for OpenMP device jobs

2021-07-19 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I'm not sure about that - we could tie instcombine to -O0 or some similar proxy 
for debugging ve performance - but I'm practice it's fairly likely that most 
traps are compiler inserted so it probably works out the same.

Conditional instcombine would let us remove much of the current logic for 
conditionally inserting traps which seems a win for implementation complexity.

Doesn't matter much for this patch, if D106299 
 lands then sure, let's switch it on for 
openmp GPU. If it goes the instcombine route then we don't need to toggle a 
switch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106301/new/

https://reviews.llvm.org/D106301

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-20 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

D105221  so LGTM too


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield updated this revision to Diff 360447.
JonChesterfield added a comment.

- rebase on main


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Headers/__clang_hip_cmath.h
  clang/lib/Headers/__clang_hip_math.h
  clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
  clang/lib/Headers/openmp_wrappers/cmath
  clang/lib/Headers/openmp_wrappers/math.h
  clang/test/Headers/Inputs/include/algorithm
  clang/test/Headers/Inputs/include/cstdlib
  clang/test/Headers/Inputs/include/utility
  clang/test/Headers/amdgcn_openmp_device_math.c
  clang/test/Headers/openmp_device_math_isnan.cpp

Index: clang/test/Headers/openmp_device_math_isnan.cpp
===
--- clang/test/Headers/openmp_device_math_isnan.cpp
+++ clang/test/Headers/openmp_device_math_isnan.cpp
@@ -21,14 +21,14 @@
 double math(float f, double d) {
   double r = 0;
   // INT_RETURN: call i32 @__nv_isnanf(float
-  // AMD_INT_RETURN: call i32 @_{{.*}}isnanf(float
+  // AMD_INT_RETURN: call i32 @__ocml_isnan_f32(float
   // BOOL_RETURN: call i32 @__nv_isnanf(float
-  // AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnanf(float
+  // AMD_BOOL_RETURN: call i32 @__ocml_isnan_f32(float
   r += std::isnan(f);
   // INT_RETURN: call i32 @__nv_isnand(double
-  // AMD_INT_RETURN: call i32 @_{{.*}}isnand(double
+  // AMD_INT_RETURN: call i32 @__ocml_isnan_f64(double
   // BOOL_RETURN: call i32 @__nv_isnand(double
-  // AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnand(double
+  // AMD_BOOL_RETURN: call i32 @__ocml_isnan_f64(double
   r += std::isnan(d);
   return r;
 }
Index: clang/test/Headers/amdgcn_openmp_device_math.c
===
--- /dev/null
+++ clang/test/Headers/amdgcn_openmp_device_math.c
@@ -0,0 +1,51 @@
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - | FileCheck %s --check-prefixes=CHECK-C,CHECK
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - | FileCheck %s --check-prefixes=CHECK-CPP,CHECK
+
+#ifdef __cplusplus
+#include 
+#else
+#include 
+#endif
+
+void test_math_f64(double x) {
+// CHECK-LABEL: define {{.*}}test_math_f64
+#pragma omp target
+  {
+// CHECK: call double @__ocml_sin_f64
+double l1 = sin(x);
+// CHECK: call double @__ocml_cos_f64
+double l2 = cos(x);
+// CHECK: call double @__ocml_fabs_f64
+double l3 = fabs(x);
+  }
+}
+
+void test_math_f32(float x) {
+// CHECK-LABEL: define {{.*}}test_math_f32
+#pragma omp target
+  {
+// CHECK-C: call double @__ocml_sin_f64
+// CHECK-CPP: call float @__ocml_sin_f32
+float l1 = sin(x);
+// CHECK-C: call double @__ocml_cos_f64
+// CHECK-CPP: call float @__ocml_cos_f32
+float l2 = cos(x);
+// CHECK-C: call double @__ocml_fabs_f64
+// CHECK-CPP: call float @__ocml_fabs_f32
+float l3 = fabs(x);
+  }
+}
+void test_math_f32_suffix(float x) {
+// CHECK-LABEL: define {{.*}}test_math_f32_suffix
+#pragma omp target
+  {
+// CHECK: call float @__ocml_sin_f32
+float l1 = sinf(x);
+// CHECK: call float @__ocml_cos_f32
+float l2 = cosf(x);
+// CHECK: call float @__ocml_fabs_f32
+float l3 = fabsf(x);
+  }
+}
Index: clang/test/Headers/Inputs/include/utility
===
--- /dev/null
+++ clang/test/Headers/Inputs/include/utility
@@ -0,0 +1,2 @@
+#pragma once
+
Index: clang/test/Headers/Inputs/include/cstdlib
===
--- clang/test/Headers/Inputs/include/cstdlib
+++ clang/test/Headers/Inputs/include/cstdlib
@@ -21,9 +21,13 @@
 inline long long
 abs(long long __x) { return __builtin_llabs (__x); }
 
+// amdgcn already provides definition of fabs
+#ifndef __AMDGCN__
 float fabs(float __x) { return __builtin_fabs(__x); }
+#endif
 
 float abs(

[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG968899ad9cf1: [OpenMP][AMDGCN] Initial math headers support 
(authored by pdhaliwal, committed by JonChesterfield).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Headers/__clang_hip_cmath.h
  clang/lib/Headers/__clang_hip_math.h
  clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
  clang/lib/Headers/openmp_wrappers/cmath
  clang/lib/Headers/openmp_wrappers/math.h
  clang/test/Headers/Inputs/include/algorithm
  clang/test/Headers/Inputs/include/cstdlib
  clang/test/Headers/Inputs/include/utility
  clang/test/Headers/amdgcn_openmp_device_math.c
  clang/test/Headers/openmp_device_math_isnan.cpp

Index: clang/test/Headers/openmp_device_math_isnan.cpp
===
--- clang/test/Headers/openmp_device_math_isnan.cpp
+++ clang/test/Headers/openmp_device_math_isnan.cpp
@@ -21,14 +21,14 @@
 double math(float f, double d) {
   double r = 0;
   // INT_RETURN: call i32 @__nv_isnanf(float
-  // AMD_INT_RETURN: call i32 @_{{.*}}isnanf(float
+  // AMD_INT_RETURN: call i32 @__ocml_isnan_f32(float
   // BOOL_RETURN: call i32 @__nv_isnanf(float
-  // AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnanf(float
+  // AMD_BOOL_RETURN: call i32 @__ocml_isnan_f32(float
   r += std::isnan(f);
   // INT_RETURN: call i32 @__nv_isnand(double
-  // AMD_INT_RETURN: call i32 @_{{.*}}isnand(double
+  // AMD_INT_RETURN: call i32 @__ocml_isnan_f64(double
   // BOOL_RETURN: call i32 @__nv_isnand(double
-  // AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnand(double
+  // AMD_BOOL_RETURN: call i32 @__ocml_isnan_f64(double
   r += std::isnan(d);
   return r;
 }
Index: clang/test/Headers/amdgcn_openmp_device_math.c
===
--- /dev/null
+++ clang/test/Headers/amdgcn_openmp_device_math.c
@@ -0,0 +1,51 @@
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - | FileCheck %s --check-prefixes=CHECK-C,CHECK
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - | FileCheck %s --check-prefixes=CHECK-CPP,CHECK
+
+#ifdef __cplusplus
+#include 
+#else
+#include 
+#endif
+
+void test_math_f64(double x) {
+// CHECK-LABEL: define {{.*}}test_math_f64
+#pragma omp target
+  {
+// CHECK: call double @__ocml_sin_f64
+double l1 = sin(x);
+// CHECK: call double @__ocml_cos_f64
+double l2 = cos(x);
+// CHECK: call double @__ocml_fabs_f64
+double l3 = fabs(x);
+  }
+}
+
+void test_math_f32(float x) {
+// CHECK-LABEL: define {{.*}}test_math_f32
+#pragma omp target
+  {
+// CHECK-C: call double @__ocml_sin_f64
+// CHECK-CPP: call float @__ocml_sin_f32
+float l1 = sin(x);
+// CHECK-C: call double @__ocml_cos_f64
+// CHECK-CPP: call float @__ocml_cos_f32
+float l2 = cos(x);
+// CHECK-C: call double @__ocml_fabs_f64
+// CHECK-CPP: call float @__ocml_fabs_f32
+float l3 = fabs(x);
+  }
+}
+void test_math_f32_suffix(float x) {
+// CHECK-LABEL: define {{.*}}test_math_f32_suffix
+#pragma omp target
+  {
+// CHECK: call float @__ocml_sin_f32
+float l1 = sinf(x);
+// CHECK: call float @__ocml_cos_f32
+float l2 = cosf(x);
+// CHECK: call float @__ocml_fabs_f32
+float l3 = fabsf(x);
+  }
+}
Index: clang/test/Headers/Inputs/include/utility
===
--- /dev/null
+++ clang/test/Headers/Inputs/include/utility
@@ -0,0 +1,2 @@
+#pragma once
+
Index: clang/test/Headers/Inputs/include/cstdlib
===
--- clang/test/Headers/Inputs/include/cstdlib
+++ clang/test/Headers/Inputs/include/cstdlib
@@ -21,9 +21,13 @@
 inline long long
 abs(long long __x) { return __builtin_l

[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Landed on @pdhaliwal's behalf. My expectation is that this patch mostly works 
and the rough edges can be cleaned up once ocml is linked in and we can more 
easily run more applications through it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Thanks! Will take a look. Feel free to revert, I'll do so shortly if noone 
beats me to it


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

cstdlib test header contains

  // amdgcn already provides definition of fabs
  #ifndef __AMDGCN__
  float fabs(float __x) { return __builtin_fabs(__x); }
  #endif

If I delete or invert the ifndef

> $HOME/llvm-build/llvm/lib/clang/13.0.0/include/__clang_hip_cmath.h:660:9: 
> error: target of using declaration conflicts with declaration already in scope
>  using ::fabs;
>  when included from openmp_wrappers/cmath

If I delete the definition,

> $HOME/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: 
> use of undeclared identifier 'fabs'
> when included from openmp_wrappers/__clang_openmp_device_functions.h

Current conclusion is that we cannot work around the presence/absence of fabs 
in the cstdlib test file, we have to do something in the real headers such that 
the test file does the right thing


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106542: [OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments.

2021-07-22 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Nice, thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106542/new/

https://reviews.llvm.org/D106542

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106793: [OpenMP] Add a driver flag to enable the new device runtime library

2021-07-26 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

Can we call this something other than new? We don't tend to remove command line 
arguments and this won't make much sense once it's the only runtime.

I'd be inclined to add an argument called 'use_legacy_runtime' or similar, 
which defaults to true


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106793/new/

https://reviews.llvm.org/D106793

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106793: [OpenMP] Add a driver flag to enable the new device runtime library

2021-07-26 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:229
+ options::OPT_fno_openmp_target_new_runtime, false))
+BitcodeSuffix = "new-amdgcn-" + GPUArch;
+  else

Likewise here, how about amdgcn-legacy-. Taking advantage of the monorepo + no 
guarantees that mix&match clang and devicertl works.

Side note, someone should probably rename the amdgcn devicertl to amdgpu, since 
the gfx10 stuff is the 'rdna' arch instead of the 'gcn' one.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106793/new/

https://reviews.llvm.org/D106793

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106870: [OpenMP] Multi architecture compilation support

2021-07-27 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

There seems to be a bunch of different things in this patch.

There's some driver plumbing to compile for more than one arch (presumably by 
calling the target compiler N times). That's a great feature, I want to build 
an application bthat can run on nvptx or amdgpu. Probably need a test case 
showing that combination.

Then there's a bunch of stuff to do with 'requirements', but it's not clear 
what that is.

Finally there's some stuff where libomptarget dlopens itself then spawns 
amdgpu-arch. I can't tell why we would want to do that.

My guess was that each arch would get its own section in the host executable 
containing a code object and each host plugin would be responsible for 
indicating whether it could do anything with a given code object. That should 
work out of the box for machines with only one offloading arch available, and 
need some work around device_id to handle multiple ones.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106870/new/

https://reviews.llvm.org/D106870

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105981: [AMDGPU][OpenMP] Support linking of math libraries

2021-07-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added a comment.

I don't like that this pulls in ockl automatically but don't think that's a 
blocker. OK on my side, @yaxunl?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105981/new/

https://reviews.llvm.org/D105981

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105981: [AMDGPU][OpenMP] Support linking of math libraries

2021-07-28 Thread Jon Chesterfield via Phabricator via cfe-commits
JonChesterfield added inline comments.



Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:831-860
   auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);
   const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);
   std::string LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);
   if (LibDeviceFile.empty()) {
 getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 1 << GpuArch;
 return;
   }

yaxunl wrote:
> I think we'd better absorb this part into the newly added function 
> getCommonDeviceLibOptions so that we have a centralized location for 
> determining device libs. We could use offload kind of the toolchain to 
> differentiate between OpenCL/HIP/OpenMP.
getCommonBitcodeLibs is called by opencl with some other set of constraints 
around argument names.

Persuading opencl to use the same arguments, getting rid of some of the files, 
doing things with aliasing, or however else we want to dice this problem is 
separable from linking the bitcode into openmp and can be left for a later 
patch. Using a common path for HIP and OpenMP seems a step in the right 
direction.

It might take quite a long time to reach consensus on how to deduplicate the 
two remaining copies, which I'd guess is why they were copy/pasted to begin 
with.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105981/new/

https://reviews.llvm.org/D105981

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   >