[clang] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-03-11 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/84420 >From 677e374d1a0ca87d734c03aa2e97e73510e04e4e Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Thu, 7 Mar 2024 15:48:00 -0600 Subject: [PATCH] [Offload] Move HIP and CUDA to new driver by default Summary: This

[clang] [Offload] Move HIP and CUDA to new driver by default (PR #84420)

2024-03-12 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Do you mean the SPIR-V target (backend)? I have not followed this area of > work closely. What is missing or what exactly needs to be supported by the > SPIR-V target? Any help or pointers would be greatly appreciated! I believe there was some work to port SYCL to work with th

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-23 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > @jhuber6 , looks like these changes break the following builds > > * https://lab.llvm.org/buildbot/#/builders/235/builds/5630 > > * https://lab.llvm.org/buildbot/#/builders/232/builds/19808 > > > there are a lot of CMake error messages started with > > ``` > CMake Er

[clang] [libc] [llvm] [openmp] [libc] Rework the GPU build to be a regular target (PR #81921)

2024-02-23 Thread Joseph Huber via cfe-commits
jhuber6 wrote: @vvereschaka Should be fixed now. https://github.com/llvm/llvm-project/pull/81921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Append target search paths for direct offloading compilation (PR #82699)

2024-02-23 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/82699 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/81058 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 commented: Some nits, mostly just formatting and naming that hasn't been updated. I agree overall that we should just put this in some canonical form and rely on other LLVM passes to take care of things like inlining. Eager to have this functionality in, so hopefully

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits
@@ -0,0 +1,701 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-24 Thread Joseph Huber via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-25 Thread Joseph Huber via cfe-commits
@@ -1306,14 +1306,50 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); } __DEVICE__ double min(double __x, double __y) { return __builtin_fmin(__x, __y); } +// Define host min/max functions. + #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) -_

[clang] [HIP] fix host min/max in header (PR #82956)

2024-02-26 Thread Joseph Huber via cfe-commits
@@ -1306,14 +1306,50 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); } __DEVICE__ double min(double __x, double __y) { return __builtin_fmin(__x, __y); } +// Define host min/max functions. + #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) -_

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-02-28 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83282 Summary: One recurring problem we have with the OpenMP libraries is that they are potentially conflicting with ones found on the system, this occurs when there are two copies and one is used for linking that it no

[clang] [Clang] Add 'CLANG_ALLOW_IMPLICIT_RPATH' to enable toolchain use of -rpath (PR #82004)

2024-02-28 Thread Joseph Huber via cfe-commits
jhuber6 wrote: So, I'm wondering if we could do a clang configuration file based solution for this. The problem that I see now is that we'd like to make some clang configuration files only active for a certain language. I think we already have OS specific files and target specific files, so it

[clang] [llvm] [HIP] Support compressing bundle by LZMA (PR #83297)

2024-02-28 Thread Joseph Huber via cfe-commits
jhuber6 wrote: This seems to be adding an entirely new compression scheme to LLVM. I feel like that should be a separate patch and the part where we make HIP use it is a follow-up. https://github.com/llvm/llvm-project/pull/83297 ___ cfe-commits maili

[clang] [llvm] [HIP] Support compressing bundle by LZMA (PR #83306)

2024-02-28 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/83306 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-02-28 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/83282 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Hi @jhuber6, @MaskRay > > We are having some problems with this patch on a server where the file > /lib64/libomptarget-nvptx-sm_52.bc exists. The test case that fails is > clang/test/Driver/openmp-offload-gpu.c. > > **Problem 1** I think one problem is related to this check l

[clang] [clang][AMDGPU] Don't define feature macros on host code (PR #83558)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: This was the original behavior of my patch, but I reverted it because it broke all the HIP headers that were unintentionally relying on this. Has that been resolved? https://github.com/llvm/llvm-project/pull/83558 ___ cfe-commits maili

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Problem 1 can be solved by flipping the order. But Problem 2 would remain as > it doesn't depend on the order. Honestly, we should just remove the second test. We just treat these things as libraries and it doesn't make sense for a test to ensure that `-lstdc++` doesn't exist

[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83573 Summary: We still use this bitcode library in one case, the NVPTX non-LTO build. The patch updated the search paths to treat it the same as other libraries, which unintentionally prioritized system paths over LIBR

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Problem 1 can be solved by flipping the order. But Problem 2 would remain as > it doesn't depend on the order. https://github.com/llvm/llvm-project/pull/83573 I made a patch to fix it. https://github.com/llvm/llvm-project/pull/83282

[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits
@@ -101,17 +101,6 @@ /// ### -/// Check that the warning is thrown when the libomptarget bitcode library is not found. -/// Libomptarget requires sm_52 or newer so an sm_52 bitcode library should never

[clang] [OpenMP] Fix test after updating library search paths (PR #83573)

2024-03-01 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/83573 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > ``` > yeluo@epyc-server:/soft/llvm/main-20240301/lib$ ls libomp* -l > lrwxrwxrwx 1 yeluo yeluo 34 Mar 1 11:18 libomptarget.rtl.amdgpu.so -> > libomptarget.rtl.amdgpu.so.19.0git > -r--r--r-- 1 yeluo yeluo 67532024 Mar 1 11:04 > libomptarget.rtl.amdgpu.so.19.0git > lrwxrw

[clang] [openmp] [OpenMP] Respect LLVM per-target install directories (PR #83282)

2024-03-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > It seems being installed twice both under `lib` and > `lib/x86_64-unknown-linux-gnu`. files are the identical as diff show nothing. Makes sense, like `add_llvm_library` is implicitly installing it there, then our subsequent `install` call is doing it again. I wonder if there's

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83906 Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the valu

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Note that this patch is not quite ready to land. I encountered issues when working with `s_setreg`. The listing of `SOPK` instructions should have this as an instruction that takes a 16-bit zero extended immediate value. However, this was apparently not the case for the `s_setre

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83906 >From d7e20596434636753610ceb4326ddc1116f0bdce Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 1 Mar 2024 15:28:32 -0600 Subject: [PATCH] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' Summary:

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { jhub

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/83906 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { jhub

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83906 >From e349a9d436cdb99f0d9fb8d6df772a600ca0ea94 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 1 Mar 2024 15:28:32 -0600 Subject: [PATCH] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' Summary:

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-04 Thread Joseph Huber via cfe-commits
@@ -1122,7 +1122,7 @@ class S_SETREG_B32_Pseudo pattern=[]> : SOPK_Pseudo < pattern>; def S_SETREG_B32 : S_SETREG_B32_Pseudo < - [(int_amdgcn_s_setreg (i32 SIMM16bit:$simm16), i32:$sdst)]> { + [(int_amdgcn_s_setreg (i32 timm:$simm16), i32:$sdst)]> { jhub

[clang] [AMDGPU] Introduce 'amdgpu_num_workgroups_{xyz}' builtin (PR #83927)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/83927 Summary: The AMDGPU traget was originally designed with OpenCL in mind. The first verisions only provided the grid size, which is the total numver of threads in the execution context. In order to get the number of

[clang] [AMDGPU] Introduce 'amdgpu_num_workgroups_{xyz}' builtin (PR #83927)

2024-03-04 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83927 >From 56059fdb5a0e22f8c7dcce6642899fdccf77a55b Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 4 Mar 2024 17:27:28 -0600 Subject: [PATCH] [AMDGPU] Introduce 'amdgpu_num_workgroups_{xyz}' builtin Summary:

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 commented: We had a lot that were like this previously. Guessing this one slipped through because of the `zlib` requirement. Does this work with `-nogpulib` instead? Usually easier than passing the dummy CUDA path. https://github.com/llvm/llvm-project/pull/84008

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83906 >From 7808b8a0f4ab70733ebff4a6b8793f4918d0107b Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 1 Mar 2024 15:28:32 -0600 Subject: [PATCH] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' Summary:

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > It definitely doesn't work for the "pure" CUDA invocations, it still finds my > local installation and complains. It might work for the OpenMP invocations, > but hard to tell for me on a system with CUDA installed. As it's a `.cu` test > after all, I think I would prefer the u

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > Might need `-nogpulib -nogpuinc` in those cases, we do that in other `.cu` > > files in the test suite. > > No, I already tried that, it doesn't work for me. All > `clang/test/Driver/*.cu` that supply `-nocudainc` also pass `--cuda-path`... The only reason it will fail with

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
@@ -325,6 +325,9 @@ BUILTIN(__builtin_amdgcn_read_exec_hi, "Ui", "nc") BUILTIN(__builtin_amdgcn_endpgm, "v", "nr") +BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n") jhuber6 wrote: There's no builtin as far as I'm aware. I think there might be some pragmas ho

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: In any case, it's not really important and this works. I'm mostly just curious why it doesn't seem to work as I would expect since there might be something to fix. https://github.com/llvm/llvm-project/pull/84008 ___ cfe-commits mailing

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83906 >From 169f8914270725bd94b14a20f5f91005ce59f494 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 1 Mar 2024 15:28:32 -0600 Subject: [PATCH] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' Summary:

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > The invocations in `clang/test/Driver/cuda-omp-unsupported-debug-options.cu` > don't pass `-emit-llvm` but `-###`. > > ``` > > cat /dev/null | ./bin/clang -### -x cuda - -nogpulib -nogpuinc -c && echo > $? > ``` > > should error with recent CUDA installations because of `sm_

[clang] [CUDA] Correctly set CUDA default architecture (PR #84017)

2024-03-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/84017 Summary: We already had a special CUDA default that better tracked the state as of modern CUDA installations. Recently this was bumped up to `sm_52`, but there was a location that wasn't respecting this. Fix that.

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Found it https://github.com/llvm/llvm-project/pull/84017. https://github.com/llvm/llvm-project/pull/84008 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Add cuda-path arguments to failing test (PR #84008)

2024-03-05 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Ok, but that still doesn't change the fact that the Clang driver will search > for a system-wide CUDA installation unless passed `--cuda-path`... We also do this with the GCC toolchain, the issue is whether or not there's an error if it didn't find it. Doing `clang -v` will al

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
@@ -839,6 +839,18 @@ unsigned test_wavefrontsize() { return __builtin_amdgcn_wavefrontsize(); } +// CHECK-LABEL test_get_fpenv( jhuber6 wrote: Is this related to the DX10 clamp / traps potentially being disabled? Or is this an LLVM concept. https://github

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
@@ -839,6 +839,18 @@ unsigned test_wavefrontsize() { return __builtin_amdgcn_wavefrontsize(); } +// CHECK-LABEL test_get_fpenv( jhuber6 wrote: Hm, I'm not sure. I feel like this is just letting the user access the hardware directly which has a different us

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/83906 >From 1cb734f3df298a34d76f7c9ee059dff84ba50c10 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Fri, 1 Mar 2024 15:28:32 -0600 Subject: [PATCH] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' Summary:

[clang] [llvm] [AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (PR #83906)

2024-03-05 Thread Joseph Huber via cfe-commits
@@ -839,6 +839,18 @@ unsigned test_wavefrontsize() { return __builtin_amdgcn_wavefrontsize(); } +// CHECK-LABEL test_get_fpenv( jhuber6 wrote: Added a sema check. https://github.com/llvm/llvm-project/pull/83906 _

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > Right now if you specify target-cpu you get target-cpu attributes, which is > > what we don't want. > > I'm fine handling 'generic' in a special way under the hood and not > specifying target-CPU. > > My concern is about user-facing interface. Command line options must be

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80035 Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into ano

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: Rework of https://github.com/llvm/llvm-project/pull/79660 to handle old behavior of these being defined for the host. https://github.com/llvm/llvm-project/pull/80035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
@@ -175,6 +175,8 @@ Predefined Macros - Defined when the GPU default stream is set to per-thread mode. * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. + * - ``__AMDGCN_WAVEFRONT_SIZE__``

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80035 >From f606aaa9c711d2ece6b1600160a61232abb69eb4 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 08:46:14 -0600 Subject: [PATCH 1/2] [AMDGPU] Do not emit arch dependent macros with unspecified c

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
@@ -175,6 +175,8 @@ Predefined Macros - Defined when the GPU default stream is set to per-thread mode. * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. + * - ``__AMDGCN_WAVEFRONT_SIZE__``

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/80035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] 626fe71 - [Clang] Fix test failing on systems without ROCm installed

2024-01-30 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-30T13:17:02-06:00 New Revision: 626fe71fa5ed79cbd41b7b29582560d7adb1220e URL: https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e DIFF: https://github.com/llvm/llvm-project/commit/626fe71fa5ed79cbd41b7b29582560d7adb1220e.diff

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > This seems to break tests: http://45.33.8.238/linux/129493/step_7.txt > > Please take a look and revert for now if it takes a while to fix. Is it still broken? I pushed a fix because I'm pretty sure the problem was not passing `-nogpulib` `-nogpuinc` so the test runs on machin

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not > with: > > Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang :: > Driver/target-id-macros.cl Thanks, seeing it locally now. I'll try to fix it quick and revert if it's not working soon. https

[clang] 6fecfbc - [AMDGPU] Correctly exclude the HIP host from arch macros

2024-01-30 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-01-30T13:45:01-06:00 New Revision: 6fecfbc7b62f54bd633e83c22630d7c2a3e5741e URL: https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e DIFF: https://github.com/llvm/llvm-project/commit/6fecfbc7b62f54bd633e83c22630d7c2a3e5741e.diff

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > i.e. it helped with Clang :: Preprocessor/predefined-arch-macros.c but not > with: > > Failed Tests (2): Clang :: Driver/amdgpu-macros.cl Clang :: > Driver/target-id-macros.cl Pushed a fix, `check-clang` passes on my machine now. Let me know if it's still broken. https://gi

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80066 Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a sepcial section called something lik

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-30 Thread Joseph Huber via cfe-commits
jhuber6 wrote: This is related to the discussions at the https://github.com/llvm/llvm-project/issues/77018 issue. https://github.com/llvm/llvm-project/pull/80066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/ma

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79873 >From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 11:08:04 -0600 Subject: [PATCH 1/2] [NVPTX] Allow compiling LLVM-IR without `-march` set Summary:

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/79873 >From 35e12c3d83f3be93618805ffaf05e3424689f32f Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Mon, 29 Jan 2024 11:08:04 -0600 Subject: [PATCH 1/3] [NVPTX] Allow compiling LLVM-IR without `-march` set Summary:

[clang] [NVPTX] Allow compiling LLVM-IR without `-march` set (PR #79873)

2024-01-30 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/79873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH] [LinkerWrapper] Support relocatable linking for offloading Summar

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -181,5 +181,6 @@ __attribute__((visibility("protected"), used)) int x; // RUN: --linker-path=/usr/bin/ld.lld -- -r --whole-archive %t.a --no-whole-archive \ // RUN: %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=RELOCATABLE-LINK jhuber6 wrote: Added

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > So, the idea is to carry two separate embedded offloading sections -- one for > already fully linked GPU executables, and another for GPU objects to be > linked at the final link stage. > It's more or less doing `-fno-gpu-rdc` on a subset of files. So you can do GPU linking

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/80066 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/2] [LinkerWrapper] Support relocatable linking for offloading S

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/3] [LinkerWrapper] Support relocatable linking for offloading S

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > Supporting such mixed mode opens an interesting set of issues we may need to > consider going forward: > > who/where/how runs initializers in the fully linked parts? I'm assuming you're talking about GPU-side constructors? I don't think the CUDA runtime supports those, but Op

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80183 Summary: Currently we cannot compile `__builtin_amdgcn_ballot_w64` on non-wave64 targets even though it is valid. This is relevant for making library code that can handle both without needing to check the wavefron

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > I'm assuming you're talking about GPU-side constructors? I don't think the > > CUDA runtime supports those, but OpenMP runs them when the image is loaded, > > so it would handle both independantly. > > Yes. I'm thinking of the expectations from a C++ user standpoint, and thi

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_ba

[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > > the idea is that it would be the desired effect if someone went out of > > their way to do this GPU subset linking thing. > > That would only be true when someone owns the whole build. That will not be > the case in practice. A large enough project is usually a bunch of libr

[clang] [HIP] fix HIP detection for /usr (PR #80190)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. Do we have any tests for this kind of stuff? We really should have some mock ROCm installation in one of the `Inputs/` directories and then do `--rocm-path=` or something. https://github.com/llvm/llvm-project/pull/80190 ___

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/4] [LinkerWrapper] Support relocatable linking for offloading S

[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80066 >From af382e03e41ef679c35a6126a1b131a7a8a28360 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 30 Jan 2024 15:34:22 -0600 Subject: [PATCH 1/5] [LinkerWrapper] Support relocatable linking for offloading S

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_ba

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-01-31 Thread Joseph Huber via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_ba

[clang] [HIP] fix HIP detection for /usr (PR #80190)

2024-01-31 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/80190 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
jhuber6 wrote: > After this change is there any value in having two different builtins? You > could just have one that always return 64 bits. I personally think it would be better to just have the one, but I figured that decision was made earlier and it would break backwards compatibility. ht

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
@@ -4,13 +4,10 @@ // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s +// expected-no-diagnostics + typedef unsigned long ulong; void test_ba

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/80183 >From 26b75cdba1aebc881e52dc82ca61e1082ef67a5e Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Wed, 31 Jan 2024 13:18:04 -0600 Subject: [PATCH] [AMDGPU] Allow w64 ballot to be used on w32 targets Summary: Curr

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Joseph Huber via cfe-commits
@@ -4,13 +4,10 @@ // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s +// expected-no-diagnostics + typedef unsigned long ulong; void test_ba

[llvm] [openmp] [clang] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy &Device) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) { --

[llvm] [openmp] [clang] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy &Device) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) { --

[openmp] [clang] [llvm] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy &Device) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) { --

[openmp] [clang] [llvm] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-02 Thread Joseph Huber via cfe-commits
@@ -199,7 +199,7 @@ static int initLibrary(DeviceTy &Device) { Entry.size) != OFFLOAD_SUCCESS) REPORT("Failed to write symbol for USM %s\n", Entry.name); } -} else { +} else if (Entry.addr) { --

[openmp] [clang] [llvm] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef HostFilePath) { loadOffloadInfoMetadata(*M.get()); } -Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) { jhuber6 wrote: Thanks for the heads up. Do you know if

[clang] [llvm] [openmp] [mlir] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef HostFilePath) { loadOffloadInfoMetadata(*M.get()); } -Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) { jhuber6 wrote: That looks a little weird, the `i32` val

[llvm] [openmp] [clang] [mlir] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef HostFilePath) { loadOffloadInfoMetadata(*M.get()); } -Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) { jhuber6 wrote: I encoded the fact that this is a "requi

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/80183 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] d172286 - [Clang] Make AMDGPU OpenCL tests require AMD registered target

2024-02-05 Thread Joseph Huber via cfe-commits
Author: Joseph Huber Date: 2024-02-05T09:08:31-06:00 New Revision: d1722868d34a69df8466b72098176f54a7af8823 URL: https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823 DIFF: https://github.com/llvm/llvm-project/commit/d1722868d34a69df8466b72098176f54a7af8823.diff

[mlir] [clang] [llvm] [openmp] [OpenMP] Remove `register_requires` global constructor (PR #80460)

2024-02-05 Thread Joseph Huber via cfe-commits
@@ -6872,35 +6883,6 @@ void OpenMPIRBuilder::loadOffloadInfoMetadata(StringRef HostFilePath) { loadOffloadInfoMetadata(*M.get()); } -Function *OpenMPIRBuilder::createRegisterRequires(StringRef Name) { jhuber6 wrote: It was a very obvious problem. I mixed u

[clang] [AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (PR #80741)

2024-02-05 Thread Joseph Huber via cfe-commits
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/80741 Summary: The backend supports the wavefrontsize intrinsic, and suggests that it is tied to a corresponding clang builtin, but it is not actually present. This simply adds it in so it can be used from clang. This a

<    1   2   3   4   5   6   7   8   9   10   >