[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2799472 , @tra wrote: > In D101630#2799425 , @yaxunl wrote: > >> But how do we control emitting LLVM IR with or without bundle? `-emit-llvm >> -emit-gpu-object` or `-emit-llvm -

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2799425 , @yaxunl wrote: > But how do we control emitting LLVM IR with or without bundle? `-emit-llvm > -emit-gpu-object` or `-emit-llvm -emit-gpu-bundle`? `-emit-*` is usually for > specifying a specific file type. Hmm.

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2799409 , @tra wrote: > In D101630#2798975 , @yaxunl wrote: > >> For sure we will need -fgpu-bundle-device-output to control bundling of >> intermediate files. Then adding -emit

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2798975 , @yaxunl wrote: > For sure we will need -fgpu-bundle-device-output to control bundling of > intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be > redundant and can cause confusion. What i

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2792160 , @tra wrote: > In D101630#2792052 , @yaxunl wrote: > >> I think for intermediate outputs e.g. preprocessor expansion, IR, and >> assembly, probably it makes sense not t

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2792052 , @yaxunl wrote: > I think for intermediate outputs e.g. preprocessor expansion, IR, and > assembly, probably it makes sense not to bundle by default. Agreed. > However, for default action (emitting object), we n

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-01 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2791734 , @tra wrote: > In D101630#2787714 , @yaxunl wrote: > >> How does nvcc --genco behave when there are multiple GPU arch's? Does it >> output a fat binary containing multi

[PATCH] D101630: [HIP] Fix device-only compilation

2021-06-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2787714 , @yaxunl wrote: > How does nvcc --genco behave when there are multiple GPU arch's? Does it > output a fat binary containing multiple ISA's? Also, does it support > device-only compilation for intermediate outputs

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-28 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#202 , @tra wrote: > In D101630#2777346 , @yaxunl wrote: > >> In D101630#2748513 , @tra wrote: >> >>> How about this: >>> If the user

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: echristo. tra added a comment. In D101630#2777346 , @yaxunl wrote: > In D101630#2748513 , @tra wrote: > >> How about this: >> If the user explicitly specified `--cuda-host-only` or `--cuda-

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 347400. yaxunl added a comment. fixed option. bundle output if users specify output by -o or -E CHANGES SINCE LAST ACTION https://reviews.llvm.org/D101630/new/ https://reviews.llvm.org/D101630 Files: clang/include/clang/Driver/Options.td clang/lib/Dri

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-24 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done. yaxunl added a comment. In D101630#2748513 , @tra wrote: > How about this: > If the user explicitly specified `--cuda-host-only` or `--cuda-device-only`, > then by default only allow producing the natural output f

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D101630#2744861 , @yaxunl wrote: > [snip] it is the convention for compiler to have one output. > The compilation is like a pipeline. If we break it into stages, users would > expect to use the output from one stage as input for

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-07 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2733761 , @tra wrote: > In D101630#2730273 , @yaxunl wrote: > >> How about an option -fhip-bundle-device-output. If it is on, device output >> is bundled no matter how many GPU

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:977 NegFlag>; +defm hip_bundle_device_output : BoolFOption<"hip-bundle-device-output", EmptyKPM, DefaultTrue, + PosFlag, jansvoboda11 wrote: > The TableGen marshalling infrastructur

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-04 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added inline comments. Comment at: clang/include/clang/Driver/Options.td:977 NegFlag>; +defm hip_bundle_device_output : BoolFOption<"hip-bundle-device-output", EmptyKPM, DefaultTrue, + PosFlag, The TableGen marshalling infrastructure (`BoolFOpti

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: jdoerfert. tra added a comment. In D101630#2730273 , @yaxunl wrote: > How about an option -fhip-bundle-device-output. If it is on, device output is > bundled no matter how many GPU arch there are. By default it is on. +1 to the o

[PATCH] D101630: [HIP] Fix device-only compilation

2021-05-01 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 342182. yaxunl edited the summary of this revision. yaxunl added a comment. Herald added a subscriber: dang. added option -fhip-bundle-device-output CHANGES SINCE LAST ACTION https://reviews.llvm.org/D101630/new/ https://reviews.llvm.org/D101630 Files:

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2729975 , @tra wrote: > What will happen with this patch in the following scenarios: > > - `--offload_arch=A -S -o out.s` > - `--offload_arch=A --offload-arch=B -S -o out.s` > > I would expect the first case to produce a

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. What will happen with this patch in the following scenarios: - `--offload_arch=A -S -o out.s` - `--offload_arch=A --offload-arch=B -S -o out.s` I would expect the first case to produce a plain text assembly file. With this patch the second case will produce a bundle. With s

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment. In D101630#2729573 , @tra wrote: > CUDA compilation currently errors out if `-o` is used when more than one > output would be produced. > E.g. > > % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 > --cuda-path=$H

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. CUDA compilation currently errors out if `-o` is used when more than one output would be produced. E.g. % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2 zz.cu -c -E #... preprocessed output from host and 2 GPU compilati

[PATCH] D101630: [HIP] Fix device-only compilation

2021-04-30 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision. yaxunl added a reviewer: tra. yaxunl requested review of this revision. When clang compiles a HIP program with -E, there are multiple output files for host and different GPU archs. Clang uses clang-offload-bundler to bundle them as one output file. Currently clang do