hfinkel added a comment.

This is much closer to what I had in mind, thanks! Now I think we're in a 
position to make this work for more than just the CUDA target. It looks like 
the added code is now:
1. Remove -march from the translated arguments (because any existing -march 
would apply only for the host compilation).
2. Process the -Xopenmp-target flags and add those arguments to the list.
3. If we don't have an -march in the translated arguments, then add 
-march=sm_20 so that there's a suitable default (noting that this default must 
be higher than the regular CUDA default).

I propose the following:

(1) is good, but should be more general. It is not just the host's -march that 
should not apply to any arbitrary toolchain, but any of the -m<foo> options. 
You should remove all options that are in the m_Group options group (which, as 
noted in Options.td, are "Target-dependent compilation options"). I believe 
that you can iterate over them all using something like:

  for (const Arg *A : Args.filtered(options::OPT_m_Group)) {

and that might help. This should be in toolchain-independent code, and I'd 
prefer that we always remove these options whenever the host and target 
toolchain differ, but leave them when they're the same.

(2) is good, but, along with (1), should be in toolchain-independent code. I 
recommend that we add a new member function to ToolChain, called, to make a 
specific suggestion, TranslateOpenMPTargetArgs, and put the logic from (1) and 
(2) in  this function. Then, we can augment Compilation::getArgsForToolChain to 
do something like the following:

  const DerivedArgList &
  Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,
                                   Action::OffloadKind DeviceOffloadKind) {
    if (!TC)
      TC = &DefaultToolChain;
  
    DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
    if (!Entry) {
      // First, translate OpenMP toolchain arguments provided via the 
-Xopenmp-toolchain flags.
      Entry = TranslateOpenMPTargetArgs(*TranslatedArgs, BoundArch, 
DeviceOffloadKind);
      if (!Entry)
        Entry = TranslatedArgs;
  
      Entry = TC->TranslateArgs(*Entry, BoundArch, DeviceOffloadKind);
      if (!Entry)
        Entry = TranslatedArgs;
    }
  
    return *Entry;
  }

And then (3) we leave as it is (where it is).



================
Comment at: lib/Driver/ToolChains/Cuda.cpp:480
+    if (MArchList.empty())
+      // Default compute capability for CUDA toolchain is sm_20.
+      DAL->AddJoinedArg(nullptr,
----------------
This default is only for OpenMP, right? Please explain in the comment why this 
is the default for OpenMP.


https://reviews.llvm.org/D34784



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to