jlebar added a comment. In http://reviews.llvm.org/D16082#324138, @tra wrote:
> Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. > They tend to throw wrenches into pipeline construction. Thanks. All of them worked except -fintegrated-as, which was causing us not to invoke ptxas. Fixed, and added a test. ================ Comment at: lib/Driver/Driver.cpp:1380 @@ +1379,3 @@ + C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN), + /* GpuArchName = */ nullptr, + /* AtTopLevel = */ false); ---------------- tra wrote: > So, you're treating GpuArchName==nullptr as a special case of DeviceAction > for fatbin? > Perhaps it warrants its own CudaDeviceLinkAction? It's a special case either way, but there are enough places that look for a CudaDeviceAction that I *think* this special case is cleaner than having a bunch of if (CudaDeviceAction || CudaDeviceLinkAction)s floating around. ================ Comment at: lib/Driver/Driver.cpp:1620-1625 @@ -1602,3 +1619,8 @@ case phases::Assemble: - return C.MakeAction<AssembleJobAction>(Input, types::TY_Object); + // Assembling generates an object file, except when we're assembling CUDA, + // in which case we get a cubin file. + return C.MakeAction<AssembleJobAction>( + std::move(Input), TC.getTriple().getOS() == llvm::Triple::CUDA + ? types::TY_CUDA_CUBIN + : types::TY_Object); } ---------------- tra wrote: > cubin *is* an ELF object file. Do we really need a new type here or can we > get by with TY_Object? Got rid of the extra type. Note that this means that our intermediate objs will be named foo.o instead of foo.cubin, which isn't optimal, but I agree that it's simpler without the extra type. ================ Comment at: lib/Driver/Driver.cpp:1880 @@ +1879,3 @@ + // retrieve the Input's GPU arch. + II.setAction(A); + return II; ---------------- echristo wrote: > Weren't you just adding it as part of the InputInfo constructor in 16078? Yes, but we're doing something slightly different here. Suppose you have a CudaDeviceAction --> BackendAction. What do you want the InputInfo's Action to be? Without this change, it's the BackendAction. But we really want it to be the CudaDeviceAction. Thus this line. I updated the comment in an attempt to clarify. ================ Comment at: lib/Driver/ToolChains.cpp:4230 @@ -4224,3 +4229,3 @@ // Skip this argument unless the architecture matches BoundArch - if (A->getValue(0) != StringRef(BoundArch)) + if (!BoundArch || A->getValue(0) != StringRef(BoundArch)) continue; ---------------- tra wrote: > You may as well move it out of the loop and return early if BoundArch is > nullptr. Tried this and discovered it's actually subtly different in a way that, thankfully, affects one of the tests I added. if (!BoundArch) continue applies only if A->getOption().matches(options::OPT_Xarch__). So we can't hoist it into an early return. ================ Comment at: lib/Driver/Tools.cpp:10677-10678 @@ +10676,4 @@ + std::string Arch = A->getGpuArchName(); + // We need to name pass an Arch of the form "sm_XX" for cubin files and + // "compute_XX" for ptx. CudaDeviceAction's getGpuArchName() is guaranteed + // to match "sm_XX". ---------------- tra wrote: > First line does not parse. > > In general compute_XX does not necessarily match sm_XX. > [[ http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options | > ptxas options ]] says that > > > Allowed values for this option: compute_20, compute_30, compute_35, > > compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and > > sm_52 > > Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY > map. > > Oh goodness, how awful. Thank you for catching that. Fixed. http://reviews.llvm.org/D16082 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits