tra added a reviewer: echristo.
tra added a subscriber: echristo.
tra added a comment.

@echristo Eric, any thoughts/concerns on the overall direction for the driver?

@yaxunl One concern I have is diagnostics. When the jobs are executed in 
parallel, I assume all of them will go to the standard error and will be 
interleaved randomly across all parallel compilations. Figuring out what went 
wrong will be hard. Ideally we may want to collect output from individual 
sub-commands and print them once the job has finished, so there's no confusion 
about the source of the error.

> It is observed that device code compilation takes most of the compilation 
> time when
>  clang compiles CUDA/HIP programs since device code usually contains 
> complicated
>  computation code. Often times such code are highly coupled, which results in
>  a few large source files which become bottlenecks of a whole project. Things 
> become
>  worse when such code is compiled with multiple gpu archs, since clang 
> compiles for
>  each gpu arch sequentially. In practice, it is common to compile for more 
> than 5 gpu
>  archs.

I think this change will only help with relatively small local builds with few 
relatively large CUDA/HIP files. We did talk internally about parallelizing 
CUDA builds in the past and came to the conclusion that it's not very useful in 
practice, at least for us. We have a lot of non-CUDA files to compile, too, and 
that usually provides enough work for the build to hide the long CUDA 
compilations. Distributed builds (and I guess local, too) often assume one 
compilation per CPU, so launching multiple parallel subcompilations for each 
top-level job may be not that helpful in practice beyond manual compilation of 
one file. That said, the change will be a nice improvement for quick rebuilds 
where only one/few CUDA files need to be recompiled. However, in that case 
being able to get  comprehensible error messages would also be very important.

Overall I'm on the fence about this change. It may be more trouble than it's 
worth.



================
Comment at: clang/include/clang/Driver/Job.h:77
+  /// Dependent actions
+  llvm::SmallVector<const Action *, 4> DependentActions;
+
----------------
Nit: Using pointer as a key will result in sub-compilations being done in 
different order from run to run and that may result in build results changing 
from run to run.

I can't think of a realistic scenario yet. One case where it may make a 
difference is generation of dependency file.
We currently leak some output file name flags to device-side compilations. E.g. 
`-fsyntax-only -MD -MF foo.d`  will write foo.d for each compilation.  At best 
we'll end up with the result of whichever sub-compilation finished last. At 
worst we'll end up with corrupt output. In this case it's the output argument 
leak that's the problem, but I suspect there may be other cases where execution 
order will be observable.


================
Comment at: clang/lib/Driver/Compilation.cpp:284-286
+      }
+      }
+    }
----------------
Indentation seems to be off. Run through clang-format?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to