tra added inline comments.
================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:433
} else {
- // If no -O was passed, pass -O0 to ptxas -- no opt flag should correspond
- // to no optimizations, but ptxas's default is -O3.
- CmdArgs.push_back("-O0");
+ // If no -O was passed, pass -O3 to ptxas -- this makes ptxas's
+ // optimization level the same as the ptxjitcompiler.
----------------
hdelan wrote:
> tra wrote:
> > I think this would be contrary to the expectation that lack of `-O` in
> > clang means - `do not optimize` and it generally implies the whole
> > compilation chain, including assembler. Matching whatever nvidia tools do
> > is an insufficient reason for breaking this assumption, IMO.
> >
> > If you do want do run optimized ptxas on unoptimized PTX, you can use
> > `-Xcuda-ptxas -O3`.
> I think for the average user, consistency across the `ptxjitcompiler` and
> `ptxas` is far more important than assuming that no `-O` means no
> optimization. I think most users will assume that no `-O` will assume that
> whatever tools being used will take their default optimization level, which
> in the case of clang is `-O0` and in the case of `ptxas` is `-O3`.
>
> We have had a few bugs with `ptxas`/`ptxjitcompiler` at higher optimization
> levels, which were quite hard to pin down since offline `ptxas` and
> `ptxjitcompiler` were using different optimisation levels, making bugs appear
> in one and not the other. Of course we are aware of this now but this
> inconsistency can result in bugs that are difficult to diagnose. Having
> consistency between the `ptxjitcompiler` and `ptxas` is therefore of
> practical benefit. Whereas if we are to leave it as is, with `ptxas`
> defaulting to `-O0`, the benefit is purely semantic and not practical.
> I think for the average user, consistency across the ptxjitcompiler and ptxas
> is far more important than assuming that no -O means no optimization.
The default is intended to provide the least amount of surprises for the most
users. There are more users of clang as a CUDA compiler than users of clang as
a cuda compiler who care about consistency with ptxjitcompiler. My point is
that the improvements for a subset of users should be balanced vs usability in
the common case. In this case the benefit does not justify the downsides, IMO.
Please add me as a reviewer when the patch is ready for public review and we'll
discuss it in a wider LLVM community.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116583/new/
https://reviews.llvm.org/D116583
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits