[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
https://github.com/rafl created https://github.com/llvm/llvm-project/pull/131932 So that things like --use-cc="ccache gcc" work. Fixes #26594. Also use the slightly simpler shellwords instead of quotewords. Cc-ing the Clang static analyzer maintainers @haoNoQ, @Xazax-hun, and @steakhal. >From bebd0d1245abc93397b388e098fd5f2ccdc196db Mon Sep 17 00:00:00 2001 From: Florian Ragwitz Date: Tue, 18 Mar 2025 15:35:42 -0700 Subject: [PATCH] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands So that things like --use-cc="ccache gcc" work. Fixes #26594. Also use the slightly simpler shellwords instead of quotewords. --- clang/tools/scan-build/libexec/ccc-analyzer | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/clang/tools/scan-build/libexec/ccc-analyzer b/clang/tools/scan-build/libexec/ccc-analyzer index 74f812aef8fdf..655ded4b102be 100755 --- a/clang/tools/scan-build/libexec/ccc-analyzer +++ b/clang/tools/scan-build/libexec/ccc-analyzer @@ -63,6 +63,7 @@ sub SearchInPath { } my $Compiler; +my @CompilerArgs; my $Clang; my $DefaultCCompiler; my $DefaultCXXCompiler; @@ -89,7 +90,7 @@ if (`uname -s` =~ m/Darwin/) { } if ($FindBin::Script =~ /c\+\+-analyzer/) { - $Compiler = $ENV{'CCC_CXX'}; + ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CXX'}); if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; } $Clang = $ENV{'CLANG_CXX'}; @@ -98,7 +99,7 @@ if ($FindBin::Script =~ /c\+\+-analyzer/) { $IsCXX = 1 } else { - $Compiler = $ENV{'CCC_CC'}; + ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CC'}); if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; } $Clang = $ENV{'CLANG'}; @@ -199,7 +200,7 @@ sub GetCCArgs { die "could not find clang line\n" if (!defined $line); # Strip leading and trailing whitespace characters. $line =~ s/^\s+|\s+$//g; - my @items = quotewords('\s+', 0, $line); + my @items = shellwords($line); my $cmd = shift @items; die "cannot find 'clang' in 'clang' command\n" if (!($cmd =~ /clang/ || basename($cmd) =~ /llvm/)); # If this is the llvm-driver the internal command will look like "llvm clang ...". @@ -462,9 +463,9 @@ my $Output; my %Uniqued; # Forward arguments to gcc. -my $Status = system($Compiler,@ARGV); +my $Status = system($Compiler,@CompilerArgs,@ARGV); if (defined $ENV{'CCC_ANALYZER_LOG'}) { - print STDERR "$Compiler @ARGV\n"; + print STDERR "$Compiler @CompilerArgs @ARGV\n"; } if ($Status) { exit($Status >> 8); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
rafl wrote: Thanks for your feedback, Balázs! > When I looked at this patch and I had the impression that `CCC_CXX` is > basically what one should use as `CXX` or `CMAKE_CXX_COMPILER`, which is just > a path to a binary - without any arguments. The author of the original code this change is modifying seemed to make the same assumption, but I believe this is not true. All of the following treats CC/CXX as a command with potential arguments: ``` CC="ccache gcc" ./configure make CC="ccache gcc" CXX="ccache g++" cmake ... ``` `CMAKE__COMPILER` also [supports potential arguments](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html): ``` cmake ... -DCMAKE_C_COMPILER='qcc;--arg1;--arg2' ``` The fact that all of these tools will consult `PATH` to resolve the compiler executable when it's not a fully-qualified path further supports that it's a command as opposed to a path to an executable. > This patch would turn it into a combination of what other tools interpret as > `CXX` and `CXX_FLAGS`, did I get this right? This would mean we would deviate > from the conventions of other tools, making it less ergonomic. Your understanding of what the proposed change does is correct. However, most if not all common build tools I've worked with treat these kinds of options as commands, and that's what users seem to expect as well, as per #26594. This behaviour is very ergonomic, especially when you're truly swapping out the "compiler", such as when replacing "gcc" with "ccache gcc" or similar, without having to write wrapper executables, which is what's currently required by `scan-build` in these situations. In many situations it's possible to achieve the same goal using a combination of `CC` and `CFLAGS`, but this is often less desirable as it can quickly run into issues with accidentally overwriting `CFLAGS` from other sources or having to merge multiple `CFLAGS` correctly. > If we don't currently have a way to pass extra `CXX` flags, we should > consider adding something like that instead of hacking the `CCC_CXX` option. > WDYT? To me, following the established convention of `CC`/`CXX` being commands as opposed to paths to executables seems like the most straightforward approach, but I'm certainly open to alternatives if you see any practical downsides to this. https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
https://github.com/rafl updated https://github.com/llvm/llvm-project/pull/131932 >From 3bdcfbbb56e80c1724a6e46597d0141c118c68cc Mon Sep 17 00:00:00 2001 From: Florian Ragwitz Date: Tue, 18 Mar 2025 15:35:42 -0700 Subject: [PATCH 1/5] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands So that things like --use-cc="ccache gcc" work. Fixes #26594. Also use the slightly simpler shellwords instead of quotewords. --- clang/tools/scan-build/libexec/ccc-analyzer | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/clang/tools/scan-build/libexec/ccc-analyzer b/clang/tools/scan-build/libexec/ccc-analyzer index 74f812aef8fdf..655ded4b102be 100755 --- a/clang/tools/scan-build/libexec/ccc-analyzer +++ b/clang/tools/scan-build/libexec/ccc-analyzer @@ -63,6 +63,7 @@ sub SearchInPath { } my $Compiler; +my @CompilerArgs; my $Clang; my $DefaultCCompiler; my $DefaultCXXCompiler; @@ -89,7 +90,7 @@ if (`uname -s` =~ m/Darwin/) { } if ($FindBin::Script =~ /c\+\+-analyzer/) { - $Compiler = $ENV{'CCC_CXX'}; + ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CXX'}); if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; } $Clang = $ENV{'CLANG_CXX'}; @@ -98,7 +99,7 @@ if ($FindBin::Script =~ /c\+\+-analyzer/) { $IsCXX = 1 } else { - $Compiler = $ENV{'CCC_CC'}; + ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CC'}); if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; } $Clang = $ENV{'CLANG'}; @@ -199,7 +200,7 @@ sub GetCCArgs { die "could not find clang line\n" if (!defined $line); # Strip leading and trailing whitespace characters. $line =~ s/^\s+|\s+$//g; - my @items = quotewords('\s+', 0, $line); + my @items = shellwords($line); my $cmd = shift @items; die "cannot find 'clang' in 'clang' command\n" if (!($cmd =~ /clang/ || basename($cmd) =~ /llvm/)); # If this is the llvm-driver the internal command will look like "llvm clang ...". @@ -462,9 +463,9 @@ my $Output; my %Uniqued; # Forward arguments to gcc. -my $Status = system($Compiler,@ARGV); +my $Status = system($Compiler,@CompilerArgs,@ARGV); if (defined $ENV{'CCC_ANALYZER_LOG'}) { - print STDERR "$Compiler @ARGV\n"; + print STDERR "$Compiler @CompilerArgs @ARGV\n"; } if ($Status) { exit($Status >> 8); } >From 6806e42d3c76455be44d6d6bbf8e5041dd331bed Mon Sep 17 00:00:00 2001 From: Florian Ragwitz Date: Fri, 21 Mar 2025 06:04:45 -0700 Subject: [PATCH 2/5] [clang][scan-build] Minor simplification/refactor of environment detection I want to adjust the logic for CCC_CC and CCC_CXX defaults, so not repeating that logic twice is convenient. As a nice side-effect, we end up with fewer globals and less code. Contains no behavioural changes, except for checking for /usr/bin/xcrun even if we're not on OSX when there's an -isysroot parameter, which should be fine. We also don't call uname -s twice. --- clang/tools/scan-build/libexec/ccc-analyzer | 80 + 1 file changed, 35 insertions(+), 45 deletions(-) diff --git a/clang/tools/scan-build/libexec/ccc-analyzer b/clang/tools/scan-build/libexec/ccc-analyzer index 655ded4b102be..9166ae61f12c1 100755 --- a/clang/tools/scan-build/libexec/ccc-analyzer +++ b/clang/tools/scan-build/libexec/ccc-analyzer @@ -62,53 +62,42 @@ sub SearchInPath { return 0; } -my $Compiler; -my @CompilerArgs; -my $Clang; -my $DefaultCCompiler; -my $DefaultCXXCompiler; -my $IsCXX; -my $AnalyzerTarget; - -# If on OSX, use xcrun to determine the SDK root. -my $UseXCRUN = 0; - -if (`uname -s` =~ m/Darwin/) { - $DefaultCCompiler = 'clang'; - $DefaultCXXCompiler = 'clang++'; - # Older versions of OSX do not have xcrun to - # query the SDK location. - if (-x "/usr/bin/xcrun") { -$UseXCRUN = 1; - } -} elsif (`uname -s` =~ m/(FreeBSD|OpenBSD)/) { - $DefaultCCompiler = 'cc'; - $DefaultCXXCompiler = 'c++'; -} else { - $DefaultCCompiler = 'gcc'; - $DefaultCXXCompiler = 'g++'; -} - -if ($FindBin::Script =~ /c\+\+-analyzer/) { - ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CXX'}); - if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; } - - $Clang = $ENV{'CLANG_CXX'}; - if (!defined $Clang || ! -x $Clang) { $Clang = 'clang++'; } - - $IsCXX = 1 +{ + my ($DefaultCCompiler, $DefaultCXXCompiler); + + my $os = `uname -s`; + if ($os =~ m/Darwin/) { +$DefaultCCompiler = 'clang'; +$DefaultCXXCompiler = 'clang++'; + } elsif ($os =~ m/(FreeBSD|OpenBSD)/) { +$DefaultCCompiler = 'cc'; +$DefaultCXXCompiler = 'c++'; + } else { +$DefaultCCompiler = 'gcc'; +$DefaultCXXCompiler = 'g++'; + } + + sub DetermineCompiler { +my ($is_cxx) = @_; +my $default = $is_cxx ? $DefaultCXXCompiler : $DefaultCCompiler; +my $opt = $ENV{$is_cxx ? 'CCC_CXX' : 'CCC_CC'}; +return $default unless defined $opt; +my ($comp, @args) =
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
rafl wrote: > I'd prefer option 2, because why else would we have a default compiler if > that wasn't used in some workflows. A warning could never hurt. `--use-cc`/`--use-c++`/`CCC_CC`/`CCC_CXX` are optional, so the default would still be used when those options are not specified, which might be the majority of use-cases. I've added a few additional commits: * 6806e42d3c76 includes some minor simplifications and deduplicates the logic we're talking about without changing behaviour. * e101c97b60f2 implements option 2, as that was your preference. * b086d3ecad7e implements my preferred option 3. I'd happy to drop this commit if we wanna stick with option 2 instead. * 805396fa1ec1 brings the handling of `CLANG`/`CLANG_CXX` in line with `CCC_CC`/`CCC_CXX` if we end up going for option 3. We'd also drop this if we go for option 2. https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
rafl wrote: Thanks for having another look @balazs-benics-sonarsource. While we wait for a potential second opinion, I was curious about what you think of the current behaviour of silently defaulting to `$Default{CC,CXX}Compiler` if `CCC_{CC,CXX}` isn't executable and can't be resolved to an executable within `PATH`. I originally found this surprising when `CC="ccache gcc"` would be effectively ignored. Would it be worth changing this while we're at it? Would you have any preference between these options? 1. keeping it as it is 2. warning when ignoring an option and falling back to a default 3. providing an obvious error My personal preference is option 3. The only downside to this which I see is potentially breaking some existing usage, but it'd do so in an obvious and easy to fix way way. I'm not sure if that's acceptable in the context of this project. Option 2 would be my second choice, with Option 1 being the least preferable. https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
@@ -51,63 +51,40 @@ sub silent_system { # Compiler command setup. ##===--===## -# Search in the PATH if the compiler exists -sub SearchInPath { -my $file = shift; -foreach my $dir (split (':', $ENV{PATH})) { -if (-x "$dir/$file") { -return 1; -} -} -return 0; -} - -my $Compiler; -my $Clang; -my $DefaultCCompiler; -my $DefaultCXXCompiler; -my $IsCXX; -my $AnalyzerTarget; - -# If on OSX, use xcrun to determine the SDK root. -my $UseXCRUN = 0; - -if (`uname -s` =~ m/Darwin/) { - $DefaultCCompiler = 'clang'; - $DefaultCXXCompiler = 'clang++'; - # Older versions of OSX do not have xcrun to - # query the SDK location. - if (-x "/usr/bin/xcrun") { -$UseXCRUN = 1; - } -} elsif (`uname -s` =~ m/(FreeBSD|OpenBSD)/) { - $DefaultCCompiler = 'cc'; - $DefaultCXXCompiler = 'c++'; -} else { - $DefaultCCompiler = 'gcc'; - $DefaultCXXCompiler = 'g++'; -} - -if ($FindBin::Script =~ /c\+\+-analyzer/) { - $Compiler = $ENV{'CCC_CXX'}; - if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; } - - $Clang = $ENV{'CLANG_CXX'}; - if (!defined $Clang || ! -x $Clang) { $Clang = 'clang++'; } - - $IsCXX = 1 +{ + my ($DefaultCCompiler, $DefaultCXXCompiler); + + my $os = `uname -s`; + if ($os =~ m/Darwin/) { +$DefaultCCompiler = 'clang'; +$DefaultCXXCompiler = 'clang++'; + } elsif ($os =~ m/(FreeBSD|OpenBSD)/) { +$DefaultCCompiler = 'cc'; +$DefaultCXXCompiler = 'c++'; + } else { +$DefaultCCompiler = 'gcc'; +$DefaultCXXCompiler = 'g++'; + } + + sub DetermineCompiler { +my ($is_cxx) = @_; +my $default = $is_cxx ? $DefaultCXXCompiler : $DefaultCCompiler; +my $opt = $ENV{$is_cxx ? 'CCC_CXX' : 'CCC_CC'}; +return defined $opt ? shellwords($opt) : $default; + } } -else { - $Compiler = $ENV{'CCC_CC'}; - if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; } - - $Clang = $ENV{'CLANG'}; - if (!defined $Clang || ! -x $Clang) { $Clang = 'clang'; } - $IsCXX = 0 +sub DetermineClang { + my ($is_cxx) = @_; + my $default = $is_cxx ? 'clang++' : 'clang'; + my $opt = $ENV{$is_cxx ? 'CLANG_CXX' : 'CLANG'}; + return defined $opt ? $opt : $default; } -$AnalyzerTarget = $ENV{'CLANG_ANALYZER_TARGET'}; +my $IsCXX = $FindBin::Script =~ /c\+\+-analyzer/; +my ($Compiler, @CompilerArgs) = DetermineCompiler($IsCXX); rafl wrote: @ziqingluo-90 that's exactly right. `DetermineCompiler` returns a list of values (potentially a singleton list), and the call-site unpacks that return value into the first list element `$Compiler` and the rest of the list `@CompilerArgs` (which might be empty). The actual parsing is done using shellwords from [Text::ParseWords](https://metacpan.org/pod/Text::ParseWords). If you'd like to experiment how it parses certain inputs, you can do something like this, which prints out the parsed list, one item per line: ``` $ CCC_CC='foo "bar baz" \"quux' perl -MText::ParseWords -le'print for shellwords $ENV{CCC_CC}' foo bar baz "quux ``` https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
@@ -51,63 +51,40 @@ sub silent_system { # Compiler command setup. ##===--===## -# Search in the PATH if the compiler exists -sub SearchInPath { -my $file = shift; -foreach my $dir (split (':', $ENV{PATH})) { -if (-x "$dir/$file") { -return 1; -} -} -return 0; -} - -my $Compiler; -my $Clang; -my $DefaultCCompiler; -my $DefaultCXXCompiler; -my $IsCXX; -my $AnalyzerTarget; - -# If on OSX, use xcrun to determine the SDK root. -my $UseXCRUN = 0; - -if (`uname -s` =~ m/Darwin/) { - $DefaultCCompiler = 'clang'; - $DefaultCXXCompiler = 'clang++'; - # Older versions of OSX do not have xcrun to - # query the SDK location. - if (-x "/usr/bin/xcrun") { -$UseXCRUN = 1; - } -} elsif (`uname -s` =~ m/(FreeBSD|OpenBSD)/) { - $DefaultCCompiler = 'cc'; - $DefaultCXXCompiler = 'c++'; -} else { - $DefaultCCompiler = 'gcc'; - $DefaultCXXCompiler = 'g++'; -} - -if ($FindBin::Script =~ /c\+\+-analyzer/) { - $Compiler = $ENV{'CCC_CXX'}; - if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; } - - $Clang = $ENV{'CLANG_CXX'}; - if (!defined $Clang || ! -x $Clang) { $Clang = 'clang++'; } - - $IsCXX = 1 +{ + my ($DefaultCCompiler, $DefaultCXXCompiler); + + my $os = `uname -s`; + if ($os =~ m/Darwin/) { +$DefaultCCompiler = 'clang'; +$DefaultCXXCompiler = 'clang++'; + } elsif ($os =~ m/(FreeBSD|OpenBSD)/) { +$DefaultCCompiler = 'cc'; +$DefaultCXXCompiler = 'c++'; + } else { +$DefaultCCompiler = 'gcc'; +$DefaultCXXCompiler = 'g++'; + } + + sub DetermineCompiler { +my ($is_cxx) = @_; +my $default = $is_cxx ? $DefaultCXXCompiler : $DefaultCCompiler; +my $opt = $ENV{$is_cxx ? 'CCC_CXX' : 'CCC_CC'}; +return defined $opt ? shellwords($opt) : $default; + } } -else { - $Compiler = $ENV{'CCC_CC'}; - if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; } - - $Clang = $ENV{'CLANG'}; - if (!defined $Clang || ! -x $Clang) { $Clang = 'clang'; } - $IsCXX = 0 +sub DetermineClang { + my ($is_cxx) = @_; rafl wrote: I'm with you on the mix of different styles in this code being far from ideal. I did briefly consider cleaning those things up a bit, but decided to not extend the scope of this changeset further and to perhaps save those kinds of changes for a separate pull-request if there was any interest in that. Here's how I arrived at the names currently used in this pull-request: * Functions Almost all existing functions use UpperCamelCase, so I stuck with that even though it's not at all typical for the majority of modern(-ish) Perl code * Variables These were also quite inconsistent, but I stuck with common modern(-ish) practices, where most regular lexical variables use `$snake_case`, constants use `$ALL_CAPS`, and "globals" use `$UpperCamelCase` as per https://metacpan.org/dist/perl/view/pod/perlstyle.pod, the old Perl Best Practices book, and established community conventions. I'd find it much more jarring if a function was to potentially shadow a variable which reads like a `$Global` from a surrounding scope. I don't feel particularly strongly about it, and would appreciate your input on the following suggestions for how to best move forwards: 1. Just call the parameter `$IsCXX` in the functions as suggested. This defies the expectations of many Perl programmers when it comes to naming, but so does most of the rest of the program. 2. Keep things as they are. This is consistent with modern practices. 3. Call the local `$is_cxx` variable something else (but not `$IsCXX`), or rename the "global" `$IsCXX`. To not potentially confuse people unfamiliar with Perl or its current common practices. 4. Just use the "global" `$IsCXX` directly. Reducing some of the global state to make it slightly easier to reason about those individual functions seemed nice, but given how much of it there is anyway, maybe it doesn't even matter in this context. Thanks for your review and your feedback on the above! https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][scan-build] Treat --use-cc and --use-c++ as shell commands (PR #131932)
https://github.com/rafl edited https://github.com/llvm/llvm-project/pull/131932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits