[clang] clang: Relax LangOpts checks when lexing quoted numbers during preprocessing (PR #95798)

Jan Svoboda via cfe-commits Thu, 21 Nov 2024 12:59:49 -0800

================
@@ -2068,7 +2068,8 @@ bool Lexer::LexNumericConstant(Token &Result, const char 
*CurPtr) {
   }
 
   // If we have a digit separator, continue.
-  if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C23)) {
+  if (C == '\'' &&
+      (LangOpts.CPlusPlus14 || LangOpts.C23 || ParsingPreprocessorDirective)) {
----------------
jansvoboda11 wrote:


> I feel like another issue is that just accepting separators uncoditionally in 
> preprocessor directives (i.e. even outside of dependency scanning) would 
> technically be a language extension... doesn’t that mean that we’d 
> technically have to add a compatibility warning? Because we’re usually pretty 
> strict about doing that for things that are effectively a compiler extension.

Sure, but we don't have to expose that language extension to users, it can 
remain internal to the compiler.

> Also, I’m not really that familiar w/ the dependency scanner, but I’m still 
> confused as to why passing down the `LangOpts` isn’t an option: in #93753, it 
> was pointed out that that would impair caching the results of the scan—but 
> isn’t it already the case that e.g. what files something depends on is 
> dependent on the language standard? E.g. what if I write:
> 
> ```c++
> #if __cplusplus >= WhateverTheValueForC++17IsAgain
> #    include "foo.hh"
> #else
> #    include "bar.hh"
> #endif
> ```
> 
> Does it just note that the file might depend on both `foo.hh` and `bar.hh`? 
> Because if so, can’t we just set the language standard in the lang opts for 
> the dependency scanner to the latest e.g. C++ standard? Because that would 
> also fix this issue.

The scanner works in two steps:

1. The virtual file system "minimizes" source files into sequence of tokens 
that are relevant for dependency discovery. Your code fragment would be 
transformed into something like `[if, ident, gte, ident, then, include, string, 
else, include, string, endif]`. 
2. The scanner instance takes those tokens and the TU `LanguageOptions` and 
runs full preprocessor on those tokens. This means that the condition should 
evaluate correctly.

Having (1) be context-free is important for performance.

On high level, the goal of #93753 is to respect `LanguageOptions` in the 
scanner. My argument is that we should be able to do so by representing tokens 
in the stream produced by (1) such that we're still able to emit context-aware, 
`LanguageOptions`-driven diagnostics in (2).

That solves the issue outlined by #93753 without impacting the performance.

> I mean, we don’t need a language option for _every_ feature either, but 
> adding them on an as-needed basis doesn’t seem completely untenable.

+1

https://github.com/llvm/llvm-project/pull/95798
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang: Relax LangOpts checks when lexing quoted numbers during preprocessing (PR #95798)

Reply via email to