tahonermann added inline comments.
================ Comment at: clang/test/SemaCXX/static-assert-cxx26.cpp:304 +static_assert('\u{9}' == (char)1, ""); // expected-error {{failed}} \ + // expected-note {{evaluates to ''\t' (0x09, 9) == '<U+0001>' (0x01, 1)'}} +static_assert((char8_t)-128 == (char8_t)-123, ""); // expected-error {{failed}} \ ---------------- cor3ntin wrote: > tahonermann wrote: > > cor3ntin wrote: > > > tahonermann wrote: > > > > Is the expected note up to date? I don't see code that would generate > > > > the `<U+0001>` output. Am I just missing it? Since U+0001 is a valid, > > > > though non-printable, character, I would expect more `'\u0001'`. > > > See elsewhere in the discussion. this formating is pre existing and > > > managed at the DiagnosticEngine level (pushEscapedString). the reason > > > it's not `\u0001` is 1/ to avoid reusing c++ syntactic elements for > > > something that comes from diagnostics and is not represented as an > > > escaped sequence in source 2/ `\u00011` is unreadable, and `\U000000001` > > > is also not helpful :) > > > > > Thanks for the explanation. I'm not sure that I agree with the rationale > > for (1) though. We're already putting the value in single quotes and > > representing some values with escapes in many of these cases when the value > > isn't produced by an escape sequence (or even a character/string literal); > > why exclude `\uXXXX`? I agree with the rationale for (2); we could use > > `'\u{1}'` in that case. > FYI afaik the notation in clang predates the existence of \u{} by a few > years, and follow Unicode notation > (https://unicode.org/mail-arch/unicode-ml/y2005-m11/0060.html). > Oldest instance seems to be > https://github.com/llvm/llvm-project/commit/77091b167fd959e1ee0c4dad4ec44de43b6c95db > - i followed suite when reworking the generic escaping mechanism all string > fed to diagnostics go through. > > I don't care about changing the syntax, but i do hope we are consistent. > Ultimately what we are trying to do is to designate a unicode codepoint and > whether we do it through C++ syntax or not probably does not matter much as > long as it's clear, delimited and consistent! I think the substitution of `<U+XXXX>` by the diagnostic engine itself is perfectly fine and good; particularly when it has no context to suggest a different presentation. In this particular case, where the character is being presented using C++ syntax as a character literal, I would prefer that C++ syntax be used consistently. From an implementation standpoint, I'm suggesting that `WriteCharValueForDiagnostic()` be modified such that, if `escapeCStyle<EscapeChar::Single>()` returns an empty string, that the character be presented in `'\u{XXXX}'` form if the character is one that would otherwise be substituted by the diagnostic engine (e.g., if `isPrintable()` is false). Note that this would be restricted to `char` values <= 0x7F; larger values could still be passed through as invalid code units that the diagnostic engine would then render as, e.g., `'<FC>'`. ================ Comment at: clang/test/SemaCXX/static-assert.cpp:287 + static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error {{failed}} \ + // expected-note {{evaluates to ''ゆ' (0x3086) == '̵' (0x335)'}} + static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \ ---------------- cor3ntin wrote: > hubert.reinterpretcast wrote: > > hubert.reinterpretcast wrote: > > > cor3ntin wrote: > > > > hubert.reinterpretcast wrote: > > > > > The C++23 escaped string formatting facility would not generate a > > > > > trailing combining character like this. I recommend following suit. > > > > > > > > > > Info on U+0335: > > > > > https://util.unicode.org/UnicodeJsps/character.jsp?a=0335 > > > > > > > > > This is way outside the scope of the patch. The diagnostic output > > > > facility has no understanding of combining characters or graphemes and > > > > do not attempt to match std::print. It probably would be an improvement > > > > but this patch is not trying to modify how all diagnostics are printed. > > > > (all of that logic is in Diagnostic.cpp) > > > This patch is pushing the envelope of what appears in diagnostics. One > > > can also argue that someone writing > > > ``` > > > static_assert(false, "\u0301"); > > > ``` > > > gets what they deserve, but that case does not have a big problem anyway > > > (because the provided message text appears after `: `). > > > > > > This patch increases the exposure of the diagnostic output facility to > > > input that it does not handle well. I disagree that it is outside the > > > scope of this patch to insist that it does not generate such inputs to > > > the diagnostic output facility (even if a possible solution is to modify > > > the diagnostic output facility first). > > @cor3ntin, do you have status quo examples for how grapheme-extending > > characters that are not already "problematic" in their original context are > > emitted in diagnostics in contexts where they are? > Are you looking for that sort of examples? https://godbolt.org/z/c79xWr7Me > That shows that clang has no understanding of graphemes gcc and MSVC get that case "right" (probably by accident). https://godbolt.org/z/Tjd6xnEon CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155610/new/ https://reviews.llvm.org/D155610 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits