Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

Jason Merrill via Gcc-patches Wed, 15 Feb 2023 10:39:44 -0800

On 9/26/22 15:27, Lewis Hyatt wrote:

On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:

On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:

Hello-


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

The attached patch resolves PR preprocessor/103902 as described in the patch
message inline below. bootstrap + regtest all languages was successful on
x86-64 Linux, with no new failures:

FAIL 103 103
PASS 542338 542371
UNSUPPORTED 15247 15250
UNTESTED 136 136
XFAIL 4166 4166
XPASS 17 17

Please let me know if it looks OK?

A few questions I have:

- A difference introduced with this patch is that after lexing something
like `operator ""_abc', then `_abc' is added to the identifier hash map,
whereas previously it was not. I feel like this must be OK because with the
optional space as in `operator "" _abc', it would be added with or without the
patch.

- The behavior of `#pragma GCC poison' is not consistent (including prior to
   my patch). I tried to make it more so but there is still one thing I want to
   ask about. Leaving aside extended characters for now, the inconsistency is
   that currently the poison is only checked, when the suffix appears as a
   standalone token.

   #pragma GCC poison _X
   bool operator ""_X (unsigned long long);   //accepted before the patch,
                                              //rejected after it
   bool operator "" _X (unsigned long long);  //rejected either before or after
   const char * operator ""_X (const char *, unsigned long); //accepted before,
                                                             //rejected after
   const char * operator "" _X (const char *, unsigned long); //rejected either

   const char * s = ""_X; //accepted before the patch, rejected after it
   const bool b = 1_X; //accepted before or after ****

I feel like after the patch, the behavior is the expected behavior for all
cases but the last one. Here, we allow the poisoned identifier because it's
not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
like this or does it need to be addressed?


Sorry, that version actually did not handle the case of -Wc++11-compat in
c++98 mode correctly. This updated version fixes that and adds the missing
test coverage for that, if you could please review this one instead?

By the way, the pipermail archive seems to permanently mangle UTF-8 in inline
attachments. I attached the patch also gzipped to address that for the
archive, since the new testcases do use non-ASCII characters.

Thanks for taking a look!


Hello-

May I please ping this patch again? Joseph suggested that it would be best if
a C++ maintainer has a look at it. This is one of just a few places left where
we don't handle UTF-8 properly in libcpp, it would be really nice to get them
fixed up if there is time to review this patch. Thanks!

https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html

I re-attached it here as it required some trivial rebasing on top of recently
pushed changes. As before, I also attached the gzipped version so that the
UTF-8 testcases show up OK in the online archive, in case that's still an
issue. Thanks for taking a look!


Thank you for the patch, sorry it slipped off my radar.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token. It is
somewhat duplicative of the code in lex_identifier(), which handles the normal
case, but I think there's no good way to avoid that without pessimizing the
usual case, since lex_identifier() takes advantage of the fact that the first
character of the identifier has already been analyzed.


So could you analyze the first character and then call lex_identifier?

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix, and we
check for poisoned identifiers there as well.

Hmm, I don't think we want the check for poisoned identifiers; a suffixis not a name. That goes for the other diagnostics inidentifier_diagnostics_on_lex, as well. At the meeting last week thecommittee decided to deprecate the declaration with a space to clarifythis distinction.

+             if (!accum.accum)
+               create_literal2 (pfile, token, base,
+                                suffix_begin - base,
+                                NODE_NAME (sr.node),
+                                NODE_LEN (sr.node),
+                                type);
+             else
+               {
+                 accum.create_literal2 (pfile, token, base,
+                                        suffix_begin - base,
+                                        NODE_NAME (sr.node),
+                                        NODE_LEN (sr.node),
+                                        type);
+                 _cpp_release_buff (pfile, accum.first);
+               }


How about always calling accum.create_literal2?

Jason

Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

Reply via email to