[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Tom Honermann via Phabricator via cfe-commits Wed, 12 Oct 2022 11:26:16 -0700

tahonermann added inline comments.


================
Comment at: clang/test/AST/Interp/arrays.cpp:143
+
+};
----------------
tbaeder wrote:
> cor3ntin wrote:
> > tahonermann wrote:
> > > cor3ntin wrote:
> > > > tbaeder wrote:
> > > > > cor3ntin wrote:
> > > > > > tahonermann wrote:
> > > > > > > As others already noted, additional testing of multicharacter 
> > > > > > > literals and UCNs (including named universal characters like 
> > > > > > > `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of 
> > > > > > > character escapes like `\t` wouldn't hurt either.
> > > > > > > 
> > > > > > > Clang does not yet support use of `-fexec-charset` to set the 
> > > > > > > literal encoding (execution character set) to anything other than 
> > > > > > > UTF-8 though work on that has been done (see D93031). If such 
> > > > > > > work was completed, it would be useful to run some tests against 
> > > > > > > a non-UTF-8 encoding. Maybe next year.
> > > > > > Yes, wide **multicharacter** literals, that's was important 
> > > > > > information missing, thanks for spotting that.
> > > > > > 
> > > > > > I have mixed feeling about adding tests for escape sequences.  
> > > > > > Their replacement doesn't happen during constant evaluation.
> > > > > > We shouldn't replicate the lexing tests here.
> > > > > > 
> > > > > > but we should compare string literal with byte values. Testing a 
> > > > > > string literal against another one doesn't ensure the code units 
> > > > > > are correct if both are equally miss evaluated.
> > > > > > 
> > > > > > Also we could add explicit tests for null termination here as they 
> > > > > > are added as part of evaluation in theory - but then again that's 
> > > > > > also something clang does earlier.
> > > > > > 
> > > > > > If we want we could consider enabling the byte code interpreter on 
> > > > > > the existing lexing test files, i actually think that's the better 
> > > > > > way to deal with the escape sequences tests.
> > > > > I changed the first test that inspects all characters of a string to 
> > > > > comparing with integers instead. Do you have a suggestion for what 
> > > > > lexing tests to enable the constant interpreter in?
> > > > I think good candidates are
> > > > 
> > > > Lexer/char-escapes.c
> > > > Lexer/char-escapes-delimited.c
> > > > Lexer/char-literal.cpp
> > > Of those, only `Lexer/char-escapes.c` does much validation of literal 
> > > values. I prefer the approach Timm has already taken relative to those 
> > > tests.
> > > 
> > > It looks like we don't have an existing set of Sema tests for character 
> > > and string literals. How about we move this test under `clang/test/Sema`? 
> > > That would be the appropriate place to exercise values relative to 
> > > `-fexec-charset` support for non-UTF-8 encodings in the future. If that 
> > > sounds amenable, then I'd like the test split to exercise character and 
> > > string literals separately.
> > > 
> > > The character literal tests don't really belong in a test named 
> > > `arrays.cpp` as is.
> > > Of those, only Lexer/char-escapes.c does much validation of literal 
> > > values. I prefer the approach Timm has already taken relative to those 
> > > tests.
> > 
> > We can do both, I was not arguing against the test we have here, I'm happy 
> > with those :)
> > I'm opposed to duplicate tests for escape sequences here.  using the new 
> > interpreter on tests that already exist (in addition of the existing run 
> > commands) is pretty easy and cheap to do.
> > 
> > I don't have opinions how the new interpreter tests are organized.
> > Ideally we would have a complete set of test that is equally suitable for 
> > both constexpr engines, but maybe that's something that does not need to be 
> > done as part of this PR 
> I've added a new run line to `test/Lexer/char-escapes.c`, which works fine. 
> To summarize, the plan is to add a new test to `clang/test/Sema`? Or split 
> the char tests out from `arrays.cpp` and add some for multicharacter char 
> sequences?
I'm a bit uncertain regarding the purpose of tests in the `Interp` directory as 
opposed to tests under the `Sema*` directories. Basically, I'm unsure why we 
wouldn't want just one set of tests that exercise the semantics of the 
language, at least with regard to constant evaluation. If the interpreter is 
also intended to (eventually) interpret full programs (e.g., evaluating 
`main()`), I think such tests would be appropriate for an `Interp` directory. 
But I say that having not closely followed the development and previous reviews 
of the new interpreter.

At a minimum, I would like to see the character and string literal testing 
pulled out of `arrays.cpp`. For the rest, I'll defer to others that have been 
more involved in these reviews.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135366/new/

https://reviews.llvm.org/D135366

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Reply via email to