On 7/24/24 11:55 AM, Jakub Jelinek wrote:
On Tue, Jul 23, 2024 at 09:38:15PM -0400, Jason Merrill wrote:
Thanks.

but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned 
char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.

The discussion at that link suggests that the author is planning to propose
removing the cast.

Yeah, just wanted to mention it for completeness, that the earlier
libcpp patch implements what is planned and would need to change if
something different is accepted into C++26.  And also state that I'd
strongly prefer preprocessing compatibility between C and C++ here.
Sure, I know 'a' is char in C++ and int in C, but there simply isn't
any compact form to denote unsigned char literals right now.

Let's call this variable old_raw_data_ptr for clarity, here and in
reshape_init_class.

Done.

-      elt_init = reshape_init_r (elt_type, d,
-                                /*first_initializer_p=*/NULL_TREE,
-                                complain);
+      if (TREE_CODE (d->cur->value) == RAW_DATA_CST
+         && (TREE_CODE (elt_type) == INTEGER_TYPE
+             || (TREE_CODE (elt_type) == ENUMERAL_TYPE
+                 && TYPE_CONTEXT (TYPE_MAIN_VARIANT (elt_type)) == std_node
+                 && strcmp (TYPE_NAME_STRING (TYPE_MAIN_VARIANT (elt_type)),
+                            "byte") == 0))

Maybe is_byte_access_type?  Or finally factor out a function to test
specifically for std::byte, it's odd that we don't have one yet.

Used is_byte_access_type (though next to the INTEGER_TYPE check, because
even signed char can and should be handled, and the CHAR_BIT test should
remain too (it is testing the host char precision against the target one).

@@ -7158,6 +7244,7 @@ reshape_init_class (tree type, reshape_i
             is initialized by the designated-initializer-list { D }, where D
             is the designated- initializer-clause naming a member of the
             anonymous union member."  */
+         gcc_checking_assert (TREE_CODE (d->cur->value) != RAW_DATA_CST);

Is there a test of trying to use #embed as a designated initializer?  I
don't see one.

I was thinking about those, then tested with -std=c++26 and saw it all
rejected, so didn't include anything.
But it seems we accept mixing [0] = 1, 1 in C++17 and older, it is just
rejected in C++20 and later, so the test is useful.  Of course even
when using
[0] =
#embed __FILE__
only the initial designator is provided and then the rest if more than
one doesn't have them.
Anyway, found an ICE on
int a[] = { [0] =
#embed __FILE__
};
and fixed that too.

+      if (tree raw_init = cp_maybe_split_raw_data (d))
+       return raw_init;
         d->cur++;
         return init;

This split-or-++ pattern seems to repeat a lot in reshape_init_r, could we
factor it out to avoid problems with people forgetting one or the other?
Maybe consume_init (d) or d->consume_init ()?

Done that (though, taking also init argument so that it can be used more
cleanly); the functions always start with init = d->cur->value; but
then can change what init is in some cases (but never in the RAW_DATA_CST
case).  Only 3 spots could use that though, another one is a recovery one
which needs to skip over perhaps multiple entries, and another one would
crash if d->cur is incremented before the has_designator_problem call.

Lightly tested on x86_64-linux, of course would test it fully, but I guess
there will be other changes requested in the 4 other patches...

--- gcc/cp/constexpr.cc.jj      2024-07-17 23:36:01.164308696 +0200
+++ gcc/cp/constexpr.cc 2024-07-24 15:49:26.881791327 +0200
@@ -4113,6 +4136,11 @@ find_array_ctor_elt (tree ary, tree dind
        {
          if (i < end)
            return i;
+         tree value = (*elts)[end - 1].value;
+         if (TREE_CODE (value) == RAW_DATA_CST
+             && wi::to_widest (dindex) < (wi::to_widest (cindex)

Wouldn't to_offset be a more optimal choice for the index math in this patch?

--- gcc/cp/typeck2.cc.jj        2024-07-17 11:36:47.836900075 +0200
+++ gcc/cp/typeck2.cc   2024-07-24 15:49:26.882791314 +0200
@@ -1310,6 +1310,40 @@ digest_init_r (tree type, tree init, int
         a parenthesized list.  */
        if (nested && !(flags & LOOKUP_AGGREGATE_PAREN_INIT))
        flags |= LOOKUP_NO_NARROWING;
+      if (TREE_CODE (init) == RAW_DATA_CST && !TYPE_UNSIGNED (type))
+       {
+         tree ret = init;
+         if ((flags & LOOKUP_NO_NARROWING) || warn_conversion)
+           for (unsigned int i = 0;
+                i < (unsigned) RAW_DATA_LENGTH (init); ++i)
+             if (((const signed char *)
+                  RAW_DATA_POINTER (init))[i] < 0)

How about macros or inline functions to produce the value of a raw data element as signed or unsigned char, to use instead of writing the casts everywhere?

--- gcc/cp/decl.cc.jj   2024-07-18 09:20:31.682542944 +0200
+++ gcc/cp/decl.cc      2024-07-24 17:36:44.331860969 +0200
@@ -6839,6 +6843,7 @@ is_direct_enum_init (tree type, tree ini
        && TREE_CODE (init) == CONSTRUCTOR
        && CONSTRUCTOR_IS_DIRECT_INIT (init)
        && CONSTRUCTOR_NELTS (init) == 1
+      && TREE_CODE (CONSTRUCTOR_ELT (init, 0)->value) != RAW_DATA_CST

The pattern before this change appears in a lot of other places as well; perhaps we want a function to identify a CONSTRUCTOR with a single non-designated initializer, that would also exclude the other cases of a single constructor_elt actually initializing multiple elements that categorize_ctor_elements_1 handles, like RANGE_EXPR and STRING_CST.

The patch is OK with whichever of the above changes you'd like to make.

Jason

Reply via email to