12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens

jakub at gcc dot gnu.org via Gcc-bugs Mon, 31 Jan 2022 06:08:44 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, from what I can see on
#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
STR(X(Y,Y))
vs.
#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
#define Y()
STR(X(Y,Y))
is that on the third call to funlike_invocation_p for the Y macro, the code
reads 2 padding tokens before finding a non-padding non-CPP_OPEN_PAREN one:
1365    static _cpp_buff *
1366    funlike_invocation_p (cpp_reader *pfile, cpp_hashnode *node,
1367                          _cpp_buff **pragma_buff, unsigned *num_args)
1368    {
1369      const cpp_token *token, *padding = NULL;
1370    
1371      for (;;)
1372        {
1373          token = cpp_get_token (pfile);
1374          if (token->type != CPP_PADDING)
1375            break;
1376          if (padding == NULL
1377              || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
1378            padding = token;
1379        }
The first padding token is pfile->avoid_paste and the second padding token is
one created by padding_token:
(gdb) p *padding
$128 = {src_loc = 0, type = CPP_PADDING, flags = 0, val = {node = {node = 0x0,
spelling = 0x0}, source = 0x0, str = {len = 0, text = 0x0}, macro_arg = {arg_no
= 0, spelling = 0x0}, 
    token_no = 0, pragma = 0}}
(gdb) p *token
$129 = {src_loc = 258816, type = CPP_PADDING, flags = 0, val = {node = {node =
0x7fffea249db0, spelling = 0x0}, source = 0x7fffea249db0, str = {len =
3928268208, text = 0x0}, 
    macro_arg = {arg_no = 3928268208, spelling = 0x0}, token_no = 3928268208,
pragma = 3928268208}}
(gdb) p *token->val.source
$130 = {src_loc = 242688, type = CPP_MACRO_ARG, flags = 1, val = {node = {node
= 0x7fff00000002, spelling = 0x7fffea38e2e8}, source = 0x7fff00000002, str =
{len = 2, 
      text = 0x7fffea38e2e8 "\260\260\070\352\377\177"}, macro_arg = {arg_no =
2, spelling = 0x7fffea38e2e8}, token_no = 2, pragma = 2}}
So, both CPP_PADDING tokens have flags of 0, but the second padding has
non-NULL val.source and that one has PREV_WHITE set on it.
Now, because of the above condition, padding is in the end the first of the 2
padding tokens, i.e. pfile->avoid_paste.
The code later will do:
1393      if (token->type != CPP_EOF || token == &pfile->endarg)
1394        {
1395          _cpp_backup_tokens (pfile, 1);
1396          if (padding)
1397            _cpp_push_token_context (pfile, NULL, padding, 1);
1398        }
so backup a single token (a CPP_NAME) and push a new context containing just
the avoid_paste token.
Later on cpp_get_token_1 is called and first hits:
2930          else if (!reached_end_of_context (context))
2931            {
2932              consume_next_token_from_context (pfile, &result,
2933                                               &virt_loc);
and so the avoid_paste token is read and next time reached_end_of_context is
true and so it will:
2945              if (pfile->context->c.macro)
2946                ++num_expanded_macros_counter;
2947              _cpp_pop_context (pfile);
2948              if (pfile->state.in_directive)
2949                continue;
2950              result = &pfile->avoid_paste;
and read avoid_paste again.
Now, if Y is not a macro, the tokens which will be read are the original ones,
i.e.
one avoid_paste followed by the padding_token one.
Now, stringify_arg is called later on, and that one ignores CPP_PADDING tokens,
except when they have val.source set on it:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
...
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
So, when Y is not a macro, we set source to the CPP_MACRO_ARG with PREV_WHITE
set on it and so emit a space in between,
while when Y is a function-like macro, source isn't set from CPP_PADDING tokens
but instead from the actual CPP_NAME
token after it which doesn't have PREV_WHITE and so we don't emit a space in
there.
Now, if the order of the CPP_PADDING tokens doesn't matter, perhaps we want
funlike_invocation_p use the
padding token with non-NULL val.source in preference of the avoid_paste token
rather than the other way around,
but not sure what that would break.  Also, not really sure how PREV_WHITE flags
on the CPP_PADDING tokens come to play with this.

[Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens

Reply via email to