Hi!

In libcpp, we have quite a lot of state on the token flags, some
related to the stuff that comes before the token (e.g.
PREV_FALLTHROUGH, PREV_WHITE and STRINGIFY_ARG), others related to the
stuff that comes after the token (e.g. PASTE_LEFT, SP_DIGRAPH, SP_PREV_WHITE).
Unfortunately, with the current __VA_OPT__ handling that information is
lost, because it is on the __VA_OPT__ or closing ) tokens that we are always
DROPing.

The following patch attempts to fix various issues, including some ICEs,
by introducing 3 new states, two of them are alternatives to INCLUDE used
for the very first token after __VA_OPT__( , where we want to take into
account also flags from the __VA_OPT__ token, and before the closing )
token where we want to use flags from the closing ) token.  Plus
PADDING which is used for the case where there are no varargs and __VA_OPT__
is supposed to fold into a placemarker, or for the case of __VA_OPT__(),
which is similar to that, in both cases we need to take into account in
those cases both flags from __VA_OPT__ and from the closing ).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This is just a partial fix, one thing this patch doesn't change is that
the standard says that __VA_OPT__ ( contents ) should be treated as
parameter, which means that #__VA_OPT__ ( contents ) should stringify it,
which we right now reject.  My preprocessor knowledge is too limited to
handle this right myself, including all the corner cases, e.g. one can have
#define f(x, ...) #__VA_OPT__(#x x ## x) etc..  I presume
m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE);
could be changed into:
m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE | STRINGIFY_ARG);
and when handling the PADDING result from update, we could just emit a 
"" token, but for INCLUDE_FIRST with this we'd need something complex,
probably a new routine similar to stringify_arg to some extent.

I've also cross-checked the libcpp implementation with this patch against
trunk clang which apparently also implements __VA_OPT__ now, on the
testcases included here the output is the same and on their
macro_vaopt_expand.cpp testcase, if I remove all tests that test
#__VA_OPT__ ( contents ) handling which we just reject now, there are still
some differences:
$ /usr/src/llvm/obj8/bin/clang++  -E /tmp/macro_vaopt_expand.cpp -std=c++2a > 
/tmp/1
$ ~/src/gcc/obj20/gcc/cc1plus -quiet -E /tmp/macro_vaopt_expand.cpp -std=c++2a 
> /tmp/2
diff -up /tmp/1 /tmp/2
-4: f(0 )
+4: f(0)
-6: f(0, a )
-7: f(0, a )
+6: f(0, a)
+7: f(0, a)
-9: TONG C ( ) B ( ) "A()"
+9: HT_A() C ( ) B ( ) "A()"
-16: S foo ;
+16: S foo;
-26: B1
-26_1: B1
+26: B 1
+26_1: B 1
-27: B11
-27_1: BexpandedA0 11
-28: B11
+27: B 11
+27_1: BA0 11
+28: B 11

Perhaps some of the whitespace changes aren't significant, but 9:, and
2[678]{,_1}: are significantly different.
9: is
#define LPAREN ( 
#define A() B LPAREN )
#define B() C LPAREN )
#define HT_B() TONG
#define F(x, ...) HT_ ## __VA_OPT__(x x A()  #x)
9: F(A(),1)

Thoughts on what is right and why?
Similarly for expansion on the last token from __VA_OPT__ when followed
by ##, like:
#define m1 (
#define f16() f17 m1 )
#define f17() f18 m1 )
#define f18() m2 m1 )
#define m3f17() g
#define f19(x, ...) m3 ## __VA_OPT__(x x f16() #x)
#define f20(x, ...) __VA_OPT__(x x)##m4()
#define f21() f17
#define f17m4() h
t25 f19 (f16 (), 1);
t26 f20 (f21 (), 2);

E.g. 26: is:
#define F(a,...)  __VA_OPT__(B a ## a) ## 1
26: F(,1)
I really wonder why clang emits B1 in that case, there
is no ## in between B and a, so those are 2 separate tokens
separated by whitespace, even when a ## a is a placemarker.
Does that mean __VA_OPT__ should throw away all the placemarkers
and return the last non-placemarker token for the ## handling?

Can somebody please take the rest over?

BTW, Tom, perhaps you should update your MAINTAINERS entry email address...

2018-01-10  Jakub Jelinek  <ja...@redhat.com>

        PR preprocessor/83063
        PR preprocessor/83708
        * macro.c (enum macro_arg_token_kind): Fix comment typo.
        (vaopt_state): Add m_flags field, reorder m_last_was_paste before
        m_state.
        (vaopt_state::vaopt_state): Adjust for the above changes.
        (vaopt_state::update_flags): Add INCLUDE_FIRST, INCLUDE_LAST and
        PADDING.
        (vaopt_state::update): Add limit argument, update m_flags member,
        return INCLUDE_FIRST instead of INCLUDE on the first and INCLUDE_LAST
        instead of INCLUDE on the last token between __VA_OPT__( and ).
        Returnn PADDING instead of DROP on the closing ) if no tokens were
        INCLUDE*.  Comment spelling fixes.
        (vaopt_state::flags): New method.
        (replace_args): Adjust vaopt_state::update caller.  Fix handling
        of __VA_OPT__ preceded or followed by ## and ensure no paste after
        last __VA_OPT__ token if not followed by ##.  Formatting fix.
        (create_iso_definition): Adjust vaopt_state::update caller.

        * c-c++-common/cpp/va-opt-2.c: New test.
        * c-c++-common/cpp/va-opt-3.c: New test.

--- libcpp/macro.c.jj   2018-01-03 10:42:55.938763447 +0100
+++ libcpp/macro.c      2018-01-09 17:27:23.603191796 +0100
@@ -51,7 +51,7 @@ struct macro_arg
 enum macro_arg_token_kind {
   MACRO_ARG_TOKEN_NORMAL,
   /* This is a macro argument token that got transformed into a string
-     litteral, e.g. #foo.  */
+     literal, e.g. #foo.  */
   MACRO_ARG_TOKEN_STRINGIFIED,
   /* This is a token resulting from the expansion of a macro
      argument that was itself a macro.  */
@@ -105,10 +105,11 @@ class vaopt_state {
     : m_pfile (pfile),
     m_allowed (any_args),
     m_variadic (is_variadic),
-    m_state (0),
     m_last_was_paste (false),
+    m_state (0),
     m_paste_location (0),
-    m_location (0)
+    m_location (0),
+    m_flags (0)
   {
   }
 
@@ -116,13 +117,16 @@ class vaopt_state {
   {
     ERROR,
     DROP,
-    INCLUDE
+    INCLUDE,
+    INCLUDE_FIRST,
+    INCLUDE_LAST,
+    PADDING
   };
 
   /* Given a token, update the state of this tracker and return a
      boolean indicating whether the token should be be included in the
      expansion.  */
-  update_type update (const cpp_token *token)
+  update_type update (const cpp_token *token, const cpp_token *limit)
   {
     /* If the macro isn't variadic, just don't bother.  */
     if (!m_variadic)
@@ -139,6 +143,7 @@ class vaopt_state {
          }
        ++m_state;
        m_location = token->src_loc;
+       m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE);
        return DROP;
       }
     else if (m_state == 1)
@@ -155,6 +160,7 @@ class vaopt_state {
       }
     else if (m_state >= 2)
       {
+       update_type ret = INCLUDE;
        if (m_state == 2 && token->type == CPP_PASTE)
          {
            cpp_error_at (m_pfile, CPP_DL_ERROR, token->src_loc,
@@ -165,7 +171,10 @@ class vaopt_state {
           case we see a close paren immediately after the open
           paren.  */
        if (m_state == 2)
-         ++m_state;
+         {
+           ++m_state;
+           ret = INCLUDE_FIRST;
+         }
 
        bool was_paste = m_last_was_paste;
        m_last_was_paste = false;
@@ -191,10 +200,27 @@ class vaopt_state {
                    return ERROR;
                  }
 
-               return DROP;
+               if (ret == INCLUDE_FIRST)
+                 {
+                   m_flags |= token->flags & (PASTE_LEFT | SP_DIGRAPH
+                                              | SP_PREV_WHITE);
+                   return PADDING;
+                 }
+               return m_allowed ? DROP : PADDING;
              }
          }
-       return m_allowed ? INCLUDE : DROP;
+       if (limit
+           && m_state == 3
+           && token + 1 < limit
+           && token[1].type == CPP_CLOSE_PAREN)
+         {
+           if (m_allowed && ret != INCLUDE_FIRST)
+             m_flags = 0;
+           m_flags |= token[1].flags & (PASTE_LEFT | SP_DIGRAPH
+                                        | SP_PREV_WHITE);
+           ret = INCLUDE_LAST;
+         }
+       return m_allowed ? ret : DROP;
       }
 
     /* Nothing to do with __VA_OPT__.  */
@@ -211,6 +237,12 @@ class vaopt_state {
     return m_state == 0;
   }
 
+  /* Return any flags that should be ored to the current token.  */
+  unsigned short flags ()
+  {
+    return m_flags;
+  }
+
  private:
 
   /* The cpp_reader.  */
@@ -220,6 +252,9 @@ class vaopt_state {
   bool m_allowed;
   /* True if the macro is variadic.  */
   bool m_variadic;
+  /* If true, the previous token was ##.  This is used to detect when
+     a paste occurs at the end of the sequence.  */
+  bool m_last_was_paste;
 
   /* The state variable:
      0 means not parsing
@@ -228,14 +263,14 @@ class vaopt_state {
      >= 3 means looking for ")", the number encodes the paren depth.  */
   int m_state;
 
-  /* If true, the previous token was ##.  This is used to detect when
-     a paste occurs at the end of the sequence.  */
-  bool m_last_was_paste;
   /* The location of the paste token.  */
   source_location m_paste_location;
 
   /* Location of the __VA_OPT__ token.  */
   source_location m_location;
+
+  /* The flags from the __VA_OPT__ token and closing ).  */
+  unsigned short m_flags;
 };
 
 /* Macro expansion.  */
@@ -1817,9 +1852,9 @@ replace_args (cpp_reader *pfile, cpp_has
           multiple tokens. This is to save memory at the expense of
           accuracy.
 
-          Suppose we have #define SQARE(A) A * A
+          Suppose we have #define SQUARE(A) A * A
 
-          And then we do SQARE(2+3)
+          And then we do SQUARE(2+3)
 
           Then the tokens 2, +, 3, will have the same location,
           saying they come from the expansion of the argument A.  */
@@ -1831,16 +1866,70 @@ replace_args (cpp_reader *pfile, cpp_has
   i = 0;
   vaopt_state vaopt_tracker (pfile, macro->variadic,
                             args[macro->paramc - 1].count > 0);
+  unsigned short prev_flags = 0;
   for (src = macro->exp.tokens; src < limit; src++)
     {
       unsigned int arg_tokens_count;
       macro_arg_token_iter from;
       const cpp_token **paste_flag = NULL;
       const cpp_token **tmp_token_ptr;
+      unsigned short flags = src->flags;
 
       /* __VA_OPT__ handling.  */
-      if (vaopt_tracker.update (src) != vaopt_state::INCLUDE)
-       continue;
+      vaopt_state::update_type vaopt_state = vaopt_tracker.update (src, limit);
+      if (vaopt_state != vaopt_state::INCLUDE)
+       {
+         if (vaopt_state == vaopt_state::PADDING)
+           {
+             flags = vaopt_tracker.flags ();
+             if ((flags & PASTE_LEFT) && (prev_flags & PASTE_LEFT))
+               continue;
+             flags &= ~PASTE_LEFT;
+             cpp_token *t = (cpp_token *) padding_token (pfile, src);
+             unsigned index = expanded_token_index (pfile, macro, src, i);
+             t->flags |= flags;
+             /* Allocate a virtual location for the padding token and
+                append the token and its location to BUFF and
+                VIRT_LOCS.   */
+             tokens_buff_add_token (buff, virt_locs, t,
+                                    t->src_loc, t->src_loc,
+                                    map, index);
+             prev_flags = flags;
+             continue;
+           }
+         else if (vaopt_state == vaopt_state::INCLUDE_FIRST
+                  || vaopt_state == vaopt_state::INCLUDE_LAST)
+           {
+             unsigned short vaopt_flags = vaopt_tracker.flags ();
+             if (src->type != CPP_MACRO_ARG)
+               {
+                 cpp_token *t = _cpp_temp_token (pfile);
+                 t->type = src->type;
+                 t->val = src->val;
+                 t->flags = flags | vaopt_flags;
+                 unsigned index = expanded_token_index (pfile, macro, src, i);
+                 tokens_buff_add_token (buff, virt_locs, t,
+                                        t->src_loc, t->src_loc, map, index);
+                 i += 1;
+                 prev_flags = t->flags;
+                 /* Avoid paste on RHS, __VA_OPT__(c)d or
+                    __VA_OPT__(c)__VA_OPT__(d).  */
+                 if (!(vaopt_flags & PASTE_LEFT)
+                     && vaopt_state == vaopt_state::INCLUDE_LAST)
+                   {
+                     const cpp_token *t = &pfile->avoid_paste;
+                     tokens_buff_add_token (buff, virt_locs,
+                                            t, t->src_loc, t->src_loc,
+                                            NULL, 0);
+                   }
+                 continue;
+               }
+             else
+               flags |= vaopt_flags;
+           }
+         else
+           continue;
+       }
 
       if (src->type != CPP_MACRO_ARG)
        {
@@ -1852,6 +1941,7 @@ replace_args (cpp_reader *pfile, cpp_has
                                 src->src_loc, src->src_loc,
                                 map, index);
          i += 1;
+         prev_flags = flags;
          continue;
        }
 
@@ -1866,7 +1956,7 @@ replace_args (cpp_reader *pfile, cpp_has
         below is to initialize the iterator depending on the type of
         tokens the macro argument has.  It also does some adjustment
         related to padding tokens and some pasting corner cases.  */
-      if (src->flags & STRINGIFY_ARG)
+      if (flags & STRINGIFY_ARG)
        {
          arg_tokens_count = 1;
          macro_arg_token_iter_init (&from,
@@ -1875,7 +1965,7 @@ replace_args (cpp_reader *pfile, cpp_has
                                     MACRO_ARG_TOKEN_STRINGIFIED,
                                     arg, &arg->stringified);
        }
-      else if (src->flags & PASTE_LEFT)
+      else if (flags & PASTE_LEFT)
        {
          arg_tokens_count = arg->count;
          macro_arg_token_iter_init (&from,
@@ -1884,7 +1974,7 @@ replace_args (cpp_reader *pfile, cpp_has
                                     MACRO_ARG_TOKEN_NORMAL,
                                     arg, arg->first);
        }
-      else if (src != macro->exp.tokens && (src[-1].flags & PASTE_LEFT))
+      else if (prev_flags & PASTE_LEFT)
        {
          int num_toks;
          arg_tokens_count = arg->count;
@@ -1936,7 +2026,7 @@ replace_args (cpp_reader *pfile, cpp_has
 
       /* Padding on the left of an argument (unless RHS of ##).  */
       if ((!pfile->state.in_directive || pfile->state.directive_wants_padding)
-         && src != macro->exp.tokens && !(src[-1].flags & PASTE_LEFT))
+         && src != macro->exp.tokens && !(prev_flags & PASTE_LEFT))
        {
          const cpp_token *t = padding_token (pfile, src);
          unsigned index = expanded_token_index (pfile, macro, src, i);
@@ -1960,9 +2050,9 @@ replace_args (cpp_reader *pfile, cpp_has
                 save extra memory while tracking macro expansion
                 locations.  So in that case here is what we do:
 
-                Suppose we have #define SQARE(A) A * A
+                Suppose we have #define SQUARE(A) A * A
 
-                And then we do SQARE(2+3)
+                And then we do SQUARE(2+3)
 
                 Then the tokens 2, +, 3, will have the same location,
                 saying they come from the expansion of the argument
@@ -1970,9 +2060,9 @@ replace_args (cpp_reader *pfile, cpp_has
 
              So that means we are going to ignore the COUNT tokens
              resulting from the expansion of the current macro
-             arugment. In other words all the ARG_TOKENS_COUNT tokens
+             argument. In other words all the ARG_TOKENS_COUNT tokens
              resulting from the expansion of the macro argument will
-             have the index I. Normally, each of those token should
+             have the index I.  Normally, each of those token should
              have index I+J.  */
              unsigned token_index = i;
              unsigned index;
@@ -1989,9 +2079,9 @@ replace_args (cpp_reader *pfile, cpp_has
 
          /* With a non-empty argument on the LHS of ##, the last
             token should be flagged PASTE_LEFT.  */
-         if (src->flags & PASTE_LEFT)
-           paste_flag =
-             (const cpp_token **) tokens_buff_last_token_ptr (buff);
+         if (flags & PASTE_LEFT)
+           paste_flag
+             = (const cpp_token **) tokens_buff_last_token_ptr (buff);
        }
       else if (CPP_PEDANTIC (pfile) && ! CPP_OPTION (pfile, c99)
               && ! macro->syshdr && ! cpp_in_system_header (pfile))
@@ -2021,7 +2111,7 @@ replace_args (cpp_reader *pfile, cpp_has
                     NODE_NAME (node), src->val.macro_arg.arg_no);
 
       /* Avoid paste on RHS (even case count == 0).  */
-      if (!pfile->state.in_directive && !(src->flags & PASTE_LEFT))
+      if (!pfile->state.in_directive && !(flags & PASTE_LEFT))
        {
          const cpp_token *t = &pfile->avoid_paste;
          tokens_buff_add_token (buff, virt_locs,
@@ -2035,7 +2125,7 @@ replace_args (cpp_reader *pfile, cpp_has
          cpp_token *token = _cpp_temp_token (pfile);
          token->type = (*paste_flag)->type;
          token->val = (*paste_flag)->val;
-         if (src->flags & PASTE_LEFT)
+         if (flags & PASTE_LEFT)
            token->flags = (*paste_flag)->flags | PASTE_LEFT;
          else
            token->flags = (*paste_flag)->flags & ~PASTE_LEFT;
@@ -2043,6 +2133,7 @@ replace_args (cpp_reader *pfile, cpp_has
        }
 
       i += arg_tokens_count;
+      prev_flags = flags;
     }
 
   if (track_macro_exp)
@@ -3304,7 +3395,7 @@ create_iso_definition (cpp_reader *pfile
            }
        }
 
-      if (vaopt_tracker.update (token) == vaopt_state::ERROR)
+      if (vaopt_tracker.update (token, NULL) == vaopt_state::ERROR)
        return false;
 
       following_paste_op = (token->type == CPP_PASTE);
--- gcc/testsuite/c-c++-common/cpp/va-opt-2.c.jj        2018-01-09 
13:06:56.721345854 +0100
+++ gcc/testsuite/c-c++-common/cpp/va-opt-2.c   2018-01-09 17:13:35.759095064 
+0100
@@ -0,0 +1,41 @@
+/* PR preprocessor/83063 */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" { target c } } */
+/* { dg-options "-std=c++2a" { target c++ } } */
+
+#define f1(...) int b##__VA_OPT__(c)
+#define f2(...) int __VA_OPT__(c)##d
+#define f3(...) int e##__VA_OPT__()
+#define f4(...) int __VA_OPT__()##f
+#define f5(...) int g##__VA_OPT__(h)##i
+#define f6(...) int j##__VA_OPT__()##k
+#define f7(...) int l##__VA_OPT__()
+#define f8(...) int __VA_OPT__()##m
+#define f9(...) int n##__VA_OPT__()##o
+#define f10(x, ...) int x##__VA_OPT__(x)
+#define f11(x, ...) int __VA_OPT__(x)##x
+#define f12(x, ...) int x##__VA_OPT__(x)##x
+f1 (1, 2, 3);
+f1 ();
+f2 (1, 2);
+f2 ();
+f3 (1);
+f4 (2);
+f5 (6, 7);
+f5 ();
+f6 (8);
+f7 ();
+f8 ();
+f9 ();
+f10 (p, 5, 6);
+f10 (p);
+f11 (q, 7);
+f11 (q);
+f12 (r, 1, 2, 3, 4, 5);
+f12 (r);
+
+int
+main ()
+{
+  return bc + b + cd + d + e + f + ghi + gi + jk + l + m + no + pp + p + qq + 
q + rrr + rr;
+}
--- gcc/testsuite/c-c++-common/cpp/va-opt-3.c.jj        2018-01-09 
17:22:32.609158440 +0100
+++ gcc/testsuite/c-c++-common/cpp/va-opt-3.c   2018-01-09 17:20:14.985142201 
+0100
@@ -0,0 +1,69 @@
+/* PR preprocessor/83063 */
+/* PR preprocessor/83708 */
+/* { dg-do preprocess } */
+/* { dg-options "-std=gnu99" { target c } } */
+/* { dg-options "-std=c++2a" { target c++ } } */
+
+#define f1(...) b##__VA_OPT__(c)
+#define f2(...) __VA_OPT__(c)##d
+#define f3(...) e##__VA_OPT__()
+#define f4(...) __VA_OPT__()##f
+#define f5(...) g##__VA_OPT__(h)##i
+#define f6(...) j##__VA_OPT__()##k
+#define f7(...) l##__VA_OPT__()
+#define f8(...) __VA_OPT__()##m
+#define f9(...) n##__VA_OPT__()##o
+#define f10(x, ...) x##__VA_OPT__(x)
+#define f11(x, ...) __VA_OPT__(x)##x
+#define f12(x, ...) x##__VA_OPT__(x)##x
+#define f13(...) __VA_OPT__(a)__VA_OPT__(b)c
+#define f14(a, b, c, ...) __VA_OPT__(a)__VA_OPT__(b)c
+#define f15(a, b, c, ...) __VA_OPT__(a b)__VA_OPT__(b c)a/**/__VA_OPT__(c a)a
+t1 f1 (1, 2, 3);
+/* { dg-final { scan-file va-opt-3.i "t1 bc;" } } */
+t2 f1 ();
+/* { dg-final { scan-file va-opt-3.i "t2 b;" } } */
+t3 f2 (1, 2);
+/* { dg-final { scan-file va-opt-3.i "t3 cd;" } } */
+t4 f2 ();
+/* { dg-final { scan-file va-opt-3.i "t4 d;" } } */
+t5 f3 (1);
+/* { dg-final { scan-file va-opt-3.i "t5 e;" } } */
+t6 f4 (2);
+/* { dg-final { scan-file va-opt-3.i "t6 f;" } } */
+t7 f5 (6, 7);
+/* { dg-final { scan-file va-opt-3.i "t7 ghi;" } } */
+t8 f5 ();
+/* { dg-final { scan-file va-opt-3.i "t8 gi;" } } */
+t9 f6 (8);
+/* { dg-final { scan-file va-opt-3.i "t9 jk;" } } */
+t10 f7 ();
+/* { dg-final { scan-file va-opt-3.i "t10 l;" } } */
+t11 f8 ();
+/* { dg-final { scan-file va-opt-3.i "t11 m;" } } */
+t12 f9 ();
+/* { dg-final { scan-file va-opt-3.i "t12 no;" } } */
+t13 f10 (p, 5, 6);
+/* { dg-final { scan-file va-opt-3.i "t13 pp;" } } */
+t14 f10 (p);
+/* { dg-final { scan-file va-opt-3.i "t14 p;" } } */
+t15 f11 (q, 7);
+/* { dg-final { scan-file va-opt-3.i "t15 qq;" } } */
+t16 f11 (q);
+/* { dg-final { scan-file va-opt-3.i "t16 q;" } } */
+t17 f12 (r, 1, 2, 3, 4, 5);
+/* { dg-final { scan-file va-opt-3.i "t17 rrr;" } } */
+t18 f12 (r);
+/* { dg-final { scan-file va-opt-3.i "t18 rr;" } } */
+t19 f13 (1, 2);
+/* { dg-final { scan-file va-opt-3.i "t19 a b c;" } } */
+t20 f13 ();
+/* { dg-final { scan-file va-opt-3.i "t20 c;" } } */
+t21 f14 (3, 4, 5, 2);
+/* { dg-final { scan-file va-opt-3.i "t21 3 4 5;" } } */
+t22 f14 (3, 4, 5);
+/* { dg-final { scan-file va-opt-3.i "t22 5;" } } */
+t23 f15 (6, 7, 8, 9);
+/* { dg-final { scan-file va-opt-3.i "t23 6 7 7 8 6 8 6 6;" } } */
+t24 f15 (6, 7, 8);
+/* { dg-final { scan-file va-opt-3.i "t24 6 6;" } } */

        Jakub

Reply via email to