Subject smells like a patch splitting opportunity: a trivial patch
adding const, followed by the more interesting patch passing around the
lookahead.  Or the other way round.

In fact, I just did this to help me review the interesting patch: the
trivial patch adds const to JSONToken *.  I can split in my tree.

Paolo Bonzini <[email protected]> writes:

> Pass the lookahead token down to the various functions implementing the
> recursive descent, instead of first peeking and then getting the token
> again multiple times.
>
> The main purpose of this patch is to switch the argument passing style
> for parse_* functions to something more desirable for a push parser,
> which gets one and exactly one token at a time.

Peeking at the lookahead is just fine in a traditional parser, but
you're preparing the way for a push parser.  Got it.

>                                                  However, there are
> some minor improvement in code size and a bugfix as well.
>
> In particular, because parse_array() and parse_object() can assume
> that the opening bracket/brace is not anymore in the token stream, it
> is now apparent that the first entry of an array incorrectly used the
> '[' character (stored in "token") as its location.

For a value of "apparent".  Not sure I would've seen it without your
heads up.

I believe you're talking about

        obj = parse_value(ctxt);
        if (obj == NULL) {
            parse_error(ctxt, token, "expecting value");
            goto out;
        }

where @token indeed holds the array's '['.

This isn't the only spot where we pass a bad @token argument to
parse_error().  For instance:

        parse_error(ctxt, NULL, "premature EOI");

These bad arguments are all masked by parse_error() completely ignoring
its argument!  We tell users what's wrong with the input, but telling
them where it's wrong has been TODO since the initial commit in 2009[*].
I see your PATCH 5 finally takes care of it.

Losely related: how the parser recovers from syntax errors.  Consider
parse_pair():

    key_obj = parse_value(ctxt);

If @key_obj, parse_value() successfully parsed a value into @key_obj.

Else, it reported an error.

    key = qobject_to(QString, key_obj);

If @key, parse_value() recognized a string.

Else, it either recognized something else, or reported an error.

    if (!key) {
        parse_error(ctxt, peek, "key is not a string in object");

This error makes sense if it recognized something else.

It doesn't when it reported an error.  Works, sort of, because
parse_error() keeps only the first error, and silently ignores
subsequent ones.

Robust parsers handle errors explicitly, they don't blindly continue,
throwing away any subsequent errors.  Moreover, I find these misleading
errors grating when I read the code.

But I digress.

        goto out;
    }

>                                                     parse_pair, for
> comparison, handled it correctly:
>
>     if (!key) {
>         parse_error(ctxt, peek, "key is not a string in object");
>         goto out;
>     }
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
>  qobject/json-parser.c | 125 +++++++++++++++++++-----------------------
>  1 file changed, 55 insertions(+), 70 deletions(-)
>
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index 7483e582fea..eabc9f8358c 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -49,13 +49,13 @@ typedef struct JSONParserContext {
>   * 4) deal with premature EOI
>   */
>  
> -static QObject *parse_value(JSONParserContext *ctxt);
> +static QObject *parse_value(JSONParserContext *ctxt, const JSONToken *token);
>  
>  /**
>   * Error handler
>   */
>  static void G_GNUC_PRINTF(3, 4) parse_error(JSONParserContext *ctxt,
> -                                           JSONToken *token, const char 
> *msg, ...)
> +                                           const JSONToken *token, const 
> char *msg, ...)

checkpatch points out

    0001-json-parser-pass-around-lookahead-token-constify.patch:48: WARNING: 
line over 80 characters

Easy enough to avoid.  Both

   static void G_GNUC_PRINTF(3, 4) parse_error(JSONParserContext *ctxt,
                                               const JSONToken *token,
                                               const char *msg, ...)

and

   static void G_GNUC_PRINTF(3, 4)
   parse_error(JSONParserContext *ctxt, const JSONToken *token,
               const char *msg, ...)

would be easier on my eyes.

More of the same below, not flagging again.

>  {
>      va_list ap;
>      char message[1024];
> @@ -126,7 +126,7 @@ static int cvt4hex(const char *s)
>   * - Invalid Unicode characters are rejected.
>   * - Control characters \x00..\x1F are rejected by the lexer.
>   */
> -static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
> +static QString *parse_string(JSONParserContext *ctxt, const JSONToken *token)
>  {
>      const char *ptr = token->str;
>      GString *str;
> @@ -235,42 +235,29 @@ out:
>      return NULL;
>  }
>  
> -/* Note: the token object returned by parser_context_peek_token or
> - * parser_context_pop_token is deleted as soon as parser_context_pop_token
> - * is called again.
> +/* Note: the token object returned by parser_context_pop_token is
> + * deleted as soon as parser_context_pop_token is called again.
>   */
> -static JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
> +static const JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
>  {
>      g_free(ctxt->current);
>      ctxt->current = g_queue_pop_head(ctxt->buf);
>      return ctxt->current;
>  }
>  
> -static JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
> -{
> -    return g_queue_peek_head(ctxt->buf);
> -}
> -
>  /**
>   * Parsing rules
>   */
> -static int parse_pair(JSONParserContext *ctxt, QDict *dict)
> +static int parse_pair(JSONParserContext *ctxt, const JSONToken *token, QDict 
> *dict)
>  {
>      QObject *key_obj = NULL;
>      QString *key;
>      QObject *value;
> -    JSONToken *peek, *token;
>  
> -    peek = parser_context_peek_token(ctxt);
> -    if (peek == NULL) {
> -        parse_error(ctxt, NULL, "premature EOI");
> -        goto out;
> -    }
> -
> -    key_obj = parse_value(ctxt);
> +    key_obj = parse_value(ctxt, token);
>      key = qobject_to(QString, key_obj);
>      if (!key) {
> -        parse_error(ctxt, peek, "key is not a string in object");
> +        parse_error(ctxt, token, "key is not a string in object");
>          goto out;
>      }
>  

@peek isn't exactly a great name, but @token tells me even less.
@first_token?

> @@ -285,7 +272,13 @@ static int parse_pair(JSONParserContext *ctxt, QDict 
> *dict)
>          goto out;
>      }
>  
> -    value = parse_value(ctxt);
> +    token = parser_context_pop_token(ctxt);

Assignment to parameter.  Not a fan.

If we rename the parameter, we can keep the local variable @token for
use here.

> +    if (token == NULL) {
> +        parse_error(ctxt, NULL, "premature EOI");
> +        goto out;
> +    }
> +
> +    value = parse_value(ctxt, token);
>      if (value == NULL) {
>          parse_error(ctxt, token, "Missing value in dict");
>          goto out;
> @@ -309,21 +302,18 @@ out:
>  static QObject *parse_object(JSONParserContext *ctxt)
>  {
>      QDict *dict = NULL;
> -    JSONToken *token, *peek;
> -
> -    token = parser_context_pop_token(ctxt);
> -    assert(token && token->type == JSON_LCURLY);
> +    const JSONToken *token;

Before the patch, parse_object() parses the entire object, as its name
indicates, with the precondition that the lookahead is '{'.

After the patch, it parses the part of the object after the '{'.  Hmm.

We could pass it the first token, like we do for all the other parsing
functions except this one and parse_array(), and keep the assertion.
I'd prefer that, but it's a matter of taste.

>  
>      dict = qdict_new();
>  
> -    peek = parser_context_peek_token(ctxt);
> -    if (peek == NULL) {
> +    token = parser_context_pop_token(ctxt);
> +    if (token == NULL) {
>          parse_error(ctxt, NULL, "premature EOI");
>          goto out;
>      }
>  
> -    if (peek->type != JSON_RCURLY) {
> -        if (parse_pair(ctxt, dict) == -1) {
> +    if (token->type != JSON_RCURLY) {
> +        if (parse_pair(ctxt, token, dict) == -1) {
>              goto out;
>          }
>  
> @@ -339,7 +329,13 @@ static QObject *parse_object(JSONParserContext *ctxt)
>                  goto out;
>              }
>  
> -            if (parse_pair(ctxt, dict) == -1) {
> +            token = parser_context_pop_token(ctxt);
> +            if (token == NULL) {
> +                parse_error(ctxt, NULL, "premature EOI");
> +                goto out;
> +            }
> +
> +            if (parse_pair(ctxt, token, dict) == -1) {
>                  goto out;
>              }
>  
> @@ -349,8 +345,6 @@ static QObject *parse_object(JSONParserContext *ctxt)
>                  goto out;
>              }
>          }
> -    } else {
> -        (void)parser_context_pop_token(ctxt);
>      }
>  
>      return QOBJECT(dict);
> @@ -363,23 +357,20 @@ out:
>  static QObject *parse_array(JSONParserContext *ctxt)
>  {
>      QList *list = NULL;
> -    JSONToken *token, *peek;
> -
> -    token = parser_context_pop_token(ctxt);
> -    assert(token && token->type == JSON_LSQUARE);
> +    const JSONToken *token;

Likewise.

>  
>      list = qlist_new();
>  
> -    peek = parser_context_peek_token(ctxt);
> -    if (peek == NULL) {
> +    token = parser_context_pop_token(ctxt);
> +    if (token == NULL) {
>          parse_error(ctxt, NULL, "premature EOI");
>          goto out;
>      }
>  
> -    if (peek->type != JSON_RSQUARE) {
> +    if (token->type != JSON_RSQUARE) {
>          QObject *obj;
>  
> -        obj = parse_value(ctxt);
> +        obj = parse_value(ctxt, token);
>          if (obj == NULL) {
>              parse_error(ctxt, token, "expecting value");
>              goto out;
> @@ -399,7 +390,13 @@ static QObject *parse_array(JSONParserContext *ctxt)
>                  goto out;
>              }
>  
> -            obj = parse_value(ctxt);
> +            token = parser_context_pop_token(ctxt);
> +            if (token == NULL) {
> +                parse_error(ctxt, NULL, "premature EOI");
> +                goto out;
> +            }
> +
> +            obj = parse_value(ctxt, token);
>              if (obj == NULL) {
>                  parse_error(ctxt, token, "expecting value");
>                  goto out;
> @@ -413,8 +410,6 @@ static QObject *parse_array(JSONParserContext *ctxt)
>                  goto out;
>              }
>          }
> -    } else {
> -        (void)parser_context_pop_token(ctxt);
>      }
>  
>      return QOBJECT(list);
> @@ -424,11 +419,8 @@ out:
>      return NULL;
>  }
>  
> -static QObject *parse_keyword(JSONParserContext *ctxt)
> +static QObject *parse_keyword(JSONParserContext *ctxt, const JSONToken 
> *token)
>  {
> -    JSONToken *token;
> -
> -    token = parser_context_pop_token(ctxt);
>      assert(token && token->type == JSON_KEYWORD);
>  
>      if (!strcmp(token->str, "true")) {
> @@ -442,11 +434,8 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
>      return NULL;
>  }
>  
> -static QObject *parse_interpolation(JSONParserContext *ctxt)
> +static QObject *parse_interpolation(JSONParserContext *ctxt, const JSONToken 
> *token)
>  {
> -    JSONToken *token;
> -
> -    token = parser_context_pop_token(ctxt);
>      assert(token && token->type == JSON_INTERP);
>  
>      if (!strcmp(token->str, "%p")) {
> @@ -478,11 +467,8 @@ static QObject *parse_interpolation(JSONParserContext 
> *ctxt)
>      return NULL;
>  }
>  
> -static QObject *parse_literal(JSONParserContext *ctxt)
> +static QObject *parse_literal(JSONParserContext *ctxt, const JSONToken 
> *token)
>  {
> -    JSONToken *token;
> -
> -    token = parser_context_pop_token(ctxt);
>      assert(token);
>  
>      switch (token->type) {
> @@ -530,29 +516,21 @@ static QObject *parse_literal(JSONParserContext *ctxt)
>      }
>  }
>  
> -static QObject *parse_value(JSONParserContext *ctxt)
> +static QObject *parse_value(JSONParserContext *ctxt, const JSONToken *token)

If we name parse_pair()'s new argument @first_token, we should name this
one the same.

>  {
> -    JSONToken *token;
> -
> -    token = parser_context_peek_token(ctxt);
> -    if (token == NULL) {
> -        parse_error(ctxt, NULL, "premature EOI");
> -        return NULL;
> -    }
> -
>      switch (token->type) {
>      case JSON_LCURLY:
>          return parse_object(ctxt);
>      case JSON_LSQUARE:
>          return parse_array(ctxt);
>      case JSON_INTERP:
> -        return parse_interpolation(ctxt);
> +        return parse_interpolation(ctxt, token);
>      case JSON_INTEGER:
>      case JSON_FLOAT:
>      case JSON_STRING:
> -        return parse_literal(ctxt);
> +        return parse_literal(ctxt, token);
>      case JSON_KEYWORD:
> -        return parse_keyword(ctxt);
> +        return parse_keyword(ctxt, token);
>      default:
>          parse_error(ctxt, token, "expecting value");
>          return NULL;
> @@ -575,8 +553,15 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap, 
> Error **errp)
>  {
>      JSONParserContext ctxt = { .buf = tokens, .ap = ap };
>      QObject *result;
> +    const JSONToken *token;
>  
> -    result = parse_value(&ctxt);
> +    token = parser_context_pop_token(&ctxt);
> +    if (token == NULL) {
> +        parse_error(&ctxt, NULL, "premature EOI");
> +        return NULL;
> +    }
> +
> +    result = parse_value(&ctxt, token);
>      assert(ctxt.err || g_queue_is_empty(ctxt.buf));
>  
>      error_propagate(errp, ctxt.err);

       while (!g_queue_is_empty(ctxt.buf)) {
           parser_context_pop_token(&ctxt);

This is the only call of parser_context_pop_token() that isn't
immediately followed by "if it returned null, report "premature EOI".
I went "let's factor this out", then realized the next patch replaces
all this.  Nevermind!

       }
       g_free(ctxt.current);

       return result;
   }


Summary:

* parse_pair() and parse_value() now receive the first token as
  argument.  Popping the first token moves to their callers.

* parse_keyword(), parse_interpolation(), and parse_literal() now
  receive the token as argument.  Popping it moves to their callers.
  parse_string() works like this even before the patch, and is not
  changed.

* parse_object() and parse_array() now parse the part after the object's
  / array's first token.

Works.



[*] Reminds me of "Ed is generous enough to flag errors, yet prudent
enough not to overwhelm the novice with verbosity."


Reply via email to