[PATCH 2/6] Bug: shellexp: Fix off-by-one assignment in read_token_word()

Michael Witten Mon, 28 Feb 2011 13:08:22 -0800

This fixes a parsing bug that would probably never be of trouble in
practice; it is an edge case of edge cases. Nevertheless, bash should
be able to handle it without corrupting memory.


Consider the following:

  $ TOKEN_DEFAULT_INITIAL_SIZE=496   # parse.y:1282
  $ n=TOKEN_DEFAULT_INITIAL_SIZE     # notational convenience
  $ unset BASH_ENV
  $ { for ((i=0;i<n-5;++i)); do printf 'a'; done; printf '${a}\177'; } | bash

That is, giving bash input composed of TOKEN_DEFAULT_INITIAL_SIZE `a'
characters followed by `${a}' followed by CTLNUL followed by EOF.

The environment variable `BASH_ENV' is unset to inhibit the reading of any
startup script, which might otherwise alter the particular case that would
trigger the bug (due to changes in buffer allocations [see below]).

The bash program eventually calls read_token_word(), which, under the
given input, is essentially equivalent to the following:

  static int
  read_token_word (character)
       int character;
  {
    /* The value for YYLVAL when a WORD is read. */
    WORD_DESC *the_word;

    /* Index into the token that we are building. */
    int token_index;

    /* DOLLAR_PRESENT becomes non-zero if we see a `$'. */
    int dollar_present;

    if (token_buffer_size < TOKEN_DEFAULT_INITIAL_SIZE)
      token = (char *)xrealloc (token, token_buffer_size = 
TOKEN_DEFAULT_INITIAL_SIZE);

    token_index = 0;
    dollar_present = 0;

    for (;;)
      {
        if (character == EOF)
          goto got_token;

        if (character == '$')
          {
             /****** simulate peek_char and parse_matched_pair() ******/
             shell_getc (1);
             shell_getc (1);
             /*********************************************************/

             token[491] = character;
             token[492] = '{';
             strcpy (token + 493, "a}");
             token_index = 495;
             dollar_present = 1;
             goto next_character;
          }

        if (character == CTLESC || character == CTLNUL)
          {
            token[495] = CTLESC;
            token_index = 496;    /* assignment below: token[497] = CTLNUL; */
          }

        dollar_present |= character == '$';

        token[token_index++] = character;

        RESIZE_MALLOCED_BUFFER (token, token_index, 1, token_buffer_size,
                                TOKEN_DEFAULT_GROW_SIZE);

      next_character:
        character = shell_getc (1);
      }   /* end for (;;) */

  got_token:

    token[token_index] = '\0';

    the_word = (WORD_DESC *)xmalloc (sizeof (WORD_DESC));
    the_word->word = (char *)xmalloc (1 + token_index);
    the_word->flags = 0;
    strcpy (the_word->word, token);
    if (dollar_present)
      the_word->flags |= W_HASDOLLAR;

    yylval.word = the_word;

    return WORD;
  }

Before the `$' is reached, the `for(;;)' loop looks essentially like this:

  for (;;)
      token[token_index++] = shell_getc (1);

That is, the `a' characters are copied into the token array.

When the `$' is read, `token_index' has the value:

  TOKEN_DEFAULT_INITIAL_SIZE-5 = 491

and the following code is executed:

  if (character == '$')
    {
       /****** simulate peek_char and parse_matched_pair() ******/
       shell_getc (1);
       shell_getc (1);
       /*********************************************************/

       token[491] = character;
       token[492] = '{';
       strcpy (token + 493, "a}");
       token_index = 495;
       dollar_present = 1;
       goto next_character;
    }

The value of `token_index' represents the number of characters that have
already been stored in `token'; in this case, 495 characters have been
stored. As per the `goto', the next character is read:

  next_character:
    character = shell_getc (1);

This reads the `\177' character, CTLNUL. which is handled with this code:

  if (character == CTLESC || character == CTLNUL)
    {
      token[495] = CTLESC;
      token_index=496;      /* assignment below: token[497] = CTLNUL; */
    }

  token[token_index++] = character;

Thus, the last assignment is equivalent to:

  token[496] = character;
  token_index=497;

which is, of course, one past the end of `token', because there are only
496 indicies for `token' according to its current allocation (the maximum
`token' index is 495).

The most "conservative" fix is the one presented in this patch:

  Allocate more space than is necessary by imposing
  an unnecessarily strict requirement, thereby "masking"
  the bug.

In my opinion, this is wrongheaded, but it only requires the change of
one character and it's exactly what other code paths in read_token_word()
[currently] do, probably by mistake. :-P

Signed-off-by: Michael Witten <mfwit...@gmail.com>
---
 parse.y |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/parse.y b/parse.y
index 6c095f5..55874aa 100644
--- a/parse.y
+++ b/parse.y
@@ -4473,7 +4473,7 @@ read_token_word (character)
                ttok = parse_matched_pair (cd, '[', ']', &ttoklen, 0);
              if (ttok == &matched_pair_error)
                return -1;              /* Bail immediately. */
-             RESIZE_MALLOCED_BUFFER (token, token_index, ttoklen + 2,
+             RESIZE_MALLOCED_BUFFER (token, token_index, ttoklen + 3,
                                      token_buffer_size,
                                      TOKEN_DEFAULT_GROW_SIZE);
              token[token_index++] = character;
-- 
1.7.4.22.g14b16.dirty

[PATCH 2/6] Bug: shellexp: Fix off-by-one assignment in read_token_word()

Reply via email to