This fixes a parsing bug that would probably never be of trouble in practice; it is an edge case of edge cases. Nevertheless, bash should be able to handle it without corrupting memory.
Consider the following: $ TOKEN_DEFAULT_INITIAL_SIZE=496 # parse.y:1282 $ n=TOKEN_DEFAULT_INITIAL_SIZE # notational convenience $ unset BASH_ENV $ { for ((i=0;i<n-5;++i)); do printf 'a'; done; printf '${a}\177'; } | bash That is, giving bash input composed of TOKEN_DEFAULT_INITIAL_SIZE `a' characters followed by `${a}' followed by CTLNUL followed by EOF. The environment variable `BASH_ENV' is unset to inhibit the reading of any startup script, which might otherwise alter the particular case that would trigger the bug (due to changes in buffer allocations [see below]). The bash program eventually calls read_token_word(), which, under the given input, is essentially equivalent to the following: static int read_token_word (character) int character; { /* The value for YYLVAL when a WORD is read. */ WORD_DESC *the_word; /* Index into the token that we are building. */ int token_index; /* DOLLAR_PRESENT becomes non-zero if we see a `$'. */ int dollar_present; if (token_buffer_size < TOKEN_DEFAULT_INITIAL_SIZE) token = (char *)xrealloc (token, token_buffer_size = TOKEN_DEFAULT_INITIAL_SIZE); token_index = 0; dollar_present = 0; for (;;) { if (character == EOF) goto got_token; if (character == '$') { /****** simulate peek_char and parse_matched_pair() ******/ shell_getc (1); shell_getc (1); /*********************************************************/ token[491] = character; token[492] = '{'; strcpy (token + 493, "a}"); token_index = 495; dollar_present = 1; goto next_character; } if (character == CTLESC || character == CTLNUL) { token[495] = CTLESC; token_index = 496; /* assignment below: token[497] = CTLNUL; */ } dollar_present |= character == '$'; token[token_index++] = character; RESIZE_MALLOCED_BUFFER (token, token_index, 1, token_buffer_size, TOKEN_DEFAULT_GROW_SIZE); next_character: character = shell_getc (1); } /* end for (;;) */ got_token: token[token_index] = '\0'; the_word = (WORD_DESC *)xmalloc (sizeof (WORD_DESC)); the_word->word = (char *)xmalloc (1 + token_index); the_word->flags = 0; strcpy (the_word->word, token); if (dollar_present) the_word->flags |= W_HASDOLLAR; yylval.word = the_word; return WORD; } Before the `$' is reached, the `for(;;)' loop looks essentially like this: for (;;) token[token_index++] = shell_getc (1); That is, the `a' characters are copied into the token array. When the `$' is read, `token_index' has the value: TOKEN_DEFAULT_INITIAL_SIZE-5 = 491 and the following code is executed: if (character == '$') { /****** simulate peek_char and parse_matched_pair() ******/ shell_getc (1); shell_getc (1); /*********************************************************/ token[491] = character; token[492] = '{'; strcpy (token + 493, "a}"); token_index = 495; dollar_present = 1; goto next_character; } The value of `token_index' represents the number of characters that have already been stored in `token'; in this case, 495 characters have been stored. As per the `goto', the next character is read: next_character: character = shell_getc (1); This reads the `\177' character, CTLNUL. which is handled with this code: if (character == CTLESC || character == CTLNUL) { token[495] = CTLESC; token_index=496; /* assignment below: token[497] = CTLNUL; */ } token[token_index++] = character; Thus, the last assignment is equivalent to: token[496] = character; token_index=497; which is, of course, one past the end of `token', because there are only 496 indicies for `token' according to its current allocation (the maximum `token' index is 495). The most "conservative" fix is the one presented in this patch: Allocate more space than is necessary by imposing an unnecessarily strict requirement, thereby "masking" the bug. In my opinion, this is wrongheaded, but it only requires the change of one character and it's exactly what other code paths in read_token_word() [currently] do, probably by mistake. :-P Signed-off-by: Michael Witten <mfwit...@gmail.com> --- parse.y | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/parse.y b/parse.y index 6c095f5..55874aa 100644 --- a/parse.y +++ b/parse.y @@ -4473,7 +4473,7 @@ read_token_word (character) ttok = parse_matched_pair (cd, '[', ']', &ttoklen, 0); if (ttok == &matched_pair_error) return -1; /* Bail immediately. */ - RESIZE_MALLOCED_BUFFER (token, token_index, ttoklen + 2, + RESIZE_MALLOCED_BUFFER (token, token_index, ttoklen + 3, token_buffer_size, TOKEN_DEFAULT_GROW_SIZE); token[token_index++] = character; -- 1.7.4.22.g14b16.dirty