"James K. Lowden" <jklow...@schemamania.org> writes:

> On Sat, 15 Feb 2025 23:32:37 -0500
> David Malcolm <dmalc...@redhat.com> wrote:
>
> In defense of lack of free(3) ...
>
>> > +const char *
>> > +esc( size_t len, const char input[] ) {
>> > +  static char spaces[] = "([,;]?[[:space:]])+";
>> > +  static char spaceD[] = "(\n {6}D" "|" "[,;]?[[:space:]])+";
>> > +  static char buffer[64 * 1024];
>> > +  char *p = buffer;
>> > +  const char *eoinput = input + len;
>> > +
>> > +  const char *spacex = is_reference_format()? spaceD : spaces;
>> > +
>> > +  for( const char *s=input; *s && s < eoinput; s++ ) {
>> > +    *p = '\0';
>> > +    gcc_assert( size_t(p - buffer) < sizeof(buffer) - 4 );
>
> overflow guarded here
>
>> > +    switch(*s) {
>> > +    case '^': case '$':
>> > +    case '(': case ')':
>> > +    case '*': case '+': case '?':
>> > +    case '[': case ']':
>> > +    case '{': case '}':
>> > +    case '|':
>> > +    case '.':
>> > +      *p++ = '\\';
>> > +      *p++ = *s;
>> > +      break;
>> > +    case '\\':
>> > +      *p++ = '[';
>> > +      *p++ = *s;
>> > +      *p++ = ']';
>> > +      break;
>> > +
>> > +    case ';': case ',':
>> > +      if( ! (s+1 < eoinput && s[1] == 0x20) ) {
>> > +        *p++ = *s;
>> > +        break;
>> > +      }
>> > +      __attribute__((fallthrough));
>> > +    case 0x20: case '\n':
>> > +      gcc_assert(p + sizeof(spacex) < buffer + sizeof(buffer));
>
> and overflow guarded here, the only place where more than 4 characters can be 
> inserted into the buffer.  
>
>> > +      p = stpcpy( p, spacex );
>> > +      while( s+1 < eoinput && is_separator_space(s+1)) {
>> > +        s++;
>> > +      }
>> > +      break;
>> > +    default:
>> > +      *p++ = *s;
>> > +      break;
>> > +    }
>> > +  }
>> > +  *p = '\0';
> ...
>> > +  return xstrdup(buffer);
>> > +}
>> 
>> Has a fixed size 64k buffer; doesn't seem to have proper overflow
>> handling. Could use a pretty_printer to accumulate chars.
>
> Thank you for these comments.  Let me see if I can alleviate your concerns.  
>
> This function is called from exactly one place, where the file-reader,
> lexio, parses a REPLACE directive.  The COBOL input says "REPLACE X BY
> Y" subject to some constraints.  Because we're using regex to find X,
> and because X might be any arbitrary string, the esc() function
> escapes regex metacharacters prior to executing the regex.
>
> IMHO it's unlikely the resulting regex input will exceed 64 KB.  It's
> unlikely to be even 64 bytes.  (Usually X is a COBOL identifier,
> limited to 64 bytes by ISO.)  Granted, a fixed maximum is a
> limitation.  But I put it to you: which is more likely?  For a regex
> to exceed 64 KB, or for heap allocation to fail to return adequate
> memory?  If there's some crazy input, I'd rather die at 64 KB than
> consume gigabytes of swap on the way to crashing.
>
> (As a matter of fact, the function returns a copy of the used portion
> of the static buffer, an unnecessary allocation.  The caller soon
> replaces it, and could have used a pointer into that static buffer.)
>
> If there is a built-in function in gcc to escape a regex, that would be 
> preferable to this gnarly code.  Is that what you mean by "pretty printer"?  
>
>> Returns a allocated buffer as "const char *", which should be just a
>> "char  *".
>
> The caller has no business writing to the allocated buffer, which is input to 
> the regex call.  Caller and called agree on const.  Isn't that what "const" 
> is for?  

If it's allocated, it should be freed at some point. If it's const,
freeing it looks wrong.

>
> --jkl

Reply via email to