"James K. Lowden" <jklow...@schemamania.org> writes: > On Sat, 15 Feb 2025 23:32:37 -0500 > David Malcolm <dmalc...@redhat.com> wrote: > > In defense of lack of free(3) ... > >> > +const char * >> > +esc( size_t len, const char input[] ) { >> > + static char spaces[] = "([,;]?[[:space:]])+"; >> > + static char spaceD[] = "(\n {6}D" "|" "[,;]?[[:space:]])+"; >> > + static char buffer[64 * 1024]; >> > + char *p = buffer; >> > + const char *eoinput = input + len; >> > + >> > + const char *spacex = is_reference_format()? spaceD : spaces; >> > + >> > + for( const char *s=input; *s && s < eoinput; s++ ) { >> > + *p = '\0'; >> > + gcc_assert( size_t(p - buffer) < sizeof(buffer) - 4 ); > > overflow guarded here > >> > + switch(*s) { >> > + case '^': case '$': >> > + case '(': case ')': >> > + case '*': case '+': case '?': >> > + case '[': case ']': >> > + case '{': case '}': >> > + case '|': >> > + case '.': >> > + *p++ = '\\'; >> > + *p++ = *s; >> > + break; >> > + case '\\': >> > + *p++ = '['; >> > + *p++ = *s; >> > + *p++ = ']'; >> > + break; >> > + >> > + case ';': case ',': >> > + if( ! (s+1 < eoinput && s[1] == 0x20) ) { >> > + *p++ = *s; >> > + break; >> > + } >> > + __attribute__((fallthrough)); >> > + case 0x20: case '\n': >> > + gcc_assert(p + sizeof(spacex) < buffer + sizeof(buffer)); > > and overflow guarded here, the only place where more than 4 characters can be > inserted into the buffer. > >> > + p = stpcpy( p, spacex ); >> > + while( s+1 < eoinput && is_separator_space(s+1)) { >> > + s++; >> > + } >> > + break; >> > + default: >> > + *p++ = *s; >> > + break; >> > + } >> > + } >> > + *p = '\0'; > ... >> > + return xstrdup(buffer); >> > +} >> >> Has a fixed size 64k buffer; doesn't seem to have proper overflow >> handling. Could use a pretty_printer to accumulate chars. > > Thank you for these comments. Let me see if I can alleviate your concerns. > > This function is called from exactly one place, where the file-reader, > lexio, parses a REPLACE directive. The COBOL input says "REPLACE X BY > Y" subject to some constraints. Because we're using regex to find X, > and because X might be any arbitrary string, the esc() function > escapes regex metacharacters prior to executing the regex. > > IMHO it's unlikely the resulting regex input will exceed 64 KB. It's > unlikely to be even 64 bytes. (Usually X is a COBOL identifier, > limited to 64 bytes by ISO.) Granted, a fixed maximum is a > limitation. But I put it to you: which is more likely? For a regex > to exceed 64 KB, or for heap allocation to fail to return adequate > memory? If there's some crazy input, I'd rather die at 64 KB than > consume gigabytes of swap on the way to crashing. > > (As a matter of fact, the function returns a copy of the used portion > of the static buffer, an unnecessary allocation. The caller soon > replaces it, and could have used a pointer into that static buffer.) > > If there is a built-in function in gcc to escape a regex, that would be > preferable to this gnarly code. Is that what you mean by "pretty printer"? > >> Returns a allocated buffer as "const char *", which should be just a >> "char *". > > The caller has no business writing to the allocated buffer, which is input to > the regex call. Caller and called agree on const. Isn't that what "const" > is for?
If it's allocated, it should be freed at some point. If it's const, freeing it looks wrong. > > --jkl