Folks, While following up on: http://thread.gmane.org/gmane.comp.printing.groff.general/12353/focus=12368
I've identified a potential buffer overrun in the above file; (it *will* overrun, if function ps_get_line() reads a maximum length line of 255 input characters -- the maximum allowed by the DSC, *excluding* line terminators -- because do_ps_file() allocates a buffer of exactly 255 bytes, which ps_get_line() will happily fill completely, before gratuitously appending a LF plus a NUL string terminator, for a potential overrun to 257 bytes). The attached patch fixes this, in addition to refactoring ps_get_line() itself, to avoid some look-ahead/push-back I/O overhead when processing input files with CR-only line terminators, and to more readily support a (future) alternative psbb processing path, within do_ps_file(), to accommodate retrieval of /MediaBox bounds from PDF files. Okay to commit? I believe I've matched the existing style of code layout, (which isn't entirely to my personal taste), but is the comment style acceptable? (I've annotated my changes considerably more comprehensively than the original). -- Regards, Keith.
# HG changeset patch # Parent 7fde8642cc78b42ba10b622cbfdff7a6737c5412 ChangeLog: ????-??-?? Keith Marshall <keith.d.marsh...@ntlworld.com> Refactor psbb line input function; avoid a buffer overrun. * src/roff/troff/input.cpp (ps_get_line): Refactor, to avoid the overhead of character look-ahead and push-back on CR stream input. Add new `dscopt' parameter, in place of internal `err' variable; update all call references, passing value of... (DSC_LINE_MAX_ENFORCE): ...this new manifest constant; define it. (DSC_LINE_MAX_IGNORED): Likewise; currently unused, but intended for future use as an alternative to `DSC_LINE_MAX_ENFORCE'. (DSC_LINE_MAX_CHECKED): New manifest constant; used internally only. (PS_LINE_MAX): Manifest constant, renamed for notional consistency... (DSC_LINE_MAX): ...to this; defined value remains as 255. (do_ps_file): Increase stack allocation for `buf' char array; former allocation of PS_LINE_MAX (now DSC_LINE_MAX) bytes exposed a potential buffer overrun, after reading DSC_LINE_MAX bytes; two additional bytes are required, to accommodate a terminating LF and NUL. Add `dscopt' parameter, with value `DSC_LINE_MAX_ENFORCE', in each of three calls to `ps_get_line()'. diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -6040,44 +6040,97 @@ int parse_bounding_box(char *p, bounding } bb->llx = bb->lly = bb->urx = bb->ury = 0; return 0; } -// This version is taken from psrm.cpp - -#define PS_LINE_MAX 255 +// This version is adapted from that in psrm.cpp + cset white_space("\n\r \t"); -int ps_get_line(char *buf, FILE *fp, const char* filename) -{ - int c = getc(fp); - if (c == EOF) { - buf[0] = '\0'; - return 0; - } - int i = 0; - int err = 0; - while (c != '\r' && c != '\n' && c != EOF) { - if ((c < 0x1b && !white_space(c)) || c == 0x7f) - error("invalid input character code %1 in `%2'", int(c), filename); - else if (i < PS_LINE_MAX) - buf[i++] = c; - else if (!err) { - err = 1; - error("PostScript file `%1' is non-conforming " - "because length of line exceeds 255", filename); - } - c = getc(fp); - } - buf[i++] = '\n'; - buf[i] = '\0'; - if (c == '\r') { - c = getc(fp); - if (c != EOF && c != '\n') - ungetc(c, fp); - } - return 1; +/* Maximum input line length, for DSC conformance, and options to + * control how it will be enforced; caller should select either of + * DSC_LINE_MAX_IGNORED, to allow partial line collection spread + * across multiple calls, or DSC_LINE_MAX_ENFORCE, to truncate + * excess length lines at the DSC limit. + * + * Note that DSC_LINE_MAX_CHECKED is reserved for internal use by + * ps_get_line(), and should not be specified by any caller; also, + * handling of DSC_LINE_MAX_IGNORED is currently unimplemented. + */ +#define DSC_LINE_MAX 255 +#define DSC_LINE_MAX_IGNORED -1 +#define DSC_LINE_MAX_ENFORCE 0 +#define DSC_LINE_MAX_CHECKED 1 + +static +/********** + * ps_get_line(): collect an input record from a PostScript file. + * + * Inputs: + * buf pointer to caller's input buffer. + * fp FILE stream pointer, whence input is read. + * filename name of input file, (for diagnostic use only). + * dscopt DSC_LINE_MAX_ENFORCE or DSC_LINE_MAX_IGNORED. + * + * Returns (+ve) number of input characters stored into caller's + * buffer, or zero at end of input stream. + */ +int ps_get_line(char *buf, FILE *fp, const char* filename, int dscopt) +{ + int c, count = 0; + static int lastc = EOF; + do { + /* Collect input characters into caller's buffer, until we + * encounter a line terminator, or end of file... + */ + while (((c = getc(fp)) != '\n') && (c != '\r') && (c != EOF)) { + if ((((lastc = c) < 0x1b) && !white_space(c)) || (c == 0x7f)) + /* + * ...rejecting any which may be designated as invalid. + */ + error("invalid input character code %1 in `%2'", int(c), filename); + + /* On reading a valid input character, and when there is + * room in caller's buffer... + */ + else if (count < DSC_LINE_MAX) + /* + * ...store it. + */ + buf[count++] = c; + + /* We have a valid input character, but it will not fit + * into caller's buffer; if enforcing DSC conformity... + */ + else if (dscopt == DSC_LINE_MAX_ENFORCE) { + /* + * ...diagnose and truncate. + */ + dscopt = DSC_LINE_MAX_CHECKED; + error("PostScript file `%1' is non-conforming " + "because length of line exceeds 255", filename); + } + } + /* Reading LF may be a special case: when it immediately + * follows a CR which terminated the preceding input line, + * we deem it to complete a CRLF terminator for the already + * collected preceding line; discard it, and restart input + * collection for the current line. + */ + } while ((lastc == '\r') && ((lastc = c) == '\n')); + + /* For each collected input line, record its actual terminator, + * substitute our preferred LF terminator... + */ + if (((lastc = c) != EOF) || (count > 0)) + buf[count++] = '\n'; + + /* ...and append the required C-string (NUL) terminator, before + * returning the actual count of input characters stored. + */ + buf[count] = '\0'; + return count; } inline void assign_registers(int llx, int lly, int urx, int ury) { llx_reg_contents = llx; @@ -6088,23 +6141,23 @@ inline void assign_registers(int llx, in void do_ps_file(FILE *fp, const char* filename) { bounding_box bb; int bb_at_end = 0; - char buf[PS_LINE_MAX]; + char buf[2 + DSC_LINE_MAX]; llx_reg_contents = lly_reg_contents = urx_reg_contents = ury_reg_contents = 0; - if (!ps_get_line(buf, fp, filename)) { + if (!ps_get_line(buf, fp, filename, DSC_LINE_MAX_ENFORCE)) { error("`%1' is empty", filename); return; } if (strncmp("%!PS-Adobe-", buf, 11) != 0) { error("`%1' is not conforming to the Document Structuring Conventions", filename); return; } - while (ps_get_line(buf, fp, filename) != 0) { + while (ps_get_line(buf, fp, filename, DSC_LINE_MAX_ENFORCE) != 0) { // in header comments, `%X' (`X' any printable character except // whitespace) is possible too if (buf[0] == '%') { if (strncmp(buf + 1, "%EndComments", 12) == 0) break; @@ -6140,11 +6193,11 @@ void do_ps_file(FILE *fp, const char* fi if (offset > 32768 || fseek(fp, -offset, 2) == -1) { last_try = 1; if (fseek(fp, 0L, 0) == -1) break; } - while (ps_get_line(buf, fp, filename) != 0) { + while (ps_get_line(buf, fp, filename, DSC_LINE_MAX_ENFORCE) != 0) { if (buf[0] == '%' && buf[1] == '%') { if (!had_trailer) { if (strncmp(buf + 2, "Trailer", 7) == 0) had_trailer = 1; }