Hi Branden, G. Branden Robinson wrote on Sun, Nov 07, 2021 at 03:06:05AM -0500:
> gbranden pushed a commit to branch master > in repository groff. > > commit 5b7fee5d6392edf90dc1f0fa7d013f36fea5964c > Author: G. Branden Robinson <g.branden.robin...@gmail.com> > AuthorDate: Sun Nov 7 09:40:14 2021 +1100 > > [libgroff,pic]: Use `strsave()`, not `strdup()`. That's a bad idea. The opposite should be done, see below for multiple reasons. > * src/libs/libgroff/font.cpp (font::load_desc): Do it. Also emit > shorter diagnostic if `strsave()` returned a null pointer. > * src/preproc/pic/troff.cpp (troff_output::set_location): Do it. > > groff has for 30+ years used its own alternative to strdup() called > strsave(). The reason for this is not clear to me $ man strdup [...] HISTORY A strdup() macro was first used in the 4.1cBSD debugger, dbx. It was rewritten as a C function for the 4.3BSD inetd(8) and first appeared in the C library of 4.3BSD-Reno. So even though strdup(3) was invented for BSD in 1982, it wasn't available in the C library (even in BSD) before 1990. Other systems may have adopted it even later. I dimly remember having heard that some implementations were buggy in the early 1990ies (hard to believe given how simple the function is - but see below...), and a quick web search actually turned up a hilarious, pre-historic bug report on very ancient 32bit Microsoft Windows: https://fltk-bugs.easysw.narkive.com/ se0lMtx0/fltk-bugs-high-str-1045-use-of-strdup-highly-dangerous When James Clark started the groff project in 1989, he definitely couldn't rely on strdup(3) yet. But it is needless to say that all the above concerns have become irrelecant more than a decade ago, or maybe even two decades ago. POSIX (SUSv2/XPG5) requires strdup(3) since at least 1997. > but the code has been consistent. This change fixes `strdup()`s > I introduced in ee0942bd (2 November) and 423e3c0b (28 September), > respectively. That's definitely making matters worse rather than better: 1. strsave() contains a severe bug. If malloc(3) fails, the function segfaults. So every time you call strsave(), that's a place where groff(1) may crash in a non-deterministic manner. 2. strsave() contains a highly dangerous idiom. When called with a NULL pointer, it does *NOT* crash (as it should!) but instead returns to the caller. That is very dangerous because it can hide bugs in the calling code. Even though the caller is obviously in an invalid state handling invalid pointers, it will prod on doing god knows what. Executing more code in an uncontrolled manner after it is already known that something went wrong is never a good idea. 3. Using your own replacement functions for Standard C and/or POSIX functions is a bad idea because: 3.1. It makes the code harder to read and audit for experienced programmers. Instead of being able to use their knowledge of standards, they have to figure out what your private functions do and how they are meant to be used, slowing down the process of understanding the code and making it more likely that auditors miss bugs when inspecting the code and that maintainers introduce new bugs when trying to change and improve the code. 3.2. Apart from the above observations that rolling your own breeds bugs in the *calling* code, again and again, i have seen that people trying to re-invent standard functionality do so badly und have both design errors and implementation errors in their home-grown stuff. If you doubt my words, see items 1 and 2 above... So, i think the groff codebase should be audited for all uses of strsave() - because every one of them is a bug site. The auditor should understand what every call does, replace it with strdup(3), and test the code afterwards. This needs to be done with the brain switched on and *NOT* with Coccinelle and not in in some other mechanical way because it is likely that many call sites are buggy, too, and that different changes are needed at different sites - for example, some sites will likely need a NULL check while others will not. See below for two examples of why this cannot be done mechanically because the calling code is likely just as wrong as the function strsave() itself. > --- > ChangeLog | 8 ++++++++ > src/libs/libgroff/font.cpp | 11 ++++++++--- > src/preproc/pic/troff.cpp | 2 +- > 3 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/ChangeLog b/ChangeLog > index 9758a40..f349e1e 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,5 +1,13 @@ > 2021-11-07 G. Branden Robinson <g.branden.robin...@gmail.com> > > + [libgroff,pic]: Use `strsave()`, not `strdup()`. > + > + * src/libs/libgroff/font.cpp (font::load_desc): Do it. Also > + emit shorter diagnostic if `strsave()` returned a null pointer. > + * src/preproc/pic/troff.cpp (troff_output::set_location): Do it. > + > +2021-11-07 G. Branden Robinson <g.branden.robin...@gmail.com> > + > * src/libs/libgroff/fontfile.cpp (font::open_file): Don't open > user-specified font file names with slashes in them; i.e., don't > traverse directories outside the configured font path. Also > diff --git a/src/libs/libgroff/font.cpp b/src/libs/libgroff/font.cpp > index 2168afa..b7960b7 100644 > --- a/src/libs/libgroff/font.cpp > +++ b/src/libs/libgroff/font.cpp > @@ -1144,7 +1144,7 @@ bool font::load_desc() if (0 == p) { t.error("'papersize' directive requires an argument"); > return false; > } So at this point we know that p is not a NULL pointer. > bool found_paper = false; > - char *savedp = strdup(p); As long as strdup was used, at this point, savedp could be NULL if malloc(3) failed. > + char *savedp = strsave(p); Now that strsave() is used, the logic changed - and i doubt you were aware of that when changing the code. Now savedp can only be NULL if p is NULL, but we know p is not NULL (or else, calling strdup(3) would have been a bug in the first place), so savedp can no longer be NULL. Unless you go fix the bug in strsave(), in which case savedp will be NULL again after malloc(3) failure. See how home- grown stuff complicates things for no benefit whatsoever? > while (p) { > double unscaled_paperwidth, unscaled_paperlength; > if (scan_papersize(p, &papersize, &unscaled_paperlength, > @@ -1157,8 +1157,13 @@ bool font::load_desc() > p = strtok(0, WS); > } > if (!found_paper) { > - t.error("unable to determine a paper format from '%1'", savedp); > - free(savedp); > + if (0 == savedp) > + t.error("unable to determine a paper format"); This is wrong for two reasons. First, savedp cannot be NULL, see above. Second, even if it could, the error message is extremely misleading. The actual problem is memory exhaustion, and you go and tell the user "unable to determine a paper format"? You are sending that poor guy on a wild goose chase! Read on, another bug is visible in this patch... > + else { > + t.error("unable to determine a paper format from '%1'", > + savedp); > + free(savedp); > + } > return false; > } > free(savedp); > diff --git a/src/preproc/pic/troff.cpp b/src/preproc/pic/troff.cpp > index 3ccd681..810067a 100644 > --- a/src/preproc/pic/troff.cpp > +++ b/src/preproc/pic/troff.cpp > @@ -560,7 +560,7 @@ void troff_output::set_location(const char *s, int n) > printf(".lf %d\n", n); > else { > printf(".lf %d %s\n", n, s); > - last_filename = strdup(s); > + last_filename = strsave(s); Error checking is obviously missing here. As long as strdup(3) was used, the error was totally obvious, and no auditor could possibly have missed it even when half asleep: unchecked malloc(3). With strsave(3), the error is much less obvious, and the consequences of the bug changed: before, the program prodded on with invalid data, having stored a NULL pointer in last_filename. Now, it segfaults right away. But the bug survived, it just changed colour. I think i said so before: the groff codebase is teeming with bugs. Look at anything, and it is almost certain that you will instantly find some obvious bugs. Unfortunately, i don't have the time to do the badly needed audit myself. Yours, Ingo > } > }