[Sorry, Ralph--it's another doorstop of a message. If it's any consolation, I tried to ensure that all 10 external references were authoritative.]
Hi, Ingo! At 2021-11-08T01:28:19+0100, Ingo Schwarze wrote: > Not so for me; i learned C with a Microsoft Visual Studio compiler > in the late 1980ies. Even though i was coming from the > https://en.wikipedia.org/wiki/HP_9800_series#HPL > programming language, which is not exactly the most beautiful language > either, the experience with the Microsoft Visual Studio C compiler was > frustrating to the point that after that, i avoided the C language for > about a decade and used FORTRAN 77 almost exclusively until the end of > the 1990ies, In my first university programming class we spent about 6 weeks on FORTRAN 77 and 10 weeks on C. I _hated_ FORTRAN's (as it was cased then, before Fortran 90 came along and brought many appurtenances of civilization) fixed-form source and the clumsy I/O handling, but otherwise found the language unobjectionable. My project teammates were all electrical engineers and they were glad that I, an aerospace major, was happy to lead on the Fortran programming project that semester. My first impression of C was also mixed; I saw a lot to like but was alarmed by how easy it was to blow your foot off, and highly irritated by the all but useless compiler diagnostics. It's only relatively recently that I learned[1] that, historically, Unix people have endured terrible diagnostics from compilers of many different sorts of languages because the combination of lex and yacc makes it difficult (I do not say impossible) to produce good ones. When you write a recursive descent parser, you the programmer have a lot of information about the parser's state at any given line of your code, so you can write highly contextual diagnostic messages that are helpful to the user. GNU troff's own input parser is an excellent example of this; _almost_ all input parsing is in input.cpp, but the expression parser is over in number.cpp and when its functions get called, much context is thrown away, so our diagnostic messages are markedly worse in that area. > I very much deplore now that i never used the opportunity to try a > real C compiler when i worked extensively on HP-200 workstations > in the mid-1980ies... :-/ Not long ago I deliberately wrote some bad C for the Version 7 Ritchie compiler under SIMH. I was horrified. You seem to have a meticulous predisposition--coping with the Wild West of pre-ANSI C compilers might have maddened you. From my light sampling I'm pretty sure it would have maddened me. This is the compiler that invented terms that no one had ever heard of before to tell the user what was wrong. (Nowadays, we take "lvalue" almost for granted.) [using strdup() by instinct] > That was a *very* reasonable thing to do; be proud of it rather > than ashamed! Thanks! > [...] > > At 2021-11-07T14:57:28+0100, Ingo Schwarze wrote: > > >> 2. strsave() contains a highly dangerous idiom. When called with > >> a NULL pointer, it does *NOT* crash (as it should!) but instead > >> returns to the caller. > > > I'm not quite with you here. That's what strdup() and malloc() > > themselves do. > > I think you misunderstood what i tried to say. If you call > strdup(NULL), the result is unspecified by POSIX. Implementations > usually segfault, and that is what i think they should indeed do. In > any case, calling strdup(NULL) is a severe bug in any program. Agreed. > If the intention of strsave() was to replace strdup(), then > the correct implementation would have been > > size_t sz = strlen(s) + 1; > char *p = malloc(sz); > if (p != NULL) > memcpy(p, s, sz); > return p: I've committed a similar change[2]. > That does not make any sense to me. Calling strdup(NULL) is a > programming bug. You must not attempt to write code to handle > programming bugs. The purpose of error handling in code is to handle > run-time failures (like unsuitable input data or memory exhaustion), > not to handle programming bugs. Yes...you might have overlooked that in my addition of strdup() to font.cpp, its argument had already been checked for nullity since it was returned by strtok()[3] and had to be validated for that reason. > Not really. Proper error handling in well-designed application code > absolutely does not need C++ exceptions. Hmm. Now that I've switched off groff's own allocator by default[4], I'm going to _have_ to handle the `Bad_alloc` exceptions that the language's `new` operator throws. Stroustrup bangs the exception drum pretty hard in Chapter 14 of the special edition of _The C++ Programming Language_, and while my own instincts are fairly aligned with yours, exceptions are an idiomatic C++ feature of long pedigree, and the language runtime assumes you're either going to handle them or be content to have your program SIGABRT. > It is true that POSIX allows printf(3) to fail with ENOMEM, but unless > you use %ls, that's a theoretical issue. I don't think real-world > implementations allocate memory if you refrain from relying on wide > character to multibyte character conversions. Why should they? I don't trust implementors of *printf() on random systems, and moreover, this deep in the groff diagnostic handling function stack, we don't really need anything that *printf() is offering us. While I'm throwing out heretical opinions, I'll venture that K&R erred when they taught the whole world their Hello, World program. It should have called `puts()` instead. Having promoted the language in fulsome terms for its smallness and terseness, they advised every novice to the language to reach for an unnecessarily big tool on their first day. (If this was done to direct comparison shoppers' attention to C's contrast with FORTRAN 77 in the area of formatted output, it was a shrewd move. But I lament the cost.) Ralph pointed out that *puts() implementors might pointlessly allocate memory, too, and that's fair, up to a point--if you have anything less than complete trust in a library, you'll have to do things yourself. But in my opinion *puts() has no _excuse_ to ever allocate memory except from the stack. So I think it's a good discipline to avoid *printf() in contexts where we might be bailing out of the program after a memory allocation failure. Maybe I'm too scarred by writing signal handlers... > > Right away I see that `do_error_with_file_and_line`[...], > > I think printing the names of source code files, the names of > functions in the source code, or line numbers of call sites in source > code files is bad practice. Such information is totally irrelevant > for normal users, it is confusing and distracting, and it just makes > the program look as if it were written for nerds rather than for > users. It does not even help developers. If you are hunting bugs, > sometimes the functions to look at are obvious anyway if you know the > code base in question, and in other cases, you usually need a debugger > anyway. I think you misunderstand. The "file" and "line" here are not of the groff source program, but of the input file being processed. That's why they're passed in as arguments[10]. (Even `source_filename` is not what you might think; it is used in diagnostics produced by some output drivers to map the formatted output back to the document that was used to produce it.) In my view it can indeed be helpful to know where in the user-supplied input you were when memory was exhausted, because that input might be to blame. While testing this stuff I amused myself with scenarios like this. $ mkdir -p build/font/devhogg && printf 'res 240\npapersize A' \ >| build/font/devhogg/DESC && dd if=/dev/zero bs=1M count=4 \ | tr '\000' 'B' >> build/font/devhogg/DESC And then turned "groff -Thogg" loose in a shell where "ulimit -v" had been used to force a OOM condition. (I am a little appalled that groff requires 16MB just to get started.) > > and with a few slight changes, can be invoked without itself, > > I don't think i understand what you mean by "without itself". Sorry, that was a botch. I was trying to say: Right away I see that `do_error_with_file_and_line`[...], if called carefully enough through the layers of diagnostic functions above it, and with a few slight changes, can be invoked without itself dynamically allocating memory, nor letting the functions it calls (like `errprint()`) make recourse to dynamic memory allocation. I've since pushed a change to try to achieve this[5], though in fairness to Ralph, we remain at the mercy of a libc whose fputs or fputc mallocs a meg of memory. > I don't see yet what in errprint() might allocate memory. Of course, > src/libs/libgroff/itoa.c is a shitshow (even if it is correct, which i > did not check). Just using printf("%i%u") would be so much clearer, > cleaner, and reduce the risk of bugs. It _seems_ to be all right. I'd be thrilled if someone wanted to write a unit test for it. Of course I made a typo in the commit message. The document, now ratified by WG14[7], that will add strdup to C23, is N2353[8]. > > I see a few tasks arising from your mail. > > > > 1. Add an Autoconf check for strdup(). > > No, please don't. Just use strdup(3), period. Seriously, POSIX > has been requiring it for almost 25 years now. Sorry, already done--it was easy and low impact[6]. > > 2. Revert the commit, leaving only 125 strsave() call sites. > > Heh. :-) > > But do fix the bugs in the code touched by that commit. ;-) I believe I have now[9]. > > 3. Revise do_error_with_file_and_line() to not use *printf(). > > Why? It does not use %ls. So what exactly is the problem? > > You don't really fear that > > fprintf(stderr, "%s:", FOO); > > might somehow end up in malloc(3), or do you? > > Of course, using fputs(3) would be fine, too. Yes, I have that fear, so that's what I did--discussed above. > > 4. Test return value from strdup(); call fatal() if it's a null > > pointer. > > Right. > The right idiom probably is: > fatal("%1", strerror(errno)); > unless you wan to clean up all that overengineered mess in and around > the file src/libs/libgroff/error.cpp and use standard facilities > instead. > > Note that > fatal("strdup: %1", strerror(errno)); > would be pointless. I have no intention of exposing the name of the library function that failed in a user-facing diagnostic. It unnecessary ties our hands as implementors, for one thing. For another, many users will either not understand or not care what we mean. I followed my usual practice of writing a uniquely worded, human-readable diagnostic message which suffices all by itself to locate the line of groff source that threw it (provided you also know the version of groff in use, but that goes just as well for the "srcfile:lineno" message variety you dislike). However, right now these helpful diagnostics will _not be reached_ unless an invariant (don't pass strdup() a null pointer) is already violated. The only failure mode for strdup is allocation failure and since we don't have an exception handler, the C++ runtime will signal itself and kill us first. This returns us to the dilemma I described at length above. We can either: * Use a nonstandard allocator; or * Handle exceptions. Exceptions are so deeply embedded into ISO C++98, and I suppose _all_ C++ compilers that will ever touch this version of groff, that there is no third choice. Does someone have a better imagination? > It is completely irrelevant for the user whether > it was malloc(3) or strdup(3) or whatever other allocating function > that failed. I agree with this part. > All that matters is "Cannot allocate memory". I reiterate that for an input processor like groff, knowing where you were in the input when things went pear-shaped _can_ be helpful. Admittedly, in old-fashioned scenarios like a time-sharing machine, you might suffer OOM errors through no fault of your own. Regards, Branden [1] https://minnie.tuhs.org/pipermail/tuhs/2021-July/023937.html [2] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=39745aee1c07cf0fb0b21cf0bacfc9946fd9e9f0 [3] https://git.savannah.gnu.org/cgit/groff.git/commit/src/libs/libgroff/font.cpp?id=ee0942bdecdef59b202d84dc2a754d04288c9abd [4] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=5151526cfd1aec0222ec2b03ddc725ac9eb803d0 [5] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=9a03816123eff2f840db444243e5ba0acd0995b4 [6] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=22a27624ecd9799497104ba1b0856a7264e338b8 [7] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf [8] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2353.htm [9] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=f6671243d873b147947125970895f34ca5201cb5 [10] https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/error.cpp#n33
signature.asc
Description: PGP signature