Re: Unsolicited opinions on C (was: [PATCH v2 0/3] Use countof() instead of its pattern)

Alejandro Colomar Thu, 18 Sep 2025 17:56:51 -0700

Hi Branden,

On Thu, Sep 18, 2025 at 06:44:02PM -0500, G. Branden Robinson wrote:
> Hi Alex,
> 
> Letting off some steam here after a going an exhausting ten rounds with
> "asciification" in groff, a process that has consumed the month to date.
> 
> At 2025-09-19T00:24:23+0200, Alejandro Colomar wrote:
> > On Thu, Sep 18, 2025 at 04:32:34PM -0500, G. Branden Robinson wrote:
> > > > When 202601 is out, you'll get streq() and memeq(), and I'll send
> > > > patches for them.  :)
> > > 
> > > Looking forward to that--those should have been in libc in the
> > > 1980s!
> > 
> > Heh!  There's still people in the C Committee that doesn't like them.
> > Some false purists thinks that only system calls and other magic
> > functions should be in the standard,
> 
> That seems to me an odd stance for the C committee itself to take, given
> that the standard doesn't provide abstractions for operating system
> services except in an extremely minimal sense; something similar to what
> MS-DOS 1.0 offered (a file system, but one with no hierarchy, just a big
> flat file store with no directories and no file types except "regular",
> except you could open those in "text" or "binary" "modes").


Well, committee members are not required to be self-consistent.  And
most often, they are not.

> > and that convenience functions should go in external libraries (that
> > would exclude every string.h function from libc except for memset(3)

I wanted to say memcpy(3), actually.

> > because of its magic aliasing properties; insane, IMO).
> 
> I think a stronger argument for standardizing memset(3) and memcpy(3) is
> that in early days, the C language itself provided _no_ facility for
> copying anything that wasn't a primitive type.

Technically, we've always had loops and unsigned char.  All of the
string library is is a glorified set of wrappers around loops and chars.

I love those wrappers, but find it funny that other people seem to
prefer to see those in external libraries.  Even for implementing libc
you _want_ those, so it's quite cheap to actually provide them to users.

In fact, if gnulib accepted so easily the additions of streq() and
memeq(), it's because they improved the gnulib source code itself.  They
considered them worth it even before having external users.  I wonder if
those voting against having useful functions in <string.h> have ever
tried implementing a libc themselves.

>  If you wanted to copy a
> struct, you had to do it field by field.  I think it was ANSI C that
> made structure copying (by assignment of rvalues to lvalues, both of the
> same struct type) part of the language proper.  A while back Doug and I
> had an exchange where we mused that prior to this, everything you could
> do with a statement (apart from a function call, of course), mapped to a
> bounded and small set of machine instructions in pretty much any ISA.
> That's a nice property for "racing the beam"-style programming and other
> hard real-time problems, but not as much use for general applications.
> 
> That conversation reminded me of the Intel 8080, which had no hardware
> multiplier and no block-memory move/copy instructions.  Everything you
> could do on that machine you could reliably cycle-count prior to
> assembly.  But the Z80, which still had no multiplier but _did_ have
> instructions that could walk up to the entire 64KB address space, fuzzed
> that line a little bit.  (It was still deterministic if the range given
> to instructions like LDIR was what we today call a "constexpr".)
> 
> With multiplication (and of course division), you don't know how many
> cycles you're going to need, and many years later (or maybe right away
> at NSA, FSK, and MSS) people figured how to use such indeterminacy in
> speed of instruction retirement to exfiltrate secrets.
> 
> Anyway, the Z80 started to eat Intel's lunch.  That made them very
> angry, so they hurried the 8086 to market to punish the entire world,
> at which they've succeeded brilliantly for decades.  How dare the free
> market not lavish one company exclusively with rewards?
> 
> > Some others just think libc functions should have some complexity;
> > adding simple wrappers seemingly doesn't make them feel proud of
> > inventing useless crap; it doesn't make them look smart.  The
> > committee is really something out of a comedy.  And there are others.
> > I could tell stories...
> 
> Why not both?  Why not offer useful primitives _and_ APIs that hide
> complexity in favor of making commonly undertaken operations
> straightforward to perform?

The funny thing is, I'm proposing both.

I have proposed things such as aprintf() --which is essentially Plan9's
smprint(2) but renamed--:
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3630.txt>

I have also proposed functions for checking prefix and suffix:
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3612.txt>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3613.txt>

And I have proposed streq() and memeq():
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3611.txt>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3617.txt>

(among several others).

And aprintf() has high chances of being approved; the prefix and suffix
ones have slightly less chances, but still some; and streq() have strong
opposition (votes in the last meeting were:

        10 Yes; 16 No; 7 Abstentions

Can you imagine a world where libc provides a function to check if some
string is a prefix of another string, but doesn't provide a function for
comparing for equality?  Because it's too simple to be worth committee
time?

> I am reminded of the tired old argument between those who advocate an
> argumentless cat(1) and those who don't.  I think that's really an
> argument between people who want command-line tools that go straight to
> system calls and exercise the kernel with few confounding factors, and
> those who, ya know, actually want to use cat(1) to _do_ something, like
> stitch files together or look at their contents.
> 
> And we _should_ let people have thin wrappers around kernel services if
> they want them.  That helps everybody understand what those services
> are, advertises what they do and don't provide, and eases evaluation of
> the kernel's interface design and performance.
> 
> Both systems programmers and application developers are real people.  If
> you want your language used by both, you must serve the needs of both.
> 
> I think a similar question is at the root of our mild disagreement over
> the respective merits of memset(3) and bzero(3).  I think the former is
> a proper thing to have; it's a nigh-essential service for a language
> runtime to offer.  But you're right that most people developing
> applications want memory cleared to zeros several nines of the time.
> 
> > Ironically, they added memccpy(3) in C23, and it has 0 users in the
> > real world.  That one was probably introduced because it made the
> > committee look smart, because they arrived first at discovering a
> > function they thought useful (hint: it's not).  Too bad that
> > memccpy(3) is as dangerous as strncpy(3).
> 
> It doesn't seem stupid to me; it's a _generalization_ of strncpy().

It is not; it's a different thing.  The problem with memccpy(3) is not
about the kind of input or output, it's about having off-by-one bugs in
its design.

A correct generalized design for memccpy(3), would have been... let's
call it mempcpyc().  I'll describe the semantics, and how it's a better
design:

        mempcpyc(dst, src, c, n):
                Copy bytes from src to dst, until either c is found in
                src (and it is not copied), or n bytes have been copied.
                If c is not found, a null pointer is returned.  If it is
                found, a pointer to one after the last copied byte is
                returned.

The main difference with memccpy(3) is not copying the delimiter.  This
is crucial, because in many of the very few legitimate users of this
API, that byte is not wanted.  With memccpy(3), that forces the user to
manually remove it, being prone to errors.

Compare:

A)  Don't want to copy the delimiter.

A.1)    p = memccpy(dst, src, ':', countof(src));
        if (p == NULL)
                goto fail;
        p[-1] = '\0';

        (you can see this in FreeBSD[1])

A.2)    p = mempcpyc(dst, src, ':', countof(src));
        if (p == NULL)
                goto fail;
        *p = '\0';

Every time you write a -1, there are chances that you'll make a mistake.

B)  Want to copy the delimiter.

B.1)    p = memccpy(dst, src, ':', countof(src));
        if (p == NULL)
                goto fail;
        *p = '\0';

B.2)    p = mempcpyc(dst, src, ':', countof(src));
        if (p == NULL)
                goto fail;
        p = stpcpy(p, ":");

In case B, both are similarly okay.  In case A, mempcpyc() is clearly
better.

If you try to implement a string-copying function with memccpy(3),
you'll find there are several places where you have chances of making an
off-by-one bug.

>  Who
> says all memory buffers look like C strings?

I don't.  But the committee sold memccpy(3) as a function for copying
C strings safely.  memccpy(3) is probably the least safe function for
copying strings.  If they hadn't promoted it as such, it would be fine.
But the committee didn't add it for its uses as a niche memory-copying
function.

But if you want this for copying strings, if would be cheap to make sure
the function writes the null terminator afterwards, and also to make it
read a C string.  We could call that higher level API stpcpyc().

A.3)    p = stpcpyc(dst, src, ':');
        if (p == NULL)
                goto fail;

B.3)    p = stpcpyc(dst, src, ':');
        if (p == NULL)
                goto fail;
        p = stpcpy(dst, ":");

>  groff's own under-
> documented distinction between these--groff's "strings" are really
> arbitrary memory buffers that can contain interior nulls, and its
> "symbol" type a pretty close match to a C string--has led me to
> appreciate the virtues of making strong and clear contrasts here.
> 
> > Will they ever realize it has no users and that they promoted a
> > function that is unsafe and now starts being used by innocent
> > programmers?
> 
> Better, I think, would be to come up with a label or name for these
> "primitives", and segregate their header files and, insofar as is
> practical, their symbol names in the function name space, which
> resembles the MS-DOS 1.0 file store.

I have a proposal for this (or similar):
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3671.txt>

It doesn't add new headers, because that's realistically not going to
happen.  But it separated the specification of <string.h> into three
subclauses: one for str*(), one for mem*(), and one for strn*(); which
correspond to C strings, bytes, and miscellaneous.

A large part of the committee doesn't see the value in that.  They would
rather remove the entire string library, and replace it by something
along the lines of Annex K.  (Remember Annex K is still in the standard,
and the committee strongly opposes its removal.)

> > Probably not; that's a problem for the next generation of committee
> > members; they'll retire before the fallout.
> 
> Like physics, I guess it progresses one funeral at a time!
> 
> > > wonder by how many orders of magnitude string (in)equality
> > > comparisons exceed string collation order comparisons.
> > 
> > I have numbers in my laptop.  I developed a patch for glibc adding
> > these APIs and then replacing every possible use within glibc itself.
> > When I use my laptop tomorrow, I can check the remaining strcmp()
> > calls compared to streq().  I remember having looked at the ratio, but
> > don't remember the numbers.  I think it was in the hundreds of
> > equality calls per each sorting call.
> 
> If you'd asked to me to bet, I'd have wagered at least 2 orders of
> magnitude, yeah.  You probably could have bluffed me into 3.  ;-)

I think it's around 2.5.  3 would still be a fair guess.

> > Well; even during the initial period, the unfamiliarity isn't worse
> > than inventing your own name.  After all, you need to invent a name.
> > :)
> 
> Yes.  It's just that groff is over that hill now.
> 
> > > I don't disagree with the migration; it just seems like an
> > > "eventually" thing to me.
> > 
> > If you have some window of time where you'd apply it, I can have the
> > patch ready for that window.  I guess once you decide to apply it it's
> > a matter of running git-am(1), and forgetting about it.  It should be
> > a moment when your local queue of patches is small, to reduce your
> > rebasing work.  But being a trivial (yet large) patch, it's not
> > something I see very problematic.
> 
> Right, and if another committer wants to shepherd the change through, I
> won't put any stop energy on it, except...

If I had the keys, I'd happily do it.  :-)
(I think prefer not having the keys; too much responsibility.)

> > The major blocker is bumping gnulib; just let me know when you'll do
> > that.
> 
> ...for that, which is a kick I'd prefer to execute in a release
> management capacity.  But I reckon right after a kick to the 2025-07
> gnulib tag, or right after the 1.24 release are both good times.

Fine; just let me know when you bump it, and I'll rebase my patch.

(And if it's so late that gnulib 202601 is out by then, I'll send some
extra patches.)

> > > Cool!  Ritchie's rolling over in his grave to see C approaching full
> > > language support for container iterators like this.  :P
> > 
> > Actually, I think this is something that was originally devised by
> > K&R, and I'm just filling the gaps.  I can't see another reason they
> > allowed using array notation in parameters, if they didn't want them
> > to behave like that.
> 
> I read recently that a classic old bit of weirdness/cleverness that has
> been widely, but perversely celebrated in C, namely the synonymy of
> `a[5]` and `5[a]`, is slated for the chopping block.  I think I saw
> something about it in a recent GCC commit--something about the rules for
> array decay changing.

Yes, the committee is finally killing that.  One of the few good things
it does.

> The grognards are going to void their bowels about that one.  The
> synonymy doesn't _mean_ anything--all it is is a reflection of the
> symmetry (or maybe commutativity is a better word) of assembly language
> expressions in ISAs that support indexed addressing modes (which is
> every machine I've personally encountered).
> 
> There's no deep meaning to the synonymy, it's confusing to learners, and
> it offers yet another vector for the construction of obfuscated code.
> 
> I have little time for people who boast about the virtues of programming
> in assembly ("portable" or otherwise) while seeming to actually do
> precious little of it.

Cheers,
Alex

[1]

        This is the very single one legitimate call to memccpy(3) I've
        every seen in the wild.  Everything else was in tests, or in
        code written after C23 standardized it (and that code is often
        just misusing it for a poor-man's string-copying function that
        truncates).

        $ grep -rh -B3 -A15 memccpy bin/sh/parser.c 
                        if (fmt[0] != '}') {
                                char *end;

                                end = memccpy(tfmt, fmt, '}', sizeof(tfmt));
                                if (end == NULL) {
                                        /*
                                         * Format too long or no '}', so
                                         * ignore "\D{" altogether.
                                         * The loop will do i++, but nothing
                                         * was written to ps, so do i-- here.
                                         * Rewind fmt for similar reason.
                                         */
                                        i--;
                                        fmt--;
                                        break;
                                }
                                *--end = '\0'; /* Ignore the copy of '}'. */
                                fmt += end - tfmt;
                        }

        As you can see, the fact that it copies the delimiter is
        design mistake.


-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

signature.asc
Description: PGP signature

Re: Unsolicited opinions on C (was: [PATCH v2 0/3] Use countof() instead of its pattern)

Reply via email to