On Fri, Jun 20, 2025 at 01:45:28AM +0200, Alejandro Colomar wrote: > On Fri, Jun 20, 2025 at 01:38:14AM +0200, Alejandro Colomar wrote: > > Hey Eric! > > > > Thanks a lot for the detailed reply! Comments below. > > > > On Thu, Jun 19, 2025 at 01:31:06PM -0500, Eric Blake wrote: > > > On Wed, Jun 18, 2025 at 09:04:02PM +0200, Alejandro Colomar wrote: > > > > Hi Rich, Elliott, > > > > > > > > On Wed, Jun 18, 2025 at 12:35:50PM -0400, Rich Felker wrote: > > > > > On Wed, Jun 18, 2025 at 11:20:54AM -0400, enh wrote: > > > > > > On Tue, Jun 17, 2025 at 5:58 PM Alejandro Colomar <a...@kernel.org> > > > > > > wrote: > > > > > > > > > > > > > > Hi Elliott, Florian, > > > > > > > > > > > > > > glibc and Bionic are non-conforming to POSIX.1-2024. The fix > > > > > > > that we're > > > > > > > proposing would make them conforming. Does conformance to > > > > > > > POSIX.1-2024 > > > > > > > mean something to you? > > > > > > > > > > > > not when POSIX screwed up and made a change that made most of the > > > > > > existing implementations non-conformant, no. that sounds like a > > > > > > POSIX > > > > > > bug to me... > > > > > > > > Not most. Only two POSIX implementations, plus Windows. And the > > > > solution is easy: fix the implementations. There have been no > > > > regression reports in gnulib since we fixed it last year. > > > > > > Speaking as someone who participated in the POSIX standardization > > > process, I'm trying to pinpoint exactly which statements of which > > > versions of which standards you are claiming as nonconformance. > > > > > > First, a disclaimer: because this thread has been very vocal, I > > > brought the topic up in today's Austin Group meeting. The members of > > > the group on the phone call remember _specifically_ trying to permit > > > existing glibc behavior (where realloc(p, 0) does NOT allocate), while > > > still jugging competing wording from the C standards, although we will > > > be the first to admit that we would not be surprised if the resulting > > > efforts are still not clear enough to be unambiguous. I mentioned in > > > the meeting that I would attempt to follow up on these threads to see > > > what, if anything, the Austin Group may need to do to assist in the > > > discussion. > > > > Thanks! I'm in Denver for the Open Source Summit and LSS. If you'll > > be around next week, we can have a chat in person, which might be more > > useful. I'd like to have a long conversation about this. > > > > > Next, my overarching question. Is this about "realloc(non_null, 0)", > > > "realloc(NULL, 0)", or both? As the two are very distinct, I want to > > > make sure we are talking about the same usage patterns. For the rest > > > of this email, I'm assuming that your complaints are solely about > > > "realloc(non_null, 0)" - if I'm wrong, it may change the analysis done > > > below. > > > > Yup, it's about realloc(non_null, 0). r(NULL,0) is fine. > > > > > Now, on to some code archeology. In today's glibc source code, I see > > > this telling comment in malloc/malloc.c, making it clear that glibc > > > folks are aware that realloc(non_null, 0) has two useful behaviors, > > > and that glibc picks the behavior that does NOT behave consistently > > > with malloc(0), because of back-compat guarantees: > > > > > > /* > > > The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, > > > 0) > > > when p is nonnull. If the macro is nonzero, the realloc call returns > > > NULL; > > > otherwise, the call returns what malloc (0) would. In either case, > > > p is freed. Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which > > > implements common historical practice. > > > > > > ISO C17 says the realloc call has implementation-defined behavior, > > > and it might not even free p. > > > */ > > > > That comment is wrong. "common historical practice" is that realloc(3) > > is consistent with malloc(3). This is true since the days of Unix V7. > > I don't know what they were referring to. Maybe the behavior introduced > > in SysVr2's -lmalloc which was later standardized in the SVID by AT&T? > > That was never common, since all existing default (-lc) realloc(3) > > implementations behaved as if realloc(p, 1). You had to use the > > -lmalloc library to get it to return NULL and free the object. > > > > See <https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>. > > > > > More reading: https://www.austingroupbugs.net/view.php?id=400 shows > > > where earlier POSIX missed that C90 to C99 changed what was permitted > > > (and apparantly in a way to render glibc's implementation > > > non-conforming), and that's part of what drove the POSIX folks to ask > > > the C standard to improve the wording. POSIX 2024 is based on C17, > > > but Nick Stoughton was regularly communicating between both C and > > > POSIX groups on what wording(s) were being floated around, in order to > > > try and make it so that glibc would not have to change behavior, but > > > at the same time trying to make it possible for applications to be > > > able to make wise runtime decisions on how to use realloc that would > > > not leak memory or risk dereferencing a NULL pointer if not careful. > > > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=12547 shows where > > > glibc has, in the past, refused to change behavior on the grounds that > > > the standards were buggy. If the standards are still buggy, the best > > > course of action is to open a bug against them. > > > > All standards since C89 have been buggy. If you are pedantic reading > > C89, the BSDs and all the historic implementations back to the original > > Unix V7 are non-conforming: > > > > <https://port70.net/~nsz/c/c89/c89-draft.html#4.10.3.4> > > > > Which says: > > > > | If size is zero and ptr is not a null pointer, the object it points to > > | is freed. > > > > It's not clear whether this means that the whole action of realloc(p,0) > > is to free(3) the pointer, or if it can also allocate a new object. > > Under the former interpretation, the standard is at odds with reality. > > Under the latter interpretation, I'd interpret it as saying that > > realloc(p,0) cannot fail (and thus must free(p)), which would be an > > interesting guarantee. I guess we'll never know what was the intended > > reading. > > > > C99 changed the specification, probably because of how ambiguous it was. > > > > glibc was also buggy, as it differed from every other Unix-like system. > > All Unix systems behaved as if free(p) and malloc(n). glibc is the only > > one that didn't follow this obvious consistency rule. > > > > So, both are bogus. > > > > > Also relevant are these documents > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf > > > > > > > > > (like i said, i care greatly about actual shipping code. a standard > > > > > > is > > > > > > interesting for green-field stuff, but when it's at odds with > > > > > > reality > > > > > > it's often worse to try to adapt than just ignore the > > > > > > stupidity/report > > > > > > the bug and get it changed back.) > > > > > > > > It's ironic that the standard should have never said that, because prior > > > > to the existence of ANSI C and POSIX, all existing systems behaved like > > > > the current POSIX specification. It was a consequence of the horrible > > > > wording of the standards, that glibc was written so badly, by following > > > > a bogus specification, when it should have been made compatible with the > > > > existing systems. > > > > > > POSIX was originally released in 1988, before C90. glibc 1.0 came out > > > in 1992. I am not sure when glibc first cared about whether trending > > > towards POSIX compliance mattered, although I do know that in the > > > early days, Ulrich would very adamently argue along the lines of > > > (paraphrased) "if the standards don't match common sense, then we > > > don't care about the standards". > > > > It seems Ulrich didn't follow that in this case. I don't know who wrote > > the original realloc(3) in glibc. Was it RMS? It would be interesting > > to know how they came up with that implementation. If anyone knows who > > wrote it and why, please CC them. > > > > I don't have a copy of POSIX.1-1988, nor of any other POSIX.1 before > > POSIX.1-2001. What do they say for realloc(3)? > > > > > > Thus, this is a historical bug in ISO C, POSIX, which at least has been > > > > finally fixed in POSIX. > > > > > > The fact that the wording has changed across multiple versions of C > > > and POSIX is indeed evidence that getting a specification that people > > > are happy with is difficult. What is harder is the decision of > > > whether the bug is in the standard (for not documenting reality) or in > > > the implementations (for not doing what the standard rightfully > > > requests), or even both. And "what people are happy with" differs on > > > who you ask - wording that permits disparate libc behavior is nicer to > > > the libraries (they don't have to change) but meaner to application > > > writers (the construct is not portable, so it is safer to avoid the > > > construct altogether rather than worry about which libraries have > > > which behaviors); whereas wording that locks down behavior is nicer to > > > applications (if I write this, it should work regardless of platform, > > > and if it doesn't, the standard exists as leverage to get libc fixed) > > > but meaner to libraries (forcing the library to version its symbols to > > > change behavior for newer standards while still providing ABI > > > stability guarantees to older apps that depend on the old behavior is > > > not cheap). > > > > As can be seen from the change in gnulib, the only possible issues from > > migrating from the current glibc behavior to the musl behavior is a few > > leaks in cases where the programmer calls realloc(p,0) ignoring the > > return value. Those leaks would leak 0 bytes plus the metadata. > > > > A solution for those leaks would be to add a diagnostic for calls to > > realloc(3) where the return value is unused. And even if those aren't > > fully solved, they're leaks of a few bytes. There's nothing that should > > cause real issues. > > > > But the glibc maintainers mentioned that they're investigating about it > > in distros, so I guess we'll eventually have the results of their > > investigation. > > > > > The sad fact of the matter is that _because_ it there are so many > > > differences in opinions, the C23 action of making realloc(p,0) > > > undefined is probably the simplest course that could be agreed on > > > (don't ever do that in your code, because you can't guarantee the > > > results), but simultaneously annoying to end users (because it is > > > undefined, rather than implementation-defined or unspecified, a > > > compiler can "optimize" your code to do WHATEVER IT WANTS - which > > > really means you CANNOT ever reliably call realloc(p,0) if your > > > compiler is aiming for C23). > > > > Indeed. I think the move from C17 to C23 was good. > > > > The issue with C17 is that it is very similar to POSIX.1-2008, but since > > ISO C doesn't require that errno is set when the pointer is not freed, > > it is impossible to portably determine if the input pointer was freed > > after realloc(p,0). This is not an issue in POSIX.1, though, since it > > can and does require that errno is set if the input pointer is not > > freed. > > Self correction: POSIX.1-2008 .. POSIX.1-2024 does allow setting errno > and freeing the input pointer, as Paul Eggert reminded. AIX does this. > This is brain damaged, and makes it also impossible to portably > determine whether the pointer was freed after realloc(p,0). > > Thus, declaring it UB in POSIX.1 would also be an improvement.
Hmm, self-correction correction: AIX sets errno to EINVAL, not ENOMEM, and thus it is still possible to portably handle this, I guess? I don't promise this works, but it could work: new = realloc(old, 0); if (new == NULL) { if (errno == ENOMEM) { free(old); goto fail; } goto fail; } free(new); > > > > > Because it was impossible to determine whether r(p,0) has freed p after > > returning NULL in C17, it was effectively UB. So, I consider C23 to be > > a minor change from C17, and one which clarifies that it is UB, because > > it already was before. > > > > POSIX.1 is not limited by this limitation of ISO C. > > > > > On top of that, the POSIX standard usually defers to a (fixed version) > > > of C, but does have the liberty to impose well-defined behavior even > > > where the corresponding C standard left things undefined (for example, > > > POSIX 2017 was able to demand that a POSIX system can cast function > > > pointers to and from void* in order to implement dlsym(), even though > > > C99 said that was undefined). Put another way, just because C23 has > > > changed realloc(p,0) to be undefined does NOT require a future version > > > of POSIX to do likewise when it finally moves to a newer C than C17. > > > But at the same time, POSIX is unlikely to make things strict if it > > > risks alienating existing implementations; if glibc changes behavior, > > > that would go a long way towards POSIX changing wording to be > > > stricter. > > > > Indeed; I think POSIX.1 doesn't need to make this undefined, and > > shouldn't. > > > > > > BTW, the same text is present in POSIX.1-2017. It was changed in a TC, > > > > following bug <https://www.austingroupbugs.net/view.php?id=400>. > > > > > > > > The motivation, from what I can read there, seems to be that C99 already > > > > made POSIX.1 non-conforming, and this fix was intended to conform to > > > > C99. > > > > > > > > Indeed, glibc is non-conforming to C99 too. Although, I don't like the > > > > wording from C99, either; it allows weird stuff: it allows an > > > > implementation where malloc(0) returns NULL and realloc(p,0) non-null > > > > (so, the opposite of glibc). > > > > > > > > C11 is essentially identical to C99 in that regard, so glibc is also > > > > non-conforming to C11. > > > > > > > > C17 changed to something very weird. It seems to me that glibc is > > > > conforming again to C17, but it also seems to me that it's impossible to > > > > write code that uses realloc(p,0) in a portable way with this > > > > specification. I think it's a good thing that C23 removed that crap. > > > > > > > > > > > > Here's a summary of conformance to standards: > > > > > > > > glibc conforms to: > > > > - SysVr4 > > > > - ISO C89 > > > > > > I don't (currently) have a copy of C89 handy in front of me to quote > > > chapter and verse for this one, beyond what was already quoted in > > > Austin Group bug 400. > > > > Here's the draft, in various formats: > > > > <https://port70.net/~nsz/c/c89/> > > > > > > - ISO C17 > > > > > > This one I _can_ quote. 7.22.3.4: > > > > > > "If size is zero and memory for the new object is not allocated, it is > > > implementation-defined whether the old object is deallocated." > > > > > > glibc documents that "realloc(non_null, 0)" deallocates non_null. > > > Therefore it is compliant. But that wording is still unfriendly to > > > users - there is no way to programmatically query the runtime what > > > behavior the implementation defined. > > > > > > - XPG4 > > > > > > > > glibc doesn't conform to: > > > > - SysIII > > > > - SysV > > > > - SysVr2 > > > > - SysVr3 > > > > - SVID Issue 2 > > > > - SVID Issue 3 > > > > - The X/Open System V Specification > > > > - ISO C99 > > > > > > Here, section 7.20.3.4 is relevant. In there, I see wording "If ptr is > > > a null pointer, the realloc function behaves like the malloc function > > > for the specified size." but NO wording about when ptr is non-null but > > > size is 0. As best I can tell, silence on the part of C99 means that > > > the standard is unspecified, and therefore glibc can do whatever it > > > wants and still claim compliance. But I'm open to correction if you > > > can quote the exact statement for why you claim glibc is non-compliant > > > here. > > > > <https://port70.net/~nsz/c/c99/n1256.html#7.20.3.4> > > > > I'll quote the entire text: > > > > Description > > > > 2 The realloc function > > deallocates the old object pointed to by ptr and > > returns a pointer to a new object that > > has the size specified by size. > > [...] // talks about the contents > > > > 3 If ptr is a null pointer, [...]. > > Otherwise, > > if ptr does not match a pointer earlier returned by > > the calloc, malloc, [...], the behavior is undefined. > > If memory for the new object cannot be allocated, > > the old object is not deallocated and its value is unchanged. > > > > Returns > > > > 4 The realloc function returns a pointer to the new object > > (which may have the same value as a pointer to the old object), > > or a null pointer if the new object could not be allocated. > > > > IMO, paragraph 4 rules out the possibility of returning a null pointer > > on success. > > > > Also, while it doesn't specify what happens in the case of size 0 > > explicitly, it mentions in paragraph 2 what happens for all sizes: > > it returns a pointer to a new object that has the size specified by > > size --which in this case is 0 bytes--. > > > > This wording of C99 was relatively good, and fixes the problems from > > C89 which had turned all historical implementations into > > non-conformance. C99 seems to restore the common historical behavior of > > realloc(3), turning glibc non-conforming as a consequence. > > > > > > - ISO C11 > > > > > > This wording appears to match C99. > > > > Agree. > > > > > > - POSIX.1-2001 > > > > > > This one defers to C89 anywhere that it is not explicitly documenting > > > with CX shading. > > > > Ahh, I had thought it would defer to C99 because it's older, but I guess > > it's like POSIX.1-2024 that doesn't defer to C23. Thanks! Then I stand > > corrected, and glibc conforms to POSIX.1-2001. > > > > > It adds CX shading to document the use of > > > errno=ENOMEM on allocation failures, but otherwise omits shading when > > > it states: > > > > > > "If size is 0 and ptr is not a null pointer, the object pointed to is > > > freed." > > > > > > which sounds like glibc behavior. But without double-checking C89, it > > > is hard to say whether POSIX accidentally diverged from C89 in > > > allowing glibc as compliant. > > > > > > > - POSIX.1-2008 > > > > > > This version of POSIX defers to C99, but still states "If size is 0 > > > and ptr is not a null pointer, the object pointed to is freed." > > > > I don't have a copy of POSIX.1-2008, but I assume the text is identical > > to POSIX.1-2001, except that it now defers to C99. Since C99 rules out > > the possibility of returning a null pointer on success (7.20.3.4p4), > > and POSIX.1-2008 doesn't seem to have shaded text to extend it, it is > > bound by the C99 restriction. The allowances provided by POSIX.1-2008 > > are invalidated as unintentional. > > > > > without CX shading, even though C99 does NOT have the same wording as > > > C89. You could argue that this statement should be ignored since it > > > lacks CX shading and does not match any statement in C99. > > > > Indeed. > > > > > But even > > > so, unless you can demonstrate chapter-and-verse how glibc fails to > > > comply with C99, you also have a hard time convincing me that glibc > > > does not comply with POSIX 2008. And this issue was why Austin Group > > > bug 400 was created. > > > > I alreayd mentioned it above, but I'll copy for completeness: > > > > n1256::7.20.3.4p4. > > > > <https://port70.net/~nsz/c/c99/n1256.html#7.20.3.4p4> > > > > The realloc function returns a pointer to the new object > > (which may have the same value as a pointer to the old object), > > or a null pointer if the new object could not be allocated. > > > > This seems to preclude the possibility of returning NULL on success. > > > > Also, this sentence is complemented by n1256::7.20.3.4p3, last sentence: > > > > If memory for the new object cannot be allocated, > > the old object is not deallocated and its value is unchanged. > > > > This sentence rules that if the implementation could consider that their > > returning a null pointer is because they decide that they can't > > allocate 0 bytes (this would be a valid interpretation), then they are > > forced to leave the pointer not deallocated. glibc frees the object, > > and thus it is not complying with this, and we must consider that glibc > > has succeeded in the allocation, which brings us back to p4. > > > > > No mention of POSIX.1-2013? > > > > I didn't have a copy of that. Thanks! I'll add it to the list of > > non-conforming standards. > > > > > But just in case you're keeping track, > > > that is the version where Bug 400 was applied, and the text changed > > > to: "If the size of the space requested is zero, the behavior shall be > > > implementation-defined: either a null pointer is returned, or the > > > behavior shall be as if the size were some non-zero value, except that > > > the returned pointer shall not be used to access an object." But it > > > also has the problem that it requires "If size is 0, either: A null > > > pointer shall be returned <CX>and errno set to an > > > implementation-defined value</CX>. ..." > > > > > > which glibc does NOT comply with. realloc(non_null,0) returns NULL > > > _without_ setting errno, precisely because it DID free the object > > > successfully. This requirement in POSIX 2013 is an explicit extension > > > not mentioned in C99, AND it was quickly pointed out that it forbids > > > glibc behavior, so: > > > > > > > - POSIX.1-2017 > > > > > > This one additionally applies Bug 526 and 688, to try and clean up > > > wording differences from C99, in particular clarifying whether errno > > > has to be set when "realloc(non_null, 0)" frees a pointer: > > > > > > https://www.austingroupbugs.net/view.php?id=526 > > > https://www.austingroupbugs.net/view.php?id=688 > > > > > > where the wording is once again relaxed to "If the size of the space > > > requested is zero, the behavior shall be implementation-defined: > > > either a null pointer is returned, or the behavior shall be as if the > > > size were some non-zero value, except that the behavior is undefined > > > if the returned pointer is used to access an object. ... If size is 0, > > > either: A null pointer shall be returned <CX>and, if ptr is not a null > > > pointer, errno shall be set to an implementation-defined value</CX>." > > > > > > which should once again allow glibc to be deemed compliant. At the > > > same time, the Austin Group was trying to get C17 fixed; that fix > > > turned out to be ugly, so the C committed tried again in C23. > > > > The last sentence clearly states that if size is 0 and ptr is non-null > > and a null pointer is returned, then *errno shall be set*. glibc > > doesn't set errno, and thus does not conform. Can you please clarify > > how you consider glibc's behavior to comply with that last sentence from > > your quote? For completeness, the sentence I'm talking about is > > > > If size is 0, > > either: A null pointer shall be returned <CX>and, if ptr is not a null > > pointer, errno shall be set to an implementation-defined value</CX>." > > > > > > - POSIX.1-2024 > > > > > > Here, the standard defers to C17 rather than C99, but adds a lot more > > > CX shading. Given the changes between C99 and C17, POSIX tried to > > > match. Unfortunately, the DESCRIPTION section lost any mention of > > > non_null pointer plus 0 size, leaving only the RETURN VALUE secion, > > > which now uses the wording entirely in CX shading: > > > > > > "If size is 0, or either nelem or elsize is 0, either: • A null > > > pointer shall be returned and, if ptr is not a null pointer, errno > > > shall be set to [EINVAL]. • A pointer to the allocated space shall be > > > returned, and the memory object pointed to by ptr shall be freed. The > > > application shall ensure that the pointer is not used to access an > > > object." > > > > > > Despite the efforts of the Austin Group to not break back-compat, that > > > one clearly looks like glibc is not compliant (glibc returns NULL and > > > does NOT set errno to EINVAL). And if I recall the conversations, we > > > knew at the time of POSIX 2024 that C23 would be marking > > > realloc(non_null, 0) as undefined behavior, and wanted to capture that > > > directly rather than depending on C17, but we may have failed in our > > > efforts. > > > > TBH, I think POSIX.1-2024 has a decent specification. I'd prefer the > > one from C99 and C11, but it is decent. > > > > > > Conformance to POSIX.1-2001 and POSIX.1-2008 is not clear. While glibc > > > > conforms to the wording of these standards, these standards have the > > > > following header in the realloc(3) specification: > > > > > > > > The functionality described on this reference page is aligned > > > > with the ISO C standard. Any conflict between the requirements > > > > described here and the ISO C standard is unintentional. This > > > > volume of IEEE Std 1003.1-2001 defers to the ISO C standard. > > > > > > > > Which means that POSIX's permissive wording is unintentional, and the > > > > ISO C99 wording is the one that matters, so glibc is non-conforming. > > > > > > The conflicts are unintentional only when <CX> shading is not > > > explicitly present. > > > > And POSIX.1-2001 .. POSIX.1-2008 doesn't have any CX shading. > > > > > > (I didn't mention C23, since it's UB, so anything conforms.) > > > > > > > > > > > > Have a lovely day! > > > > Alex > > > > > > > > -- > > > > <https://www.alejandro-colomar.es/> > > > > > > If you've managed to make it this far, congratulations. We probably > > > still need to open bugs against POSIX to have POSIX-2024-TC1 improve > > > any ambiguous wording, and taking into account whatever the C > > > committee may decide to do with Alejandro's proposals for post-C23 > > > behaviors, and whether glibc is willing to make realloc(non_null, 0) > > > allocate in the same manner as malloc(0) rather than being a hidden > > > call to free(). > > > > As always, I have trouble with using the Austin group interface. If > > you're in Denver, this is another thing you could help me with. :) > > > > > I don't know if I answered all of your questions, or raised even more, > > > but you have your work cut out for you before declaring the man pages > > > good enough. > > > > Again, thanks a lot!! > > > > > > Have a lovely day! > > Alex > > > > -- > > <https://www.alejandro-colomar.es/> > > > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/>
signature.asc
Description: PGP signature