Hi!

Here's a new revision addressing the suggestions by Eric in v4.  I've
added a new subsection in the Rationale explaining why not go the other
way around, as people keep suggesting that every now and then.

I'll submit this version to the C Committee today, since there seems to
be more consensus now, and the recent iterations have seen only minor
wording improvements, but no major changes.  Of course, we can continue
improving the paper, though.  Please suggest any improvements you may
consider appropriate.

It would be good to have explicit replies by glibc maintainers about it,
so that the C Committee understands better what the maintainers think
about it.  I've got word from some committee members that if I can
convince the maintainers, they'll vote for standardizing it.  So, it
would be great it people could emit 'Acked-by:' tags, or otherwise
explain their position.


Have a lovely day!
Alex

---
Name
        alx-0029r5 - Restore the traditional realloc(3) specification

Principles
        -  Uphold the character of the language
        -  Keep the language small and simple
        -  Facilitate portability
        -  Avoid ambiguities
        -  Pay attention to performance
        -  Codify existing practice to address evident deficiencies.
        -  Do not prefer any implementation over others
        -  Ease migration to newer language editions
        -  Avoid quiet changes
        -  Enable secure programming

Category
        Remove UB.

Author
        Alejandro Colomar <a...@kernel.org>

        Cc: <bug-gnulib@gnu.org>
        Cc: <m...@lists.openwall.com>
        Cc: <libc-al...@sourceware.org>
        Cc: наб <nabijaczlew...@nabijaczleweli.xyz>
        Cc: Douglas McIlroy <douglas.mcil...@dartmouth.edu>
        Cc: Paul Eggert <egg...@cs.ucla.edu>
        Cc: Robert Seacord <rcseac...@gmail.com>
        Cc: Elliott Hughes <e...@google.com>
        Cc: Bruno Haible <br...@clisp.org>
        Cc: JeanHeyd Meneide <phdoftheho...@gmail.com>
        Cc: Rich Felker <dal...@libc.org>
        Cc: Adhemerval Zanella Netto <adhemerval.zane...@linaro.org>
        Cc: Joseph Myers <josmy...@redhat.com>
        Cc: Florian Weimer <fwei...@redhat.com>
        Cc: Andreas Schwab <sch...@suse.de>
        Cc: Thorsten Glaser <t...@mirbsd.de>
        Cc: Eric Blake <ebl...@redhat.com>
        Cc: Vincent Lefevre <vinc...@vinc17.net>
        Cc: Mark Harris <mark....@gmail.com>
        Cc: Collin Funk <collin.fu...@gmail.com>
        Cc: Wilco Dijkstra <wilco.dijks...@arm.com>
        Cc: DJ Delorie <d...@redhat.com>
        Cc: Cristian Rodríguez <crist...@rodriguez.im>
        Cc: Siddhesh Poyarekar <siddh...@gotplt.org>
        Cc: Sam James <s...@gentoo.org>
        Cc: Mark Wielaard <m...@klomp.org>
        Cc: "Maciej W. Rozycki" <ma...@redhat.com>
        Cc: Martin Uecker <ma.uec...@gmail.com>
        Cc: Christopher Bazley <chris.bazley.w...@gmail.com>
        Cc: <es...@obsession.se>
        Cc: Daniel Krügler <daniel.krueg...@googlemail.com>
        Cc: Kees Cook <keesc...@chromium.org>
        Cc: Valdis Klētnieks <valdis.kletni...@vt.edu>

History
        <https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0029.git/>

        r0 (2025-06-17):
        -  Initial draft.

        r1 (2025-06-20):
        -  Full rewrite after the recent glibc discussion.

        r2 (2025-06-21):
        -  Remove CC.  Add CC.
        -  wfix.
        -  Drop quote.
        -  Add a few more principles
        -  Clarify why ENOMEM is used in this proposal, and make it
           optional.
        -  Mention exceptional leak in code checking (size != 0).
        -  Clarify that part of the description of realloc can be
           editorially removed after this change.

        r3 (2025-06-23):
        -  Fix diff missing line.
        -  Remove ENOMEM from the proposal.
        -  Clarify that ENOMEM should be retained by platforms already
           using it.
        -  Add mention that LLVM's address sanitizer will catch the leak
           mentioned in r2.
        -  Add links to real bugs (including an RCE bug).

        r4 (2025-06-24):
        -  Use a better link for the Whatsapp RCE.
        -  s/Description/Rationale/
        -  wfix
        -  Mention that glibc <2.1.1 had the BSD behavior.
        -  Add footnote that realloc(3) may fail while shrinking.

        r5 (2025-06-26):
        -  It was glibc 2.1.1 that broke it, not glibc 2.2.
        -  wfix
        -  Mention in the footnote that the pointer may change.
        -  Document why not go the other way around.  It was explained
           several times during discussion, but people keep suggesting
           it.

See also
        <https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>
        <https://sourceware.org/pipermail/libc-alpha/1999-April/000956.html>
        
<https://inbox.sourceware.org/libc-alpha/20241019014002.3684656-1-siddh...@sourceware.org/T/#u>
        
<https://inbox.sourceware.org/libc-alpha/qukfe5yxycbl5v7ooskvqdnm3au3orohbx4babfltegi47iyly@or6dgf7akeqv/T/#u>
        
<https://github.com/bminor/glibc/commit/7c2b945e1fd64e0a5a4dbd6ae6592a7314dcd4b5>
        <https://github.com/llvm/llvm-project/issues/113065>
        <https://www.austingroupbugs.net/view.php?id=400>
        <https://www.austingroupbugs.net/view.php?id=526>
        <https://www.austingroupbugs.net/view.php?id=688>
        <https://sourceware.org/bugzilla/show_bug.cgi?id=12547>
        <https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm>
        <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n868.htm>
        <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm>
        <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf>
        
<https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/realloc.html>
        
<https://pubs.opengroup.org/onlinepubs/9699919799.2013edition/functions/realloc.html>
        <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120744>
        
<https://lore.kernel.org/lkml/20220213182443.4037039-1-keesc...@chromium.org/>
        <https://awakened1712.github.io/hacking/hacking-whatsapp-gif-rce/>
        <https://gbhackers.com/whatsapp-double-free-vulnerability/>

Rationale
        The specification of realloc(3) has been problematic since the
        very first standards, even before ISO C.  The wording has
        changed significantly, trying to forcedly permit implementations
        to return a null pointer when the requested size is zero.  This
        originated from the intent of banning zero-sized objects from
        the language in C89, but that never worked well in
        retrospective, as we can see from the fallout.

        None of the specifications have been good, and C23 finally gave
        up and made it undefined behavior.

        The problem is not only theoretical.  Programmers don't know how
        to use realloc(3) correctly, and have written weird code in
        their attempts.  This has resulted in a lot of non-sensical code
        in configure scripts[1], and even bugs in actual programs[2].

        [1] 
<https://codesearch.debian.net/search?q=%5Cbrealloc%5B+%5Ct%5D*%5B%28%5D%5B%5E%2C%5D*%2C%5B+%5Ct%5D0%5B%29%5D&literal=0>
        [2] 
<https://lore.kernel.org/lkml/20220213182443.4037039-1-keesc...@chromium.org/>

        In some cases, this non-sensical code has resulted in RCEs[3].

        [3] <https://awakened1712.github.io/hacking/hacking-whatsapp-gif-rce/>

        However, this doesn't need to be like that.  The traditional
        implementation of realloc(3), present in Unix V7, inherited by
        the BSDs, and currently available in range of systems, including
        musl libc, doesn't have any issues.  glibc --which uses an
        independent implemention rather than a Unix derivative-- also
        had this behavior originally; it changed to the current behavior
        in 1999 (glibc 2.1.1), only for compatibility with C89, even
        though ironically C99 was released soon after and removed the
        text that glibc was trying to comply to, and introduced some new
        text that was very confusing, and one of its interpretations
        would make the new glibc behavior non-conforming.

        Code written for platforms returning a null pointer can be
        migrated to platforms returning non-null, without significant
        issues.

        There are two kinds of code that call realloc(p,0).  One
        hard-codes the 0, and is used as a replacement of free(p).  This
        code ignores the return value, since it's unimportant.  This
        code currently produces a leak of 0 bytes plus associated
        metadata on platforms such as musl libc, where it returns a
        non-null pointer.  However, assuming that there are programs
        written with the knowledge that they won't ever be run on such
        platforms, we should take care of that, and make sure they don't
        leak.  A way of accomplishing this would be to recommend
        implementations to issue a diagnostic when realloc(3) is called
        with a hardcoded zero.  This is only an informal recommendation
        made by this proposal, as this is a matter of QoI, and the
        standard shouldn't say anything about it.  This would prevent
        this class of minor leaks.

        Moreover, in glibc, realloc(p,0) may return non-null, in the
        case where p is NULL, so code must already take that into
        account, and thus code that simply takes realloc(p,0) as a
        synonym of free(p) is already leaky, as free(NULL) is a no-op,
        but realloc(NULL,0) allocates 0 bytes.

        The other kind of code is in algorithms that realloc(3) an
        arbitrary size, which might eventually be zero.  This gets more
        complex.

        Here's the code that should be written for AIX or glibc:

                errno = 0;
                new = realloc(old, size);
                if (new == NULL) {
                        if (errno == ENOMEM)
                                free(old);
                        goto fail;
                }
                ...
                free(new);

        Failing to check for ENOMEM in these platforms before freeing
        the old pointer would result in a double-free.  If the program
        decides to continue using the old pointer instead of freeing it,
        it would result in a use-after-free.

        In the platforms where realloc(p,0) returns non-null, such as
        the BSDs or musl libc, it is simpler to handle it:

                new = realloc(old, size);
                if (new == NULL) {  // errno is ENOMEM
                        free(old);
                        goto fail;
                }
                ...
                free(new);

        Whenever the result is a null pointer, these platforms are
        reporting an ENOMEM error, and thus it is superfluous to check
        errno there.

        Most code is written in this way, even if run on platforms
        returning a null pointer.  This is because most programmers are
        just unaware of this problem.  Part of the reason is also that
        returning a non-null pointer with zero bytes is the natural
        extension of the behavior, which is what programmers intuitively
        expect from libc; that is, if realloc(p,3) allocates 3 bytes,
        r(p,2) allocates two bytes, and r(p,1) allocates one byte, it is
        natural by induction to expect that r(p,0) will allocate zero
        bytes.  Most algorithms naturally extend to 0 just fine, and
        special casing 0 is artificial.

        If the realloc(3) specification were changed to require that
        realloc(p,0) returns non-null on success, and that realloc(p,0)
        only fails when out-of-memory (and assuming the implementations
        will continue setting errno to ENOMEM), then code written for
        AIX or glibc would continue working just fine, since the errno
        check would be redundant with the null check.  Simply, the
        conditional (errno == ENOMEM) would always be true when
        (new == NULL).

        Then, there are non-POSIX platforms that don't set ENOMEM.  In
        those platforms, code might do this:

                new = realloc(old, size);
                if (new == NULL) {
                        if (size != 0)
                                free(old);
                        goto fail;
                }
                ...
                free(new);

        That code would continue working with this proposal, except for
        a very rare corner case, in which it would leak.  In the normal
        case, (size != 0) would never be true under (new == NULL),
        because a reallocation of 0 bytes would almost always succeed,
        and thus not return a null pointer under this proposal.
        However, in some cases, the system might not find space even for
        the small metadata needed for a 0-byte allocation.  In such
        case, the (size != 0) conditional would prevent deallocating
        'old', and thus cause a memory leak.  This case is exceptional
        enough that it shouldn't stop us from fixing realloc(3).
        Anyway, on an out-of-memory case, the program is likely to
        terminate rather soon, so the issue is even less likely to have
        an impact on any existing programs.  Also, LLVM's address
        sanitizer will soon able to catch such a leak:
        <https://github.com/llvm/llvm-project/issues/113065>

        This proposal makes handling of realloc(3) as straightforward as
        one would expect, with only two states: success or error.  There
        are no in-between states.

        The resulting wording in the standard is also much simpler, as
        it doesn't need to define so many special cases.

        For consistency, all the other allocation functions are updated
        to both return a null pointer on error, and use consistent
        wording.

    Why not go the other way around?
        Some people keep asking why not go the other way around: why not
        force the BSDs and musl to return a null pointer if size is 0.
        This would result in double-free and use-after-free bugs, which
        can result in RCE vulnerabilities (remote code execution), which
        is clearly unacceptable.

        Consider this code, which is the usual code for calling
        realloc(3) in such systems:

                new = realloc(old, size);
                if (new == NULL) {
                        free(old);
                        goto fail;
                }
                ...
                free(new);

        If realoc(p,0) would return a null pointer and free the old
        block, then the third line would be a double-free bug.

Prior art
    gnulib
        gnulib provides the realloc-posix module, which aims to wrap the
        system realloc(3) and reallocarray(3) functions so that they
        behave in a POSIX-complying manner.

        It previously behaved like glibc.  After I reported that it was
        non-conforming to POSIX, we discussed the best way forward,
        which we agreed was the same direction that this paper is
        proposing now for C2y.  The implementation was changed in

                gnulib.git d884e6fc4a60 (2024-11-04; "realloc-posix: realloc 
(..., 0) now returns nonnull")

        There have been no regression reports since then, as we
        expected.

    Unix V7, BSD
        The proposed behavior is the one endorsed by Doug McIlroy, the
        author of the original implementation of realloc(3) in Unix V7,
        and also present in the BSDs.

    glibc <= 2.1
        glibc was implemented originally to return non-null.  It was
        only in 1999, and purely to comply with the standards --with no
        requests by users to do so--, that the glibc maintainers decided
        to switch to the current behavior.

Design decisions
        This change needs two changes, which can be applied all at once,
        or in separate steps.

        The first step would make realloc(p,s) be consistent with
        free(p) and malloc(s), including when p is a null pointer, when
        s is zero, and also when both corner cases happen at the same
        time.  This change would already turn the implementations where
        malloc(0) returns non-null into the end goal we have.  This
        would require changes to (at least) the following
        implementations: glibc, Bionic, Windows.

        The second step would be to require that malloc(0) returns a
        non-null pointer.  This would require changes to (at least) the
        following implementations: AIX.

        This proposal has merged all steps into a single proposal.

Future directions
        This proposal, by specifying realloc(3) as-if by calling
        free(3) and malloc(3), makes redundant several mentions of
        realloc(3) next to either free(3) or malloc(3) in the standard.
        We could remove them in this proposal, or clean up that in a
        separate (mostly editorial) proposal.  Let's keep it for a
        future proposal for now.

Caveats
    n?n:1
        Code written today should be careful, in case it can run on
        older systems that are not fixed to comply with this stricter
        specification.  Thus, code written today should call realloc(3)
        similar to this:

                realloc(p, n?n:1);

        When all existing implementations are fixed to comply with this
        stricter specification, that workaround can be removed.

    ENOMEM
        Existing implementations that set errno to ENOMEM must continue
        doing so when the input pointer is not freed.  If they didn't,
        code that is currently portable to all POSIX systems

                errno = 0;
                new = realloc(old, size);
                if (new == NULL) {
                        if (errno == ENOMEM)
                                free(old);
                        goto fail;
                }
                ...
                free(new);

        would leak on error.

        Since it is currently impossible to write code today that is
        portable to arbitrary C17 systems, this is not an issue in
        ISO C.

                -  New code written for C2y will only need to check for
                   NULL to detect errors.

                -  Code written for specific C17 and older platforms
                   that don't set errno will continue to work for those
                   specific platforms.

                -  Code written for POSIX.1-2024 and older platforms
                   will continue working on POSIX C2y platforms,
                   assuming that POSIX will continue mandating ENOMEM.

                -  Code written for POSIX.1-2024 and older will not be
                   able to be run on non-POSIX C2y platforms, but that
                   could be expected.

        The only important thing is that platforms that did set ENOMEM
        should continue setting it, to avoid introducing leaks.

Proposed wording
        Based on N3550.

    7.25.4.1  Memory management functions :: General
        @@ p1
        ...
         If the size of the space requested is zero,
        -the behavior is implementation-defined:
        -either
        -a null pointer is returned to indicate the error,
        -or
         the behavior is as if the size were some nonzero value,
         except that the returned pointer shall not be used
         to access an object.

    7.25.4.2  The aligned_alloc function
        @@ Returns, p3
         The <b>aligned_alloc</b> function returns
        -either
        -a null pointer
        -or
        -a pointer to the allocated space.
        +a pointer to the allocated space
        +on success.
        +If
        +the space cannot be allocated,
        +a null pointer is returned.

    7.25.4.3  The calloc function
        @@ Returns, p3
         The <b>calloc</b> function returns
        -either
         a pointer to the allocated space
        +on success.
        -or a null pointer
        -if
        +If
         the space cannot be allocated
         or if the product <tt>nmemb * size</tt>
        -would wraparound <b>size_t</b>.
        +would wraparound <b>size_t</b>,
        +a null pointer is returned.

    7.25.4.7  The malloc function
        @@ Returns, p3
         The <b>malloc</b> function returns
        -either
        -a null pointer
        -or
        -a pointer to the allocated space.
        +a pointer to the allocated space
        +on success.
        +If
        +the space cannot be allocated,
        +a null pointer is returned.

    7.25.4.8  The realloc function
        @@ Description, p2
         The <b>realloc</b> function
         deallocates the old object pointed to by <tt>ptr</tt>
        +as if by a call to <b>free</b>,
         and returns a pointer to a new object
        -that has the size specified by <tt>size</tt>.
        +that has the size specified by <tt>size</tt>
        +as if by a call to <b>malloc</b>.
         The contents of the new object
         shall be the same as that of the old object prior to deallocation,
         up to the lesser of the new and old sizes.
         Any bytes in the new object
         beyond the size of the old object
         have unspecified values.

        @@ p3
         If <tt>ptr</tt> is a null pointer,
         the <b>realloc</b> function behaves
         like the <b>malloc</b> function for the specified size.
         Otherwise,
         if <tt>ptr</tt> does not match a pointer
         earlier returned by a memory management function,
         or
         if the space has been deallocated
         by a call to the <b>free</b> or <b>realloc</b> function,
        ## We can probably remove all of the above, because of the
        ## behavior now being defined as-if by calls to malloc(3) and
        ## free(3).  But let's do that editorially in a separate change.
        -or
        -if the size is zero,
        ## We're defining the behavior.
         the behavior is undefined.
         If
        -memory for the new object is not allocated,
        +the space cannot be allocated,
        ## Editorial; for consistency with the wording of the other functions.
         the old object is not deallocated
         and its value is unchanged.
        +XXX)

        @@ New footnote XXX
        +XXX)
        +While atypical,
        +<b>realloc</b> may fail
        +or return a different pointer
        +for a call that shrinks the block of memory.

        @@ Returns, p4
         The <b>realloc</b> function returns
         a pointer to the new object
         (which can have the same value
        -as a pointer to the old object),
        +as a pointer to the old object)
        +on success.
        -or
        +If
        +space cannot be allocated,
         a null pointer
        -if the new object has not been allocated.
        +is returned.

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description: PGP signature

Reply via email to