https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119693
Bug ID: 119693 Summary: GCC assumes wrong bounds for strlen() return value, causing bogus bounds check elimination Product: gcc Version: 14.2.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jannh at google dot com CC: msebor at gcc dot gnu.org Target Milestone: --- GCC, on 32-bit x86, assumes that the return value of strlen() is smaller than 0x7ffffffe: ``` $ cat gcc-strlen-test-simple.c #include <string.h> int test1(const char *p) { return __builtin_constant_p(strlen(p) < 0x7ffffffd); } int test2(const char *p) { return __builtin_constant_p(strlen(p) < 0x7ffffffe); } $ gcc -Wall -Wextra -m32 -c -O3 gcc-strlen-test-simple.c $ objdump -d -Mintel gcc-strlen-test-simple.o [...] 00000000 <test1>: 0: 31 c0 xor eax,eax 2: c3 ret [...] 00000010 <test2>: 10: b8 01 00 00 00 mov eax,0x1 15: c3 ret $ gcc --version gcc (Debian 14.2.0-3+build4) 14.2.0 Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` I guess this assumption was probably introduced in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78153 . I think that this assumption is wrong for two reasons. First: The assumption that strlen(...) returns a value smaller than 0x7ffffffe due to ptrdiff_t constraints implies an assumption that the size of an allocation is **smaller than** 0x7fffffff bytes. That's off-by-one - if an object has a size of 0x7fffffff bytes, ptrdiff_t can still represent the result of subtracting the object's start address from the object's one-behind-the-end pointer. glibc malloc's checked_request2size() in https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c also allows sizes up to (and including) PTRDIFF_MAX. So the following sample program, when built for 32-bit x86 and invoked with argument 0x7ffffffe, cleanly reports an error when built with clang while it segfaults when built with gcc: ``` $ cat gcc-strlen-test.c #include <err.h> #include <stdlib.h> #include <stdio.h> #include <string.h> static char hexdigit(unsigned char val) { if (val < 10) return '0' + val; return 'a' + val; } __attribute__((noinline)) char *hex_func(const char *str) { size_t str_len = strlen(str); size_t encoded_hex_len; if (__builtin_mul_overflow(str_len, 2, &encoded_hex_len)) errx(1, "multiplication overflowed"); size_t encoded_len; if (__builtin_add_overflow(encoded_hex_len, 4/*"hex:"*/ + 1/*NUL*/, &encoded_len)) errx(1, "addition overflow"); char *outbuf = malloc(encoded_len); if (outbuf == NULL) err(1, "malloc outbuf"); outbuf[0] = 'h'; outbuf[1] = 'e'; outbuf[2] = 'x'; outbuf[3] = ':'; for (size_t i = 0; i < str_len; i++) { outbuf[4+i*2+0] = hexdigit((unsigned char)str[i] & 0xf); outbuf[4+i*2+1] = hexdigit((unsigned char)str[i] >> 4); } outbuf[4+str_len*2] = '\0'; return outbuf; } int main(int argc, char **argv) { if (argc != 2) errx(1, "invocation"); unsigned long len = strtoul(argv[1], NULL, 0); size_t len_with_nullbyte; if (__builtin_add_overflow(len, 1, &len_with_nullbyte)) errx(1, "add 1 for nullbyte"); char *str = malloc(len_with_nullbyte); if (!str) err(1, "malloc"); memset(str, 'A', len); str[len] = '\0'; puts(hex_func(str)); } $ clang -m32 -O2 -o gcc-strlen-test gcc-strlen-test.c -ggdb $ ./gcc-strlen-test 0x7ffffffe gcc-strlen-test: addition overflow $ gcc -m32 -O2 -o gcc-strlen-test gcc-strlen-test.c -ggdb $ ./gcc-strlen-test 0x7ffffffe Segmentation fault (core dumped) $ ``` The second more nitpicky reason is that, from what I can tell, nothing in C99 or in N3220 actually specifies that objects must be small enough for ptrsize_t. N3220 section "6.5.7 Additive operators" paragraph 10 says "If the result is not representable in an object of that type, the behavior is undefined" - so I think you're allowed to have bigger objects as long as you don't do pointer arithmetic on them.