https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119693

            Bug ID: 119693
           Summary: GCC assumes wrong bounds for strlen() return value,
                    causing bogus bounds check elimination
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jannh at google dot com
                CC: msebor at gcc dot gnu.org
  Target Milestone: ---

GCC, on 32-bit x86, assumes that the return value of strlen() is smaller than
0x7ffffffe:

```
$ cat gcc-strlen-test-simple.c
#include <string.h>
int test1(const char *p) { return __builtin_constant_p(strlen(p) < 0x7ffffffd);
}
int test2(const char *p) { return __builtin_constant_p(strlen(p) < 0x7ffffffe);
}
$ gcc -Wall -Wextra -m32 -c -O3 gcc-strlen-test-simple.c
$ objdump -d -Mintel gcc-strlen-test-simple.o
[...]
00000000 <test1>:
   0:   31 c0                   xor    eax,eax
   2:   c3                      ret
[...]

00000010 <test2>:
  10:   b8 01 00 00 00          mov    eax,0x1
  15:   c3                      ret
$ gcc --version
gcc (Debian 14.2.0-3+build4) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```
I guess this assumption was probably introduced in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78153 .

I think that this assumption is wrong for two reasons.


First: The assumption that strlen(...) returns a value smaller than 0x7ffffffe
due to ptrdiff_t constraints implies an assumption that the size of an
allocation is **smaller than** 0x7fffffff bytes. That's off-by-one - if an
object has a size of 0x7fffffff bytes, ptrdiff_t can still represent the result
of subtracting the object's start address from the object's one-behind-the-end
pointer. glibc malloc's checked_request2size() in
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c also allows
sizes up to (and including) PTRDIFF_MAX.

So the following sample program, when built for 32-bit x86 and invoked with
argument 0x7ffffffe, cleanly reports an error when built with clang while it
segfaults when built with gcc:
```
$ cat gcc-strlen-test.c
#include <err.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

static char hexdigit(unsigned char val) {
  if (val < 10)
    return '0' + val;
  return 'a' + val;
}

__attribute__((noinline))
char *hex_func(const char *str) {
  size_t str_len = strlen(str);
  size_t encoded_hex_len;
  if (__builtin_mul_overflow(str_len, 2, &encoded_hex_len))
    errx(1, "multiplication overflowed");
  size_t encoded_len;
  if (__builtin_add_overflow(encoded_hex_len, 4/*"hex:"*/ + 1/*NUL*/,
&encoded_len))
    errx(1, "addition overflow");
  char *outbuf = malloc(encoded_len);
  if (outbuf == NULL)
    err(1, "malloc outbuf");
  outbuf[0] = 'h';
  outbuf[1] = 'e';
  outbuf[2] = 'x';
  outbuf[3] = ':';
  for (size_t i = 0; i < str_len; i++) {
    outbuf[4+i*2+0] = hexdigit((unsigned char)str[i] & 0xf);
    outbuf[4+i*2+1] = hexdigit((unsigned char)str[i] >> 4);
  }
  outbuf[4+str_len*2] = '\0';
  return outbuf;
}

int main(int argc, char **argv) {
  if (argc != 2)
    errx(1, "invocation");
  unsigned long len = strtoul(argv[1], NULL, 0);
  size_t len_with_nullbyte;
  if (__builtin_add_overflow(len, 1, &len_with_nullbyte))
    errx(1, "add 1 for nullbyte");
  char *str = malloc(len_with_nullbyte);
  if (!str)
    err(1, "malloc");
  memset(str, 'A', len);
  str[len] = '\0';

  puts(hex_func(str));
}
$ clang -m32 -O2 -o gcc-strlen-test gcc-strlen-test.c -ggdb
$ ./gcc-strlen-test 0x7ffffffe
gcc-strlen-test: addition overflow
$ gcc -m32 -O2 -o gcc-strlen-test gcc-strlen-test.c -ggdb
$ ./gcc-strlen-test 0x7ffffffe
Segmentation fault (core dumped)
$
```

The second more nitpicky reason is that, from what I can tell, nothing in C99
or in N3220 actually specifies that objects must be small enough for ptrsize_t.
N3220 section "6.5.7 Additive operators" paragraph 10 says "If the result is
not representable in an object of that type, the behavior is undefined" - so I
think you're allowed to have bigger objects as long as you don't do pointer
arithmetic on them.

Reply via email to