Using size_t to crash on off-by-one errors (was: size_t vs long.)

Alejandro Colomar via Gcc Wed, 23 Nov 2022 12:09:12 -0800

Hi,

On 11/18/22 00:04, Alejandro Colomar wrote:

The main advantage of this code compared to the equivalent ssize_t or ptrdiff_t or idx_t code is that if you somehow write an off-by-one error, and manage to access the array at [-1], if i is unsigned you'll access [SIZE_MAX], which will definitely crash your program.
That's not true on the vast majority of today's platforms, which don't have subscript checking, and for which a[-1] is treated the same way a[SIZE_MAX] is. On my platform (Fedora 36 x86-64) the same machine code is generated for 'a' and 'b' for the following C code.
   #include <stdint.h>
   int a(int *p) { return p[-1]; }
   int b(int *p) { return p[SIZE_MAX]; }
Hmm, this seems to be true in my platform (amd64) per the experiment I just did:

$ cat s.c
#include <sys/types.h>

char
f(char *p, ssize_t i)
{
     return p[i];
}
$ cat u.c
#include <stddef.h>

char
f(char *p, size_t i)
{
     return p[i];
}
$ cc -Wall -Wextra -Werror -S -O3 s.c u.c
$ diff -u u.s s.s
--- u.s    2022-11-17 23:41:47.773805041 +0100
+++ s.s    2022-11-17 23:41:47.761805265 +0100
@@ -1,15 +1,15 @@
-    .file    "u.c"
+    .file    "s.c"
      .text
      .p2align 4
      .globl    f
      .type    f, @function
  f:
-.LFB0:
+.LFB6:
      .cfi_startproc
      movzbl    (%rdi,%rsi), %eax
      ret
      .cfi_endproc
-.LFE0:
+.LFE6:
      .size    f, .-f
      .ident    "GCC: (Debian 12.2.0-9) 12.2.0"
      .section    .note.GNU-stack,"",@progbits


It seems a violation of the standard, isn't it?
The operator [] doesn't have a type, and an argument to it should be treated with whatever type it has after default promotions. If I pass a size_t to it, the type should be unsigned, and that should be preserved, by accessing the array at a high value, which the compiler has no way to know if it will exist or not, by that function definition. The extreme of -1 and SIZE_MAX might be not the best one, since we would need a pointer to be 0 to be accessible at [SIZE_MAX], but if you replace those by -RANDOM, and (size_t)-RANDOM, then the compiler definitely needs to generate different code, yet it doesn't.
I'm guessing this is an optimization by GCC knowing that we will never be close to using the whole 64-bit address space. If we use int and unsigned, things change:
$ cat s.c
char
f(char *p, int i)
{
     return p[i];
}
alx@asus5775:~/tmp$ cat u.c
char
f(char *p, unsigned i)
{
     return p[i];
}
$ cc -Wall -Wextra -Werror -S -O3 s.c u.c
$ diff -u u.s s.s
--- u.s    2022-11-17 23:44:54.446318186 +0100
+++ s.s    2022-11-17 23:44:54.434318409 +0100
@@ -1,4 +1,4 @@
-    .file    "u.c"
+    .file    "s.c"
      .text
      .p2align 4
      .globl    f
@@ -6,7 +6,7 @@
  f:
  .LFB0:
      .cfi_startproc
-    movl    %esi, %esi
+    movslq    %esi, %rsi
      movzbl    (%rdi,%rsi), %eax
      ret
      .cfi_endproc
I'm guessing that GCC doesn't do the assumption here, and I guess the unsigned version would crash, while the signed version would cause nasal demons. Anyway, now that I'm here, I'll test it:
$ cat s.c
[[gnu::noipa]]
char
f(char *p, int i)
{
     return p[i];
}

int main(void)
{
     int i = -1;
     char c[4];

     return f(c, i);
}
$ cc -Wall -Wextra -Werror -O3 s.c
$ ./a.out
$ echo $?
0


$ cat u.c
[[gnu::noipa]]
char
f(char *p, unsigned i)
{
     return p[i];
}

int main(void)
{
     unsigned i = -1;
     char c[4];

     return f(c, i);
}
$ cc -Wall -Wextra -Werror -O3 u.c
$ ./a.out
Segmentation fault
I get this SEGV difference consistently. I CCed gcc@ in case they consider this to be something they want to address. Maybe the optimization is important for size_t-sized indices, but if it is not, I'd prefer getting the SEGV for SIZE_MAX.

After some though, of course the compiler can't produce any different code, since pointers are 64 bits. A different story would be if pointers were 128 bits, but that might cause its own issues; should sizes be still 64 bits? or 128 bits? Maybe using a configurable size_t would be interesting for debugging.

Anyway, it's good to know that tweaking size_t to be 32 bits in some debug builds might help catch some off-by-one errors.


Cheers,

Alex

--
<http://www.alejandro-colomar.es/>

OpenPGP_signature
Description: OpenPGP digital signature

Using size_t to crash on off-by-one errors (was: size_t vs long.)

Reply via email to