[Bug c/87392] New: UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

roscaeugeniu at gmail dot com Sat, 22 Sep 2018 15:52:33 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87392


            Bug ID: 87392
           Summary: UBSAN behavior on left-shifting 1 into the sign bit is
                    dependent on C standard
           Product: gcc
           Version: 8.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: roscaeugeniu at gmail dot com
  Target Milestone: ---

Dear GCC community,

Porting GCC UBSAN support from Linux kernel to U-Boot [1], we noticed [2] that
UBSAN complains about (0xfffff << 12) in U-Boot (saying "left shift of 1048575
by 12 places cannot be represented in type 'int'"), but doesn't complain about
the same piece of code in Linux kernel.

A simpler test-case with the same behavior is (1 << 31). The latter is commonly
used in the world of firmware, bootloaders and kernels. To give an example,
`git grep -E '1[ ]*<<[ ]*31' | wc -l` reports:
- 532 occurrences in U-Boot (v2018.09)
- 558 occurrences in coreboot (4.8)
- 1453 occurrences in Linux kernel (v4.19-rc4)

One could argue that the developers should just write (1UL << 31) to silent the
UBSAN. However, the problem is not limited to left-shifting constants, but also
applies to left-shifting 'unsigned char' variables (which become signed due to
promotion) [3]. To fix the latter, a cast to unsigned int is required prior to
left-shifting. But that, once again, only applies to non-Linux kernel code.

Doing some investigation why (1 << 31) is reported as undefined behavior in
U-Boot/coreboot, but not in Linux kernel, it turns out that UBSAN behavior
varies depending on the '-std=' option passed to gcc. For the record:
- Coreboot uses '-std=gnu11' since
https://github.com/coreboot/coreboot/commit/acbb70b810c6e
- U-Boot uses '-std=gnu11' since
http://git.denx.de/?p=u-boot.git;a=commitdiff;h=fa893990e9b5
- Linux uses '-std=gnu89' since
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51b97e354ba9

### Compiling with '-std=gnu89'

My experiments show that the sample code [6], compiled  with '-std=gnu89
-fsanitize=shift -lubsan' [7], triggers no UBSAN warning. This seems consistent
and independent on the gcc version [5] and on the signed integer overflow
options (-fwrapv/-fstrict-overflow/-fno-strict-overflow).

### Compiling with '-std=gnu11'

The same sample code [6], compiled with '-std=gnu11 -fsanitize=shift -lubsan'
[8] does trigger the UBSAN warning [9] by default. The newer gcc versions
6.4.0, 7.3.0, 8.1.0 (unlike 5.5.0 and 4.9.4) silent the warning if the code is
compiled with -fwrapv. GCC 8.1.0 (unlike the other versions) also silents the
warning with '-fno-strict-overflow' option alone (presumably due to the fact
that gcc-8 maps -fno-strict-overflow to "-fwrapv -fwrapv-pointer", according to
[10]).

All in all, what can be seen is that:
- The '-std=gnu89' UBSAN consistently refuses to report (1 << 31) as an issue.
- The '-std=gnu11' UBSAN, by default, reports (1 << 31) as "left-shifting into
sign bit" warning [9]. There are gcc options to silent [9], but these are
highly specific on the compiler version (hence can't be taken as a generic
solution).

Going forward and looking into the C standard itself, we see a difference
between how C89 and C11/C18 define the "left shift" operators:

C89 draft [11] says in §3.3.7:

------8<------
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 multiplied by the quantity, 2 raised to the power E2,
reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1
otherwise. (The constants ULONG_MAX and UINT_MAX are defined in the
header <limits.h> .)
------8<------

Both C11 [12] and C18 [13] drafts say in §6.5.7p3:

------8<------
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros. If E1 has an unsigned type, the value of the result
is E1 × 2^E2, reduced modulo one more than the maximum value representable
in the result type. If E1 has a signed type and nonnegative value, and
E1 × 2^E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
------8<------

Based on my reading, none of the above paragraphs legitimates left-shifting
into the sign bit. Still, what we see in our test results is that the
'-std=gnu89' UBSAN never complains about (1 << 31), while the '-std=gnu11'
UBSAN pretty much always does.

So, here come the questions:
- Could you please explain the UBSAN inconsistency when compiled with
'-std=gnu89' and '-std=gnu11'?
- Assuming projects like U-Boot and coreboot prefer to compile with
'-std=gnu11' (and not to roll back to ANSI C), what is your recommendation on
handling the "left-shifting 1 into the sign bit" warnings, given that (as shown
above) neither -fwrapv nor -fno-strict-overflow heal these warnings
consistently across the compiler versions.

Thanks and regards,
Eugeniu.


### References

[1] https://patchwork.ozlabs.org/cover/962307/
[2] https://patchwork.ozlabs.org/patch/962305/#1980409
[3] Left-shifting into the sign bit, when left operand is an 'unsigned char'
variable:
    - http://www.open-std.org/pipermail/ub/2013-October/000300.html
    - https://patchwork.ozlabs.org/patch/962298/
[4] https://mail.coreboot.org/pipermail/coreboot/2018-February/086146.html
[5] GCC versions tried:
    - gcc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0
    - gcc (Ubuntu 7.3.0-21ubuntu1~16.04) 7.3.0
    - gcc (Ubuntu 6.4.0-17ubuntu1~16.04) 6.4.0 20180424
    - gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
    - gcc (Ubuntu 4.9.4-2ubuntu1~16.04) 4.9.4
[6] #include <stdio.h> 
    int main(void) { printf("%d\n", 1 << 31); return 0; }
[7] gcc -std=gnu89 -fsanitize=shift -c sample.c && gcc sample.o -lubsan -o
sample && ./sample
[8] gcc -std=gnu11 -fsanitize=shift -c sample.c && gcc sample.o -lubsan -o
sample && ./sample
[9] sample.c:2:35: runtime error: left shift of 1 by 31 places cannot be
represented in type 'int'
[10] https://gcc.gnu.org/gcc-8/changes.html
[11] https://port70.net/~nsz/c/c89/c89-draft.html#3.3.7
[12] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
[13]
http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf

[Bug c/87392] New: UBSAN behavior on left-shifting 1 into the sign bit is dependent on C standard

Reply via email to