https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702
Bug ID: 69702 Summary: excessive stack usage with -fprofile-arcs Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: arnd at linaro dot org Target Milestone: --- Created attachment 37604 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37604&action=edit standalone test case extracted from Linux kernel With gcc versions 4.9 or higher, the stack usage of some functions in the Linux kernel has grown to the point where we risk a stack overflow, with 8kb or 16kb of stack being available per thread. When building an ARM kernel, I get at least these warnings in some configurations when using "gcc -fprofile-arcs -Wframe-larger-than=1024", and don't get them without -fprofile-arcs: drivers/isdn/isdnhdlc.c:629:1: error: the frame size of 1152 bytes is larger than 1024 bytes drivers/media/common/saa7146/saa7146_hlp.c:464:1: error: the frame size of 1040 bytes is larger than 1024 bytes drivers/mtd/chips/cfi_cmdset_0020.c:651:1: error: the frame size of 1040 bytes is larger than 1024 bytes drivers/net/wireless/ath/ath6kl/main.c:495:1: error: the frame size of 1200 bytes is larger than 1024 bytes drivers/net/wireless/ath/ath9k/ar9003_aic.c:434:1: error: the frame size of 1208 bytes is larger than 1024 bytes drivers/video/fbdev/riva/riva_hw.c:426:1: error: the frame size of 1248 bytes is larger than 1024 bytes lib/lz4/lz4hc_compress.c:514:1: error: the frame size of 2464 bytes is larger than 1024 bytes The lz4hc_compress.c file is a good example, as it has the worst stack usage and is usable as a working test case outside of the kernel. I have reduced this file to a standalone .c file that can optionally compile into an executable program (lz4 compression from stdin to stdout). The code is orginally from www.lz4.org, but has been adapted for use in Linux. Compile with: gcc -O2 -Wall -Wno-pointer-sign -Wframe-larger-than=200 -fprofile-arcs -c lz4hc_compress.c The same problem happens on all architectures, e.g. gcc-4.9.3: Target -fprofile-arcs normal aarch64-linux-gcc 1136 112 alpha-linux-gcc 1008 304 am33_2.0-linux-gcc 1280 84 arm-linux-gnueabi-gcc 1080 112 cris-linux-gcc 828 100 frv-linux-gcc 904 104 hppa64-linux-gcc 944 248 hppa-linux-gcc 824 92 i386-linux-gcc 824 108 m32r-linux-gcc 908 136 microblaze-linux-gcc 832 88 mips64-linux-gcc 864 192 mips-linux-gcc 792 120 powerpc64-linux-gcc 800 96 powerpc-linux-gcc 808 56 s390-linux-gcc 832 112 sh3-linux-gcc 824 128 sparc64-linux-gcc 896 192 sparc-linux-gcc 824 104 x86_64-linux-gcc 912 192 xtensa-linux-gcc 816 128 With gcc-4.8.1, the numbers are much lower: arm-linux-gnueabi-gcc 184 104 x86_64-linux-gcc 224 192 The size of the binary object has also grown noticeably, from around 3000 bytes without -fprofile-arcs (on any version) to 10300 bytes with gcc-5.3.1 but only 6941 bytes with gcc-4.8. Runtime speed does not appear to be affected much (less than 20% overhead for -fprofile-arcs, which seems reasonable). I have tested ARM cross-compilers version 4.9.3 through 5.3.1, which all show similar problematic behavior, while version 4.6 through 4.8.3 are ok.