https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818
Bug ID: 81818 Summary: aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3 (memory-hog) Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andrewm.roberts at sky dot com Target Milestone: --- Created attachment 41973 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41973&action=edit System independent test program to demonstrate the issue. I've run into problems building both gcc its self and my application on aarch64. The system was running out of memory, and the compliler was aborting with an ICE: g++: internal compiler error: Killed (program cc1plus) I've raised the issue on the gcc list (gcc behavior on memory exhaustion), and people are looking at getting a better error message indicating that out of memory may be an issue. The remaining issue is what can be done about the memory usage of the aarch64 version of gcc, which seems much worse than arm, and x64. -------------------------------------------------------------------------- gcc on aarch64 uses 3x the memory of arm, and is 2.2x slower in compiling. This is apparent at -Os, -O2 and -O3. -------------------------------------------------------------------------- I've cut my program down and made it system independent (testmap.cpp, attached). The program consists of two functions, one of which populates a multimap with 2400 inserts. Basically it's just doing: #include <map> typedef std::multimap<unsigned int, std::string> EnumMap_t; static EnumMap_t EnumMap; ... EnumMap.insert(EnumMap_t::value_type(0u, "0")); EnumMap.insert(EnumMap_t::value_type(1u, "1")); ... EnumMap.insert(EnumMap_t::value_type(2399u, "2399")); ... I've built this across x64 (Ryzen), arm (Raspberry Pi3), and aarch64 (Raspbery Pi3, and Odroid-C2). x64 is on Fedora, the rest are on Arch Linux Arm. The Raspberry Pi's have 1Gb RAM, the ODroid 2Gb, x64 has 32Gb. Compiling this single file exhausts most of the RAM on the Raspberry PI, and thus any parallel builds fail, or slow right down if swap file is used. I've attached log files for builds at -O0, -O1, -O2, -O3 and -Os on all the systems, using gcc 5.4.0, 6.4.0, 7.2.0rc2 and 8.0.0 snapshot. Here is a summary of the results: all build using: gcc -Ox -c testmap.cpp where -Ox is one of -O0, -Os, -O1, -O2, -O3 Memory Usage (Kb) -O0 5.4.0 6.4.0 7.2rc2 8.0.0 x64 223676 223688 223736 223728 arm 156204 156336 156336 156292 pi aarch64 224324 224596 224424 224572 od aarch64 217492 217604 217492 217540 -Os 5.4.0 6.4.0 7.2rc2 8.0.0 x64 392448 392512 392688 392680 arm 205724 205792 205896 205664 pi aarch64 422520 422636 422208 422604 <= Higher than x64, 2x arm od aarch64 416776 416260 416684 416708 <= Higher than x64, 2x arm -O1 5.4.0 6.4.0 7.2rc2 8.0.0 x64 394596 394568 394352 394232 arm 319976 319896 319900 319840 pi aarch64 393944 393996 393836 394000 od aarch64 391636 391652 391636 391640 -O2 5.4.0 6.4.0 7.2rc2 8.0.0 x64 628816 628972 628772 628896 arm 267832 267860 267716 267836 pi aarch64 815260 784288 799196 812504 <= Higher than x64, 3x arm od aarch64 813252 813068 813052 813084 <= Higher than x64, 3x arm -O3 5.4.0 6.4.0 7.2rc2 8.0.0 x64 629284 629472 629116 629236 arm 266364 266264 266240 266412 pi aarch64 724168 723760 724000 724148 <= Higher than x64, 2.7x arm od aarch64 718628 718388 718608 718608 <= Higher than x64, 2.7x arm It's a similar story with compile times. I'll just compare apples with apples here (identical hardware just arm vs aarch64 distibution/compiler): -Os 5.4.0 6.4.0 7.2rc2 8.0.0 arm 3:05.82 3:06.41 3:03.30 3:05.58 pi aarch64 5:59.43 6:07.95 6:04.69 5:55.98 <= 2.0x arm -O3 5.4.0 6.4.0 7.2rc2 8.0.0 arm 2:14.83 2:15.77 2:14.87 2:15.94 pi aarch64 5:02.46 5:02.44 5:02.47 5:02.46 <= 2.2x arm Both arm and aarch64 versions are using the same binutils: GNU ld (GNU Binutils) 2.28.0.20170506 I built the compilers myself using same options for all versions: ARM: /usr/local/gcc/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper Target: armv7l-unknown-linux-gnueabihf Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0 --program-suffix= --disable-werror --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-gnu-indirect-function --enable-lto --with-isl --enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu --disable-libstdcxx-pch --enable-install-libiberty --disable-multilib --disable-libssp --host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap Thread model: posix gcc version 8.0.0 20170806 (experimental) (GCC) AARCH64: /usr/local/gcc/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/aarch64-unknown-linux-gnu/8.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0 --program-suffix= --disable-werror --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-gnu-indirect-function --enable-lto --with-isl --enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu --disable-libstdcxx-pch --enable-install-libiberty --disable-multilib --enable-shared --enable-clocale=gnu --with-arch-directory=aarch64 --enable-multiarch --disable-libssp --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu --with-arch=armv8-a --disable-bootstrap Thread model: posix gcc version 8.0.0 20170806 (experimental) (GCC) I'm happy to test addition things if you would like to suggest them. For example testing individiual optimizations. Advice on how to proceed would be appreciated.