https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

            Bug ID: 81818
           Summary: aarch64  uses 2-3x memory and 2x time of arm at -Os,
                    -O2, -O3 (memory-hog)
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Created attachment 41973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41973&action=edit
System independent test program to demonstrate the issue.

I've run into problems building both gcc its self and my application
on aarch64. The system was running out of memory, and the compliler was
aborting
with an ICE:

g++: internal compiler error: Killed (program cc1plus)

I've raised the issue on the gcc list (gcc behavior on memory exhaustion), and 
people are looking at getting a better error message indicating that out of 
memory may be an issue.

The remaining issue is what can be done about the memory usage of the aarch64 
version of gcc, which seems much worse than arm, and x64.

--------------------------------------------------------------------------
gcc on aarch64 uses 3x the memory of arm, and is 2.2x slower in compiling.
This is apparent at -Os, -O2 and -O3.
--------------------------------------------------------------------------

I've cut my program down and made it system independent (testmap.cpp,
attached).
The program consists of two functions, one of which populates a multimap
with 2400 inserts. Basically it's just doing:

#include <map>
typedef std::multimap<unsigned int, std::string> EnumMap_t;
static EnumMap_t EnumMap;
...
EnumMap.insert(EnumMap_t::value_type(0u, "0"));
EnumMap.insert(EnumMap_t::value_type(1u, "1"));
...
EnumMap.insert(EnumMap_t::value_type(2399u, "2399"));
... 

I've built this across x64 (Ryzen), arm (Raspberry Pi3), and aarch64 (Raspbery 
Pi3, and Odroid-C2). x64 is on Fedora, the rest are on Arch Linux Arm. The 
Raspberry Pi's have 1Gb RAM, the ODroid 2Gb, x64 has 32Gb. 

Compiling this single file exhausts most of the RAM on the Raspberry PI, and 
thus any parallel builds fail, or slow right down if swap file is used.

I've attached log files for builds at -O0, -O1, -O2, -O3 and -Os on all the
systems, using gcc 5.4.0, 6.4.0, 7.2.0rc2 and 8.0.0 snapshot.

Here is a summary of the results: 
all build using:
gcc -Ox -c testmap.cpp
where -Ox is one of -O0, -Os, -O1, -O2, -O3

Memory Usage (Kb)
-O0        5.4.0   6.4.0   7.2rc2  8.0.0
x64        223676  223688  223736  223728
arm        156204  156336  156336  156292
pi aarch64 224324  224596  224424  224572
od aarch64 217492  217604  217492  217540

-Os        5.4.0   6.4.0   7.2rc2  8.0.0
x64        392448  392512  392688  392680
arm        205724  205792  205896  205664
pi aarch64 422520  422636  422208  422604 <= Higher than x64, 2x arm
od aarch64 416776  416260  416684  416708 <= Higher than x64, 2x arm

-O1        5.4.0   6.4.0   7.2rc2  8.0.0
x64        394596  394568  394352  394232
arm        319976  319896  319900  319840
pi aarch64 393944  393996  393836  394000
od aarch64 391636  391652  391636  391640

-O2        5.4.0   6.4.0   7.2rc2  8.0.0
x64        628816  628972  628772  628896
arm        267832  267860  267716  267836
pi aarch64 815260  784288  799196  812504  <= Higher than x64, 3x arm
od aarch64 813252  813068  813052  813084  <= Higher than x64, 3x arm

-O3        5.4.0   6.4.0   7.2rc2  8.0.0
x64        629284  629472  629116  629236
arm        266364  266264  266240  266412
pi aarch64 724168  723760  724000  724148  <= Higher than x64, 2.7x arm
od aarch64 718628  718388  718608  718608  <= Higher than x64, 2.7x arm

It's a similar story with compile times. I'll just compare apples with apples
here (identical hardware just arm vs aarch64 distibution/compiler):

-Os        5.4.0   6.4.0   7.2rc2  8.0.0
arm        3:05.82 3:06.41 3:03.30 3:05.58
pi aarch64 5:59.43 6:07.95 6:04.69 5:55.98 <= 2.0x arm

-O3        5.4.0   6.4.0   7.2rc2  8.0.0
arm        2:14.83 2:15.77 2:14.87 2:15.94
pi aarch64 5:02.46 5:02.44 5:02.47 5:02.46 <= 2.2x arm

Both arm and aarch64 versions are using the same binutils:
GNU ld (GNU Binutils) 2.28.0.20170506

I built the compilers myself using same options for all versions:

ARM:
/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --host=armv7l-unknown-linux-gnueabihf
--build=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20170806 (experimental) (GCC)

AARCH64:
/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/aarch64-unknown-linux-gnu/8.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--enable-shared --enable-clocale=gnu --with-arch-directory=aarch64
--enable-multiarch --disable-libssp --host=aarch64-unknown-linux-gnu
--build=aarch64-unknown-linux-gnu --with-arch=armv8-a --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20170806 (experimental) (GCC)

I'm happy to test addition things if you would like to suggest them. For
example testing individiual optimizations. Advice on how to proceed would
be appreciated.

Reply via email to