On 19/12/18 09:10, Tangnianyao (ICT) wrote: > Greetings All, > I am dealing with compile optimization comparison between arm64 and intel > platform, with g++ (version 4.9.4). > > Compile the following c++ code, > > uint32 Witness::getEntityVolatileDataUpdateFlags(Entity* otherEntity) > { > uint32 flags = UPDATE_FLAG_NULL; > > > if (otherEntity->controlledBy() && pEntity_->id() == > otherEntity->controlledBy()->id()) > return flags; > > ... > } > > with successive load memory operations at the entry of a function, where the > latter load memory operation has dependency on the former one. > Compiling result on intel x86-64 platform, we find that g++ will put one load > memory instrution in front of push stack instructions of function call. It > can save some time waiting the former load to finish on an out-of-order > processor. We use these optimization options O1, -fpartial-inlining, > -fschedule-insns2, -ftree-pre. > On arm64 platform, We use the same optimization options to compile the same > code and find that there is no similar results. No load memory instructions > is put before push stack instructions. Nor we get that result using O2, O3, > or Ofast to complie on arm64. > > Did we have similar compiling optimization on arm64 g++? > If yes, which optimization options can I use? >
I think this is the wrong list - you probably want the gcc-help list. Have you tried using a more recent version of gcc? AArch64 was new in gcc 4.8 - major new architectures usually take a few versions before they have acquired good optimised code.