On 19/12/18 09:10, Tangnianyao (ICT) wrote:
> Greetings All,
> I am dealing with compile optimization comparison between arm64 and intel
> platform, with g++ (version 4.9.4).
>
> Compile the following c++ code,
>
> uint32 Witness::getEntityVolatileDataUpdateFlags(Entity* otherEntity)
> {
> uint32 flags = UPDATE_FLAG_NULL;
>
>
> if (otherEntity->controlledBy() && pEntity_->id() ==
> otherEntity->controlledBy()->id())
> return flags;
>
> ...
> }
>
> with successive load memory operations at the entry of a function, where the
> latter load memory operation has dependency on the former one.
> Compiling result on intel x86-64 platform, we find that g++ will put one load
> memory instrution in front of push stack instructions of function call. It
> can save some time waiting the former load to finish on an out-of-order
> processor. We use these optimization options O1, -fpartial-inlining,
> -fschedule-insns2, -ftree-pre.
> On arm64 platform, We use the same optimization options to compile the same
> code and find that there is no similar results. No load memory instructions
> is put before push stack instructions. Nor we get that result using O2, O3,
> or Ofast to complie on arm64.
>
> Did we have similar compiling optimization on arm64 g++?
> If yes, which optimization options can I use?
>
I think this is the wrong list - you probably want the gcc-help list.
Have you tried using a more recent version of gcc? AArch64 was new in
gcc 4.8 - major new architectures usually take a few versions before
they have acquired good optimised code.