On 19/12/18 09:10, Tangnianyao (ICT) wrote:
> Greetings All,
> I am dealing with compile optimization comparison between arm64 and intel 
> platform, with g++ (version 4.9.4).
> 
> Compile the following c++ code,
> 
> uint32 Witness::getEntityVolatileDataUpdateFlags(Entity* otherEntity)
> {
>          uint32 flags = UPDATE_FLAG_NULL;
> 
> 
>          if (otherEntity->controlledBy() && pEntity_->id() == 
> otherEntity->controlledBy()->id())
>                    return flags;
> 
>          ...
> }
> 
> with successive load memory operations at the entry of a function, where the 
> latter load memory operation has dependency on the former one.
> Compiling result on intel x86-64 platform, we find that g++ will put one load 
> memory instrution in front of push stack instructions of function call. It 
> can save some time waiting the former load to finish on an out-of-order 
> processor.  We use these optimization options O1, -fpartial-inlining, 
> -fschedule-insns2, -ftree-pre.
> On arm64 platform, We use the same optimization options to compile the same 
> code and find that there is no similar results. No load memory instructions 
> is put before push stack instructions. Nor we get that result using O2, O3, 
> or Ofast to complie on arm64.
> 
> Did we have similar compiling optimization on arm64 g++?
> If yes, which optimization options can I use?
> 

I think this is the wrong list - you probably want the gcc-help list.

Have you tried using a more recent version of gcc?  AArch64 was new in
gcc 4.8 - major new architectures usually take a few versions before
they have acquired good optimised code.


Reply via email to