Greetings All,
I am dealing with compile optimization comparison between arm64 and intel 
platform, with g++ (version 4.9.4).

Compile the following c++ code,

uint32 Witness::getEntityVolatileDataUpdateFlags(Entity* otherEntity)
{
         uint32 flags = UPDATE_FLAG_NULL;


         if (otherEntity->controlledBy() && pEntity_->id() == 
otherEntity->controlledBy()->id())
                   return flags;

         ...
}

with successive load memory operations at the entry of a function, where the 
latter load memory operation has dependency on the former one.
Compiling result on intel x86-64 platform, we find that g++ will put one load 
memory instrution in front of push stack instructions of function call. It can 
save some time waiting the former load to finish on an out-of-order processor.  
We use these optimization options O1, -fpartial-inlining, -fschedule-insns2, 
-ftree-pre.
On arm64 platform, We use the same optimization options to compile the same 
code and find that there is no similar results. No load memory instructions is 
put before push stack instructions. Nor we get that result using O2, O3, or 
Ofast to complie on arm64.

Did we have similar compiling optimization on arm64 g++?
If yes, which optimization options can I use?

Thanks,
-Nianyao Tang

Reply via email to