http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55147



--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-10-31 
16:07:11 UTC ---

For the testcase from this PR it creates better assembly actually (compared to

with the #c1 patch, without that it is both longer and wrong).  That is because

when bswapdi is split too late, nothing optimizes the fact that only 32 bits of

the result are used.



For

unsigned long long

f1 (unsigned long long *p, int i)

{

  return __builtin_bswap64 (p[i]);

}



unsigned long long

f2 (unsigned long long p)

{

  return __builtin_bswap64 (p);

}



void

f3 (unsigned long long *p, int i, unsigned long long q)

{

  p[i] = __builtin_bswap64 (q);

}



void

f4 (unsigned long long *p, int i, unsigned long long *q)

{

  p[i] = __builtin_bswap64 (q[i]);

}



it creates the same number of insns/same quality (just slightly different RA

decisions/scheduling) for f1-f3, but for f4 without bswapdi2 it creates

slightly worse code (with bswapdi2 f4 needs just one call saved register,

without it two, supposedly because both bswap insns are scheduled together.

Reply via email to