http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58879

            Bug ID: 58879
           Summary: PPC: Missed opportunity to use lwbrx
           Product: gcc
           Version: 4.7.3
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marcus at mc dot pp.se

Hi.

Please consider the following function, compiled on PPC (32 bit):

uint32_t swap32(uint32_t *in)
{
#if 1
  uint8_t a[] = {
    ((*in) & (uint32_t)0xff000000UL)>>24,
    ((*in) & (uint32_t)0x00ff0000UL)>>16,
    ((*in) & (uint32_t)0x0000ff00UL)>>8,
    (*in) & (uint32_t)0x000000ffUL,
  };
#else
  const uint8_t *a = (uint8_t *)in;
#endif
  uint32_t r =
    (a[0]) |
    (a[1] << 8) |
    (a[2] << 16) |
    (a[3] << 24);

  return r;
}

With the code in the #if branch, this results in a single lwbrx instruction. 
However, with the code in the #else branch it does not (getting lbz + slwi + or
instead).

Why the uint8_t pointer?  Well, my real code is a C++ template containing the
following:

  uint8_t data[nBytes];
  T getValue() const {
    T v = 0;
    int i;
    for (i=0; i<nBytes; i++)
      v |= data[i]<<(i*8);
    return v;
  }

If nBytes happens to be 4 in a particular instantiation of the template, then
this collapses beatifully into a single movl instuction on AMD64.  So I think
I'm not being totally unreasonable in hoping for a lwbrx on PPC (or lwz, if
-mlittle is in effect), provided strict alignment is not required of course.

I don't know how difficult it would be to make this work, but given that byte
array reassembly -> word load already works on AMD64, and reverse order
reassembly already can give a lwbrx at least _sometimes_ on PPC, it seems like
it would be feasable at least.  And it would be a neat trick to get efficient
code from portable source, without a lot of #ifdefs and __builtin_whatevers. 
:-)

Thanks for listening

  // Marcus

Reply via email to