http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58879
Bug ID: 58879 Summary: PPC: Missed opportunity to use lwbrx Product: gcc Version: 4.7.3 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: marcus at mc dot pp.se Hi. Please consider the following function, compiled on PPC (32 bit): uint32_t swap32(uint32_t *in) { #if 1 uint8_t a[] = { ((*in) & (uint32_t)0xff000000UL)>>24, ((*in) & (uint32_t)0x00ff0000UL)>>16, ((*in) & (uint32_t)0x0000ff00UL)>>8, (*in) & (uint32_t)0x000000ffUL, }; #else const uint8_t *a = (uint8_t *)in; #endif uint32_t r = (a[0]) | (a[1] << 8) | (a[2] << 16) | (a[3] << 24); return r; } With the code in the #if branch, this results in a single lwbrx instruction. However, with the code in the #else branch it does not (getting lbz + slwi + or instead). Why the uint8_t pointer? Well, my real code is a C++ template containing the following: uint8_t data[nBytes]; T getValue() const { T v = 0; int i; for (i=0; i<nBytes; i++) v |= data[i]<<(i*8); return v; } If nBytes happens to be 4 in a particular instantiation of the template, then this collapses beatifully into a single movl instuction on AMD64. So I think I'm not being totally unreasonable in hoping for a lwbrx on PPC (or lwz, if -mlittle is in effect), provided strict alignment is not required of course. I don't know how difficult it would be to make this work, but given that byte array reassembly -> word load already works on AMD64, and reverse order reassembly already can give a lwbrx at least _sometimes_ on PPC, it seems like it would be feasable at least. And it would be a neat trick to get efficient code from portable source, without a lot of #ifdefs and __builtin_whatevers. :-) Thanks for listening // Marcus