https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702
Avinash Jayakar <avinashd at linux dot ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |avinashd at linux dot ibm.com --- Comment #2 from Avinash Jayakar <avinashd at linux dot ibm.com> --- I am looking into this issue. As Peter mentioned the following issue 2) vextsb2d is not required as PowerPC has a modulo shift. It does not matter if additional bytes are set with shift amount. is no longer present in the trunk. I just wanted to understand the optimization opportunity a little better. This change is mainly to optimize the code size rather than execution time right? Because I think using a splat and shift has a similar performance to doing an add. I just ran a small benchmark, with 2 variants, and see very minimal difference in the actual execution time. Here is the synthetic benchmark. int main() { unsigned long long a[2]; a[0] = 1; a[1] = 2; for (long i=0; i<1e10; i++) lshift1((unsigned long long*)&a); printf ("%ld\n", a[1]); // don't optimize away the loop } And should the same behaviour happen with the following code as well? 1. a[0] *= 2; a[1] *= 2; 2. a[0] += a[0]; a[1] += a[1]; All of these emit the same left shift by 1 instruction with current gcc's trunk.