in my change on November 24th (adding the support to use scalar floating point values in Altivec registers) there was a regression when Spec 2000 was compiled for 32-bit big endian power7 systems. The rs6000_legitimize_reload_address function generated reg+offset address for scalar values. This resulted in compiler generating these addresses to load up a constant in some cases in 32-bit. This patch does not give an optimized address for scalar types if they can go in Altivec registers. By not generating an 'optimized' address, reload falls to try other options, and it eventually generates the lfd instruction with an offset instead of an lxsdx.
I have bootstraped these patches on big endian power7, big endian power8, and little endian power8 systems, and there were no regressions. Is the patch ok to install? [gcc] 2014-12-01 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/64019 * config/rs6000/rs6000.c (rs6000_legitimize_reload_address): Do not create LO_SUM address for constant addresses if the type can go in Altivec registers. [gcc/testsuite] 2014-12-01 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/64019 * gcc.target/powerpc/pr64019.c: New file. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 218090) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7593,7 +7593,11 @@ rs6000_legitimize_reload_address (rtx x, naturally aligned. Since we say the address is good here, we can't disable offsets from LO_SUMs in mem_operand_gpr. FIXME: Allow offset from lo_sum for other modes too, when - mem is sufficiently aligned. */ + mem is sufficiently aligned. + + Also disallow this if the type can got in VMX/Altivec registers, since + those registers do not have d-form (reg+offset) address modes. */ + && !reg_addr[mode].scalar_in_vmx_p && mode != TFmode && mode != TDmode && (mode != TImode || !TARGET_VSX_TIMODE) Index: gcc/testsuite/gcc.target/powerpc/pr64019.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr64019.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr64019.c (revision 0) @@ -0,0 +1,71 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-O2 -ffast-math -mcpu=power7" } */ + +#include <math.h> + +typedef struct +{ + double x, y, z; + double q, a, b, mass; + double vx, vy, vz, vw, dx, dy, dz; +} +ATOM; +int +u_f_nonbon (lambda) + double lambda; +{ + double r, r0, xt, yt, zt; + double lcutoff, cutoff, get_f_variable (); + double rdebye; + int inbond, inangle, i; + ATOM *a1, *a2, *bonded[10], *angled[10]; + ATOM *(*use)[]; + int uselist (), nuse, used; + ATOM *cp, *bp; + int a_number (), inbuffer; + double (*buffer)[], xx, yy, zz, k; + int invector, atomsused, ii, jj, imax; + double (*vector)[]; + ATOM *(*atms)[]; + double dielectric; + rdebye = cutoff / 2.; + dielectric = get_f_variable ("dielec"); + imax = a_number (); + for (jj = 1; jj < imax; jj++, a1 = bp) + { + if ((*use)[used] == a1) + { + used += 1; + } + while ((*use)[used] != a1) + { + for (i = 0; i < inbuffer; i++) + { + } + xx = a1->x + lambda * a1->dx; + yy = a1->y + lambda * a1->dy; + zz = a1->z + lambda * a1->dz; + for (i = 0; i < inbuffer; i++) + { + xt = xx - (*buffer)[3 * i]; + yt = yy - (*buffer)[3 * i + 1]; + zt = zz - (*buffer)[3 * i + 2]; + r = xt * xt + yt * yt + zt * zt; + r0 = sqrt (r); + xt = xt / r0; + zt = zt / r0; + k = + -a1->q * (*atms)[i]->q * dielectric * exp (-r0 / rdebye) * + (1. / (rdebye * r0) + 1. / r); + k += a1->a * (*atms)[i]->a / r / r0 * 6; + k -= a1->b * (*atms)[i]->b / r / r / r0 * 12; + (*vector)[3 * i] = xt * k; + (*vector)[3 * i + 1] = yt * k; + (*vector)[3 * i + 2] = zt * k; + } + } + } +}