------- Comment #43 from pthaugen at gcc dot gnu dot org 2008-04-30 18:49 ------- Created an attachment (id=15553) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15553&action=view) Testcase
I tried a mainline with the latest patch. While we no longer have problems with the prior testcases, there is no improvement for leslie3d on ppc64. I can still double the performance of the benchmark by specifying --param max-aliased-vops=10000. Including a new trimmed down testcase from the benchmark where I'm still seeing poor code when max-aliased-vops is not increased, compiled with 'gfortran -m32 -O2'. Refer to the first nested loop in procedure FLUXK(): DO I = I1, I2 QS(I) = WAV(I,J,K) * ZAREA END DO Base: .L150: lwz 0,24(18) # <variable>.stride, <variable>.stride lwz 9,36(18) # <variable>.stride, <variable>.stride lwz 11,12(18) # <variable>.stride, <variable>.stride lwz 10,4(18) # wav.offset, wav.offset mullw 0,17,0 # tmp660, ivtmp.602, <variable>.stride lwz 8,0(18) # wav.data, wav.data mullw 9,30,9 # tmp666, ivtmp.590, <variable>.stride mullw 11,6,11 # tmp670, i, <variable>.stride add 0,0,9 # tmp672, tmp660, tmp666 addi 6,6,1 # i, i, add 0,0,11 # tmp673, tmp672, tmp670 add 0,0,10 # tmp674, tmp673, wav.offset slwi 0,0,3 # tmp676, tmp674, lfdx 0,8,0 #, tmp678 fmul 0,0,8 # tmp679, tmp678, zarea.64 stfdx 0,15,7 #* ivtmp.589, tmp679 addi 7,7,8 # ivtmp.589, ivtmp.589, bdnz .L150 # With --param max-aliased-vops=1000: .L150: lfd 0,0(11) #* ivtmp.599, tmp739 add 11,11,30 # ivtmp.599, ivtmp.599, D.2783 fmul 0,0,8 # tmp740, tmp739, zarea.64 stfdx 0,22,9 #* ivtmp.602, tmp740 addi 9,9,8 # ivtmp.602, ivtmp.602, bdnz .L150 # -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921