On Wed, Jun 15, 2011 at 11:06 PM, Fang, Changpeng <changpeng.f...@amd.com> wrote: >>I have no problems on -mtune=Bulldozer. But I object -mtune=generic >>change and did suggest a different approach for -mtune=generic. > > Something must have been broken for the unaligned load splitting in generic > mode. > > While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for > -mtune=bdver1, splitting > unaligned loads in generic mode is KILLING us: > > For 459.GemsFDTD (ref) on Bulldozer, > -Ofast -mavx -mno-avx256-split-unaligned-load: 480s > -Ofast -mavx : 2527s > > So, splitting unaligned loads results in the program to run 5~6 times slower! > > For 434.zeusmp train run > -Ofast -mavx -mno-avx256-split-unaligned-load: 32.5s > -Ofast -mavx : 106s > > Other tests are on-going!
I suspect that the split loads get further split into mov[lh]ps pieces? We do that for SSE moves with generic tuning at least IIRC. Richard. > > Changpeng. > > >