On Wed, Jun 15, 2011 at 11:06 PM, Fang, Changpeng
<changpeng.f...@amd.com> wrote:
>>I have no problems on -mtune=Bulldozer.  But I object -mtune=generic
>>change and did suggest a different approach for -mtune=generic.
>
> Something must have been broken for the unaligned load splitting in generic 
> mode.
>
> While we lose 1.3% on CFP2006 in geomean by splitting unaligned loads for 
> -mtune=bdver1, splitting
> unaligned loads in generic mode is KILLING us:
>
> For 459.GemsFDTD (ref) on Bulldozer,
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   480s
> -Ofast -mavx                                                       :    2527s
>
> So, splitting unaligned loads results in the program to run 5~6 times slower!
>
> For 434.zeusmp train run
>  -Ofast -mavx -mno-avx256-split-unaligned-load:   32.5s
> -Ofast -mavx                                                       :    106s
>
> Other tests are on-going!

I suspect that the split loads get further split into mov[lh]ps pieces?
We do that for SSE moves with generic tuning at least IIRC.

Richard.

>
> Changpeng.
>
>
>

Reply via email to