> >> > > We would like to propose changing AVX generic mode tuning to > >> generate > >> > 128-bit > >> > > AVX instead of 256-bit AVX. > >> > > >> > You indicate a 3% reduction on bulldozer with avx256. > >> > How does avx128 compare to -mno-avx -msse4.2? > >> > >> We see these % differences going from SSE42 to AVX128 to AVX256 on > >> Bulldozer with "-mtune=generic -Ofast". > >> (Positive is improvement, negative is degradation) > >> > >> Bulldozer: > >> AVX128/SSE42 AVX256/AVX-128 > >> 410.bwaves -1.4% -1.4% > >> 416.gamess -1.1% 0.0% > >> 433.milc 0.5% -2.4% > >> 434.zeusmp 9.7% -2.1% > >> 435.gromacs 5.1% 0.5% > >> 436.cactusADM 8.2% -23.8% > >> 437.leslie3d 8.1% 0.4% > >> 444.namd 3.6% 0.0% > >> 447.dealII -1.4% -0.4% > >> 450.soplex -0.4% -0.4% > >> 453.povray 0.0% -1.5% > >> 454.calculix 15.7% -8.3% > >> 459.GemsFDTD 4.9% 1.4% > >> 465.tonto 1.3% -0.6% > >> 470.lbm 0.9% 0.3% > >> 481.wrf 7.3% -3.6% > >> 482.sphinx3 5.0% -9.8% > >> SPECFP 3.8% -3.2% > >> > >> > Will the next AMD generation have a useable avx256? > >> > I'm not keen on the idea of generic mode being tune > >> > for a single processor revision that maybe shouldn't > >> > actually be using avx at all. > >> > >> We see a substantial gain in several SPECFP benchmarks going from > SSE42 > >> to AVX128 on Bulldozer. > >> IMHO, accomplishing even a 5% gain in an individual benchmark takes > a > >> hardware company several man months. > >> The loss with AVX256 for Bulldozer is much more significant than the > >> gain for SandyBridge. > >> While the general trend in the industry is a move toward AVX256, for > >> now we would be disadvantaging Bulldozer with this choice. > >> > >> We have several customers who use -mtune=generic and it is default, > >> unless a user explicitly overrides it with -mtune=native. They are > the > >> ones who want to experiment with latest ISA using gcc, but want to > keep > >> their ISA selection and tuning agnostic on x86/64. IMHO, it is with > >> these customers in mind that generic was introduced in the first > place. > > > > Since stage 1 closure is around the corner, just wanted to ping to > see if the maintainers have made up their mind on this one. > > AVX-128 is an improvement over SSE42 for Bulldozer and AVX-256 wipes > out pretty much all of that gain in generic mode. > > Until there is a convergence on AVX-256 for x86/64, we would like to > propose having generic generate avx-128 by default and have a user > override to avx-256 manually when known to benefit performance. > > Did somebody spend the time analyzing why CactusADM shows so much of a > difference? > With the recent improvements in vectorizing for AVX, did > you > re-do the measurements with a recent trunk? > > I don't think disabling avx-256 by default is a good idea until we > understand why these numbers happen and are convinced we cannot fix > this by proper > cost modeling.
We have observed cases where AVX-256 bit code is slower than AVX-128 bit code on Bulldozer. This is because internally the front end, data paths etc for Bulldozer are designed for optimal AVX 128-bit. Throwing densely packed 256-bit code at the pipeline can congest the front end causing stalls and hence slowdowns. We expect the behavior of cactus, calculix and sphinx, which are the 3 benchmarks with the biggest avx-256 gaps, to be in the same vein. In general, the hardware design engineers recommend running AVX 128-bit code on Bulldozer. Given the underlying hardware design, software tuning can't really change the results here. Any further analysis of cactus would be a cycle sink at our end and we may not even be able to discuss the details on a public mailing list. x86/64 has not yet converged on avx-256 and generic mode should reflect that. Posting the re-measurements on trunk for cactus, calculix and sphinx on Bulldozer: AVX128/SSE42 AVX256/AVX-128 436.cactusADM 10% -30% 454.calculix 14.7% -6% 482.sphinx3 7% -9% All positive % above are improvements, all negative % are degradations. I will post re-measurements for all of Spec with latest trunk as soon as I have them. Thoughts? Thanks, Harsha