------- Comment #7 from rguenth at gcc dot gnu dot org 2009-12-01 16:46 ------- Just reverting rev. 154688 and using the training set gets us from
464.h264ref -- 228 -- S to 464.h264ref -- 170 -- S at -O3 -ffast-math -funroll-loops -march-native (-march=k8-sse3 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=k8). After the patch the oprofile looks like CPU: AMD64 processors, speed 2000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 2241755 51.1739 SetupFastFullPelSearch 442008 10.0900 BlockMotionSearch 271429 6.1961 SubPelBlockMotionSearch 269337 6.1483 FastPelY_14 170961 3.9026 UMVLine16Y_11 159311 3.6367 SetupLargerBlocks 155556 3.5510 SATD 127230 2.9044 FastLine16Y_11 72328 1.6511 dct_luma 69728 1.5917 getNonAffNeighbour All but the *Y_1[14] functions are inside mv-search.c, just re-compiling that file is enough to reproduce the issue. after reverting it CPU: AMD64 processors, speed 2000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1223269 36.8289 SetupFastFullPelSearch 450306 13.5573 BlockMotionSearch 320395 9.6461 SubPelBlockMotionSearch 245175 7.3815 FastPelY_14 160694 4.8380 SetupLargerBlocks 155353 4.6772 SATD 153292 4.6152 UMVLine16Y_11 79531 2.3944 FastLine16Y_11 72455 2.1814 dct_luma 70733 2.1296 getNonAffNeighbour 42828 1.2894 UMVPelY_14 34396 1.0356 UnifiedOneForthPix -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[4.5 Regression] rev |[4.5 Regression] rev 154688 |15458[78] regress |regress 464.h264ref peak 20% |464.h264ref peak 20% | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42216