------- Comment #7 from hjl dot tools at gmail dot com 2009-09-01 13:20 ------- Realign the incoming stack for vectorizer has very limited impact on performance. Here are the differences of -m32 -O3 -msse2 -mfpmath=sse -ffast-math -funroll-loops before and after my patch:
400.perlbench -0.384615% 401.bzip2 0% 403.gcc -0.362319% 429.mcf -0.813008% 445.gobmk 0.921659% 456.hmmer 0.549451% 458.sjeng -0.438596% 462.libquantum 0% 464.h264ref 0% 471.omnetpp -0.478469% 473.astar -0.645161% 483.xalancbmk -0.727273% SPECint(R)_base2006 -0.411523% 410.bwaves -0.406504% 416.gamess 0% 433.milc -1.36986% 434.zeusmp -0.44843% 435.gromacs 0% 436.cactusADM 0% 437.leslie3d -0.888889% 444.namd 1.20482% 447.dealII -0.350877% 450.soplex -0.31746% 453.povray 0.458716% 454.calculix 0% 459.GemsFDTD 0% 465.tonto 0% 470.lbm 0% 481.wrf 0.480769% 482.sphinx3 0.940439% SPECfp(R)_base2006 0% It won't change generated code if vectorizer isn't enabled. Its benifits outweigh its drawbacks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41156