Hi! I can't approve this, but for what it's worth it looks fine to me.
Bill On 12/11/19 6:31 AM, Kewen.Lin wrote:
Hi, We found that the vectorization cost modeling on scalar COND_EXPR is a bit off on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the default is -fvect-cost-model=dynamic) by 1.94%. Scalar COND_EXPR is expanded into compare + branch or compare + isel normally, either of them should be priced more than the simple FXU operation. This patch is to add additional vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost. The idea to use additional cost value 2 instead of the others: 1) try various possible value candidates from 1 to 5, 2 is the best measured on Power9. 2) from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's 2.5 times of simple FXU instruction which takes cost 1 in the current modeling, it's close. 3) get fine SPEC2017 ratio on Power8 as well. The SPEC2017 performance evaluation on Power9 with explicit unrolling shows 548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others is trivial. By further investigation on 526.blender_r, the assembly of 10 hottest functions are unchanged, the impact should be due to some side effects. SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r). Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are trivial. SPECINT geomean +0.19%, SPECINT geomean +0.06%. While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r +1.32% gain and 511.povray_r +2.03% gain, the others are trivial. SPECINT geomean +0.08%, SPECINT geomean +0.18%. Bootstrapped and regress tested on powerpc64le-linux-gnu. Is OK for trunk? BR, Kewen --- gcc/ChangeLog 2019-12-11 Kewen Lin <li...@gcc.gnu.org> * config/rs6000/rs6000.c (adjust_vectorization_cost): New function. (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update stmt_cost.