On Tue, Oct 1, 2013 at 12:16 PM, Kenneth Graunke <kenn...@whitecape.org> wrote: > On 09/30/2013 07:16 PM, Ian Romanick wrote: >> On 09/11/2013 10:00 PM, Chia-I Wu wrote: >>> From: Chia-I Wu <o...@lunarg.com> >>> >>> Replicate the gradient of the top-left pixel to the other three pixels in >>> the >>> subspan, as how DDY is implemented. Before, different graidents were used >>> for >>> pixels in the top row and pixels in the bottom row. >>> >>> This change results in a less accurate approximation. However, it improves >>> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at >>> 95.0% confidence) on Haswell. No noticeable image quality difference >>> observed. >>> >>> No piglit gpu.tests regressions. >>> >>> I failed to come up with an explanation for the performance difference. The >>> change does not make a difference on Ivy Bridge either. If anyone has the >>> insight, please kindly enlighten me. Performance differences may also be >>> observed on other games that call textureGrad and dFdx. >> >> After all the experiments and discussions with the hardware guys, lets >> go ahead and do this. We should do a couple things, however. >> >> 1. Disable the optimization if the application explicitly sets >> GL_FRAGMENT_SHADER_DERIVATIVE_HINT to GL_NICEST. > > Urgh...I always hate adding more state-dependent recompiles... > > To accomplish this, you'll have to: > - Add a new high_quality_derivatives flag to brw_wm_prog_key. > - In brw_wm_populate_key, add: > /* _NEW_HINT */ > key->high_quality_derivatives = > ctx->Hint.FragmentShaderDerivative == GL_NICEST; > - Add the _NEW_HINT dependency to brw_wm_prog's dirty flags. > >> 2. Add a driconf option, as suggested by Chris, to disable the optimization. > > ...which means changing the key setup to: > > if (brw->disable_derivative_optimization) { > key->high_quality_derivatives = > ctx->Hint.FragmentShaderDerivative != GL_FASTEST; > } else { > key->high_quality_derivatives = > ctx->Hint.FragmentShaderDerivative == GL_NICEST; > } > > and, in brw_fs_precompile, setting > > key->high_quality_derivatives = brw->disable_derivative_optimization; Thanks for the instructions. I've sent an updated patch with all of yours and Ian's comments incorporated.
> > This all seems pretty awful to me...but I guess there's not really any > getting around it. If the register had worked out, we could've just > added a Hint() driver hook that programmed it appropriately. But alas. > >> 3. Use the same DDX / DDY calculation on all platforms. >> >> 4. Update the commit message and the comment in the code with the >> explanation of the optimization (the HSW sample_d instruction does some >> optimizations if the same LOD is used for all pixels, etc.). >> >>> Signed-off-by: Chia-I Wu <o...@lunarg.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++---- >>> 1 file changed, 13 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> index bfb3d33..c0d24a0 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct >>> brw_reg dst, struct brw_reg src >>> void >>> fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct >>> brw_reg src) >>> { >>> + /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on >>> Haswell, >>> + * which gives much better performance when the result is used with >>> + * sample_d >>> + */ >>> + unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 : >>> + BRW_VERTICAL_STRIDE_2; >>> + unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 : >>> + BRW_WIDTH_2; >>> + >>> struct brw_reg src0 = brw_reg(src.file, src.nr, 1, >>> BRW_REGISTER_TYPE_F, >>> - BRW_VERTICAL_STRIDE_2, >>> - BRW_WIDTH_2, >>> + vstride, >>> + width, >>> BRW_HORIZONTAL_STRIDE_0, >>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); >>> struct brw_reg src1 = brw_reg(src.file, src.nr, 0, >>> BRW_REGISTER_TYPE_F, >>> - BRW_VERTICAL_STRIDE_2, >>> - BRW_WIDTH_2, >>> + vstride, >>> + width, >>> BRW_HORIZONTAL_STRIDE_0, >>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); >>> brw_ADD(p, dst, src0, negate(src1)); >>> >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > -- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev