Re: [Mesa-dev] [PATCH] i965/hsw: approximate DDX with a uniform value across a subspan

Chia-I Wu Mon, 30 Sep 2013 23:32:54 -0700

On Tue, Oct 1, 2013 at 12:16 PM, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On 09/30/2013 07:16 PM, Ian Romanick wrote:
>> On 09/11/2013 10:00 PM, Chia-I Wu wrote:
>>> From: Chia-I Wu <o...@lunarg.com>
>>>
>>> Replicate the gradient of the top-left pixel to the other three pixels in 
>>> the
>>> subspan, as how DDY is implemented.  Before, different graidents were used 
>>> for
>>> pixels in the top row and pixels in the bottom row.
>>>
>>> This change results in a less accurate approximation.  However, it improves
>>> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at
>>> 95.0% confidence) on Haswell.  No noticeable image quality difference
>>> observed.
>>>
>>> No piglit gpu.tests regressions.
>>>
>>> I failed to come up with an explanation for the performance difference.  The
>>> change does not make a difference on Ivy Bridge either.  If anyone has the
>>> insight, please kindly enlighten me.  Performance differences may also be
>>> observed on other games that call textureGrad and dFdx.
>>
>> After all the experiments and discussions with the hardware guys, lets
>> go ahead and do this.  We should do a couple things, however.
>>
>> 1. Disable the optimization if the application explicitly sets
>> GL_FRAGMENT_SHADER_DERIVATIVE_HINT to GL_NICEST.
>
> Urgh...I always hate adding more state-dependent recompiles...
>
> To accomplish this, you'll have to:
> - Add a new high_quality_derivatives flag to brw_wm_prog_key.
> - In brw_wm_populate_key, add:
>   /* _NEW_HINT */
>   key->high_quality_derivatives =
>      ctx->Hint.FragmentShaderDerivative == GL_NICEST;
> - Add the _NEW_HINT dependency to brw_wm_prog's dirty flags.
>
>> 2. Add a driconf option, as suggested by Chris, to disable the optimization.
>
> ...which means changing the key setup to:
>
>   if (brw->disable_derivative_optimization) {
>      key->high_quality_derivatives =
>         ctx->Hint.FragmentShaderDerivative != GL_FASTEST;
>   } else {
>      key->high_quality_derivatives =
>         ctx->Hint.FragmentShaderDerivative == GL_NICEST;
>   }
>
> and, in brw_fs_precompile, setting
>
> key->high_quality_derivatives = brw->disable_derivative_optimization;
Thanks for the instructions.  I've sent an updated patch with all of
yours and Ian's comments incorporated.


>
> This all seems pretty awful to me...but I guess there's not really any
> getting around it.  If the register had worked out, we could've just
> added a Hint() driver hook that programmed it appropriately.  But alas.
>
>> 3. Use the same DDX / DDY calculation on all platforms.
>>
>> 4. Update the commit message and the comment in the code with the
>> explanation of the optimization (the HSW sample_d instruction does some
>> optimizations if the same LOD is used for all pixels, etc.).
>>
>>> Signed-off-by: Chia-I Wu <o...@lunarg.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++----
>>>  1 file changed, 13 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> index bfb3d33..c0d24a0 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
>>> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct 
>>> brw_reg dst, struct brw_reg src
>>>  void
>>>  fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct 
>>> brw_reg src)
>>>  {
>>> +   /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on 
>>> Haswell,
>>> +    * which gives much better performance when the result is used with
>>> +    * sample_d
>>> +    */
>>> +   unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 :
>>> +                                          BRW_VERTICAL_STRIDE_2;
>>> +   unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 :
>>> +                                        BRW_WIDTH_2;
>>> +
>>>     struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
>>>                               BRW_REGISTER_TYPE_F,
>>> -                             BRW_VERTICAL_STRIDE_2,
>>> -                             BRW_WIDTH_2,
>>> +                             vstride,
>>> +                             width,
>>>                               BRW_HORIZONTAL_STRIDE_0,
>>>                               BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>>     struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
>>>                               BRW_REGISTER_TYPE_F,
>>> -                             BRW_VERTICAL_STRIDE_2,
>>> -                             BRW_WIDTH_2,
>>> +                             vstride,
>>> +                             width,
>>>                               BRW_HORIZONTAL_STRIDE_0,
>>>                               BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
>>>     brw_ADD(p, dst, src0, negate(src1));
>>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>



-- 
o...@lunarg.com
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/hsw: approximate DDX with a uniform value across a subspan

Reply via email to