fs: fix pull constant load component selection for doubles

Francisco Jerez Wed, 11 May 2016 13:47:12 -0700

Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:

> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:
>> 
>> > 
>> > From: Iago Toral Quiroga <ito...@igalia.com>
>> > 
>> > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4
>> > starting at a
>> > constant offset that is 16-byte aligned. If we need to access an
>> > unaligned
>> > offset we emit a load with an aligned offset and use the remaining
>> > constant
>> > offset to select the component into the vec4 result that we are
>> > interested
>> > in. This component must be computed in units of the type size,
>> > since that
>> > is what fs_reg::set_smear expects.
>> > 
>> > This patch does this change in the two places where we use this
>> > message:
>> > In demote_pull_constants when we lower uniform access with constant
>> > offset
>> > into the pull constant buffer and in UBO loads with constant
>> > offset.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp     | 3 ++-
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
>> >  2 files changed, 5 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 0e69be8..dff13ea 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
>> >           inst->src[i].file = VGRF;
>> >           inst->src[i].nr = dst.nr;
>> >           inst->src[i].reg_offset = 0;
>> > -         inst->src[i].set_smear(pull_index & 3);
>> > +         unsigned type_slots = MAX2(1, type_sz(inst->dst.type) /
>> > 4);
>> > +         inst->src[i].set_smear((pull_index & 3) / type_slots);
>> >  
>> This cannot be right, why should we care what the destination type of
>> the instruction is while lowering a uniform source?  Also I don't
>> think
>> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
>> you'll force type_slots to 1 and end up interpreting the pull_index
>> in
>> the wrong units.  How about:
>> 
>> > 
>> >           inst->src[i].set_smear((pull_index & 3) * 4 /
>> >                                  type_sz(inst->src[i].type));
>> > 
>
> OK
>
>> >           brw_mark_surface_used(prog_data, index);
>> >        }
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 4cd219a..532ca65 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const
>> > fs_builder &bld, nir_intrinsic_instr *instr
>> >           bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
>> > packed_consts,
>> >                    surf_index, const_offset_reg);
>> >  
>> > +         unsigned component_base =
>> > +            (const_offset->u32[0] % 16) / MAX2(1,
>> > type_sz(dest.type));
>> Rather than dividing by the type size only to let set_smear multiply
>> by
>> the type size again, I think it would be cleaner to do something
>> like:
>> 
>> > 
>> >           const fs_reg consts = byte_offset(packed_consts,
>> > const_offset->u32[0] % 16);
>> > 
>> >           for (unsigned i = 0; i < instr->num_components; i++) {
>> then here:
>> 
>> > 
>> >              bld.MOV(offset(dest, bld, i), component(consts, i));
>> and then remove the rest of the loop.
>> 
>
> I am having troubles with adapting patch 13/23 to this way because the
> following assert in component() is failing for some tests:
>     
>     assert(reg.subreg_offset == 0);
>


Ouch, that seems pretty broken, let's fix it (see attachment).

> consts.subreg is not zero thanks to byte_offset() call.
>
> So I prefer to go to a mixed solution: keep set_smear() usage, then:
>
>    bld.MOV(offset(dest, bld, i), packed_consts);
>
> and remove the rest of the loop.
>
> Sam
>
>> > 
>> > -            packed_consts.set_smear(const_offset->u32[0] % 16 / 4
>> > + i);
>> > +            packed_consts.set_smear(component_base + i);
>> >  
>> >              /* The std140 packing rules don't allow vectors to
>> > cross 16-byte
>> >               * boundaries, and a reg is 32 bytes.

From 242fa33e55396630a8385794ffeab5ea6cb6462c Mon Sep 17 00:00:00 2001
From: Francisco Jerez <curroje...@riseup.net>
Date: Wed, 11 May 2016 12:54:26 -0700
Subject: [PATCH] i965/fs: Fix and document component().

This fixes a number of bugs of component() by reimplementing it in
terms of horiz_offset(): Handling of base registers starting at a
non-zero subreg_offset, handling of strided registers and overflow of
subreg_offset into reg_offset.
---
 src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index e4f20f4..57ee816 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -122,12 +122,14 @@ horiz_offset(fs_reg reg, unsigned delta)
    return reg;
 }
 
+/**
+ * Get the scalar channel of \p reg given by \p idx and replicate it to all
+ * channels of the result.
+ */
 static inline fs_reg
 component(fs_reg reg, unsigned idx)
 {
-   assert(reg.subreg_offset == 0);
-   reg.subreg_offset = idx * type_sz(reg.type);
+   reg = horiz_offset(reg, idx);
    reg.stride = 0;
    return reg;
 }
-- 
2.7.3

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

Reply via email to