Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote: >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> >> > >> > From: Iago Toral Quiroga <ito...@igalia.com> >> > >> > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 >> > starting at a >> > constant offset that is 16-byte aligned. If we need to access an >> > unaligned >> > offset we emit a load with an aligned offset and use the remaining >> > constant >> > offset to select the component into the vec4 result that we are >> > interested >> > in. This component must be computed in units of the type size, >> > since that >> > is what fs_reg::set_smear expects. >> > >> > This patch does this change in the two places where we use this >> > message: >> > In demote_pull_constants when we lower uniform access with constant >> > offset >> > into the pull constant buffer and in UBO loads with constant >> > offset. >> > --- >> > src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++- >> > 2 files changed, 5 insertions(+), 2 deletions(-) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > index 0e69be8..dff13ea 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads() >> > inst->src[i].file = VGRF; >> > inst->src[i].nr = dst.nr; >> > inst->src[i].reg_offset = 0; >> > - inst->src[i].set_smear(pull_index & 3); >> > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) / >> > 4); >> > + inst->src[i].set_smear((pull_index & 3) / type_slots); >> > >> This cannot be right, why should we care what the destination type of >> the instruction is while lowering a uniform source? Also I don't >> think >> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1 >> you'll force type_slots to 1 and end up interpreting the pull_index >> in >> the wrong units. How about: >> >> > >> > inst->src[i].set_smear((pull_index & 3) * 4 / >> > type_sz(inst->src[i].type)); >> > > > OK > >> > brw_mark_surface_used(prog_data, index); >> > } >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 4cd219a..532ca65 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const >> > fs_builder &bld, nir_intrinsic_instr *instr >> > bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, >> > packed_consts, >> > surf_index, const_offset_reg); >> > >> > + unsigned component_base = >> > + (const_offset->u32[0] % 16) / MAX2(1, >> > type_sz(dest.type)); >> Rather than dividing by the type size only to let set_smear multiply >> by >> the type size again, I think it would be cleaner to do something >> like: >> >> > >> > const fs_reg consts = byte_offset(packed_consts, >> > const_offset->u32[0] % 16); >> > >> > for (unsigned i = 0; i < instr->num_components; i++) { >> then here: >> >> > >> > bld.MOV(offset(dest, bld, i), component(consts, i)); >> and then remove the rest of the loop. >> > > I am having troubles with adapting patch 13/23 to this way because the > following assert in component() is failing for some tests: > > assert(reg.subreg_offset == 0); >
Ouch, that seems pretty broken, let's fix it (see attachment). > consts.subreg is not zero thanks to byte_offset() call. > > So I prefer to go to a mixed solution: keep set_smear() usage, then: > > bld.MOV(offset(dest, bld, i), packed_consts); > > and remove the rest of the loop. > > Sam > >> > >> > - packed_consts.set_smear(const_offset->u32[0] % 16 / 4 >> > + i); >> > + packed_consts.set_smear(component_base + i); >> > >> > /* The std140 packing rules don't allow vectors to >> > cross 16-byte >> > * boundaries, and a reg is 32 bytes.
From 242fa33e55396630a8385794ffeab5ea6cb6462c Mon Sep 17 00:00:00 2001 From: Francisco Jerez <curroje...@riseup.net> Date: Wed, 11 May 2016 12:54:26 -0700 Subject: [PATCH] i965/fs: Fix and document component(). This fixes a number of bugs of component() by reimplementing it in terms of horiz_offset(): Handling of base registers starting at a non-zero subreg_offset, handling of strided registers and overflow of subreg_offset into reg_offset. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index e4f20f4..57ee816 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h @@ -122,12 +122,14 @@ horiz_offset(fs_reg reg, unsigned delta) return reg; } +/** + * Get the scalar channel of \p reg given by \p idx and replicate it to all + * channels of the result. + */ static inline fs_reg component(fs_reg reg, unsigned idx) { - assert(reg.subreg_offset == 0); - reg.subreg_offset = idx * type_sz(reg.type); + reg = horiz_offset(reg, idx); reg.stride = 0; return reg; } -- 2.7.3
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev