On 11/05/16 22:46, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> On Tue, 2016-05-10 at 21:06 -0700, Francisco Jerez wrote: >>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >>> >>>> >>>> From: Iago Toral Quiroga <ito...@igalia.com> >>>> >>>> UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 >>>> starting at a >>>> constant offset that is 16-byte aligned. If we need to access an >>>> unaligned >>>> offset we emit a load with an aligned offset and use the remaining >>>> constant >>>> offset to select the component into the vec4 result that we are >>>> interested >>>> in. This component must be computed in units of the type size, >>>> since that >>>> is what fs_reg::set_smear expects. >>>> >>>> This patch does this change in the two places where we use this >>>> message: >>>> In demote_pull_constants when we lower uniform access with constant >>>> offset >>>> into the pull constant buffer and in UBO loads with constant >>>> offset. >>>> --- >>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++- >>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++- >>>> 2 files changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> index 0e69be8..dff13ea 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>> @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads() >>>> inst->src[i].file = VGRF; >>>> inst->src[i].nr = dst.nr; >>>> inst->src[i].reg_offset = 0; >>>> - inst->src[i].set_smear(pull_index & 3); >>>> + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) / >>>> 4); >>>> + inst->src[i].set_smear((pull_index & 3) / type_slots); >>>> >>> This cannot be right, why should we care what the destination type of >>> the instruction is while lowering a uniform source? Also I don't >>> think >>> the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1 >>> you'll force type_slots to 1 and end up interpreting the pull_index >>> in >>> the wrong units. How about: >>> >>>> >>>> inst->src[i].set_smear((pull_index & 3) * 4 / >>>> type_sz(inst->src[i].type)); >>>> >> >> OK >> >>>> brw_mark_surface_used(prog_data, index); >>>> } >>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>> index 4cd219a..532ca65 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>> @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const >>>> fs_builder &bld, nir_intrinsic_instr *instr >>>> bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, >>>> packed_consts, >>>> surf_index, const_offset_reg); >>>> >>>> + unsigned component_base = >>>> + (const_offset->u32[0] % 16) / MAX2(1, >>>> type_sz(dest.type)); >>> Rather than dividing by the type size only to let set_smear multiply >>> by >>> the type size again, I think it would be cleaner to do something >>> like: >>> >>>> >>>> const fs_reg consts = byte_offset(packed_consts, >>>> const_offset->u32[0] % 16); >>>> >>>> for (unsigned i = 0; i < instr->num_components; i++) { >>> then here: >>> >>>> >>>> bld.MOV(offset(dest, bld, i), component(consts, i)); >>> and then remove the rest of the loop. >>> >> >> I am having troubles with adapting patch 13/23 to this way because the >> following assert in component() is failing for some tests: >> >> assert(reg.subreg_offset == 0); >> > > Ouch, that seems pretty broken, let's fix it (see attachment). >
Oh thanks! That fixes my problems. I am going to pick your patch and add it to our version 2 of the branch. Sam >> consts.subreg is not zero thanks to byte_offset() call. >> >> So I prefer to go to a mixed solution: keep set_smear() usage, then: >> >> bld.MOV(offset(dest, bld, i), packed_consts); >> >> and remove the rest of the loop. >> >> Sam >> >>>> >>>> - packed_consts.set_smear(const_offset->u32[0] % 16 / 4 >>>> + i); >>>> + packed_consts.set_smear(component_base + i); >>>> >>>> /* The std140 packing rules don't allow vectors to >>>> cross 16-byte >>>> * boundaries, and a reg is 32 bytes. > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev