On Saturday, September 20, 2014 01:56:11 PM Jason Ekstrand wrote: > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <[email protected]> wrote: > > > On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <[email protected]> > > wrote: > > > --- > > > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24 > > ++++++++++++++++++++++- > > > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++ > > > 2 files changed, 28 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > > > index b42f1e5..41e8855 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp > > > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int > > reg_width) > > > /* Compute the total number of registers across all classes. */ > > > int ra_reg_count = 0; > > > for (int i = 0; i < class_count; i++) { > > > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count; > > > ra_reg_count += base_reg_count - (class_sizes[i] - 1); > > > } > > > + screen->wm_reg_sets[index].class_first_reg[class_count] = > > ra_reg_count; > > > > > > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count); > > > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count); > > > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling) > > > } > > > > > > setup_payload_interference(g, payload_node_count, > > first_payload_node); > > > - if (brw->gen >= 7) > > > + if (brw->gen >= 7) { > > > setup_mrf_hack_interference(g, first_mrf_hack_node); > > > > > > + foreach_in_list(fs_inst, inst, &instructions) { > > > + /* When we do send-from-GRF for FB writes, we need to ensure > > that > > > + * the last write instruction sends from a high register. > > This is > > > + * because the vertex fetcher wants to start filling the low > > > + * payload registers while the pixel data port is still > > working on > > > + * writing out the memory. If we don't do this, we get > > rendering > > > + * artifacts. > > > + * > > > + * We could just do "something high". Instead, we just pick > > the > > > + * highest register that works. > > > + */ > > > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) { > > > + int size = virtual_grf_sizes[inst->src[0].reg]; > > > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] - > > 1; > > > + ra_set_node_reg(g, inst->src[0].reg, reg); > > > + break; > > > + } > > > + } > > > + } > > > + > > > if (dispatch_width > 8) { > > > /* In 16-wide dispatch we have an issue where a compressed > > > * instruction is actually two instructions executed > > simultaneiously. > > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h > > b/src/mesa/drivers/dri/i965/intel_screen.h > > > index 945f6f5..8bc6abd 100644 > > > --- a/src/mesa/drivers/dri/i965/intel_screen.h > > > +++ b/src/mesa/drivers/dri/i965/intel_screen.h > > > @@ -90,6 +90,11 @@ struct intel_screen > > > int classes[16]; > > > > > > /** > > > + * Array of the first ra reg in eahch ra class. > > > > Typo. > > > > Thanks > > > > > > > + */ > > > + int class_first_reg[17]; > > > + > > > + /** > > > * Mapping for register-allocated objects in *regs to the first > > > * GRF for that object. > > > */ > > > -- > > > 2.1.0 > > > > > > _______________________________________________ > > > mesa-dev mailing list > > > [email protected] > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > I think it would be less confusing if instead of introducing a > > "class_first_reg" array you added a class_last_reg array and then did > > class_last_reg[size - 1] - iiuc, you're relying on the fact that the > > first reg of one class minus one is the last reg of the previous > > class, which isn't obvious and it takes a while to realize what's > > going on. > > > > That's fair. I can change that easily enough. It doesn't really matter > that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1] > == class_size. But it doesn't matter and what you suggested would be far > more obvious. > > > > BTW, just so you know, I don't think it's generally a good idea to set > > more than one node to be pre-allocated to the same register, since > > things will blow up if those nodes interfere. It's probably OK here > > since I can't think of a way that an input register preallocated to > > the highest register would interfere with the input to the fb write, > > > > Yes, I'm aware of the problem. However, the only things we preallocate are > payload registers which only ever go at the beginning and the final FB > write which goes at the end. As long as we don't start preallocating more > things, I think we'll be fine. Also, isn't there some cap on how many > payload registers we can have? If so then that already solves our problem.
Here are the hardware limits (IVB PRM Volume 2 Part 1): - At most 64 registers of push constants [3DSTATE_CONSTANT(Body)] - At most 32 varyings [SF Output Attributes], at two registers each [pg 349]: 64 registers of varyings. - At most 64 registers of other payload data in SIMD32, or at most 33 registers of other payload data in SIMD16. Of course, that's a completely pessimal case...
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
