Jason Ekstrand <[email protected]> writes: > On Mon, Feb 16, 2015 at 10:40 AM, Francisco Jerez <[email protected]> > wrote: > >> Jason Ekstrand <[email protected]> writes: >> >> > On Feb 16, 2015 9:34 AM, "Francisco Jerez" <[email protected]> >> wrote: >> >> >> >> Jason Ekstrand <[email protected]> writes: >> >> >> >> > On Feb 16, 2015 8:35 AM, "Francisco Jerez" <[email protected]> >> > wrote: >> >> >> >> >> >> The round-robin allocation strategy is expected to decrease the >> amount >> >> >> of false dependencies created by the register allocator and give the >> >> >> post-RA scheduling pass more freedom to move instructions around. On >> >> >> the other hand it has the disadvantage of increasing fragmentation >> and >> >> >> decreasing the number of equally-colored nearby nodes, what increases >> >> >> the likelihood of failure in presence of optimistically colorable >> >> >> nodes. >> >> >> >> >> >> This patch disables the round-robin strategy for optimistically >> >> >> colorable nodes. These typically arise in situations of high >> register >> >> >> pressure or for registers with large live intervals, in both cases >> the >> >> >> task of the instruction scheduler shouldn't be constrained >> excessively >> >> >> by the dense packing of those nodes, and a spill (or on Intel >> hardware >> >> >> a fall-back to SIMD8 mode) is invariably worse than a slightly less >> >> >> optimal scheduling. >> >> > >> >> > Actually, that's not true. Matt was doing some experiments recently >> > with a >> >> > noise shader from synmark and the difference between our 2nd and 3rd >> > choice >> >> > schedulers is huge. In that test he disabled the third choice >> scheduler >> >> > and the result was a shader that spilled 6 or 8 times but ran >> something >> >> > like 30% faster. We really need to do some more experimentation with >> >> > scheduling and figure out better heuristics than "SIMD16 is always >> > faster" >> >> > and "spilling is bad". >> >> > >> >> >> >> Yes, I'm aware of rare corner cases like that where e.g. SIMD16 leads to >> >> higher cache thrashing than SIMD8 leading to decreased overall >> >> performance, and a case where a shader SIMD16 *with* spills has better >> >> performance than the SIMD8 version of the same shader without spills. >> >> >> >> In any case it's not the register allocator's business to implement such >> >> heuristics, and that's not an argument against the register allocator >> >> trying to make a more efficient use of the register file. >> > >> > The primary point I was trying to make is that scheduling *does* matter. >> > It matters a lot. In fact, Matt and i have talked about throwing away >> the >> > SIMD16 program if it ends up using the pessimal schedulong algorithm. >> > Throwing scheduling to the wind just to gain a few SIMD16 programs is >> > probably not a good trade-off. >> > >> In my experience the exact opposite observation has been far more >> common. Running SIMD16 vs SIMD8 has a larger impact on performance than >> the way you end up scheduling things post-regalloc. Actually even if >> you end up causing some unmet instruction dependencies by the way >> instructions are scheduled post-regalloc, the EU can context-switch to >> service the next available thread almost for free when a thread stalls >> on some dependency. Also the fact that you're doing SIMD16 itself makes >> post-regalloc scheduling less important because it naturally has an >> effect in hiding latency. >> >> My intuition is that the huge performance improvement Matt observed by >> disabling the third scheduling heuristic is more likely to have been >> caused by a decrease in the amount of cache thrashing caused by the fact >> that he was running less channels concurrently rather than by the >> scheduling heuristic itself. Matt, did you rule out that possibility? >> >> The other thing is this patch has an effect on the allocation strategy >> for optimistically colorable nodes *only*. We're already heavily >> constrained by register pressure when we get to that point, and assuming >> allocation succeeds the post-regalloc scheduler is going to have little >> room for maneuvering anyway. >> >> > It could be that this is an good idea, but it's going to take more than >> > hand-waved theories about register allocation one shader not spilling to >> > convince me. Do you actually know what it did to scheduling? It >> wouldn't >> > be hard to hack up the driver and shader-db to collect that information. >> > >> 44 shaders going SIMD16 seems like a strong enough argument to me. >> Could you be more precise about what additional information you want me >> to collect? >> > > How many shaders go from the first scheduling method to the second or to > the third. In other words some sort of metric on which shaders are > "helped" or "hurt" in their scheduling.
OK, I hacked the driver to output the scheduling heuristic that had been
used when we allocated registers successfully for the program via
KHR_debug and then ran shader-db before and after applying this patch.
Before this patch:
Heuristic SIMD8 SIMD16
PRE 18924 18598
PRE_NON_LIFO 72 38
PRE_LIFO 8 3
After this patch:
Heuristic SIMD8 SIMD16
PRE 18939 18643
PRE_NON_LIFO 57 37
PRE_LIFO 8 3
So it actually *decreases* the number of shaders falling back to the
latency-insensitive heuristics because the register allocator is more
likely to succeed with the PRE heuristic.
>
>
>> > --Jason
>> >
>> >> >> Shader-db results on the i965 driver:
>> >> >>
>> >> >> total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
>> >> >> instructions in affected programs: 1121 -> 1071 (-4.46%)
>> >> >> helped: 1
>> >> >> HURT: 0
>> >> >> GAINED: 49
>> >> >> LOST: 5
>> >> >> ---
>> >> >> src/util/register_allocate.c | 22 +++++++++++++++++++++-
>> >> >> 1 file changed, 21 insertions(+), 1 deletion(-)
>> >> >>
>> >> >> diff --git a/src/util/register_allocate.c
>> > b/src/util/register_allocate.c
>> >> >> index af7a20c..d63d8eb 100644
>> >> >> --- a/src/util/register_allocate.c
>> >> >> +++ b/src/util/register_allocate.c
>> >> >> @@ -168,6 +168,12 @@ struct ra_graph {
>> >> >>
>> >> >> unsigned int *stack;
>> >> >> unsigned int stack_count;
>> >> >> +
>> >> >> + /**
>> >> >> + * Tracks the start of the set of optimistically-colored
>> registers
>> > in
>> >> > the
>> >> >> + * stack.
>> >> >> + */
>> >> >> + unsigned int stack_optimistic_start;
>> >> >> };
>> >> >>
>> >> >> /**
>> >> >> @@ -454,6 +460,7 @@ static void
>> >> >> ra_simplify(struct ra_graph *g)
>> >> >> {
>> >> >> bool progress = true;
>> >> >> + unsigned int stack_optimistic_start = ~0;
>> >> >> int i;
>> >> >>
>> >> >> while (progress) {
>> >> >> @@ -483,12 +490,16 @@ ra_simplify(struct ra_graph *g)
>> >> >>
>> >> >> if (!progress && best_optimistic_node != ~0U) {
>> >> >> decrement_q(g, best_optimistic_node);
>> >> >> + stack_optimistic_start =
>> >> >> + MIN2(stack_optimistic_start, g->stack_count);
>> >> >> g->stack[g->stack_count] = best_optimistic_node;
>> >> >> g->stack_count++;
>> >> >> g->nodes[best_optimistic_node].in_stack = true;
>> >> >> progress = true;
>> >> >> }
>> >> >> }
>> >> >> +
>> >> >> + g->stack_optimistic_start = stack_optimistic_start;
>> >> >> }
>> >> >>
>> >> >> /**
>> >> >> @@ -542,7 +553,16 @@ ra_select(struct ra_graph *g)
>> >> >> g->nodes[n].reg = r;
>> >> >> g->stack_count--;
>> >> >>
>> >> >> - if (g->regs->round_robin)
>> >> >> + /* Rotate the starting point except for optimistically
>> colorable
>> >> > nodes.
>> >> >> + * The likelihood that we will succeed at allocating
>> > optimistically
>> >> >> + * colorable nodes is highly dependent on the way that the
>> > previous
>> >> >> + * nodes popped off the stack are laid out. The round-robin
>> >> > strategy
>> >> >> + * increases the fragmentation of the register file and
>> > decreases
>> >> > the
>> >> >> + * number of nearby nodes assigned to the same color, what
>> >> > increases the
>> >> >> + * likelihood of spilling with respect to the dense packing
>> >> > strategy.
>> >> >> + */
>> >> >> + if (g->regs->round_robin &&
>> >> >> + g->stack_count <= g->stack_optimistic_start)
>> >> >> start_search_reg = r + 1;
>> >> >> }
>> >> >>
>> >> >> --
>> >> >> 2.1.3
>> >> >>
>> >> >> _______________________________________________
>> >> >> mesa-dev mailing list
>> >> >> [email protected]
>> >> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
pgp3K0E_xMsP7.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
