On Apr 13, 2016 7:57 AM, "Connor Abbott" <[email protected]> wrote: > > On Wed, Apr 13, 2016 at 3:24 AM, Iago Toral <[email protected]> wrote: > > On Tue, 2016-04-12 at 13:16 -0400, Connor Abbott wrote: > >> I'm not sure I'm so comfortable with this. For one, this is the sort > >> of thing the backend can (and should) do. Why should be be adding > >> stuff at the NIR level to help the backend's copy propagation pass? > >> And perhaps more importantly, why should we be adding stuff at the NIR > >> level to workaround the lack of spilling support for doubles? I know > >> it's more difficult to implement than normal spilling, but it just > >> feels dirty to workaround it because some tests happen to use too many > >> registers. > > > > My thoughts for this were that NIR is producing suboptimal output for > > this case: it is unpacking and repacking an entire 64-bit float when it > > only really needs to operate on the high 32-bit, so fixing that in NIR, > > where the issue is introduced, instead of having each backend fix it, > > seemed to make more sense to me. > > > > I am not sure that backends can easily fix this either: what they are > > going to see is a nir_op_pack_double_2x32_split operation, they don't > > have any context to tell if the low 32-bit argument is actually > > different from what is already in the register, other than emitting code > > to check for that which kind of defeats the purpose of the optimization. > > I suppose they could still see if the instruction that generated the low > > 32-bit value is actually an unpack of the same register and do the right > > thing in that case. Would you like that solution better? > > > > In any case, I don't think this is something for copy-propagation to > > handle this is really not about copy-propagating a value, it is about > > not needing that value to begin with. Register pressure comes from > > needing to have the low 32-bit value alive until we recombine it in the > > 64-bit value. This patch makes it so that the low 32-bit is not alive at > > any moment, we don't need to care about it at any point, and that is why > > it helps reduce register pressure.
I think we have two fundamental problems here. One is that spilling is broken. The second is that NIR is generating code that the backend can't handle without spilling. I can totally believe that the later is happening given that, thanks to the flag, a the backend can't move two SELs past each other when it schedules and we use SEL very heavily in the double lowering. In this particular case, I do think we have a coalescing problem more than a need for a new opcode. Thanks to SSA you could just as easily see a pack where one source is an unpack and do what you're doing with this "replace the top bits" opcode in fs/vec4_nir without adding a special opcode for it. Therefore, even if this little optimization helps, I think the opcode is unneeded. I also think it's probably solvable as Connor says with better coalescing. > Think of what the backend is going to turn this into, though. Say you > have a sequence like: > > hi = unpack_double_2x32_split_y foo > > new_hi = ... > > lo = unpack_double_2x32_split_x foo > bar = pack_double_2x32_split lo, new_hi > > then this gets transformed to, in the backend: > > hi = foo (stride 2, offset 1) > > new_hi = .. > > lo = foo (stride 2, offset 0) > bar (stride 2, offset 0) = lo (stride 2, offset 0) # <- notice this > bar (stride 2, offset 1) hi (stride 2, offset 1) > > Now the move from lo to bar has the same stride, and the same offset, > and the same offset, and bar obviously doesn't conflict with lo, so > register coalescing can combine lo and bar. This will result in: > > hi = foo (stride 2, offset 1) > > new_hi = .. > > bar (stride 2, offset 0) = foo (stride 2, offset 0) > bar (stride 2, offset 1) hi (stride 2, offset 1) > > This is the same code (with the same register pressure) as if you had > used pack_double_2x32_split_y. In fact, in the bugzilla bug you gave > me an example of a shader where this happens, and I pointed it out to > you. I think all you need to do to fix your problem is move down the > unpack_2x32_split_x to right before the pack, since the backend isn't > smart enough to reorder it. > > In any case, I think that we should get spilling to work on doubles regardless. > > > > > Jason also mentioned that what we should do is fix the spilling code. I > > agree that we should do that, but I think the underlying issue here is > > not that, but that NIR's output is leading to inefficient code, which is > > then leading to unnecessary spilling in some cases. If we fix the > > spilling then we can live with the problem, but the problem is still > > there: we are still having live values that we don't need taking some > > register space and leading to unnecessary spilling. > > > > In any case, I understand why you don't like the patch, so I am fine > > with holding it back and focus on fixing spilling code first. FWIW, I > > already had a stab at that, but I run into something that I didn't quite > > figure out at that time: > > > > https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html > > I didn't notice this fly by, but you're not doing the spilling right. > For things where the stride * type_size is 64, you can't decompose it > into two scratch reads/writes, since then the exec mask will be wrong. > You only want to read/write a channel if the exec mask is enabled for > it. You have to use a different message to spill doubles, since the > normal one (the scattered read/write) assumes the exec mask channels > correspond to 32-bit chunks. Jason would know more about this. > > > > > I still have plans to get back to that though. > > > > Iago > > > >> Jason, what do you think? > >> > >> On Tue, Apr 12, 2016 at 4:05 AM, Samuel Iglesias Gonsálvez > >> <[email protected]> wrote: > >> > From: Iago Toral Quiroga <[email protected]> > >> > > >> > This is useful when we only need to modify the high 32-bit chunk of a double. > >> > This is a common case, because this is the part that encodes the exponent > >> > which we manipulate in some double lowering passes. Although we can accomplish > >> > the same by using pack_double_2x32, this new opcode is better for register > >> > pressure, since we don't have to unpack both parts of the double and keep > >> > the low 32-bits around until we can recombine the new exponent. > >> > > >> > We will use this opcode in the set_exponent() function of the double lowering > >> > pass and with that we will fix spilling issues in some dmat4 divide > >> > piglit tests on Intel. > >> > --- > >> > src/compiler/nir/nir_opcodes.py | 14 ++++++++++++++ > >> > 1 file changed, 14 insertions(+) > >> > > >> > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > >> > index 9f62e08..f92dd8d 100644 > >> > --- a/src/compiler/nir/nir_opcodes.py > >> > +++ b/src/compiler/nir/nir_opcodes.py > >> > @@ -276,6 +276,20 @@ di.i2 = src0.y; > >> > dst.x = di.u64; > >> > """) > >> > > >> > +opcode("pack_double_2x32_split_y", 0, tuint64, [0, 0], [tuint64, tuint32], "", """ > >> > +union { > >> > + uint64_t u64; > >> > + struct { > >> > + uint32_t i1; > >> > + uint32_t i2; > >> > + }; > >> > +} di; > >> > + > >> > +di.u64 = src0; > >> > +di.i1 = src1; > >> > +dst = di.u64; > >> > +""") > >> > + > >> > unop_horiz("unpack_double_2x32", 2, tuint32, 1, tuint64, """ > >> > union { > >> > uint64_t u64; > >> > -- > >> > 2.5.0 > >> > > >> > _______________________________________________ > >> > mesa-dev mailing list > >> > [email protected] > >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >> _______________________________________________ > >> mesa-dev mailing list > >> [email protected] > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > >
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
