On Thu, Apr 14, 2016 at 1:50 AM, Iago Toral <[email protected]> wrote:
> On Wed, 2016-04-13 at 08:29 -0700, Jason Ekstrand wrote: > > > > On Apr 13, 2016 7:57 AM, "Connor Abbott" <[email protected]> wrote: > > > > > > On Wed, Apr 13, 2016 at 3:24 AM, Iago Toral <[email protected]> > > wrote: > > > > On Tue, 2016-04-12 at 13:16 -0400, Connor Abbott wrote: > > > >> I'm not sure I'm so comfortable with this. For one, this is the > > sort > > > >> of thing the backend can (and should) do. Why should be be adding > > > >> stuff at the NIR level to help the backend's copy propagation > > pass? > > > >> And perhaps more importantly, why should we be adding stuff at > > the NIR > > > >> level to workaround the lack of spilling support for doubles? I > > know > > > >> it's more difficult to implement than normal spilling, but it > > just > > > >> feels dirty to workaround it because some tests happen to use too > > many > > > >> registers. > > > > > > > > My thoughts for this were that NIR is producing suboptimal output > > for > > > > this case: it is unpacking and repacking an entire 64-bit float > > when it > > > > only really needs to operate on the high 32-bit, so fixing that in > > NIR, > > > > where the issue is introduced, instead of having each backend fix > > it, > > > > seemed to make more sense to me. > > > > > > > > I am not sure that backends can easily fix this either: what they > > are > > > > going to see is a nir_op_pack_double_2x32_split operation, they > > don't > > > > have any context to tell if the low 32-bit argument is actually > > > > different from what is already in the register, other than > > emitting code > > > > to check for that which kind of defeats the purpose of the > > optimization. > > > > I suppose they could still see if the instruction that generated > > the low > > > > 32-bit value is actually an unpack of the same register and do the > > right > > > > thing in that case. Would you like that solution better? > > > > > > > > In any case, I don't think this is something for copy-propagation > > to > > > > handle this is really not about copy-propagating a value, it is > > about > > > > not needing that value to begin with. Register pressure comes from > > > > needing to have the low 32-bit value alive until we recombine it > > in the > > > > 64-bit value. This patch makes it so that the low 32-bit is not > > alive at > > > > any moment, we don't need to care about it at any point, and that > > is why > > > > it helps reduce register pressure. > > > > I think we have two fundamental problems here. One is that spilling > > is broken. The second is that NIR is generating code that the backend > > can't handle without spilling. I can totally believe that the later > > is happening given that, thanks to the flag, a the backend can't move > > two SELs past each other when it schedules and we use SEL very heavily > > in the double lowering. > > > > In this particular case, I do think we have a coalescing problem more > > than a need for a new opcode. Thanks to SSA you could just as easily > > see a pack where one source is an unpack and do what you're doing with > > this "replace the top bits" opcode in fs/vec4_nir without adding a > > special opcode for it. Therefore, even if this little optimization > > helps, I think the opcode is unneeded. I also think it's probably > > solvable as Connor says with better coalescing. > > Yes, this is what I suggested as an alternative to do this in the > backend in my original reply to Connor. There is a small caveat here > though: we would have to assume that if the low 32-bit operand comes > from an unpack opcode, it is an unpack of the same value. We can't > verify this precisely because of SSA. I don't follow. If you have ssa_1 = unpack(ssa_0) ssa_2 = stuff ssa_3 = pack_split(ssa_2, ssa_1.y) that should be exactly equivalent to ssa_2 = stuff ssa_3 = pack_split_y(ssa_0, ssa_2) You can easily detect it and just do the slightly simpler thing. > It is probably okay to assume > this, since I don't imagine situations where we want to unpack a double > and repack its low 32-bit into a different double value, but it is a bit > of a hack :) > Sounds like we have hacks all round :-) I don't know if you're doing this, but one thing that might help substantially with your spilling problems is if we run lower_alu_to_scalar before we lower double ops. Is there a branch I can play with that has these spilling problems? If so, what tests do I need to look at? I'm planning to start playing with making spilling work better and would like to fix it for the fp64 case while I'm at it. > > > Think of what the backend is going to turn this into, though. Say > > you > > > have a sequence like: > > > > > > hi = unpack_double_2x32_split_y foo > > > > > > new_hi = ... > > > > > > lo = unpack_double_2x32_split_x foo > > > bar = pack_double_2x32_split lo, new_hi > > > > > > then this gets transformed to, in the backend: > > > > > > hi = foo (stride 2, offset 1) > > > > > > new_hi = .. > > > > > > lo = foo (stride 2, offset 0) > > > bar (stride 2, offset 0) = lo (stride 2, offset 0) # <- notice this > > > bar (stride 2, offset 1) hi (stride 2, offset 1) > > > > > > Now the move from lo to bar has the same stride, and the same > > offset, > > > and the same offset, and bar obviously doesn't conflict with lo, so > > > register coalescing can combine lo and bar. This will result in: > > > > > > hi = foo (stride 2, offset 1) > > > > > > new_hi = .. > > > > > > bar (stride 2, offset 0) = foo (stride 2, offset 0) > > > bar (stride 2, offset 1) hi (stride 2, offset 1) > > > > > > This is the same code (with the same register pressure) as if you > > had > > > used pack_double_2x32_split_y. In fact, in the bugzilla bug you gave > > > me an example of a shader where this happens, and I pointed it out > > to > > > you. I think all you need to do to fix your problem is move down the > > > unpack_2x32_split_x to right before the pack, since the backend > > isn't > > > smart enough to reorder it. > > > > > > In any case, I think that we should get spilling to work on doubles > > regardless. > > > > > > > > > > > Jason also mentioned that what we should do is fix the spilling > > code. I > > > > agree that we should do that, but I think the underlying issue > > here is > > > > not that, but that NIR's output is leading to inefficient code, > > which is > > > > then leading to unnecessary spilling in some cases. If we fix the > > > > spilling then we can live with the problem, but the problem is > > still > > > > there: we are still having live values that we don't need taking > > some > > > > register space and leading to unnecessary spilling. > > > > > > > > In any case, I understand why you don't like the patch, so I am > > fine > > > > with holding it back and focus on fixing spilling code first. > > FWIW, I > > > > already had a stab at that, but I run into something that I didn't > > quite > > > > figure out at that time: > > > > > > > > > > https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html > > > > > > I didn't notice this fly by, but you're not doing the spilling > > right. > > > For things where the stride * type_size is 64, you can't decompose > > it > > > into two scratch reads/writes, since then the exec mask will be > > wrong. > > > You only want to read/write a channel if the exec mask is enabled > > for > > > it. You have to use a different message to spill doubles, since the > > > normal one (the scattered read/write) assumes the exec mask channels > > > correspond to 32-bit chunks. Jason would know more about this. > > > > > > > > > > > I still have plans to get back to that though. > > > > > > > > Iago > > > > > > > >> Jason, what do you think? > > > >> > > > >> On Tue, Apr 12, 2016 at 4:05 AM, Samuel Iglesias Gonsálvez > > > >> <[email protected]> wrote: > > > >> > From: Iago Toral Quiroga <[email protected]> > > > >> > > > > >> > This is useful when we only need to modify the high 32-bit > > chunk of a double. > > > >> > This is a common case, because this is the part that encodes > > the exponent > > > >> > which we manipulate in some double lowering passes. Although we > > can accomplish > > > >> > the same by using pack_double_2x32, this new opcode is better > > for register > > > >> > pressure, since we don't have to unpack both parts of the > > double and keep > > > >> > the low 32-bits around until we can recombine the new exponent. > > > >> > > > > >> > We will use this opcode in the set_exponent() function of the > > double lowering > > > >> > pass and with that we will fix spilling issues in some dmat4 > > divide > > > >> > piglit tests on Intel. > > > >> > --- > > > >> > src/compiler/nir/nir_opcodes.py | 14 ++++++++++++++ > > > >> > 1 file changed, 14 insertions(+) > > > >> > > > > >> > diff --git a/src/compiler/nir/nir_opcodes.py > > b/src/compiler/nir/nir_opcodes.py > > > >> > index 9f62e08..f92dd8d 100644 > > > >> > --- a/src/compiler/nir/nir_opcodes.py > > > >> > +++ b/src/compiler/nir/nir_opcodes.py > > > >> > @@ -276,6 +276,20 @@ di.i2 = src0.y; > > > >> > dst.x = di.u64; > > > >> > """) > > > >> > > > > >> > +opcode("pack_double_2x32_split_y", 0, tuint64, [0, 0], > > [tuint64, tuint32], "", """ > > > >> > +union { > > > >> > + uint64_t u64; > > > >> > + struct { > > > >> > + uint32_t i1; > > > >> > + uint32_t i2; > > > >> > + }; > > > >> > +} di; > > > >> > + > > > >> > +di.u64 = src0; > > > >> > +di.i1 = src1; > > > >> > +dst = di.u64; > > > >> > +""") > > > >> > + > > > >> > unop_horiz("unpack_double_2x32", 2, tuint32, 1, tuint64, """ > > > >> > union { > > > >> > uint64_t u64; > > > >> > -- > > > >> > 2.5.0 > > > >> > > > > >> > _______________________________________________ > > > >> > mesa-dev mailing list > > > >> > [email protected] > > > >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > >> _______________________________________________ > > > >> mesa-dev mailing list > > > >> [email protected] > > > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > > > > > > > > > >
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
