Connor Abbott <[email protected]> writes: > Hi all, > > While working on FP64 for i965, there's an issue that I thought of > with the vec4 backend that I'm not sure how to resolve. From what I > understand, the execmask works the same way in Align16 mode as Align1 > mode, except that you only use the first 8 channels in practice for > SIMD4x2, and the first four channels are always the same as well as > the last 4 channels. But this doesn't work for 64-bit things, since > there we only operate on 4 components at the same time, so it's more > like SIMD2x2. For example, imagine that only the second vertex is > currently enabled at the moment. Then the execmask looks like > 00001111, and if we do something like: > > mul(4) g24<1>DF g12<4,4,1>DF g13<4,4,1>DF { align16 }; > > then all 4 channels will be disabled, which is not what we want. > AFAIUI this shouldn't be a problem. In align16 mode each component of an instruction with double-precision execution type maps to *two* bits of the execmask instead of one (one for each 32-bit half), which is compensated by each logical thread having two components instead of four, so in your example [assuming 00001111 is little-endian notation and you actually do 'mul(8)' ;)] the x and y components of the first logical thread will be disabled while the x and y components of the second logical thread will be enabled.
> I think the first thing to do is to write a piglit test that tests > this case, since currently all the arb_gpu_shader_fp64 tests only use > uniforms. We need a test that uses non-uniform control flow that > triggers the case described above. Once we do that, and if we > determine there's actually a problem, then we need to figure out how > to solve it.. The ideas I had were: > I guess a piglit test would be nice, but you're unlikely to have to do much about it. ;) > 1. make every FP64 thing use WE_all. This isn't actually too bad at > the moment, since our notion of interference already assumes > (more-or-less) that everything is WE_all, but it prevents us from > improving it in the future with FP64 things. Unfortunately, it also > means that we can't use writemasks since setting WE_all makes the EU > ignore the writemask, so we'll have to do some trickery to get things > with only 1 channel enabled to work correctly. > > 2. Use the NibCtrl field, and split each FP64 operation into 2. > Unfortunately, this field only appeared on gen8, and the PRM only says > it works for SIMD4 operations, whereas we need it to work for SIMD2 > operations, although there's a chance it'll actually work for SIMD2 as > well. This lets us potentially do better register allocation, but it > might not work and even if it does it won't work for gen7. > NibCtrl is Gen7+ actually. I believe that indeed has a good chance of working for Align16 2-wide DF instructions but I don't know for sure offhand. > #1 sounds like the better solution for now, but who knows... maybe the > HW people magically made it work already, and I'm not aware or they > didn't document it. > > Connor > _______________________________________________ > mesa-dev mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
