Right now we rely on the code at the bottom of brw_set_dest to set the correct
execution size for anything that does not operate on a full SIMD register
(dst.width < BRW_EXECUTE_8). However, this won't work with fp64, where operands
are twice as big and we see instructions with a horizontal width of 4 that
still require an execution size of 8. We cannot fix this by simply checking the
type of the operands involved and skip the automatic execsize adjustment when
they are doubles because we can also operate on doubles as integers (for pack
and unpack operations for example).
Connor and I have been discussing this and we see two options:
1) We fix all the instructions that do not operate on doubles to set the
correct execution size so they don't depend on the code in brw_set_dest any
more, then remove that code from brw_set_dest so it does not break fp64.
However, this involves a lot of changes throughout the driver and across all
hardware platforms... it looks rather scary, so I think we probably don't want
to go down this path.
2) Since we only produce double instructions with a width of 4, we can limit
the change to these instructions. This means that we only automatically adjust
execsize in brw_set_dest for instructions where dst.width < BRW_EXECUTE_4 and
we fix driver code that emits non-double instructions with a width of 4 to set
the execsize they need explicitly. For gen7/8 it seems that only a handful of
changes are required for this according to piglit.
This RFC series implements 2). As mentioned, piglit seems happy with the
changes, but it might be a good idea to run this through Jenkins to make sure
that we are not missing anything.
My original idea was to expand the fix to cover other gens as well, but
unfortunately, this would involve many more changes. For example, ILK's
strip-and-fans or clipping modules are full with code that mixes width8 and
width4 instructions (I have ~10 patches that fix this that involve changing the
default execution size to 4 in the sf module), gen6 also needs a few changes
and I would not be surprised if gen4 required more. Gen9 might require changes
too. Another problem with this is that we don't have gen4/9 hardware available,
so trying this would also require involvement from other people with access to
this hardware.
So considering that we are not targetting fp64 support on gens < 7, at least
for now, it seems that implementing 2) but special-case it for gen7 and gen8
only, leaving other generations intact is a resoanble way to go for now.
Opinions?
Iago Toral Quiroga (5):
i965/eu: set correct execution size in brw_NOP
i965/fs: set execution size for SEND messages in
generate_uniform_pull_constant_load_gen7
i965/eu: set execution size for SEND message in
brw_send_indirect_message
i965: set correct execsize for MOVS with a width of 4 in
brw_find_live_channel
i965: Skip execution size adjustment for instructions of width 4
src/mesa/drivers/dri/i965/brw_eu_emit.c | 21 ++++++++++++++++++++-
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++
2 files changed, 22 insertions(+), 1 deletion(-)
--
2.1.4
_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev