Francisco Jerez <[email protected]> writes: > Neil Roberts <[email protected]> writes: > >> Are you sure this patch is necessary? The documentation for the multiply >> instruction on BDW+ says: >> >> SourceType : *D >> DestinationType : *D >> Project : EXCLUDE(CHV) >> >> This to me implies that it should work on BXT because it doesn't say >> EXCLUDE(BXT). >> > > In fact both CHV and BXT *can* support 32x32 multiplication, that's not > really the reason that motivated this patch -- The problem is that 32x32 > multiplication has QWORD execution type which has several restrictions > on CHV and BXT, the annoying one is: > > "CHV, BXT > > When source or destination datatype is 64b or operation is integer DWord > multiply, regioning in Align1 must follow these rules: > > Source and Destination horizontal stride must be aligned to the same > qword. > > Example: > [...] > // mul (4) r10.0<2>:d r11.0<8;4,2>:d r12.0<8;4,2>:d // Source and > Destination stride must be 2 since the execution type is Qword." > > So it sounds like we could use the native 32x32 multiply with some > additional effort to pass the arguments with the correct stride, but > it's not clear to me whether the solution would be any more better than > splitting up the multiply into 32x16 multiplies, so doing the same as in > CHV and pre-Gen8 seemed like the most KISS solution for now. > >> I made a little test case and ran it on my BXT and it seems to work even >> without this patch. I looked at the assembler source and it is >> definitely doing a MUL instruction with D types for the dst and two >> sources. >> >> https://github.com/bpeel/piglit/blob/test-integer-multiply/tests/general/mult32.c >> > Hmm, I guess the docs could be wrong, but I'm not sure what the > consequences would be of violating the alignment rule, I guess the > failure may be non-deterministic. What stepping of BXT did you test > this on? > I just tried this on the BXT simulator, and it doesn't seem to be too happy about multiplies breaking the alignment rule:
| .aub:0x0> CEuExecUnit::CDecodeInst::RegionCheck>Instruction is:
| mul (16) r3.0<1>:d r2.3<0;1,0>:d 0xe1ac99a1:d { Align1, N1,
NoCompact }.
| For 64-bit Align1 operation or multiplication of dwords in this
| device, destination horizontal stride must be aligned to qword.
|
| Fail
>> Regards,
>> - Neil
>>
>> Francisco Jerez <[email protected]> writes:
>>
>>> AFAIK BXT has the same annoying alignment limitation as CHV on the
>>> source register regions of 32x32 bit MULs, give it the same treatment.
>>> ---
>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 244f299..fc9f007 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -3126,9 +3126,9 @@ fs_visitor::lower_integer_multiplication()
>>> bool progress = false;
>>>
>>> /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit operation
>>> - * directly, but Cherryview cannot.
>>> + * directly, but CHV/BXT cannot.
>>> */
>>> - if (devinfo->gen >= 8 && !devinfo->is_cherryview)
>>> + if (devinfo->gen >= 8 && !devinfo->is_cherryview &&
>>> !devinfo->is_broxton)
>>> return false;
>>>
>>> foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
>>> --
>>> 2.4.6
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> [email protected]
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
