Samuel Iglesias Gonsálvez <[email protected]> writes: > On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote: >> Samuel Iglesias Gonsálvez <[email protected]> writes: >> >> > From: "Juan A. Suarez Romero" <[email protected]> >> > >> > When converting a DF to F, we set dst stride to 2, to fulfil >> > alignment >> > restrictions. >> > >> > But in IVB/BYT, this is not necessary, as each DF conversion >> > already >> > writes 2 F, the first one the real value, and the second one a 0. >> > That >> > is, IVB/BYT already set stride = 2 implicitly, so we must set it to >> > 1 >> > explicitly to avoid ending up with stride = 4. >> > >> > v2: >> > - Fix typo (Matt) >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++++++ >> > 1 file changed, 10 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> > index 45881e3ec95..487f2e90224 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const cfg_t >> > *cfg, int dispatch_width) >> > inst->src[i].type != BRW_REGISTER_TYPE_UD || >> > !inst->src[i].negate); >> > } >> > + /* When converting from DF->F, we set destination's stride >> > as 2 as an >> > + * alignment requirement. But in IVB/BYT, each DF implicitly >> > writes 2 F, >> > + * being the first one the converted value. So we don't need >> > to >> > + * explicitly set stride 2, but 1. >> > + */ >> > + if (devinfo->gen == 7 && !devinfo->is_haswell && >> > + type_sz(inst->src[0].type) > type_sz(inst->dst.type)) { >> >> This should be exec_type_size(inst) rather than >> type_sz(inst->src[0].type). >> > > Actually it is going to be the same as this is going to catch only DF > -> 32-bit data type conversions. But yeah, you are right. > Are you sure? AFAICT the few lines of this patch were going to be executed for *all* instructions, you didn't even have a check for the execution type of the instruction being FP64...
>> > + assert(inst->dst.stride == 2 || inst->dst.stride == 1); >> > + inst->dst.stride = 1; >> > + } >> >> This is modifying the IR, please don't. >> > > Right, I am going to do the change in brw_eu_emit.c inside the > brw_MOV() function that Matt added in other patch and also when > emitting the MOV indirect. > >> Also I don't think the above has the same semantics as a destination >> region with stride 2... AFAIUI IVB will just write garbage into the >> odd >> channels when the destination type is narrower than a DF which is >> really >> not what a strided move is supposed to do. If that's the case it >> would >> probably be safer to add a new F64TO32 virtual opcode for type >> conversions and assert(inst->dst.stride == 1) here... >> > > I think adding a virtual opcode for this change is likely too much. I > will keep the aforementioned changes. However, I don't have a strong > opinion against it: if you prefer the virtual opcode, we can add it now > or even later as a follow-up patch. > TBH I'm not particularly fond of the idea of adding a virtual conversion opcode for IVB either, but if you don't, I hope you have some other plan to deal with the subtle kind of breakage this could cause... E.g. some future, seemingly unrelated and obviously correct (because it's the codegen pass that will be breaking a contract with this patch) optimizer improvement could rely on the (seemingly obvious) assumption that strided destination regions don't corrupt any of the skipped components (honestly I'm not fully convinced that we don't rely on this already), which would probably work fine except on IVB for this infrequently used feature, and even then the kind of failure mode is likely to be non-deterministic and may not be caught by the simplest test-cases. Some alternatives to adding a virtual opcode: - Continue using regular moves with destination stride 2 as you're doing here, but call the contents of the skipped components undefined after the instruction is executed. Add some simple code to the FS validator pass to warn us if we ever break this assumption -- The most naive implementation of this could just make sure that the destination VGRF of any strided narrowing conversion is only ever written once on IVB (which could conceivably give false positives but never false negatives). - Same as above, but instead of the validation pass, write a legalization pass to fix up invalid strided conversions to use a temporary instead of the real destination which would then copied into the real register (this may have benefits of its own because the same logic could potentially by used to get rid of an amount of cruft from the compiler back-end intended to address execution type alignment restrictions of the destination and sources of various instructions, which gets particularly annoying on CHV/BXT). > Sam > >> > dst = brw_reg_from_fs_reg(compiler->devinfo, inst, >> > &inst->dst, compressed); >> > >> > -- >> > 2.11.0 >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > [email protected] >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
