fs: Get 64-bit indirect moves working on IVB.

Samuel Iglesias Gonsálvez Sun, 12 Feb 2017 22:15:45 -0800

On Fri, 2017-02-10 at 10:44 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <[email protected]> writes:
> 
> > On Thu, 2017-02-09 at 18:28 -0800, Francisco Jerez wrote:
> > > Francisco Jerez <[email protected]> writes:
> > > 
> > > > ---
> > > > This replaces "[PATCH v2 09/20] i965/fs: indirect addressing
> > > > with
> > > > doubles is not supported in IVB/BYT".
> > > > 
> > > 
> > > Note that some of the fp64 indirect addressing test-cases still
> > > fail
> > > on
> > > IVB even with this patch applied, but the reason doesn't seem to
> > > have
> > > anything to do with indirect addressing.  Apparently the
> > > remaining
> > > failures are caused by illegal code like:
> > > 
> > > > sel(8)          g30<1>D         g9<0,2,1>DF     g8.3<0,2,1>DF  
> > > >  {
> > > > align1 1Q };
> > > 
> > > The problem is that the SEL instruction doesn't support most
> > > datatype
> > > conversions, so this leads to data corruption on at least IVB.
> > > According to the hardware docs, the only valid destination type
> > > of
> > > SEL
> > > is DF if the execution type is DF, and F (or HF on CHV+) if the
> > > execution type is F.  Integer 16 or 32-bit execution types seem
> > > to
> > > allow
> > > converting to other single-precision integer types.
> > > 
> > 
> > Some weeks ago I saw this issue while working in a solution for
> > IVB's
> > MOV INDIRECT. That SEL is added by opt_peephole_sel():
> > 
> > https://lists.freedesktop.org/archives/mesa-dev/2017-January/142023
> > .htm
> > l
> > 
> > However,  I did it very restrictive by just don't allowing any
> > conversion between different data type sizes. I am going to improve
> > that patch to include all the allowed cases.
> > 
> 
> I wouldn't worry too much about the couple of supported type
> conversions, it doesn't seem terribly important, my concern with your
> patch is that it does seem to allow several type conversions which
> are
> invalid, like D->F, F->D, DF->Q, etc.
>


Right. I can just disallowing all the conversions (src.type !=
dst.type) in that pass... but I prefer the solutions suggested below.

> Aside from that I'm not particularly confident that your patch will
> fix
> all potential sources of invalid conversions.  E.g. do we know that
> no
> other optimization pass will ever change the destination type of the
> SEL
> assuming that the initial type was correct?  What about other
> instructions with double-to-single or single-to-double type
> conversion
> restrictions? (A quick look at the PRMs would likely reveal
> instructions
> with similar restrictions) -- I don't think we can answer these
> questions without auditing the rest of the compiler back-end (while
> doing this in a legalization pass would be pretty much obviously
> correct).  Alternatively you could fix the EU validator to recognize
> this and other cases of invalid conversions, which would make sure we
> get an assertion failure in the likely case that we've missed
> something
> so we can find the problem quickly instead of spending hours
> debugging
> non-deterministic data corruption issues.  I'm okay either way,
> validation or legalization pass, do it as you like.
> 

OK, thanks.

Sam

> > > I'm not sure why we haven't seen this cause massive breakage on
> > > HSW+,
> > > but I think we need some sort of legalization pass to do the type
> > > conversion in a separate MOV for any instruction with an invalid
> > > destination conversion.  You may be able to do it within the d2x
> > > pass
> > > but then it would make sense to rename it to something more
> > > appropriate
> > > like destination conversion lowering (lower_cvt? :P).
> > > 
> > 
> > :)
> > 
> > > In addition there seem to be other issues with your d2x lowering
> > > code
> > > (I'm bringing this up here because you don't seem to have sent
> > > the
> > > d2x
> > > changes for review to the mailing list yet) -- It removes the
> > > original
> > > instruction and creates a new one with the same opcode and first
> > > few
> > > sources, which will miscompile the program if the original
> > > instruction
> > > had a larger number of sources or any other instruction controls
> > > set.  I
> > > suggest you modify the original instruction in-place to point it
> > > to
> > > the
> > > temporary you've allocated, and then copy the data into the
> > > original
> > > destination by using a strided move.
> > > 
> > 
> > OK.
> > 
> > Thanks,
> > 
> > Sam
> > 
> > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
> > > > ++++++++++++++++++++++++--
> > > >  1 file changed, 25 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > index ea4a3fe1399..0e2dbe23571 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
> > > > *inst,
> > > >        brw_MOV(p, dst, reg);
> > > >     } else {
> > > >        /* Prior to Broadwell, there are only 8 address
> > > > registers.
> > > > */
> > > > -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > > > +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
> > > >  
> > > >        /* We use VxH indirect addressing, clobbering a0.0
> > > > through
> > > > a0.7. */
> > > >        struct brw_reg addr = vec8(brw_address_reg(0));
> > > > @@ -478,7 +478,30 @@
> > > > fs_generator::generate_mov_indirect(fs_inst
> > > > *inst,
> > > >         * code, using it saves us 0 instructions and would
> > > > require
> > > > quite a bit
> > > >         * of case-by-case work.  It's just not worth it.
> > > >         */
> > > > -      brw_ADD(p, addr, indirect_byte_offset,
> > > > brw_imm_uw(imm_byte_offset));
> > > > +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
> > > > type_sz(reg.type) < 8) {
> > > > +         brw_ADD(p, addr, indirect_byte_offset,
> > > > brw_imm_uw(imm_byte_offset));
> > > > +      } else {
> > > > +         /* IVB reads two address register components per
> > > > channel
> > > > for
> > > > +          * indirectly addressed 64-bit sources, so we need to
> > > > initialize
> > > > +          * adjacent address components to consecutive dwords
> > > > of
> > > > the source
> > > > +          * region by emitting two separate ADD
> > > > instructions.  Found
> > > > +          * empirically.
> > > > +          */
> > > > +         assert(inst->exec_size <= 4);
> > > > +         brw_push_insn_state(p);
> > > > +         brw_set_default_exec_size(p, cvt(inst->exec_size) -
> > > > 1);
> > > > +
> > > > +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
> > > > +                 brw_imm_uw(imm_byte_offset));
> > > > +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst,
> > > > true);
> > > > +
> > > > +         brw_ADD(p, spread(suboffset(addr, 1), 2),
> > > > indirect_byte_offset,
> > > > +                 brw_imm_uw(imm_byte_offset + 4));
> > > > +         brw_inst_set_no_dd_check(devinfo, brw_last_inst,
> > > > true);
> > > > +
> > > > +         brw_pop_insn_state(p);
> > > > +      }
> > > > +
> > > >        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
> > > >  
> > > >        brw_inst *mov = brw_MOV(p, dst, retype(ind_src,
> > > > dst.type));
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > [email protected]
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: This is a digitally signed message part

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv3 09/20] i965/fs: Get 64-bit indirect moves working on IVB.

Reply via email to