compiler: validate region restrictions for half-float conversions

Francisco Jerez Fri, 01 Mar 2019 19:15:32 -0800

Iago Toral <[email protected]> writes:

> On Fri, 2019-03-01 at 09:39 +0100, Iago Toral wrote:
>> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > Iago Toral <[email protected]> writes:
>> > 
>> > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > Iago Toral <[email protected]> writes:
>> > > > 
>> > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral Quiroga <[email protected]> writes:
>> > > > > > 
>> > > > > > > ---
>> > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
>> > > > > > > ++++++++++++-
>> > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > ++++++++++++++++++++++++
>> > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > 
>> > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > general_restrictions_based_on_operand_types(const struct
>> > > > > > > gen_device_info *devinf
>> > > > > > >         exec_type_size == 8 && dst_type_size == 4)
>> > > > > > >        dst_type_size = 8;
>> > > > > > >  
>> > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > +    *
>> > > > > > > +    *    "There is no direct conversion from HF to DF or
>> > > > > > > DF to
>> > > > > > > HF.
>> > > > > > > +    *     There is no direct conversion from HF to Q/UQ
>> > > > > > > or
>> > > > > > > Q/UQ to
>> > > > > > > HF."
>> > > > > > > +    */
>> > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > inst);
>> > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > BRW_OPCODE_MOV
>> > > > > > > &&
>> > > > > > 
>> > > > > > Why is only the MOV instruction handled here and
>> > > > > > below?  Aren't
>> > > > > > other
>> > > > > > instructions able to do implicit conversions?  Probably
>> > > > > > means
>> > > > > > you
>> > > > > > need
>> > > > > > to deal with two sources rather than one.
>> > > > > 
>> > > > > This comes from the programming notes of the MOV instruction
>> > > > > (Volume
>> > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > described
>> > > > > specifically for the MOV instruction. I should probably have
>> > > > > made
>> > > > > this
>> > > > > clear in the comment.
>> > > > > 
>> > > > 
>> > > > Maybe the one above is specified in the MOV page only, probably
>> > > > due
>> > > > to
>> > > > an oversight (If these restrictions were really specific to the
>> > > > MOV
>> > > > instruction, what would prevent you from implementing such
>> > > > conversions
>> > > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
>> > > > which
>> > > > would be substantially more efficient than what you're doing in
>> > > > PATCH
>> > > > 02)
>> > > 
>> > > Instructions that take HF can only be strictly HF or mix F and HF
>> > > (mixed mode float), with MOV being the only exception. That means
>> > > that
>> > > any instruction like the one you use above are invalid. Maybe we
>> > > should
>> > > validate explicitly that instructions that use HF are strictly HF
>> > > or
>> > > mixed-float mode only?
>> > > 
>> > 
>> > So you're acknowledging that the conversions checked above are
>> > illegal
>> > whether the instruction is a MOV or something else.  Where are we
>> > preventing instructions other than MOV with such conversions from
>> > being
>> > accepted by the validator?
>> 
>> We aren't, because the validator is not checking, in general, for
>> accepted type combinations for specific instructions anywhere as far
>> as
>> I can see. If we want to start doing this with HF conversions
>> specifically, I guess it is fine, but in case it is not clear, I
>> think
>> it won't bring any immediate benefits with the
>> VK_KHR_shader_float16_int8 implementation since this series only ever
>> emits conversions involving HF operands via MOV instructions, which
>> is
>> why I thought that validating that no direct MOV conversions from
>> DF/Q
>> types ever happen was useful, since we have code in the driver to
>> handle this scenario.
>> 
>> > > >  but I see other restriction checks in this patch which are
>> > > > certainly specified in the generic regioning restrictions page
>> > > > and
>> > > > you're limiting to the MOV instruction...
>> > > 
>> > > There are two rules below:
>> > > 
>> > > 1. The one about conversions between integer and half-float.
>> > > Again,
>> > > these can only happen through MOV for the same reasons, so I
>> > > think
>> > > this
>> > > one should be fine.
>> > > 
>> > 
>> > Why do you think that can only happen through a MOV
>> > instruction?  The
>> > hardware spec lists the following as a valid example in the
>> > register
>> > region restrictions page:
>> > 
>> > > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
>> 
>> I thought these instructions were not valid. I looked again at the
>> BDW
>> PRM and indeed, it lists that HF destination for ADD is only allowed
>> with HF source. The same for all other instructions that can take HF
>> except (S)MOV. However, I have just now realized that the SKL+ PRM is
>> different and it allows ADD to do this, but as far as I can see, ADD
>> is
>> the only exception: you cannot do this with MUL, MAD or MATH for
>> example. I am not sure what to make of this, but I guess that if we
>> want to validate this it doesn't hurt to be safe and assume that we
>> can
>> have more instructions doing implicit conversions.
>> 
>> If course, there is the fact that we never emit these kind of
>> conversions anyway, at least not with this series (we don't even use
>> mixed-float mode) but I guess it is fine to add that validation if
>> only
>> to be safe in the future if we ever decide to start using the extra
>> flexibility allowed by the hardware.
>
> I am just now realizing that in Align16 mode, we always expect
> destination horizontal stride to be exactly 1, no matter the type. This
> means that Align16 mixed float instructions with HF destination
> (implicit conversion from F->HF) would also require this (actually the
> docs make this requirement explicit for mixed float instructions
> anyway). This would make it impossible to satisfy the rule for DWord
> alignment on F->HF conversions for implicit conversions in mixed float
> mode instructions. I guess these conversion restrictions are formulated
> specifically for Align1 and should be skipped for Align16, even if that
> is not explicitly stated in the docs. Does that make sense?
>


Yes, that's what I would assume.  The hardware spec is seriously
contradictory indeed...

>> 
>> > > 2. The one about word destinations (of which we are only really
>> > > implementing conversions from F->HF). Here the rule is more
>> > > generic
>> > > and
>> > > I agree that expanding this to include any other mixed float mode
>> > > instruction would make sense. However, validation for mixed float
>> > > mode
>> > > has its own set rules, some of which are incompatible with the
>> > > general
>> > > region restrictions being validated here, so I think it is
>> > > inconvenient
>> > > to try and do that validation  here (see patch 36 and then patch
>> > > 37).
>> > > What I would propose, if you agree, is that we only implement
>> > > this
>> > > for
>> > > MOV here, and then for mixed float mode instructions, we
>> > > implement
>> > > the
>> > > more generic version of this check (that would then go in patch
>> > > 37).
>> > > How does that sound?
>> > > 
>> > 
>> > I still don't understand why you want to implement the same
>> > restriction
>> > twice, once for MOV and once for all other mixed-mode
>> > instructions.  How
>> > is that more convenient?
>> 
>> The restrictions based on operand types that are checked in the
>> validator are specific to Byte or cases where the execution type is
>> larger than the destination stride, for which mixed float has a
>> different set of restrictions.
>> 
>> For example, for mixed float we have this rule:
>> 
>> "In Align1, destination stride can be smaller than execution type"
>> 
>> Which overrides this other from "General restrictions based on
>> operand
>> types":
>> 
>> "Destination stride must be equal to the ratio of the sizes of the
>> execution data type to the destination type"
>> 
>> So I thought that it would make things easier to keep all
>> restrictions
>> for mixed float separate and make sure that we skipped mixed float
>> instructions in general_restrictions_based_on_operand_types() so we
>> avoid having to add code to skip individual general restrictions that
>> that are overriden for mixed float mode anyway.
>> 
>> Anyway, I'll try to rework the patches to do more generic validation
>> of
>> HF conversions like you ask and see what kind of code we end up with.
>> 
>> > > > > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > > > > type_sz(src0_type) == 8) ||
>> > > > > > > +             (dst_type_size == 8 && src0_type ==
>> > > > > > > BRW_REGISTER_TYPE_HF)),
>> > > > > > > +            "There are no direct conversion between 64-
>> > > > > > > bit
>> > > > > > > types
>> > > > > > > and HF");
>> > > > > > > +
>> > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > +    *
>> > > > > > > +    *   "Conversion between Integer and HF (Half Float)
>> > > > > > > must
>> > > > > > > be
>> > > > > > > +    *    DWord-aligned and strided by a DWord on the
>> > > > > > > destination."
>> > > > > > > +    *
>> > > > > > > +    * But this seems to be expanded on CHV and SKL+ by:
>> > > > > > > +    *
>> > > > > > > +    *   "There is a relaxed alignment rule for word
>> > > > > > > destinations.
>> > > > > > > When
>> > > > > > > +    *    the destination type is word (UW, W, HF),
>> > > > > > > destination
>> > > > > > > data types
>> > > > > > > +    *    can be aligned to either the lowest word or the
>> > > > > > > second
>> > > > > > > lowest
>> > > > > > > +    *    word of the execution channel. This means the
>> > > > > > > destination
>> > > > > > > data
>> > > > > > > +    *    words can be either all in the even word
>> > > > > > > locations or
>> > > > > > > all
>> > > > > > > in the
>> > > > > > > +    *    odd word locations."
>> > > > > > > +    *
>> > > > > > > +    * We do not implement the second rule as is though,
>> > > > > > > since
>> > > > > > > empirical testing
>> > > > > > > +    * shows inconsistencies:
>> > > > > > > +    *   - It suggests that packed 16-bit is not allowed,
>> > > > > > > which
>> > > > > > > is
>> > > > > > > not true.
>> > > > > > > +    *   - It suggests that conversions from Q/DF to W
>> > > > > > > (which
>> > > > > > > need
>> > > > > > > to be 64-bit
>> > > > > > > +    *     aligned on the destination) are not possible,
>> > > > > > > which
>> > > > > > > is
>> > > > > > > not true.
>> > > > > > > +    *   - It suggests that conversions from 16-bit
>> > > > > > > executions
>> > > > > > > types to W need
>> > > > > > > +    *     to be 32-bit aligned, which doesn't seem to be
>> > > > > > > necessary.
>> > > > > > > +    *
>> > > > > > > +    * So from this rule we only validate the implication
>> > > > > > > that
>> > > > > > > conversion from
>> > > > > > > +    * F to HF needs to be DWord aligned too (in BDW this
>> > > > > > > is
>> > > > > > > limited to
>> > > > > > > +    * conversions from integer types).
>> > > > > > > +    */
>> > > > > > > +   bool is_half_float_conversion =
>> > > > > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
>> > > > > > > &&
>> > > > > > > +       dst_type != src0_type &&
>> > > > > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
>> > > > > > > BRW_REGISTER_TYPE_HF);
>> > > > > > > +
>> > > > > > > +   if (is_half_float_conversion) {
>> > > > > > > +      assert(devinfo->gen >= 8);
>> > > > > > > +
>> > > > > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > > > > brw_reg_type_is_integer(src0_type)) ||
>> > > > > > > +          (brw_reg_type_is_integer(dst_type) &&
>> > > > > > > src0_type
>> > > > > > > ==
>> > > > > > > BRW_REGISTER_TYPE_HF)) {
>> > > > > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
>> > > > > > > +                  "Conversions between integer and half-
>> > > > > > > float
>> > > > > > > must
>> > > > > > > be strided "
>> > > > > > > +                  "by a DWord on the destination");
>> > > > > > > +
>> > > > > > > +         unsigned subreg =
>> > > > > > > brw_inst_dst_da1_subreg_nr(devinfo,
>> > > > > > > inst);
>> > > > > > > +         ERROR_IF(subreg % 4 != 0,
>> > > > > > > +                  "Conversions between integer and half-
>> > > > > > > float
>> > > > > > > must
>> > > > > > > be aligned "
>> > > > > > > +                  "to a DWord on the destination");
>> > > > > > > +      } else if ((devinfo->is_cherryview || devinfo->gen
>> > > > > > > > = 9)
>> > > > > > > 
>> > > > > > > &&
>> > > > > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
>> > > > > > > +         ERROR_IF(dst_stride != 2,
>> > > > > > > +                  "Conversions to HF must have either
>> > > > > > > all
>> > > > > > > words in
>> > > > > > > even word "
>> > > > > > > +                  "locations or all words in odd word
>> > > > > > > locations");
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +   } else if (exec_type_size > dst_type_size) {
>> > > > > > >        if (!(dst_type_is_byte &&
>> > > > > > > inst_is_raw_move(devinfo,
>> > > > > > > inst)))
>> > > > > > > {
>> > > > > > >           ERROR_IF(dst_stride * dst_type_size !=
>> > > > > > > exec_type_size,
>> > > > > > >                    "Destination stride must be equal to
>> > > > > > > the
>> > > > > > > ratio
>> > > > > > > of the sizes "
>> > > > > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > b/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > index 73300b23122..1557b6d2452 100644
>> > > > > > > --- a/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > +++ b/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
>> > > > > > > byte_destination_relaxed_alignment)
>> > > > > > >     }
>> > > > > > >  }
>> > > > > > >  
>> > > > > > > +TEST_P(validation_test, half_float_conversion)
>> > > > > > > +{
>> > > > > > > +   static const struct {
>> > > > > > > +      enum brw_reg_type dst_type;
>> > > > > > > +      enum brw_reg_type src_type;
>> > > > > > > +      unsigned dst_stride;
>> > > > > > > +      unsigned dst_subnr;
>> > > > > > > +      bool expected_result;
>> > > > > > > +   } inst[] = {
>> > > > > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
>> > > > > > > expected_result)  \
>> > > > > > > +      {                                                 
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_REGISTER_TYPE_##dst_type,                  
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_REGISTER_TYPE_##src_type,                  
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,            
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         dst_subnr,                                     
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         expected_result,                               
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      /* MOV to half-float destination (F source handled
>> > > > > > > later) */
>> > > > > > > +      INST(HF,  B, 1, 0, false),
>> > > > > > > +      INST(HF,  W, 1, 0, false),
>> > > > > > > +      INST(HF, HF, 1, 0, true),
>> > > > > > > +      INST(HF, HF, 1, 2, true),
>> > > > > > > +      INST(HF,  D, 1, 0, false),
>> > > > > > > +      INST(HF,  Q, 1, 0, false),
>> > > > > > > +      INST(HF, DF, 1, 0, false),
>> > > > > > > +      INST(HF,  B, 2, 0, true),
>> > > > > > > +      INST(HF,  B, 2, 2, false),
>> > > > > > > +      INST(HF,  W, 2, 0, true),
>> > > > > > > +      INST(HF,  W, 2, 2, false),
>> > > > > > > +      INST(HF, HF, 2, 0, true),
>> > > > > > > +      INST(HF, HF, 2, 2, true),
>> > > > > > > +      INST(HF,  D, 2, 0, true),
>> > > > > > > +      INST(HF,  D, 2, 2, false),
>> > > > > > > +      INST(HF,  Q, 2, 0, false),
>> > > > > > > +      INST(HF, DF, 2, 0, false),
>> > > > > > > +      INST(HF,  B, 4, 0, false),
>> > > > > > > +      INST(HF,  W, 4, 0, false),
>> > > > > > > +      INST(HF, HF, 4, 0, true),
>> > > > > > > +      INST(HF, HF, 4, 2, true),
>> > > > > > > +      INST(HF,  D, 4, 0, false),
>> > > > > > > +      INST(HF,  Q, 4, 0, false),
>> > > > > > > +      INST(HF, DF, 4, 0, false),
>> > > > > > > +
>> > > > > > > +      /* MOV from half-float source */
>> > > > > > > +      INST( B, HF, 1, 0, false),
>> > > > > > > +      INST( W, HF, 1, 0, false),
>> > > > > > > +      INST( D, HF, 1, 0, true),
>> > > > > > > +      INST( D, HF, 1, 4, true),
>> > > > > > > +      INST( F, HF, 1, 0, true),
>> > > > > > > +      INST( F, HF, 1, 4, true),
>> > > > > > > +      INST( Q, HF, 1, 0, false),
>> > > > > > > +      INST(DF, HF, 1, 0, false),
>> > > > > > > +      INST( B, HF, 2, 0, false),
>> > > > > > > +      INST( W, HF, 2, 0, true),
>> > > > > > > +      INST( W, HF, 2, 2, false),
>> > > > > > > +      INST( D, HF, 2, 0, false),
>> > > > > > > +      INST( F, HF, 2, 0, true),
>> > > > > > > +      INST( F, HF, 2, 2, true),
>> > > > > > > +      INST( B, HF, 4, 0, true),
>> > > > > > > +      INST( B, HF, 4, 1, false),
>> > > > > > > +      INST( W, HF, 4, 0, false),
>> > > > > > > +
>> > > > > > > +#undef INST
>> > > > > > > +   };
>> > > > > > > +
>> > > > > > > +   if (devinfo.gen < 8)
>> > > > > > > +      return;
>> > > > > > > +
>> > > > > > > +   for (unsigned i = 0; i < sizeof(inst) /
>> > > > > > > sizeof(inst[0]);
>> > > > > > > i++) {
>> > > > > > > +      if (!devinfo.has_64bit_types &&
>> > > > > > > +          (type_sz(inst[i].src_type) == 8 ||
>> > > > > > > type_sz(inst[i].dst_type) == 8)) {
>> > > > > > > +         continue;
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      brw_MOV(p, retype(g0, inst[i].dst_type),
>> > > > > > > retype(g0,
>> > > > > > > inst[i].src_type));
>> > > > > > > +
>> > > > > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
>> > > > > > > BRW_EXECUTE_4);
>> > > > > > > +
>> > > > > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > inst[i].dst_stride);
>> > > > > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo,
>> > > > > > > last_inst,
>> > > > > > > inst[i].dst_subnr);
>> > > > > > > +
>> > > > > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
>> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > > > > BRW_VERTICAL_STRIDE_4);
>> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > > > > BRW_WIDTH_2);
>> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > > > > +      } else {
>> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > > > > BRW_VERTICAL_STRIDE_4);
>> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > > > > BRW_WIDTH_4);
>> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
>> > > > > > > +
>> > > > > > > +      clear_instructions(p);
>> > > > > > > +   }
>> > > > > > > +
>> > > > > > > +   /* Conversion from F to HF has Dword destination
>> > > > > > > alignment
>> > > > > > > restriction
>> > > > > > > +    * on CHV and SKL+ only
>> > > > > > > +    */
>> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > > > > +
>> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > > > > +
>> > > > > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
>> > > > > > > +      EXPECT_FALSE(validate(p));
>> > > > > > > +   } else {
>> > > > > > > +      assert(devinfo.gen == 8);
>> > > > > > > +      EXPECT_TRUE(validate(p));
>> > > > > > > +   }
>> > > > > > > +   clear_instructions(p);
>> > > > > > > +
>> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > > > > +
>> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > > > > +
>> > > > > > > +   EXPECT_TRUE(validate(p));
>> > > > > > > +   clear_instructions(p);
>> > > > > > > +}
>> > > > > > > +
>> > > > > > >  TEST_P(validation_test,
>> > > > > > > vector_immediate_destination_alignment)
>> > > > > > >  {
>> > > > > > >     static const struct {
>> > > > > > > -- 
>> > > > > > > 2.17.1
>> > > > > > > 
>> > > > > > > _______________________________________________
>> > > > > > > mesa-dev mailing list
>> > > > > > > [email protected]
>> > > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

Reply via email to