fs: Introduce FS IR builder.

Francisco Jerez Fri, 05 Jun 2015 13:43:25 -0700

Matt Turner <[email protected]> writes:

> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <[email protected]> wrote:
>> The purpose of this change is threefold: First, it improves the
>> modularity of the compiler back-end by separating the functionality
>> required to construct an i965 IR program from the rest of the visitor
>> god-object, what in turn will reduce the coupling between other
>> components and the visitor allowing a more modular design.  This patch
>> doesn't yet remove the equivalent functionality from the visitor
>> classes, that task will be undertaken by a separate series, as it
>> involves major back-end surgery.
>>
>> Second, it improves consistency between the scalar and vector
>> back-ends.  The FS and VEC4 builders can both be used to generate
>> scalar code with a compatible interface or they can be used to
>> generate natural vector width code -- 1 or 4 components respectively.
>>
>> Third, the approach to IR construction is somewhat different to what
>> the visitor classes currently do.  All parameters affecting code
>> generation (execution size, half control, point in the program where
>> new instructions are inserted, etc.) are encapsulated in a stand-alone
>> object rather than being quasi-global state (yes, anything defined in
>> one of the visitor classes is effectively global due to the tight
>> coupling with virtually everything else in the compiler back-end).
>> This object is lightweight and can be copied, mutated and passed
>> around, making helper IR-building functions more flexible because they
>> can now simply take a builder object as argument and will inherit its
>> IR generation properties in exactly the same way that a discrete
>> instruction would from the same builder object.
>>
>> The emit_typed_write() function from my image-load-store branch is an
>> example that illustrates the usefulness of the latter point: Due to
>> hardware limitations the function may have to split the untyped
>> surface message in 8-wide chunks.  That means that the several
>> functions called to help with the construction of the message payload
>> are themselves required to set the execution width and half control
>> correctly on the instructions they emit, and to allocate all registers
>> with half the default width.  With the previous approach this would
>> require the used helper functions to be aware of the parameters that
>> might differ from the default state and explicitly set the instruction
>> bits accordingly.  With the new approach they would get a modified
>> builder object as argument that would influence all instructions
>> emitted by the helper function as if it were the default state.
>>
>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
>> method.  It doesn't actually emit any instructions, they are simply
>> created and inserted into an exec_list which is returned for the
>> caller to emit at some location of the program.  This sort of two-step
>> emission becomes unnecessary with the builder interface because the
>> insertion point is one more of the code generation parameters which
>> are part of the builder object.  The caller can simply pass
>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
>> location of the program where the effect of the constant load is
>> desired.  This two-step emission (which pervades the compiler back-end
>> and is in most cases redundant) goes away: E.g. ADD() now actually
>> adds two registers rather than just creating an ADD instruction in
>> memory, emit(ADD()) is no longer necessary.
>>
>> v2: Drop scalarizing VEC4 builder.
>> v3: Take a backend_shader as constructor argument.  Improve handling
>>     of debug annotations and execution control flags.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 
>> +++++++++++++++++++++++++++++
>>  2 files changed, 660 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 3f852cd..93f336e 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -42,6 +42,7 @@ i965_FILES = \
>>         brw_ff_gs.c \
>>         brw_ff_gs_emit.c \
>>         brw_ff_gs.h \
>> +       brw_fs_builder.h \
>>         brw_fs_channel_expressions.cpp \
>>         brw_fs_cmod_propagation.cpp \
>>         brw_fs_combine_constants.cpp \
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>> new file mode 100644
>> index 0000000..e528180
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>> @@ -0,0 +1,659 @@
>> +/* -*- c++ -*- */
>> +/*
>> + * Copyright © 2010-2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef BRW_FS_BUILDER_H
>> +#define BRW_FS_BUILDER_H
>> +
>> +#include "brw_ir_fs.h"
>> +#include "brw_shader.h"
>> +#include "brw_context.h"
>> +
>> +namespace brw {
>> +   /**
>> +    * Toolbox to assemble an FS IR program out of individual instructions.
>> +    *
>> +    * This object is meant to have an interface consistent with
>> +    * brw::vec4_builder.  They cannot be fully interchangeable because
>> +    * brw::fs_builder generates scalar code while brw::vec4_builder 
>> generates
>> +    * vector code.
>> +    */
>> +   class fs_builder {
>> +   public:
>> +      /** Type used in this IR to represent a source of an instruction. */
>> +      typedef fs_reg src_reg;
>> +
>> +      /** Type used in this IR to represent the destination of an 
>> instruction. */
>> +      typedef fs_reg dst_reg;
>> +
>> +      /** Type used in this IR to represent an instruction. */
>> +      typedef fs_inst instruction;
>> +
>> +      /**
>> +       * Construct an fs_builder appending instructions at the end of
>> +       * the instruction list of \p shader.  \p dispatch_width gives
>> +       * the native execution width of the program.
>> +       */
>> +      fs_builder(backend_shader *shader,
>> +                 unsigned dispatch_width) :
>> +         shader(shader), block(NULL),
>> +         cursor((exec_node *)&shader->instructions.tail),
>> +         _dispatch_width(dispatch_width),
>> +         _group(0),
>> +         force_writemask_all(false),
>> +         annotation()
>> +      {
>> +      }
>> +
>> +      /**
>> +       * Construct an fs_builder that inserts instructions before \p cursor 
>> in
>> +       * basic block \p block, inheriting other code generation parameters
>> +       * from this.
>> +       */
>> +      fs_builder
>> +      at(bblock_t *block, exec_node *cursor) const
>> +      {
>> +         fs_builder bld = *this;
>> +         bld.block = block;
>> +         bld.cursor = cursor;
>> +         return bld;
>> +      }
>> +
>> +      /**
>> +       * Construct a builder specifying the default SIMD width and group of
>> +       * channel enable signals, inheriting other code generation parameters
>> +       * from this.
>> +       *
>> +       * \p n gives the default SIMD width, \p i gives the slot group used 
>> for
>> +       * predication and control flow masking in multiples of \p n channels.
>> +       */
>> +      fs_builder
>> +      group(unsigned n, unsigned i) const
>> +      {
>> +         assert(n <= dispatch_width() &&
>> +                i < dispatch_width() / n);
>> +         fs_builder bld = *this;
>> +         bld._dispatch_width = n;
>> +         bld._group += i * n;
>> +         return bld;
>> +      }
>> +
>> +      /**
>> +       * Alias for group() with width equal to eight.
>> +       */
>> +      fs_builder
>> +      half(unsigned i) const
>> +      {
>> +         return group(8, i);
>> +      }
>> +
>> +      /**
>> +       * Construct a builder with per-channel control flow execution masking
>> +       * disabled if \p b is true.  If control flow execution masking is
>> +       * already disabled this has no effect.
>> +       */
>> +      fs_builder
>> +      exec_all(bool b = true) const
>> +      {
>> +         fs_builder bld = *this;
>> +         if (b)
>> +            bld.force_writemask_all = true;
>> +         return bld;
>> +      }
>> +
>> +      /**
>> +       * Construct a builder with the given debug annotation info.
>> +       */
>> +      fs_builder
>> +      annotate(const char *str, const void *ir = NULL) const
>> +      {
>> +         fs_builder bld = *this;
>> +         bld.annotation.str = str;
>> +         bld.annotation.ir = ir;
>> +         return bld;
>> +      }
>> +
>> +      /**
>> +       * Get the SIMD width in use.
>> +       */
>> +      unsigned
>> +      dispatch_width() const
>> +      {
>> +         return _dispatch_width;
>> +      }
>> +
>> +      /**
>> +       * Allocate a virtual register of natural vector size (one for this 
>> IR)
>> +       * and SIMD width.  \p n gives the amount of space to allocate in
>> +       * dispatch_width units (which is just enough space for one logical
>> +       * component in this IR).
>> +       */
>> +      dst_reg
>> +      vgrf(enum brw_reg_type type, unsigned n = 1) const
>> +      {
>> +         return dst_reg(GRF, shader->alloc.allocate(
>> +                           DIV_ROUND_UP(n * type_sz(type) * 
>> dispatch_width(),
>> +                                        REG_SIZE)),
>> +                        type, dispatch_width());
>> +      }
>> +
>> +      /**
>> +       * Create a null register of floating type.
>> +       */
>> +      dst_reg
>> +      null_reg_f() const
>> +      {
>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>> +                               BRW_REGISTER_TYPE_F));
>> +      }
>> +
>> +      /**
>> +       * Create a null register of signed integer type.
>> +       */
>> +      dst_reg
>> +      null_reg_d() const
>> +      {
>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>> +                               BRW_REGISTER_TYPE_D));
>> +      }
>> +
>> +      /**
>> +       * Create a null register of unsigned integer type.
>> +       */
>> +      dst_reg
>> +      null_reg_ud() const
>> +      {
>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>> +                               BRW_REGISTER_TYPE_UD));
>> +      }
>> +
>> +      /**
>> +       * Get the mask of SIMD channels enabled by dispatch and not yet
>> +       * disabled by discard.
>> +       */
>> +      src_reg
>> +      sample_mask_reg() const
>> +      {
>> +         const bool uses_kill =
>> +            (shader->stage == MESA_SHADER_FRAGMENT &&
>> +             ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill);
>> +         return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) :
>> +                 uses_kill ? brw_flag_reg(0, 1) :
>> +                 retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD));
>> +      }
>> +
>> +      /**
>> +       * Insert an instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(const instruction &inst) const
>> +      {
>> +         return emit(new(shader->mem_ctx) instruction(inst));
>> +      }
>> +
>> +      /**
>> +       * Create and insert a nullary control instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(enum opcode opcode) const
>> +      {
>> +         return emit(instruction(opcode, dispatch_width()));
>> +      }
>> +
>> +      /**
>> +       * Create and insert a nullary instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(enum opcode opcode, const dst_reg &dst) const
>> +      {
>> +         return emit(instruction(opcode, dst));
>> +      }
>> +
>> +      /**
>> +       * Create and insert a unary instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) 
>> const
>> +      {
>> +         switch (opcode) {
>> +         case SHADER_OPCODE_RCP:
>> +         case SHADER_OPCODE_RSQ:
>> +         case SHADER_OPCODE_SQRT:
>> +         case SHADER_OPCODE_EXP2:
>> +         case SHADER_OPCODE_LOG2:
>> +         case SHADER_OPCODE_SIN:
>> +         case SHADER_OPCODE_COS:
>> +            return fix_math_instruction(
>> +               emit(instruction(opcode, dst.width, dst,
>> +                                fix_math_operand(src0))));
>> +
>> +         default:
>> +            return emit(instruction(opcode, dst.width, dst, src0));
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Create and insert a binary instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>> +           const src_reg &src1) const
>> +      {
>> +         switch (opcode) {
>> +         case SHADER_OPCODE_POW:
>> +         case SHADER_OPCODE_INT_QUOTIENT:
>> +         case SHADER_OPCODE_INT_REMAINDER:
>> +            return fix_math_instruction(
>> +               emit(instruction(opcode, dst.width, dst,
>> +                                fix_math_operand(src0),
>> +                                fix_math_operand(src1))));
>> +
>> +         default:
>> +            return emit(instruction(opcode, dst.width, dst, src0, src1));
>> +
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Create and insert a ternary instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>> +           const src_reg &src1, const src_reg &src2) const
>> +      {
>> +         switch (opcode) {
>> +         case BRW_OPCODE_BFE:
>> +         case BRW_OPCODE_BFI2:
>> +         case BRW_OPCODE_MAD:
>> +         case BRW_OPCODE_LRP:
>> +            return emit(instruction(opcode, dst.width, dst,
>> +                                    fix_3src_operand(src0),
>> +                                    fix_3src_operand(src1),
>> +                                    fix_3src_operand(src2)));
>> +
>> +         default:
>> +            return emit(instruction(opcode, dst.width, dst, src0, src1, 
>> src2));
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Insert a preallocated instruction into the program.
>> +       */
>> +      instruction *
>> +      emit(instruction *inst) const
>> +      {
>> +         assert(inst->exec_size == dispatch_width() ||
>> +                force_writemask_all);
>> +         assert(_group == 0 || _group == 8);
>> +
>> +         inst->force_sechalf = (_group == 8);
>> +         inst->force_writemask_all = force_writemask_all;
>> +         inst->annotation = annotation.str;
>> +         inst->ir = annotation.ir;
>> +
>> +         if (block)
>> +            static_cast<instruction *>(cursor)->insert_before(block, inst);
>> +         else
>> +            cursor->insert_before(inst);
>> +
>> +         return inst;
>> +      }
>> +
>> +      /**
>> +       * Select \p src0 if the comparison of both sources with the given
>> +       * conditional mod evaluates to true, otherwise select \p src1.
>> +       *
>> +       * Generally useful to get the minimum or maximum of two values.
>> +       */
>> +      void
>> +      emit_minmax(const dst_reg &dst, const src_reg &src0,
>> +                  const src_reg &src1, brw_conditional_mod mod) const
>
> This is unused, after Jason's commit:
>
> commit 78644ffc4d341deb431145108f0b2d377e59b61e
> Author: Jason Ekstrand <[email protected]>
> Date:   Wed May 20 10:35:34 2015 -0700
>
>     i965/fs: Remove the ir_visitor code
>
>     Now that everything is running through NIR, this is all dead.
>
>     Reviewed-by: Matt Turner <[email protected]>
>     Reviewed-by: Kenneth Graunke <[email protected]>
>


I was planning to use it in my ARB_shader_image_load_store series.  It
may also make sense to fix brw_fs_nir.cpp to use it, as it would
probably simplify the code slightly.  I can make the changes as part of
PATCH 32 if you like.

>> +      {
>> +         if (shader->devinfo->gen >= 6) {
>> +            set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
>> +                                 fix_unsigned_negate(src1)));
>> +         } else {
>> +            CMP(null_reg_d(), src0, src1, mod);
>> +            set_predicate(BRW_PREDICATE_NORMAL,
>> +                          SEL(dst, src0, src1));
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Copy any live channel from \p src to the first channel of \p dst.
>> +       */
>> +      void
>> +      emit_uniformize(const dst_reg &dst, const src_reg &src) const
>> +      {
>> +         const fs_builder ubld = exec_all();
>> +         const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
>> +
>> +         ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, 
>> 0));
>> +         ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0),
>> +                   src, component(chan_index, 0));
>> +      }
>> +
>> +      /**
>> +       * Assorted arithmetic ops.
>> +       * @{
>> +       */
>> +#define ALU1(op)                                        \
>> +      instruction *                                     \
>> +      op(const dst_reg &dst, const src_reg &src0) const \
>> +      {                                                 \
>> +         return emit(BRW_OPCODE_##op, dst, src0);       \
>> +      }
>> +
>> +#define ALU2(op)                                                        \
>> +      instruction *                                                     \
>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) 
>> const \
>> +      {                                                                 \
>> +         return emit(BRW_OPCODE_##op, dst, src0, src1);                 \
>> +      }
>> +
>> +#define ALU2_ACC(op)                                                    \
>> +      instruction *                                                     \
>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) 
>> const \
>> +      {                                                                 \
>> +         instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1);    \
>> +         inst->writes_accumulator = true;                               \
>> +         return inst;                                                   \
>> +      }
>> +
>> +#define ALU3(op)                                                        \
>> +      instruction *                                                     \
>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1,  \
>> +         const src_reg &src2) const                                     \
>> +      {                                                                 \
>> +         return emit(BRW_OPCODE_##op, dst, src0, src1, src2);           \
>> +      }
>> +
>> +      ALU2(ADD)
>> +      ALU2_ACC(ADDC)
>> +      ALU2(AND)
>> +      ALU2(ASR)
>> +      ALU2(AVG)
>> +      ALU3(BFE)
>> +      ALU2(BFI1)
>> +      ALU3(BFI2)
>> +      ALU1(BFREV)
>> +      ALU1(CBIT)
>> +      ALU2(CMPN)
>> +      ALU3(CSEL)
>> +      ALU2(DP2)
>> +      ALU2(DP3)
>> +      ALU2(DP4)
>> +      ALU2(DPH)
>> +      ALU1(F16TO32)
>> +      ALU1(F32TO16)
>> +      ALU1(FBH)
>> +      ALU1(FBL)
>> +      ALU1(FRC)
>> +      ALU2(LINE)
>> +      ALU1(LZD)
>> +      ALU2(MAC)
>> +      ALU2_ACC(MACH)
>> +      ALU3(MAD)
>> +      ALU1(MOV)
>> +      ALU2(MUL)
>> +      ALU1(NOT)
>> +      ALU2(OR)
>> +      ALU2(PLN)
>> +      ALU1(RNDD)
>> +      ALU1(RNDE)
>> +      ALU1(RNDU)
>> +      ALU1(RNDZ)
>> +      ALU2(SAD2)
>> +      ALU2_ACC(SADA2)
>> +      ALU2(SEL)
>> +      ALU2(SHL)
>> +      ALU2(SHR)
>> +      ALU2_ACC(SUBB)
>> +      ALU2(XOR)
>> +
>> +#undef ALU3
>> +#undef ALU2_ACC
>> +#undef ALU2
>> +#undef ALU1
>> +      /** @} */
>> +
>> +      /**
>> +       * CMP: Sets the low bit of the destination channels with the result
>> +       * of the comparison, while the upper bits are undefined, and updates
>> +       * the flag register with the packed 16 bits of the result.
>> +       */
>> +      instruction *
>> +      CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1,
>> +          brw_conditional_mod condition) const
>> +      {
>> +         /* Take the instruction:
>> +          *
>> +          * CMP null<d> src0<f> src1<f>
>> +          *
>> +          * Original gen4 does type conversion to the destination type
>> +          * before comparison, producing garbage results for floating
>> +          * point comparisons.
>> +          *
>> +          * The destination type doesn't matter on newer generations,
>> +          * so we set the type to match src0 so we can compact the
>> +          * instruction.
>> +          */
>> +         return set_condmod(condition,
>> +                            emit(BRW_OPCODE_CMP, retype(dst, src0.type),
>> +                                 fix_unsigned_negate(src0),
>> +                                 fix_unsigned_negate(src1)));
>> +      }
>> +
>> +      /**
>> +       * Gen4 predicated IF.
>> +       */
>> +      instruction *
>> +      IF(brw_predicate predicate) const
>> +      {
>> +         instruction *inst = emit(BRW_OPCODE_IF);
>> +         return set_predicate(predicate, inst);
>> +      }
>> +
>> +      /**
>> +       * Gen6 IF with embedded comparison.
>> +       */
>> +      instruction *
>> +      IF(const src_reg &src0, const src_reg &src1,
>> +         brw_conditional_mod condition) const
>> +      {
>> +         assert(shader->devinfo->gen == 6);
>> +         return set_condmod(condition,
>> +                            emit(BRW_OPCODE_IF,
>> +                                 null_reg_d(),
>> +                                 fix_unsigned_negate(src0),
>> +                                 fix_unsigned_negate(src1)));
>
> I don't see that we were calling resolve_ud_negate() on
> fs_visitor::emit_if_gen6.
>
Right, but shouldn't any instruction comparing unsigned values which are
potentially negated call resolve_ud_negate()?

> In fact, we've removed the function entirely (a known regression in
> the switch to NIR). I'd drop this for now. It's unused as well.
>
If there are still plans to restore this functionality, I would rather
leave it alone rather than remove it now for somebody else to have to
rewrite it later on, because it's less work and doesn't seem to hurt.

>> +      }
>> +
>> +      /**
>> +       * Emit a linear interpolation instruction.
>> +       */
>> +      instruction *
>> +      LRP(const dst_reg &dst, const src_reg &x, const src_reg &y,
>> +          const src_reg &a) const
>> +      {
>> +         if (shader->devinfo->gen >= 6) {
>> +            /* The LRP instruction actually does op1 * op0 + op2 * (1 - 
>> op0), so
>> +             * we need to reorder the operands.
>> +             */
>> +            return emit(BRW_OPCODE_LRP, dst, a, y, x);
>> +
>> +         } else {
>> +            /* We can't use the LRP instruction.  Emit x*(1-a) + y*a. */
>> +            const dst_reg y_times_a = vgrf(dst.type);
>> +            const dst_reg one_minus_a = vgrf(dst.type);
>> +            const dst_reg x_times_one_minus_a = vgrf(dst.type);
>> +
>> +            MUL(y_times_a, y, a);
>> +            ADD(one_minus_a, negate(a), src_reg(1.0f));
>> +            MUL(x_times_one_minus_a, x, src_reg(one_minus_a));
>> +            return ADD(dst, src_reg(x_times_one_minus_a), 
>> src_reg(y_times_a));
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Collect a number of registers in a contiguous range of registers.
>> +       */
>> +      instruction *
>> +      LOAD_PAYLOAD(const dst_reg &dst, const src_reg *src,
>> +                   unsigned sources, unsigned header_size) const
>> +      {
>> +         assert(dst.width % 8 == 0);
>> +         instruction *inst = emit(instruction(SHADER_OPCODE_LOAD_PAYLOAD,
>> +                                              dst.width, dst, src, 
>> sources));
>> +         inst->header_size = header_size;
>> +
>> +         for (unsigned i = 0; i < header_size; i++)
>> +            assert(src[i].file != GRF ||
>> +                   src[i].width * type_sz(src[i].type) == 32);
>> +         inst->regs_written = header_size;
>> +
>> +         for (unsigned i = header_size; i < sources; ++i)
>> +            assert(src[i].file != GRF ||
>> +                   src[i].width == dst.width);
>> +         inst->regs_written += (sources - header_size) * (dst.width / 8);
>> +
>> +         return inst;
>> +      }
>> +
>> +      backend_shader *shader;
>> +
>> +   private:
>> +      /**
>> +       * Workaround for negation of UD registers.  See comment in
>> +       * fs_generator::generate_code() for more details.
>> +       */
>> +      src_reg
>> +      fix_unsigned_negate(const src_reg &src) const
>> +      {
>> +         if (src.type == BRW_REGISTER_TYPE_UD &&
>> +             src.negate) {
>> +            dst_reg temp = vgrf(BRW_REGISTER_TYPE_UD);
>> +            MOV(temp, src);
>> +            return src_reg(temp);
>> +         } else {
>> +            return src;
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Workaround for source register modes not supported by the ternary
>> +       * instruction encoding.
>> +       */
>> +      src_reg
>> +      fix_3src_operand(const src_reg &src) const
>> +      {
>> +         if (src.file == GRF || src.file == UNIFORM || src.stride > 1) {
>> +            return src;
>> +         } else {
>> +            dst_reg expanded = vgrf(src.type);
>> +            MOV(expanded, src);
>> +            return expanded;
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Workaround for source register modes not supported by the math
>> +       * instruction.
>> +       */
>> +      src_reg
>> +      fix_math_operand(const src_reg &src) const
>> +      {
>> +         /* Can't do hstride == 0 args on gen6 math, so expand it out. We
>> +          * might be able to do better by doing execsize = 1 math and then
>> +          * expanding that result out, but we would need to be careful with
>> +          * masking.
>> +          *
>> +          * Gen6 hardware ignores source modifiers (negate and abs) on math
>> +          * instructions, so we also move to a temp to set those up.
>> +          *
>> +          * Gen7 relaxes most of the above restrictions, but still can't 
>> use IMM
>> +          * operands to math
>> +          */
>> +         if ((shader->devinfo->gen == 6 &&
>> +              (src.file == IMM || src.file == UNIFORM ||
>> +               src.abs || src.negate)) ||
>> +             (shader->devinfo->gen == 7 && src.file == IMM)) {
>> +            const dst_reg tmp = vgrf(src.type);
>> +            MOV(tmp, src);
>> +            return tmp;
>> +         } else {
>> +            return src;
>> +         }
>> +      }
>> +
>> +      /**
>> +       * Workaround other weirdness of the math instruction.
>> +       */
>> +      instruction *
>> +      fix_math_instruction(instruction *instr) const
>
> Is there a reason this is 'instr' instead of 'inst'? If not, let's
> rename it 'inst'.
>
Nope, I'll fix that.

Thanks.

>> +      {
>> +         if (shader->devinfo->gen < 6) {
>> +            instr->base_mrf = 2;
>> +            instr->mlen = instr->sources * dispatch_width() / 8;
>> +
>> +            if (instr->sources > 1) {
>> +               /* From the Ironlake PRM, Volume 4, Part 1, Section 6.1.13
>> +                * "Message Payload":
>> +                *
>> +                * "Operand0[7].  For the INT DIV functions, this operand is 
>> the
>> +                *  denominator."
>> +                *  ...
>> +                * "Operand1[7].  For the INT DIV functions, this operand is 
>> the
>> +                *  numerator."
>> +                */
>> +               const bool is_int_div = instr->opcode != SHADER_OPCODE_POW;
>> +               const fs_reg src0 = is_int_div ? instr->src[1] : 
>> instr->src[0];
>> +               const fs_reg src1 = is_int_div ? instr->src[0] : 
>> instr->src[1];
>> +
>> +               instr->resize_sources(1);
>> +               instr->src[0] = src0;
>> +
>> +               at(block, instr).MOV(fs_reg(MRF, instr->base_mrf + 1, 
>> src1.type,
>> +                                           dispatch_width()), src1);
>> +            }
>> +         }
>> +
>> +         return instr;
>> +      }

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/38] i965/fs: Introduce FS IR builder.

Reply via email to