----- Original Message -----
> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca <[email protected]>
> wrote:
> > Dave,
> >
> > Ideally there should be only one lp_build_mod() which will invoke
> > LLVMBuildSRem or LLVMBuildURem depending on the value of
> > bld->type.sign.  The point being that this allows the same code
> > generation logic to seemingly target any type without having to
> > worry too much which target it is targeting.
> 
> Yeah I agree with this for now, but I'm starting to think a lot of
> this stuff is redunant once I looked at what Tom has done.
> 
> The thing is TGSI doesn't have that many crazy options where you are
> going to be targetting instructions at the wrong type, and wrapping
> all the basic llvm interfaces with an extra type layer seems to me
> long term like a waste of time.

So far llvmpipe's TGSI->LLVM IR has only been targetting floating point SIMD 
instructions.

But truth is that many simple fragment shaders can be partially done with 8bit 
and 16bit SIMD integers, if values are represented in 8bit unorm and 16 bit 
unorms. The throughput for these will be much higher, as not only we can 
squeeze more elements, they take less cycles, and the hardware has several 
arithmetic units.

The point of those lp_build_xxx functions is to handle this transparently. See, 
e.g., how lp_build_mul handles fixed point. Currently this is only used for 
blending, but the hope is to eventually use it on TGSI translation of simple 
fragment shaders.

Maybe not the case for the desktop GPUs, but I also heard that some low powered 
devices have shader engines w/ 8bit unorms.

But of course, not all opcodes can be done correctly: and URem/SRem might not 
be one we care.

> I'm happy for now to finish the integer support in the same style as
> the current code, but I think moving forward afterwards it might be
> worth investigating a more direct instruction emission scheme.

If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from tgsi translation 
I'm fine with it. We can always generalize 

> Perhaps
> Tom can comment also from his experience.

BTW, Tom, I just now noticed that there are two action versions for add:

/* TGSI_OPCODE_ADD (CPU Only) */
static void
add_emit_cpu(
   const struct lp_build_tgsi_action * action,
   struct lp_build_tgsi_context * bld_base,
   struct lp_build_emit_data * emit_data)
{
   emit_data->output[emit_data->chan] = lp_build_add(&bld_base->base,
                                   emit_data->args[0], emit_data->args[1]);
}

/* TGSI_OPCODE_ADD */
static void
add_emit(
   const struct lp_build_tgsi_action * action,
   struct lp_build_tgsi_context * bld_base,
   struct lp_build_emit_data * emit_data)
{
   emit_data->output[emit_data->chan] = LLVMBuildFAdd(
                                bld_base->base.gallivm->builder,
                                emit_data->args[0], emit_data->args[1], "");
}

Why is this necessary? lp_build_add will already call LLVMBuildFAdd internally 
as appropriate.

Is this because some of the functions in lp_bld_arit.c will emit x86 
intrinsics? If so then a "no-x86-intrinsic" flag in the build context would 
achieve the same effect with less code duplication.

If possible I'd prefer a single version of these actions. If not, then I'd 
prefer have them split: lp_build_action_cpu.c and lp_build_action_gpu.

Jose
_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to