----- Original Message ----- > On Mon, Feb 20, 2012 at 01:50:43PM -0800, Jose Fonseca wrote: > > > > > > ----- Original Message ----- > > > > > > > > > ----- Original Message ----- > > > > On Sat, Feb 18, 2012 at 4:20 AM, Jose Fonseca > > > > <[email protected]> > > > > wrote: > > > > > ----- Original Message ----- > > > > >> On Fri, Feb 17, 2012 at 9:46 PM, Jose Fonseca > > > > >> <[email protected]> > > > > >> wrote: > > > > >> > Dave, > > > > >> > > > > > >> > Ideally there should be only one lp_build_mod() which will > > > > >> > invoke > > > > >> > LLVMBuildSRem or LLVMBuildURem depending on the value of > > > > >> > bld->type.sign. The point being that this allows the same > > > > >> > code > > > > >> > generation logic to seemingly target any type without > > > > >> > having > > > > >> > to > > > > >> > worry too much which target it is targeting. > > > > >> > > > > >> Yeah I agree with this for now, but I'm starting to think a > > > > >> lot > > > > >> of > > > > >> this stuff is redunant once I looked at what Tom has done. > > > > >> > > > > >> The thing is TGSI doesn't have that many crazy options where > > > > >> you > > > > >> are > > > > >> going to be targetting instructions at the wrong type, and > > > > >> wrapping > > > > >> all the basic llvm interfaces with an extra type layer seems > > > > >> to > > > > >> me > > > > >> long term like a waste of time. > > > > > > > > > > So far llvmpipe's TGSI->LLVM IR has only been targetting > > > > > floating > > > > > point SIMD instructions. > > > > > > > > > > But truth is that many simple fragment shaders can be > > > > > partially > > > > > done with 8bit and 16bit SIMD integers, if values are > > > > > represented > > > > > in 8bit unorm and 16 bit unorms. The throughput for these > > > > > will be > > > > > much higher, as not only we can squeeze more elements, they > > > > > take > > > > > less cycles, and the hardware has several arithmetic units. > > > > > > > > > > The point of those lp_build_xxx functions is to handle this > > > > > transparently. See, e.g., how lp_build_mul handles fixed > > > > > point. > > > > > Currently this is only used for blending, but the hope is to > > > > > eventually use it on TGSI translation of simple fragment > > > > > shaders. > > > > > > > > > > Maybe not the case for the desktop GPUs, but I also heard > > > > > that > > > > > some > > > > > low powered devices have shader engines w/ 8bit unorms. > > > > > > > > > > But of course, not all opcodes can be done correctly: and > > > > > URem/SRem > > > > > might not be one we care. > > > > > > > > > >> I'm happy for now to finish the integer support in the same > > > > >> style > > > > >> as > > > > >> the current code, but I think moving forward afterwards it > > > > >> might > > > > >> be > > > > >> worth investigating a more direct instruction emission > > > > >> scheme. > > > > > > > > > > If you wanna invoke LLVMBuildURem/LLVMBuildSRem directly from > > > > > tgsi > > > > > translation I'm fine with it. We can always generalize > > > > > > > > > >> Perhaps > > > > >> Tom can comment also from his experience. > > > > > > > > > > BTW, Tom, I just now noticed that there are two action > > > > > versions > > > > > for > > > > > add: > > > > > > > > > > /* TGSI_OPCODE_ADD (CPU Only) */ > > > > > static void > > > > > add_emit_cpu( > > > > > const struct lp_build_tgsi_action * action, > > > > > struct lp_build_tgsi_context * bld_base, > > > > > struct lp_build_emit_data * emit_data) > > > > > { > > > > > emit_data->output[emit_data->chan] = > > > > > lp_build_add(&bld_base->base, > > > > > emit_data->args[0], > > > > > emit_data->args[1]); > > > > > } > > > > > > > > > > /* TGSI_OPCODE_ADD */ > > > > > static void > > > > > add_emit( > > > > > const struct lp_build_tgsi_action * action, > > > > > struct lp_build_tgsi_context * bld_base, > > > > > struct lp_build_emit_data * emit_data) > > > > > { > > > > > emit_data->output[emit_data->chan] = LLVMBuildFAdd( > > > > > bld_base->base.gallivm->builder, > > > > > emit_data->args[0], > > > > > emit_data->args[1], ""); > > > > > } > > > > > > > > > > Why is this necessary? lp_build_add will already call > > > > > LLVMBuildFAdd > > > > > internally as appropriate. > > > > > > > > > > Is this because some of the functions in lp_bld_arit.c will > > > > > emit > > > > > x86 intrinsics? If so then a "no-x86-intrinsic" flag in the > > > > > build > > > > > context would achieve the same effect with less code > > > > > duplication. > > > > > > > > > > If possible I'd prefer a single version of these actions. If > > > > > not, > > > > > then I'd prefer have them split: lp_build_action_cpu.c and > > > > > lp_build_action_gpu. > > > > > > > > Yes, this is why a split them up. I can add that flag and > > > > merge > > > > the > > > > actions together. > > > > > > That would be nice. Thanks. > > > > Tom, actually I've been looking more at the code, thinking about > > this, and I'm not so sure what's best anymore. > > > > I'd appreciate your honest answer: do you think the stuff in > > lp_bld_arit.[ch] of any use for GPUs in general (or AMD's in > > particular), or is it just an hinderance? > > > > As I said before, for CPUs, this abstraction is useful, to allow to > > convert TGSI (and other fixed function state) -> fixed point SIMD > > instructions, which yield the highest throughput on CPUs. Because > > LLVM native types are not expressive enough for fixed function, > > etc. > > > > But if this is useless for GPUs (i.e, if LLVM's native types are > > sufficient), then we can make this abstraction a CPU only thing. > > > > I don't think the lp_bld_arit.c functions are really useful for GPUs, > and I don't rely on any of them in the R600 backend. Also, I was > looking > through those functions again and the problem is more than just x86 > intrinsics. Some of them assume vector types, which I don't use at > all.
Does that mean that the R600 generates/consumes only scalar expressions? > So, maybe it is best to keep them separate. Yes, it seems so. Does anybody else working or planning in writting TGSI -> LLVM translation pass has other thoughts? If not, I'll eventually split the helpers in two kinds: - generic helpers that operate directly with LLVMTypeRef and LLVMValueRef - native SIMD abstraction, that operates with lp_build_type, for fixed point, etc. For the record, this is what you seem to use so far: $ git grep '#include.*gallivm' src/gallium/drivers/r600/ src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_const.h" src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_intr.h" src/gallium/drivers/r600/r600_llvm.c:#include "gallivm/lp_bld_gather.h" src/gallium/drivers/r600/r600_llvm.h:#include "gallivm/lp_bld_tgsi.h" Jose _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
