[Mesa-dev] [PATCH] gallivm: fix lp_build_compare_ext

2015-07-03 Thread sroland
From: Roland Scheidegger The expansion should always be to the same width as the input arguments no matter what, since these functions should work with any bit width of the arguments (the sext is a no-op on any sane simd architecture). Thus, fix the caller expecting differently. This fixes https

[Mesa-dev] [PATCH 2/4] radeon/r200: mark state atoms as dirty after blits

2015-07-11 Thread sroland
From: Roland Scheidegger Blit submits lots of packets which are usually handled by state atoms, so these must be dirtied. Not sure if this fixes anything, but it was a concern raised by bug 51658 (with this all issues there seen as actual bugs should be fixed, with the exception of the patch to u

[Mesa-dev] [PATCH 1/4] r200: fix fbo rendering by disabling optimized texture format chooser

2015-07-11 Thread sroland
From: Roland Scheidegger It is rather unfortunate that we don't know if a texture is going to be used as a rt later, and we lack the means to do something about a format chosen which we can't render to directly, so disable this and always chose renderable format for rgba8 textures. This addresses

[Mesa-dev] [PATCH 3/4] radeon: fix some potential big endian issues

2015-07-11 Thread sroland
From: Roland Scheidegger The formats chosen (both by texture format choser, fbo storage allocation) are different for big endian not just for rgba8 but also lower bit width formats (why I don't actually know). Even the function to test for renderable formats used different formats, however the ac

[Mesa-dev] [PATCH 4/4] r200: fix some potential big endian issues

2015-07-11 Thread sroland
From: Roland Scheidegger The formats chosen (both by texture format choser, fbo storage allocation) are different for big endian not just for rgba8 but also lower bit width formats (why I don't actually know). Even the function to test for renderable formats used different formats, however the ac

[Mesa-dev] [PATCH] mesa: fix up some texture error checks

2015-07-16 Thread sroland
From: Roland Scheidegger In particular, we were incorrectly accepting s3tc (and lots of others) for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d targets. At this time, the only allowed formats for these calls are the bptc ones, since none of the specific extensions allow i

[Mesa-dev] [PATCH 4/4] gallivm: fix tex offsets with mirror repeat linear

2015-10-22 Thread sroland
From: Roland Scheidegger Can't see why anyone would ever want to use this, but it was clearly broken. This fixes the piglit texwrap offset test using this combination. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git

[Mesa-dev] [PATCH 2/4] softpipe: fix using non-zero layer in non-array view from array resource

2015-10-22 Thread sroland
From: Roland Scheidegger For vertex/geometry shader sampling, this is the same as for llvmpipe - just use the original resource target. For fragment shader sampling though (which does not use first-layer based mip offsets) adjust the sampling code to use first_layer in the non-array cases. While

[Mesa-dev] [PATCH 1/4] llvmpipe: fix using non-zero layer in non-array view from array resource

2015-10-22 Thread sroland
From: Roland Scheidegger Just need to use resource target not view target when calculating first-layer based mip offsets. (This is a gl specific problem since d3d10 does not distinguish between non-array and array resources neither at the resource nor view level, only at the shader level.) Fixes

[Mesa-dev] [PATCH 3/4] gallivm: fix sampling with texture offsets in SoA path

2015-10-22 Thread sroland
From: Roland Scheidegger When using nearest filtering and clamp / clamp to edge wrapping results could be wrong for negative offsets. Fix this by adding the offset before doing the conversion to int coords (could also use floor instead of trunc int conversion but probably more complex on "typical

[Mesa-dev] [PATCH] gallivm: disable f16c when not using AVX

2015-10-23 Thread sroland
From: Roland Scheidegger f16c intrinsic can only be emitted when AVX is used. So when we disable AVX due to forcing 128bit vectors we must not use this intrinsic (depending on llvm version, this worked previously because llvm used AVX even when we didn't tell it to, however I've seen this fail wi

[Mesa-dev] [PATCH 1/2] radeon: fix bgrx8/xrgb8 blits

2015-11-12 Thread sroland
From: Roland Scheidegger Since d21320f6258b2e1780a15c1ca718963d8a15ca18 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeo

[Mesa-dev] [PATCH 2/2] r200: fix bgrx8/xrgb8 blits

2015-11-12 Thread sroland
From: Roland Scheidegger Since 779cabfc7d022de8b7b9bc7fdac0caffa8646c51 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeo

[Mesa-dev] [PATCH] gallium/docs: fix docs wrt ARL/ARR/FLR

2015-01-29 Thread sroland
From: Roland Scheidegger since the address reg holds integer values, ARL/ARR do an implicit float-to-int conversion, so clarify that. Thus it is also incorrect to say that FLR really does the same as ARL. --- src/gallium/docs/source/tgsi.rst | 18 -- 1 file changed, 8 insertions(

[Mesa-dev] [PATCH] mesa: don't enable NV_fragment_program_option with swrast

2015-02-14 Thread sroland
From: Roland Scheidegger Since dropping some NV_fragment_program opcodes (commits 868f95f1da74cf6dd7468cba1b56664aad585ccb, a3688d686f147f4252d19b298ae26d4ac72c2e08) we can no longer parse all opcodes necessary for this extension, leading to bugs (https://bugs.freedesktop.org/show_bug.cgi?id=869

[Mesa-dev] [PATCH 2/2] gallium/auxiliary: optimize rgb9e5 helper some more

2015-08-09 Thread sroland
From: Roland Scheidegger I used this as some testing ground for investigating some compiler bits initially (e.g. lrint calls etc.), figured I could do much better in the end just for fun... This is mathematically equivalent, but uses some tricks to avoid doubles and also replaces some float math

[Mesa-dev] [PATCH 1/2] gallium/auxiliary: optimize rgb9e5 helper a bit

2015-08-09 Thread sroland
From: Roland Scheidegger This code (lifted straight from the extension) was doing things the most inefficient way you could think of. This drops some of the more expensive float operations, in particular - int-cast floors (pointless, values always positive) - 2 raised to (signed) integers (replac

[Mesa-dev] [PATCH] draw: initialize shader inputs

2016-10-11 Thread sroland
From: Roland Scheidegger This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. --- src/gallium/auxiliary/draw/draw_llvm.c | 7 +++ 1 file changed, 7 insertions(+) diff --gi

[Mesa-dev] [PATCH] draw: improve vertex fetch

2016-10-11 Thread sroland
From: Roland Scheidegger The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, i

[Mesa-dev] [PATCH] draw: improved handling of undefined inputs

2016-10-13 Thread sroland
From: Roland Scheidegger Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader,

[Mesa-dev] [PATCH] gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf

2016-10-13 Thread sroland
From: Roland Scheidegger Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. --- src/gallium/auxiliary/gallivm/lp_bld_init.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/gallium/auxiliary

[Mesa-dev] [PATCH] draw: improve vertex fetch (v2)

2016-10-14 Thread sroland
From: Roland Scheidegger The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, i

[Mesa-dev] [PATCH 1/2] llvmpipe: fix depth clamping wrt reversed near/far values

2016-08-14 Thread sroland
From: Roland Scheidegger This wasn't handled before (the result was that no matter what value got clamped, it always ended up as the near value in this case) (if clamping actually happened). Fix this by using the util helper for that (the math is otherwise "mostly" the same, mostly because there

[Mesa-dev] [PATCH] llvmpipe: fix issues with depth clamp

2016-08-14 Thread sroland
From: Roland Scheidegger We only did depth clamp when the value was written from the fs. This is very wrong both for d3d10 and GL, and only passed the corresponding piglit test due to pure luck (it no longer does with the enhanced test). Also, interpolation clamped values to 1.0 always, which can

[Mesa-dev] [PATCH] gallivm: Use native packs and unpacks for the lerps

2016-10-17 Thread sroland
From: Roland Scheidegger For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine wit

[Mesa-dev] [PATCH] draw: use vectorized calculations for fetch

2016-10-31 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be o

[Mesa-dev] [PATCH 1/2] draw: fix undefined input handling some more...

2016-11-02 Thread sroland
From: Roland Scheidegger Previous fixes were incomplete - some code still iterated through the number of elements provided by velem layout instead of the number stored in the key (which is the same as the number defined by the vs). And also actually accessed the elements from the layout directly

[Mesa-dev] [PATCH 2/2] draw: use vectorized calculations for fetch

2016-11-02 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be o

[Mesa-dev] [PATCH 1/2] gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function

2016-11-03 Thread sroland
From: Roland Scheidegger This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no wi

[Mesa-dev] [PATCH 2/2] draw: use vectorized calculations for fetch

2016-11-03 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be o

[Mesa-dev] [PATCH 2/3] draw: finally optimize bool clip mask generation

2016-11-12 Thread sroland
From: Roland Scheidegger lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same as before. While here

[Mesa-dev] [PATCH 1/3] draw: use vectorized calculations for fetch (v2)

2016-11-12 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be o

[Mesa-dev] [PATCH 3/3] draw: simplify vsplit elts code a bit

2016-11-12 Thread sroland
From: Roland Scheidegger vsplit_get_base_idx explicitly returned idx 0 and set the ofbit in case of overflow. We'd then check the ofbit and use idx 0 instead of looking it up. This was necessary because DRAW_GET_IDX used to return DRAW_MAX_FETCH_IDX and not 0 in case of overflows. However, this i

[Mesa-dev] [PATCH 2/5] draw: use same argument order for jit draw linear / elts functions

2016-11-13 Thread sroland
From: Roland Scheidegger This is a bit simpler. Mostly to make it easier to unify the paths later... --- src/gallium/auxiliary/draw/draw_llvm.c | 48 ++ src/gallium/auxiliary/draw/draw_llvm.h | 8 ++-- .../draw/draw_pt_fetch_shade_pipeline_llvm.c

[Mesa-dev] draw: simplify overflow handling, unify elts and linear jit code

2016-11-13 Thread sroland
Overflow handling is simplified quite a bit both in jit code and vsplit paths (basically just let things wrap around everywhere). This seems to be good enough for all apis. Also, elts and linear jit code is unified since the differences are minimal (even more so at the end of the series). The cost

[Mesa-dev] [PATCH 1/5] draw: drop unnecessary index overflow handling from vsplit code

2016-11-13 Thread sroland
From: Roland Scheidegger This was kind of strange, since it replaced indices which were only overflowing due to bias with MAX_UINT. This would cause an overflow later in the shader, except if stride was 0, however the vertex id would be essentially random then (-1 + eltBias). No test cared about

[Mesa-dev] [PATCH 5/5] draw: drop some overflow computations

2016-11-13 Thread sroland
From: Roland Scheidegger It turns out that noone actually cares if the address computations overflow, be it the stride mul or the offset adds. Wrap around seems to be explicitly permitted even by some other API (which is a _very_ surprising result, as these overflow computations were added just f

[Mesa-dev] [PATCH 3/5] draw: unify linear and elts draw jit functions

2016-11-13 Thread sroland
From: Roland Scheidegger The code for elts and linear paths was nearly 100% identical by now - with the elts path simply having some additional gather for the elements in the main loop (with some additional small differences before the main loop). Hence nuke the separate functions and decide thi

[Mesa-dev] [PATCH 4/5] draw: simplify fetch some more

2016-11-13 Thread sroland
From: Roland Scheidegger Don't keep the ofbit. This is just a minor simplification, just adjust the buffer size so that there will always be an overflow if buffers aren't valid to fetch from. Also, get rid of control flow from the instanced path too. Not worried about performance, but it's simple

[Mesa-dev] [PATCH] glsl: fix ldexp lowering if bitfield insert lowering is also requested

2016-12-03 Thread sroland
From: Roland Scheidegger Trivial, this just resurrects the code which was there once upon a time (the code can't lower instructions generated in the lowering pass there, and even if it could it would probably be suboptimal). This fixes piglit mesa_shader_integer_functions fs-ldexp.shader_test and

[Mesa-dev] [PATCH 3/3] gallivm: optimize gather a bit, by using supplied destination type

2016-12-03 Thread sroland
From: Roland Scheidegger By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit

[Mesa-dev] [PATCH 1/3] util: (trivial) ETC1 meets the criteria for fitting into unorm8

2016-12-03 Thread sroland
From: Roland Scheidegger Just like other similar compressed formats. --- src/gallium/auxiliary/util/u_format.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/auxiliary/util/u_format.c b/src/gallium/auxiliary/util/u_format.c index 72dd60f..3d28190 100644 --- a/src/gallium/a

[Mesa-dev] [PATCH 2/3] gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soa

2016-12-03 Thread sroland
From: Roland Scheidegger Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not THAT bad), with mini

[Mesa-dev] [PATCH] main: allow NEAREST_MIPMAP_NEAREST for stencil texturing

2016-12-05 Thread sroland
From: Roland Scheidegger As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing. The extension spec wasn't updated, but just allow it with older GL versions as well, hoping there aren't any crazy tests which want to see an error there... (Compile tested only.) Reported by Jó

[Mesa-dev] [PATCH 1/6] gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soa

2016-12-11 Thread sroland
From: Roland Scheidegger soa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like it wouldn't work for

[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one

2016-12-11 Thread sroland
From: Roland Scheidegger Now that there's some SoA fetch which never falls back, we should usually get results which are better or at least not worse (something like rgba32f will stay the same). I suppose though it might be worse in some cases where the format doesn't require conversion (e.g. rg3

[Mesa-dev] [PATCH 5/6] gallivm: generalize the compressed format soa fetch a bit

2016-12-11 Thread sroland
From: Roland Scheidegger This can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). --- src/gallium/auxiliary/galli

[Mesa-dev] [PATCH 4/6] gallivm: provide soa fetch path handling formats with more than 32bit

2016-12-11 Thread sroland
From: Roland Scheidegger This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for

[Mesa-dev] [PATCH 2/6] gallivm: optimize SoA AoS fallback fetch path a little

2016-12-11 Thread sroland
From: Roland Scheidegger We should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. --- src/gallium/auxiliary/gallivm/lp_

[Mesa-dev] [PATCH 3/6] gallivm: optimize gather a bit, by using supplied destination type

2016-12-11 Thread sroland
From: Roland Scheidegger By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit

[Mesa-dev] [PATCH 1/4] llvmpipe: (trivial) minimally simplify mask construction

2016-12-20 Thread sroland
From: Roland Scheidegger simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we

[Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto

2016-12-20 Thread sroland
From: Roland Scheidegger If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source v

[Mesa-dev] [PATCH 4/4] gallivm: implement aos unpack (to unorm8) for small unorm formats

2016-12-20 Thread sroland
From: Roland Scheidegger Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss with AVX). With SoA it

[Mesa-dev] [PATCH 3/4] gallivm: optimize lp_build_unpack_arith_rgba_aos slightly

2016-12-20 Thread sroland
From: Roland Scheidegger This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said,

[Mesa-dev] [PATCH 1/4] llvmpipe: use scalar load instead of vectors for small vectors in fs backend

2016-12-21 Thread sroland
From: Roland Scheidegger llvm has _huge_ problems trying to load things like <4 x i8> vectors and stitching such loads together to form 128bit vectors. My understanding of the problem is that the type legalizer tries to extend that to really a <4 x i32> vector and not a <16 x i8> vector with the

[Mesa-dev] [PATCH 2/4] llvmpipe: use alpha from already converted color if possible

2016-12-21 Thread sroland
From: Roland Scheidegger For rgbx formats, there is no point in doing alpha conversion again (and with different tranpose even, so llvm can't eliminate it). Albeit it looks like there's some minimal changes needed in the blend code (found by code inspection, no test seemed to complain) if we do t

[Mesa-dev] [PATCH 3/4] gallivm: generalize 4x4f->1x16ub special case conversion

2016-12-21 Thread sroland
From: Roland Scheidegger This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last step packing will be si

[Mesa-dev] [PATCH 4/4] llvmpipe: do transpose/untwiddle after conversion for 8bit formats

2016-12-21 Thread sroland
From: Roland Scheidegger Generally we should do tranpose after conversion, if the format has less than 32 bits per channel (if it has 32 bits, conversion is going to be a no-op anyway...). This is obviously because there's less vectors to deal with. Though the advantage for 16 bit formats isn't t

[Mesa-dev] [PATCH] gallivm: ignore rho approximation for cube maps

2013-09-30 Thread sroland
From: Roland Scheidegger There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho appr

[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-03 Thread sroland
From: Roland Scheidegger There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho appr

[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-03 Thread sroland
From: Roland Scheidegger They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some te

[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-03 Thread sroland
From: Roland Scheidegger Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 7

[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-04 Thread sroland
From: Roland Scheidegger They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some te

[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-04 Thread sroland
From: Roland Scheidegger There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho appr

[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-04 Thread sroland
From: Roland Scheidegger Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). v2: also get rid of 3 helper functions no longer used.

[Mesa-dev] [PATCH] softpipe: fix seamless cube filtering

2013-10-10 Thread sroland
From: Roland Scheidegger Fix coord wrapping (and face selection too) in case of edges. Unfortunately, the coord wrapping is way more complicated than what the code did, as it depends on the face and the direction where the texel falls off the face (the logic needed to get this right in fact seems

[Mesa-dev] [PATCH] llvmpipe: increase fs shader variant instruction cache limit by factor 4

2013-10-11 Thread sroland
From: Roland Scheidegger The previous limit of of 128*1024 was reported to cause frequent recompiles in some apps due to shader variant thrashing on IRC in some apps leading to noticeable lags. Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible to reach, since even simp

[Mesa-dev] [PATCH 1/2] gallivm: implement seamless cube filtering

2013-10-18 Thread sroland
From: Roland Scheidegger For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very "asymmetric" wrt face, x/y under/overflow), further complicated by the fact that if the 4 samples are in a

[Mesa-dev] [PATCH 2/2] llvmpipe: enable seamless cube filtering

2013-10-18 Thread sroland
From: Roland Scheidegger --- src/gallium/drivers/llvmpipe/lp_screen.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 723e40e..4c81022 100644 --- a/src/gallium/drivers/llvmpipe/lp_sc

[Mesa-dev] [PATCH] gallivm: implement fully accurate corner filtering for seamless cube maps

2013-10-21 Thread sroland
From: Roland Scheidegger d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering cod

[Mesa-dev] [PATCH] gallivm: implement fully accurate corner filtering for seamless cube maps

2013-10-23 Thread sroland
From: Roland Scheidegger d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering cod

[Mesa-dev] [PATCH] gallium: kill off PIPE_FORMAT_Z32_UNORM with extreme prejudice

2013-10-24 Thread sroland
From: Roland Scheidegger This format, while still supported in OpenGL (but optional) and glx, is just causing major nuisance everywhere and needs special code in some places, because things like 1 << depth_bits don't work. It is also the reason why we chose (just like in GL) depth clear values as

[Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup

2013-10-25 Thread sroland
From: Roland Scheidegger The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the scene, and this was act

[Mesa-dev] [PATCH] gallivm: optimize lp_build_minify for sse

2013-11-05 Thread sroland
From: Roland Scheidegger SSE can't handle true vector shifts (with variable shift count), so llvm is turning them into a mess of extracts, scalar shifts and inserts. It is however possible to emulate them in lp_build_minify with float muls, which should be way faster (saves over 20 instructions p

[Mesa-dev] [PATCH] gallivm: fix indirect addressing of inputs

2013-11-06 Thread sroland
From: Roland Scheidegger We weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first element. (Copied straight from the same fix for temps.) While here fix up a couple of broken comments in the fetch func

[Mesa-dev] [PATCH] gallivm: deduplicate some indirect register address code

2013-11-06 Thread sroland
From: Roland Scheidegger There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). --- src/gallium/auxiliary/g

[Mesa-dev] [PATCH] gallivm, llvmpipe: fix float->srgb conversion to handle NaNs

2013-11-11 Thread sroland
From: Roland Scheidegger d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by

[Mesa-dev] [PATCH] llvmpipe: clean up state setup code a bit

2013-11-12 Thread sroland
From: Roland Scheidegger In particular get rid of home-grown vector helpers which didn't add much. And while here fix formatting a bit. No functional change. --- src/gallium/drivers/llvmpipe/lp_state_setup.c | 183 + 1 file changed, 66 insertions(+), 117 deletions(-) di

[Mesa-dev] [PATCH] llvmpipe: calculate more accurate interpolation value at origin

2013-11-20 Thread sroland
From: Roland Scheidegger Some rounding errors could crop up when calculating a0. Use a more accurate method (barycentric interpolation essentially) to fix this, though to fix the REAL problem (which is that our interpolation will give very bad results with small triangles far away from the origin

[Mesa-dev] [PATCH] llvmpipe: increase number of queries which can be binned simultaneously to 64

2014-06-12 Thread sroland
From: Roland Scheidegger Gallium (but not OpenGL) does allow nesting of queries, but there's no limit specified (d3d10 has no limit neither). Nevertheless, for practical purposes we need some limit in llvmpipe, otherwise we'd need more complex handling of queries as we need to keep track of all b

[Mesa-dev] [PATCH] gallivm: fix SCALED -> NORM conversions

2014-06-17 Thread sroland
From: Roland Scheidegger Such conversions (which are most likely rather pointless in practice) were resulting in shifts with negative shift counts and shifts with counts the same as the bit width. This was always undefined in llvm, the code generated was rather horrendous but happened to work. So

[Mesa-dev] [PATCH] gallivm: fix SCALED -> NORM conversions

2014-06-17 Thread sroland
From: Roland Scheidegger Such conversions (which are most likely rather pointless in practice) were resulting in shifts with negative shift counts and shifts with counts the same as the bit width. This was always undefined in llvm, the code generated was rather horrendous but happened to work. So

[Mesa-dev] [PATCH] gallivm: set mcpu when initializing llvm execution engine

2014-06-18 Thread sroland
From: Roland Scheidegger Previously llvm detected cpu features automatically when the execution engine was created (based on host cpu). This is no longer the case, which meant llvm was then not able to emit some of the intrinsics we used as we didn't specify any sse attributes (only on avx suppor

[Mesa-dev] [PATCH] draw: (trivial) fix clamping of viewport index

2014-06-23 Thread sroland
From: Roland Scheidegger The old logic would let all negative values go through unclamped, with potentially disastrous results (probably trying to fetch viewport values from random memory locations). GL has undefined rendering for vp indices outside valid range but that's a bit too undefined... (

[Mesa-dev] [PATCH] softpipe: use last_level from sampler view, not from the resource

2014-06-25 Thread sroland
From: Roland Scheidegger The last_level from the sampler view may be limited by the state tracker to a value lower than what the base texture provides. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=80541. --- src/gallium/drivers/softpipe/sp_tex_sample.c | 39 ++-- 1

[Mesa-dev] [PATCH 6/6] llvmpipe: get rid of llvmpipe_get_texture_tile_linear

2014-07-01 Thread sroland
From: Roland Scheidegger Because the layout is always linear this didn't really do much any longer - at some point this triggered per-tile swizzled->linear conversion. The x/y coords were ignored too. Apart from triggering conversion, this also invoked alloc_image_data(), which could only actuall

[Mesa-dev] [PATCH 3/6] llvmpipe: allocate regular texture memory upfront

2014-07-01 Thread sroland
From: Roland Scheidegger The deferred allocation doesn't really make much sense anymore, since we no longer allocate swizzled/linear memory in chunks and not per level / slice neither. This means we could fail resource creation a bit more (could already fail in theory anyway) but should not fail

[Mesa-dev] [PATCH 2/6] llvmpipe: get rid of linear_img struct

2014-07-01 Thread sroland
From: Roland Scheidegger Just use a tex_data pointer directly - the description was no longer correct neither. --- src/gallium/drivers/llvmpipe/lp_setup.c | 2 +- src/gallium/drivers/llvmpipe/lp_state_sampler.c | 2 +- src/gallium/drivers/llvmpipe/lp_texture.c | 39 ++

[Mesa-dev] [PATCH 4/6] llvmpipe: get rid of llvmpipe_get_texture_image_all

2014-07-01 Thread sroland
From: Roland Scheidegger Once used for invoking swizzled->linear conversion for all needed images. But we now have a single allocation for all images in a resource, thus looping through all slices is rather pointless, conversion doesn't happen neither. Also simplify the sampling setup code to use

[Mesa-dev] [PATCH 5/6] llvmpipe: get rid of llvmpipe_get_texture_image

2014-07-01 Thread sroland
From: Roland Scheidegger The only caller left used it only for non display target textures, hence it was really the same as llvmpipe_get_texture_image_address - it also had a usage flag but this was ignored anyway. --- src/gallium/drivers/llvmpipe/lp_texture.c | 48 +-

[Mesa-dev] [PATCH 1/6] llvmpipe: (trivial) rename linear_mip_offsets to mip_offsets

2014-07-01 Thread sroland
From: Roland Scheidegger Since switching to non-swizzled rendering we only have "normal", aka linear, offsets. --- src/gallium/drivers/llvmpipe/lp_setup.c | 2 +- src/gallium/drivers/llvmpipe/lp_state_sampler.c | 2 +- src/gallium/drivers/llvmpipe/lp_texture.c | 6 +++--- src/galli

[Mesa-dev] [PATCH] target-helpers: don't use designated initializers

2014-07-01 Thread sroland
From: Roland Scheidegger it looks since ce1a1372280d737a1b85279995529206586ae480 they are now included in more places, in particular even for things buildable with msvc, and hence those break the build. --- src/gallium/auxiliary/target-helpers/inline_drm_helper.h | 8 1 file changed, 4

[Mesa-dev] [PATCH] gallivm: fix up out-of-bounds level when using conformant out-of-bound behavior

2014-07-29 Thread sroland
From: Roland Scheidegger When using (d3d10) conformant out-of-bound behavior for texel fetching (currently always enabled) the level still needs to be set to a safe value even though the offset in the end won't get used because the level is used to look up the mip offset itself and the actual str

[Mesa-dev] [PATCH 2/3] llvmpipe: integrate memory allocation into llvmpipe_texture_layout

2014-07-31 Thread sroland
From: Roland Scheidegger Seems pointless to just duplicate some of the calculations (the calculation of actual memory used compared to what was predicted in llvmpipe_texture_layout actually could have differed slightly in some cases due to different alignment rules used though this should have be

[Mesa-dev] [PATCH 1/3] llvmpipe: get rid of impossible code in alloc_image_data

2014-07-31 Thread sroland
From: Roland Scheidegger Only used for non display target resources. --- src/gallium/drivers/llvmpipe/lp_texture.c | 39 +++ 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_texture.c b/src/gallium/drivers/llvmpipe/lp_te

[Mesa-dev] [PATCH 3/3] llvmpipe: don't store number of layers per level

2014-07-31 Thread sroland
From: Roland Scheidegger This could be recalculated, though it turns out the only use of it after resource allocation is for calculating whole resource size (for scene size accounting though that isn't quite ideal neither). Thus, instead just store the whole resource size and drop it (saving a co

[Mesa-dev] [PATCH 1/3] llvmpipe: implement support for cube map arrays

2014-08-01 Thread sroland
From: Roland Scheidegger This just covers the resource side of things, not the actual sampling. Here things are trivial as cube map arrays are identical to 2d arrays in all respects. --- src/gallium/drivers/llvmpipe/lp_screen.c| 3 ++- src/gallium/drivers/llvmpipe/lp_setup.c | 6

[Mesa-dev] [PATCH 2/3] gallivm: fix cube map array (and cube map shadow with bias) handling

2014-08-01 Thread sroland
From: Roland Scheidegger In particular need to handle TEX2/TXB2/TXL2 opcodes (cube map shadow could already have used TXB2 which clearly couldn't have worked, despite that no piglit change), and add a bunch more switch cases. The actual sampling code still won't handle cube map arrays. --- src/g

[Mesa-dev] [PATCH 3/3] gallium/docs: Document TEX2/TXL2/TXB2 instructions and fix up other tex doc

2014-08-01 Thread sroland
From: Roland Scheidegger Add documentation for TEX2/TXL2/TXB2 tgsi opcodes. Also, the texture opcode documentation wasn't very accurate so fix this up a bit. --- src/gallium/docs/source/tgsi.rst | 127 +++ 1 file changed, 102 insertions(+), 25 deletions(-) di

[Mesa-dev] [PATCH] util/u_format_s3tc: handle srgb formats correctly.

2013-07-16 Thread sroland
From: Roland Scheidegger Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear combinations). Refactore

  1   2   3   4   5   6   7   >