From: Roland Scheidegger
The expansion should always be to the same width as the input arguments
no matter what, since these functions should work with any bit width of
the arguments (the sext is a no-op on any sane simd architecture).
Thus, fix the caller expecting differently.
This fixes https
From: Roland Scheidegger
Blit submits lots of packets which are usually handled by state atoms, so
these must be dirtied.
Not sure if this fixes anything, but it was a concern raised by bug 51658
(with this all issues there seen as actual bugs should be fixed, with the
exception of the patch to u
From: Roland Scheidegger
It is rather unfortunate that we don't know if a texture is going to be used
as a rt later, and we lack the means to do something about a format chosen
which we can't render to directly, so disable this and always chose renderable
format for rgba8 textures.
This addresses
From: Roland Scheidegger
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the ac
From: Roland Scheidegger
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the ac
From: Roland Scheidegger
In particular, we were incorrectly accepting s3tc (and lots of others)
for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d
targets. At this time, the only allowed formats for these calls are the
bptc ones, since none of the specific extensions allow i
From: Roland Scheidegger
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.
---
src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git
From: Roland Scheidegger
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While
From: Roland Scheidegger
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes
From: Roland Scheidegger
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical
From: Roland Scheidegger
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail wi
From: Roland Scheidegger
Since d21320f6258b2e1780a15c1ca718963d8a15ca18 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeo
From: Roland Scheidegger
Since 779cabfc7d022de8b7b9bc7fdac0caffa8646c51 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeo
From: Roland Scheidegger
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.
---
src/gallium/docs/source/tgsi.rst | 18 --
1 file changed, 8 insertions(
From: Roland Scheidegger
Since dropping some NV_fragment_program opcodes (commits
868f95f1da74cf6dd7468cba1b56664aad585ccb,
a3688d686f147f4252d19b298ae26d4ac72c2e08)
we can no longer parse all opcodes necessary for this extension, leading
to bugs (https://bugs.freedesktop.org/show_bug.cgi?id=869
From: Roland Scheidegger
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math
From: Roland Scheidegger
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replac
From: Roland Scheidegger
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).
No piglit change.
---
src/gallium/auxiliary/draw/draw_llvm.c | 7 +++
1 file changed, 7 insertions(+)
diff --gi
From: Roland Scheidegger
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, i
From: Roland Scheidegger
Previous attempts to zero initialize all inputs were not really optimal
(though no performance impact was measurable). In fact this is not really
necessary, since we know the max number of inputs used.
Instead, just generate fetch for up to max inputs used by the shader,
From: Roland Scheidegger
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.
---
src/gallium/auxiliary/gallivm/lp_bld_init.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/gallium/auxiliary
From: Roland Scheidegger
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, i
From: Roland Scheidegger
This wasn't handled before (the result was that no matter what value got
clamped, it always ended up as the near value in this case) (if clamping
actually happened).
Fix this by using the util helper for that (the math is otherwise "mostly"
the same, mostly because there
From: Roland Scheidegger
We only did depth clamp when the value was written from the fs.
This is very wrong both for d3d10 and GL, and only passed the
corresponding piglit test due to pure luck (it no longer does
with the enhanced test).
Also, interpolation clamped values to 1.0 always, which can
From: Roland Scheidegger
For the texturing packs, things looked pretty terrible. For every
lerp, we were repacking the values, and while those look sort of cheap
with 128bit, with 256bit we end up with 2 of them instead of just 1 but
worse, plus 2 extracts too (the unpack, however, works fine wit
From: Roland Scheidegger
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be o
From: Roland Scheidegger
Previous fixes were incomplete - some code still iterated through the number
of elements provided by velem layout instead of the number stored in the key
(which is the same as the number defined by the vs). And also actually
accessed the elements from the layout directly
From: Roland Scheidegger
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be o
From: Roland Scheidegger
This is used by shader umul_hi/imul_hi functions (and soon by draw).
It's actually useful separating this out on its own, however the real
reason for doing it is because we're using an optimized sse2 version,
since the code llvm generates is atrocious (since there's no wi
From: Roland Scheidegger
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be o
From: Roland Scheidegger
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here
From: Roland Scheidegger
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be o
From: Roland Scheidegger
vsplit_get_base_idx explicitly returned idx 0 and set the ofbit
in case of overflow. We'd then check the ofbit and use idx 0 instead of
looking it up. This was necessary because DRAW_GET_IDX used to return
DRAW_MAX_FETCH_IDX and not 0 in case of overflows.
However, this i
From: Roland Scheidegger
This is a bit simpler. Mostly to make it easier to unify the paths later...
---
src/gallium/auxiliary/draw/draw_llvm.c | 48 ++
src/gallium/auxiliary/draw/draw_llvm.h | 8 ++--
.../draw/draw_pt_fetch_shade_pipeline_llvm.c
Overflow handling is simplified quite a bit both in jit code and vsplit
paths (basically just let things wrap around everywhere). This seems to
be good enough for all apis.
Also, elts and linear jit code is unified since the differences are minimal
(even more so at the end of the series). The cost
From: Roland Scheidegger
This was kind of strange, since it replaced indices which were only
overflowing due to bias with MAX_UINT. This would cause an overflow later
in the shader, except if stride was 0, however the vertex id would be
essentially random then (-1 + eltBias). No test cared about
From: Roland Scheidegger
It turns out that noone actually cares if the address computations overflow,
be it the stride mul or the offset adds.
Wrap around seems to be explicitly permitted even by some other API (which
is a _very_ surprising result, as these overflow computations were added just
f
From: Roland Scheidegger
The code for elts and linear paths was nearly 100% identical by now - with
the elts path simply having some additional gather for the elements in the
main loop (with some additional small differences before the main loop).
Hence nuke the separate functions and decide thi
From: Roland Scheidegger
Don't keep the ofbit. This is just a minor simplification, just adjust
the buffer size so that there will always be an overflow if buffers aren't
valid to fetch from.
Also, get rid of control flow from the instanced path too. Not worried about
performance, but it's simple
From: Roland Scheidegger
Trivial, this just resurrects the code which was there once upon a time
(the code can't lower instructions generated in the lowering pass there,
and even if it could it would probably be suboptimal).
This fixes piglit mesa_shader_integer_functions fs-ldexp.shader_test and
From: Roland Scheidegger
By using a dst_type in the the gather interface, gather has some more
knowledge about how values should be fetched.
E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather
will no longer do a ZExt with a 96bit scalar value to 128bit, but
just fetch the 96bit
From: Roland Scheidegger
Just like other similar compressed formats.
---
src/gallium/auxiliary/util/u_format.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/gallium/auxiliary/util/u_format.c
b/src/gallium/auxiliary/util/u_format.c
index 72dd60f..3d28190 100644
--- a/src/gallium/a
From: Roland Scheidegger
Note that we really want to _never_ reach the bottom of the function, which
resorts to AoS fetch.
Half floats can be handled just like other formats which fit into 32bit
vectors (so, only 1x16 and 2x16 formats, albeit with more channels things
are not THAT bad), with mini
From: Roland Scheidegger
As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing.
The extension spec wasn't updated, but just allow it with older GL versions
as well, hoping there aren't any crazy tests which want to see an error
there... (Compile tested only.)
Reported by Jó
From: Roland Scheidegger
soa fetch so far always assumed that data was aligned. However, we want to
use this for vertex fetch, and data might not be aligned there, so handle
it in this path too (basically just pass through alignment through to other
functions). (It looks like it wouldn't work for
From: Roland Scheidegger
Now that there's some SoA fetch which never falls back, we should usually get
results which are better or at least not worse (something like rgba32f will
stay the same). I suppose though it might be worse in some cases where the
format doesn't require conversion (e.g. rg3
From: Roland Scheidegger
This can now handle rgtc (unorm) too - this path no longer handles plain
formats, but that's unnecessary they now all have their proper SoA unpack
(this will still be dog-slow though due to the actual fetch being per-pixel
util fallbacks).
---
src/gallium/auxiliary/galli
From: Roland Scheidegger
This previously always fell back to AoS conversion. Even for 4-float formats
(which is the optimal case by far for that fallback case) this was suboptimal,
since it meant the conversion couldn't be done with 256bit vectors. While this
may still only be partly possible for
From: Roland Scheidegger
We should do transpose, not extract/insert, at least with "sufficient" amount
of channels (for 4 channels, extract/insert shuffles generated otherwise look
truly terrifying). Albeit we shouldn't fallback to that so often in any case.
---
src/gallium/auxiliary/gallivm/lp_
From: Roland Scheidegger
By using a dst_type in the the gather interface, gather has some more
knowledge about how values should be fetched.
E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather
will no longer do a ZExt with a 96bit scalar value to 128bit, but
just fetch the 96bit
From: Roland Scheidegger
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.
Also add code to avoid scalar expansion of i1 values which we
From: Roland Scheidegger
If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source v
From: Roland Scheidegger
Using bit replication. This path now resembles something which might make
sense. (The logic was mostly copied from llvmpipe fs backend.)
I am not convinced though it is actually faster than SoA sampling (actually
I'm quite certain it's always a loss with AVX).
With SoA it
From: Roland Scheidegger
This code uses a vector shift which has to be emulated on x86 unless
there's AVX2. Luckily in some cases we can actually avoid the shift
altogether, so do that.
Also make sure we hit the fast lp_build_conv() path when applicable,
albeit that's quite the hack...
That said,
From: Roland Scheidegger
llvm has _huge_ problems trying to load things like <4 x i8> vectors and
stitching such loads together to form 128bit vectors. My understanding
of the problem is that the type legalizer tries to extend that to
really a <4 x i32> vector and not a <16 x i8> vector with the
From: Roland Scheidegger
For rgbx formats, there is no point in doing alpha conversion again (and
with different tranpose even, so llvm can't eliminate it).
Albeit it looks like there's some minimal changes needed in the blend code
(found by code inspection, no test seemed to complain) if we do t
From: Roland Scheidegger
This special packing path can be easily extended to handle not just
float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8
(i.e. all interesting cases for llvmpipe fs backend code).
The packing parts all stay the same (only the last step packing will
be si
From: Roland Scheidegger
Generally we should do tranpose after conversion, if the format has less than
32 bits per channel (if it has 32 bits, conversion is going to be a no-op
anyway...). This is obviously because there's less vectors to deal with.
Though the advantage for 16 bit formats isn't t
From: Roland Scheidegger
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho appr
From: Roland Scheidegger
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho appr
From: Roland Scheidegger
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some te
From: Roland Scheidegger
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
---
src/gallium/auxiliary/gallivm/lp_bld_sample.c | 7
From: Roland Scheidegger
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some te
From: Roland Scheidegger
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho appr
From: Roland Scheidegger
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).
v2: also get rid of 3 helper functions no longer used.
From: Roland Scheidegger
Fix coord wrapping (and face selection too) in case of edges.
Unfortunately, the coord wrapping is way more complicated than what
the code did, as it depends on the face and the direction where the
texel falls off the face (the logic needed to get this right in fact
seems
From: Roland Scheidegger
The previous limit of of 128*1024 was reported to cause frequent recompiles
in some apps due to shader variant thrashing on IRC in some apps leading
to noticeable lags.
Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible
to reach, since even simp
From: Roland Scheidegger
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a
From: Roland Scheidegger
---
src/gallium/drivers/llvmpipe/lp_screen.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 723e40e..4c81022 100644
--- a/src/gallium/drivers/llvmpipe/lp_sc
From: Roland Scheidegger
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering cod
From: Roland Scheidegger
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering cod
From: Roland Scheidegger
This format, while still supported in OpenGL (but optional) and glx, is just
causing major nuisance everywhere and needs special code in some places,
because things like 1 << depth_bits don't work.
It is also the reason why we chose (just like in GL) depth clear values as
From: Roland Scheidegger
The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was act
From: Roland Scheidegger
SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions p
From: Roland Scheidegger
We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch func
From: Roland Scheidegger
There's only one minor functional change, for immediates the pixel offsets
are no longer added since the values are all the same for all elements in
any case (it might be better if those weren't stored as soa vectors in the
first place maybe).
---
src/gallium/auxiliary/g
From: Roland Scheidegger
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by
From: Roland Scheidegger
In particular get rid of home-grown vector helpers which didn't add much.
And while here fix formatting a bit. No functional change.
---
src/gallium/drivers/llvmpipe/lp_state_setup.c | 183 +
1 file changed, 66 insertions(+), 117 deletions(-)
di
From: Roland Scheidegger
Some rounding errors could crop up when calculating a0. Use a more accurate
method (barycentric interpolation essentially) to fix this, though to fix
the REAL problem (which is that our interpolation will give very bad results
with small triangles far away from the origin
From: Roland Scheidegger
Gallium (but not OpenGL) does allow nesting of queries, but there's no
limit specified (d3d10 has no limit neither). Nevertheless, for practical
purposes we need some limit in llvmpipe, otherwise we'd need more complex
handling of queries as we need to keep track of all b
From: Roland Scheidegger
Such conversions (which are most likely rather pointless in practice) were
resulting in shifts with negative shift counts and shifts with counts the same
as the bit width. This was always undefined in llvm, the code generated was
rather horrendous but happened to work.
So
From: Roland Scheidegger
Such conversions (which are most likely rather pointless in practice) were
resulting in shifts with negative shift counts and shifts with counts the same
as the bit width. This was always undefined in llvm, the code generated was
rather horrendous but happened to work.
So
From: Roland Scheidegger
Previously llvm detected cpu features automatically when the execution engine
was created (based on host cpu). This is no longer the case, which meant llvm
was then not able to emit some of the intrinsics we used as we didn't specify
any sse attributes (only on avx suppor
From: Roland Scheidegger
The old logic would let all negative values go through unclamped, with
potentially disastrous results (probably trying to fetch viewport values
from random memory locations). GL has undefined rendering for vp indices
outside valid range but that's a bit too undefined...
(
From: Roland Scheidegger
The last_level from the sampler view may be limited by the state tracker
to a value lower than what the base texture provides.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=80541.
---
src/gallium/drivers/softpipe/sp_tex_sample.c | 39 ++--
1
From: Roland Scheidegger
Because the layout is always linear this didn't really do much any longer -
at some point this triggered per-tile swizzled->linear conversion. The x/y
coords were ignored too.
Apart from triggering conversion, this also invoked alloc_image_data(), which
could only actuall
From: Roland Scheidegger
The deferred allocation doesn't really make much sense anymore, since we no
longer allocate swizzled/linear memory in chunks and not per level / slice
neither.
This means we could fail resource creation a bit more (could already fail in
theory anyway) but should not fail
From: Roland Scheidegger
Just use a tex_data pointer directly - the description was no longer correct
neither.
---
src/gallium/drivers/llvmpipe/lp_setup.c | 2 +-
src/gallium/drivers/llvmpipe/lp_state_sampler.c | 2 +-
src/gallium/drivers/llvmpipe/lp_texture.c | 39 ++
From: Roland Scheidegger
Once used for invoking swizzled->linear conversion for all needed images.
But we now have a single allocation for all images in a resource, thus looping
through all slices is rather pointless, conversion doesn't happen neither.
Also simplify the sampling setup code to use
From: Roland Scheidegger
The only caller left used it only for non display target textures,
hence it was really the same as llvmpipe_get_texture_image_address - it
also had a usage flag but this was ignored anyway.
---
src/gallium/drivers/llvmpipe/lp_texture.c | 48 +-
From: Roland Scheidegger
Since switching to non-swizzled rendering we only have "normal", aka linear,
offsets.
---
src/gallium/drivers/llvmpipe/lp_setup.c | 2 +-
src/gallium/drivers/llvmpipe/lp_state_sampler.c | 2 +-
src/gallium/drivers/llvmpipe/lp_texture.c | 6 +++---
src/galli
From: Roland Scheidegger
it looks since ce1a1372280d737a1b85279995529206586ae480 they are now included
in more places, in particular even for things buildable with msvc, and hence
those break the build.
---
src/gallium/auxiliary/target-helpers/inline_drm_helper.h | 8
1 file changed, 4
From: Roland Scheidegger
When using (d3d10) conformant out-of-bound behavior for texel fetching
(currently always enabled) the level still needs to be set to a safe value
even though the offset in the end won't get used because the level is used
to look up the mip offset itself and the actual str
From: Roland Scheidegger
Seems pointless to just duplicate some of the calculations (the calculation
of actual memory used compared to what was predicted in llvmpipe_texture_layout
actually could have differed slightly in some cases due to different alignment
rules used though this should have be
From: Roland Scheidegger
Only used for non display target resources.
---
src/gallium/drivers/llvmpipe/lp_texture.c | 39 +++
1 file changed, 13 insertions(+), 26 deletions(-)
diff --git a/src/gallium/drivers/llvmpipe/lp_texture.c
b/src/gallium/drivers/llvmpipe/lp_te
From: Roland Scheidegger
This could be recalculated, though it turns out the only use of it after
resource allocation is for calculating whole resource size (for scene size
accounting though that isn't quite ideal neither). Thus, instead just store
the whole resource size and drop it (saving a co
From: Roland Scheidegger
This just covers the resource side of things, not the actual sampling.
Here things are trivial as cube map arrays are identical to 2d arrays in
all respects.
---
src/gallium/drivers/llvmpipe/lp_screen.c| 3 ++-
src/gallium/drivers/llvmpipe/lp_setup.c | 6
From: Roland Scheidegger
In particular need to handle TEX2/TXB2/TXL2 opcodes (cube map shadow
could already have used TXB2 which clearly couldn't have worked, despite
that no piglit change), and add a bunch more switch cases.
The actual sampling code still won't handle cube map arrays.
---
src/g
From: Roland Scheidegger
Add documentation for TEX2/TXL2/TXB2 tgsi opcodes. Also, the texture opcode
documentation wasn't very accurate so fix this up a bit.
---
src/gallium/docs/source/tgsi.rst | 127 +++
1 file changed, 102 insertions(+), 25 deletions(-)
di
From: Roland Scheidegger
Instead of just ignoring the srgb/linear conversions, simply call the
corresponding conversion functions, for all of pack/unpack/fetch,
both for float and unorm8 versions (though some don't make a whole
lot of sense, i.e. unorm8/unorm8 srgb/linear combinations).
Refactore
1 - 100 of 640 matches
Mail list logo