On 26/06/17 20:06, Nicolai Hähnle wrote:
On 26.06.2017 11:56, Timothy Arceri wrote:
On 26/06/17 19:27, Nicolai Hähnle wrote:
On 25.06.2017 03:31, Timothy Arceri wrote:
---
src/mesa/state_tracker/st_atom_constbuf.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/mesa/state_tracker/st_atom_constbuf.c
b/src/mesa/state_tracker/st_atom_constbuf.c
index 7f17546..eb8f6b3 100644
--- a/src/mesa/state_tracker/st_atom_constbuf.c
+++ b/src/mesa/state_tracker/st_atom_constbuf.c
@@ -83,21 +83,27 @@ void st_upload_constants(struct st_context *st,
struct gl_program *prog)
/* Make all bindless samplers/images bound texture/image
units resident in
* the context.
*/
st_make_bound_samplers_resident(st, prog);
st_make_bound_images_resident(st, prog);
/* update constants */
if (params && params->NumParameters) {
struct pipe_constant_buffer cb;
- const uint paramBytes = params->NumParameters *
sizeof(GLfloat) * 4;
+
+ /* We need to align to 4 as the driver will expect to load
four values
+ * regardless of how many values the last param has. It's
safe to do the
+ * align as the param list always allocates extra memory.
+ */
+ const uint paramBytes =
+ align((params->NumParameterValues * sizeof(GLfloat)) + 1,
4);
This looks broken.
First of all, I would say that it shouldn't actually be required. A
driver that supports packed uniforms should only load as much as the
shader actually requests.
I recall you telling me on IRC that radeonsi needs to load vec4
chunks for efficiency. Maybe I misinterpreted what you were saying?
Right, we should try to merge loads in the backend, because loading a
vec4 (or even up to 16 consecutive constant!) in a single instruction
is faster than 4 or 16 loads of a single constant, assuming register
pressure isn't an issue. FWIW, we're not actually doing that right now.
However, there's no reason for a vec4 load when you only need one or
two constants, as would be the case when loading from the end of the
constant buffer. So there's no reason to pad the buffer.
Ok, thanks for the clarification. I think I can see how this would work
currently, you would need a swizzle xxxx to handle a single component
passed the last block of 4.
Cheers,
Nicolai
Second, even if it were needed, the code doesn't do what it says.
:P I made some *simplifications* here at some point, not sure what I
was thinking at the time.
Cheers,
Nicolai
/* Update the constants which come from fixed-function
state, such as
* transformation matrices, fog factors, etc. The rest of
the values in
* the parameters list are explicitly set by the user with
glUniform,
* glProgramParameter(), etc.
*/
if (params->StateFlags)
_mesa_load_state_parameters(st->ctx, params);
_mesa_shader_write_subroutine_indices(st->ctx, stage);
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev