Re: [Mesa-dev] More questions from a newbie

2013-09-20 Thread Rogovin, Kevin
 Hi all,

>I'm not aware of any public web servers that serve up the doxygen 
>documentation right now.  If you can find one (or set up one, or convince the 
>freedesktop.org >maintainers to set up one), I would 
>be in favor of putting overview style documentation in doxygen format.  
>Otherwise I think it would be better to add overview style >documentation to 
>the existing docs/ directory in html form, because that's what people see if 
>they go to http://www.mesa3d.org/.

There is a link to the doxygen magicks, over at "Developer Topics -> Source 
Code" and then a link as the last word in the text " .. documentation here". 
The link is dead; however, the link to bufferobj.c works.

I really think the docs should be "part of the source code" to help guarantee 
they do not rot; who would I contact of http://www.mesa3d.org so that the link 
to the source code would work? It will mean that whoever maintains the site 
will need to run #cd doxygen && make, which I hope would not be a big deal.

>If we can find or set up a public server serving up the latest doxygen 
>documentation (and that's a big "if"), then I'd be in favor of adding group 
>tags so that module >categorization is in the sources.  But I don't understand 
>the benefit of having two separate doxygen outputs (one for headers and one 
>for all code).  The fact is, the only >consumers of the doxygen output are 
>going to be mesa developers and people who are interested in understanding the 
>source code.  That argues for one piece of >doxygen output that contains all 
>the code.  Besides, we currently have zero public doxygen builds of Mesa; I'd 
>far prefer to take the incremental step of changing this >to one, and seeing 
>how it goes for a while, rather than jumping to two and risking confusing 
>people.

I agree with you, two doxygen runs is silly since Mesa is an implementation of 
documented and speced API; thus the only doxygen documentation would include 
sources that are linked so that a developer can quickly navigate. As for the 
how, this is what I am thinking of doing:

1.   First just tweak the .doxy files found in doxygen (they have that 
various directories are processed by doxygen separately so that the module 
thing is kind of done by directory already). However, write the overview text 
(new files) in each of the main directories. The text would be in a \mainpage 
header. Since each directory is a separate run, it will then present the 
overview text of that directory when clicked. If a directory already has a main 
page then just edit that.

2.   Once the documents look good and people are happy with AND there is 
update from the websites to make and host the doxygen stuff, then redo the 
doxygen files a bit to one doxygen file and then add group tags to the headers.

The 2nd step is not really needed if people are happy with the results of 
(1)... and skipping (2) has one positive side-effect, one can then, if one 
chooses, organize the sources into sub-groups of a main directory. Given that 
step 1 is not even started, I think figure out 2 if 1 gets done as needed.

>There's a lot of code in core Mesa that is shared by all drivers.  Even if 
>your documentation efforts only wind up affecting i965 code and shared code, 
>they will still >benefit everyone.  I think the best way to address this is to 
>take the "patches welcome" attitude--if you set up a public doxygen server, 
>and get the documentation into >good enough shape that it is useful, either 
>the gallium folks will want to contribute (in which case they can, by updating 
>comments in the gallium code), or they won't >(in which case they still will 
>benefit from the doxygen documentation in core Mesa).  Either way it's a win.

I'll try to get something set up; but to make progress faster and now, I'll 
start reading and then writing those \mainpage texts for each main directory. 
I'll start posting the text here once I've written something that has some info 
it. Along those lines, which is preferred by those doing the reading and 
checking posting the text as a patch or just the text itself?

Best Regards
-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] RFC: gl_shader cache

2013-09-20 Thread Tapani Pälli
Hello;

This is RFC for shader cache implementation, goal is to optimize application
startup time. The implementation is based on idea that Kenneth had on a Mesa
branch for memory based cache. These patches implement a 'behind the scenes'
cache for shader compiler storing gl_shader struct to the disk. Later on my
goal is to implement OES_get_program_binary extension using some of this
implementation as the basis.

There are already number of TODO's and issues mentioned in commit msgs,
please check these first. Then, there are probably bugs and missing things
that I'm not aware of, please let me know about any of these. One big TODO
is to rewrite the serialization to happen to memory area first (not directly
to the disk), this will make serialization more robust with more centralized
error handling but also makes it more usable for OES_get_program_binary
implementation. I haven't paid much attention to the portability of this code
so I would be interested to know also how well this would work with windows
and others.

One big TODO/goal is to move this all to happen after linking to cache
all shaders in a gl program. This will give far better optimization for
startup time. Currently rough approx 1/4 time is spent during compilation
but much more after linking and optimizations done after linking (this is
based on analysis done with particular app and callgrind). I have already
somewhat promising results from this cache, for example compiling L4D2
shaders and shader-db runs are faster..

On my SNB desktop machine shader-db takes ~2.4secs without cache, with cache
it takes ~1.8secs (~2.7 when cache gets generated) (values are avg from 100
runs compiling all the shaders). Shader-db creates 266 blobs eating 46M of
disk space when using cache.

Any comments greatly appreciated!

Tapani Pälli (4):
  glsl: add ir_cache class for IR serialization (WIP)
  mesa: gl_shader_cache class (WIP)
  glsl: export populate_symbol_table
  glsl: use gl_shader_cache when compiling (WIP)

 src/glsl/Makefile.sources |   2 +
 src/glsl/glsl_parser_extras.cpp   |  40 ++
 src/glsl/ir_cache.h   | 535 +
 src/glsl/ir_cache_serialize.cpp   | 603 
 src/glsl/ir_cache_unserialize.cpp | 969 ++
 src/glsl/linker.cpp   |   2 +-
 src/glsl/linker.h |   3 +
 src/mesa/Makefile.sources |   1 +
 src/mesa/main/context.c   |   2 +
 src/mesa/main/mtypes.h|   4 +
 src/mesa/main/shadercache.cpp | 198 
 src/mesa/main/shadercache.h   |  68 +++
 12 files changed, 2426 insertions(+), 1 deletion(-)
 create mode 100644 src/glsl/ir_cache.h
 create mode 100644 src/glsl/ir_cache_serialize.cpp
 create mode 100644 src/glsl/ir_cache_unserialize.cpp
 create mode 100644 src/mesa/main/shadercache.cpp
 create mode 100644 src/mesa/main/shadercache.h

-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] glsl: export populate_symbol_table

2013-09-20 Thread Tapani Pälli
will be used by compiler code when loading shaders from the cache

Signed-off-by: Tapani Pälli 
---
 src/glsl/linker.cpp | 2 +-
 src/glsl/linker.h   | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 8a143fd..538676b 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -838,7 +838,7 @@ interstage_cross_validate_uniform_blocks(struct 
gl_shader_program *prog)
 /**
  * Populates a shaders symbol table with all global declarations
  */
-static void
+void
 populate_symbol_table(gl_shader *sh)
 {
sh->symbols = new(sh) glsl_symbol_table;
diff --git a/src/glsl/linker.h b/src/glsl/linker.h
index 8a0027d..58ff8e1 100644
--- a/src/glsl/linker.h
+++ b/src/glsl/linker.h
@@ -26,6 +26,9 @@
 #ifndef GLSL_LINKER_H
 #define GLSL_LINKER_H
 
+void
+populate_symbol_table(gl_shader *sh);
+
 extern bool
 link_function_calls(gl_shader_program *prog, gl_shader *main,
gl_shader **shader_list, unsigned num_shaders);
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa: gl_shader_cache class (WIP)

2013-09-20 Thread Tapani Pälli
Patch introduces gl_shader_cache class that uses ir_cache
to save and load shaders from the disk.

Known issues / bugs:

- lightsmark textures black, unigine and glb2.7 texturing
  work just fine though .. something is broken there (?)

- warsow going crazy at 'move_push_constants_to_pull_constants',
  (int pull_constant_loc[this->uniforms];), something broken

TODO

- recursive cache directory generation
- env variable to enable/disable cache
- create new exec_list directly to the given shader (not
  having to copy it later as it takes extra time)

Signed-off-by: Tapani Pälli 
---
 src/mesa/Makefile.sources |   1 +
 src/mesa/main/context.c   |   2 +
 src/mesa/main/mtypes.h|   4 +
 src/mesa/main/shadercache.cpp | 198 ++
 src/mesa/main/shadercache.h   |  68 +++
 5 files changed, 273 insertions(+)
 create mode 100644 src/mesa/main/shadercache.cpp
 create mode 100644 src/mesa/main/shadercache.h

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 122ea8e..f5ef82a 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -41,6 +41,7 @@ MAIN_FILES = \
$(SRCDIR)main/feedback.c \
$(SRCDIR)main/ffvertex_prog.c \
$(SRCDIR)main/ff_fragment_shader.cpp \
+   $(SRCDIR)main/shadercache.cpp \
$(SRCDIR)main/fog.c \
$(SRCDIR)main/formatquery.c \
$(SRCDIR)main/formats.c \
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 58f42cc..901a652 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -131,6 +131,7 @@
 #include "program/prog_print.h"
 #include "math/m_matrix.h"
 #include "main/dispatch.h" /* for _gloffset_COUNT */
+#include "shadercache.h"
 
 #ifdef USE_SPARC_ASM
 #include "sparc/sparc.h"
@@ -774,6 +775,7 @@ init_attrib_groups(struct gl_context *ctx)
_mesa_init_rastpos( ctx );
_mesa_init_scissor( ctx );
_mesa_init_shader_state( ctx );
+   _mesa_init_shader_cache( ctx );
_mesa_init_stencil( ctx );
_mesa_init_transform( ctx );
_mesa_init_transform_feedback( ctx );
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index c88c1c6..9716b16 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -72,6 +72,7 @@ struct gl_attrib_node;
 struct gl_list_extensions;
 struct gl_meta_state;
 struct gl_program_cache;
+struct gl_shader_cache;
 struct gl_texture_object;
 struct gl_context;
 struct st_context;
@@ -2006,6 +2007,9 @@ struct gl_vertex_program_state
/** Cache of fixed-function programs */
struct gl_program_cache *Cache;
 
+   /** Cache of GLSL shaders */
+   struct gl_shader_cache *ShaderCache;
+
GLboolean _Overriden;
 };
 
diff --git a/src/mesa/main/shadercache.cpp b/src/mesa/main/shadercache.cpp
new file mode 100644
index 000..fd19c20
--- /dev/null
+++ b/src/mesa/main/shadercache.cpp
@@ -0,0 +1,198 @@
+
+/* -*- c++ -*- */
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "imports.h"
+#include "shadercache.h"
+#include "ir_cache.h"
+
+/**
+ * How it works:
+ *
+ * Shader cache uses ir_cache class to serialize instructions
+ * in a binary form to ~/.cache/mesa directory. When loading, a
+ * new IR exec_list is constructed from loaded data and new shader
+ * created and returned.
+ */
+
+
+void gl_shader_cache::init_cache_path()
+{
+   const char *tmp = "/tmp";
+   const char *cache_root = getenv("XDG_CACHE_DIR");
+   if (!cache_root)
+  cache_root = getenv("HOME");
+   if (!cache_root)
+  cache_root = tmp;
+
+   _mesa_snprintf(cache_path, MAX_CACHE_PATHLEN,
+  "%s/.cache/mesa", cache_root);
+
+   struct stat stat_info;
+
+   /**
+* FIXME construct the whole path, now assumes ~/.cache
+*/
+   if(stat(cache_path, &stat_info) != 0) {
+  if(mkdir(cache_path, 0775))
+ fprintf(stderr, "%s: error creating shader cache [%s]\n",
+

[Mesa-dev] [PATCH 4/4] glsl: use gl_shader_cache when compiling (WIP)

2013-09-20 Thread Tapani Pälli
Patch takes gl_shader_cache in to use. When compiling a shader, we
first search the cache if it exists already and can be used from there.
If cache did not exist, we attempt to cache the shader after compilation.

Signed-off-by: Tapani Pälli 
---
 src/glsl/glsl_parser_extras.cpp | 40 
 1 file changed, 40 insertions(+)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index cac5a18..cb546f7 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -30,6 +30,7 @@ extern "C" {
 #include "main/context.h"
 #include "main/shaderobj.h"
 }
+#include "main/shadercache.h"
 
 #include "ralloc.h"
 #include "ast.h"
@@ -37,6 +38,7 @@ extern "C" {
 #include "glsl_parser.h"
 #include "ir_optimization.h"
 #include "loop_analysis.h"
+#include "linker.h"
 
 /**
  * Format a short human-readable description of the given GLSL version.
@@ -1456,6 +1458,40 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
state->error = glcpp_preprocess(state, &source, &state->info_log,
  &ctx->Extensions, ctx);
 
+   /* check if we have same shader cached */
+   struct gl_shader *existing =
+  ctx->VertexProgram.ShaderCache->find(shader, source, state);
+   if (existing) {
+  /* use cached shader, clone ir list, populate symbol table */
+  shader->CompileStatus = GL_TRUE;
+  shader->InfoLog = ralloc_strdup(shader, "cached shader");
+  shader->Version = existing->Version;
+  shader->Type = existing->Type;
+  shader->IsES = existing->IsES;
+
+  /**
+   * NOTE - following should not be needed and should be removed
+   * as takes a lot of time. Problem with this is that we might currently
+   * bail out while reading cache and decide to use the original. This
+   * should be decided/known already when writing cache.
+   */
+  ralloc_free(shader->ir);
+  shader->ir = new(shader) exec_list;
+  clone_ir_list(shader, shader->ir, existing->ir);
+  ralloc_free(existing->ir);
+  ralloc_free(existing);
+
+  populate_symbol_table(shader);
+
+  memcpy(shader->builtins_to_link, state->builtins_to_link,
+ sizeof(shader->builtins_to_link[0]) * state->num_builtins_to_link);
+  shader->num_builtins_to_link = state->num_builtins_to_link;
+
+  _mesa_debug(ctx, "used a shader from cache\n");
+  return;
+   }
+
+
if (!state->error) {
  _mesa_glsl_lexer_ctor(state, source);
  _mesa_glsl_parse(state);
@@ -1524,6 +1560,10 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
/* Retain any live IR, but trash the rest. */
reparent_ir(shader->ir, shader->ir);
 
+   /* attempt to cache this shader */
+   if (ctx->VertexProgram.ShaderCache->cache(shader, source, state) == 0)
+  _mesa_debug(ctx, "cached a shader\n");
+
ralloc_free(state);
 }
 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/gen7.5: Fix missing Shader Channel Select entries on Haswell

2013-09-20 Thread Abdiel Janulgue
Probably non-intentional, but the SURFACE_STATE setup refactoring
for buffer surfaces had missed the scs bits when creating constant
surface states.

Fixes broken GLB 2.5 on Haswell where the knight's textures are missing

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index 6938b1a..7571cbf 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -249,6 +249,13 @@ gen7_emit_buffer_surface_state(struct brw_context *brw,
 
surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
 
+   if (brw->is_haswell) {
+  surf[7] |= (SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
+  SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
+  SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
+  SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A));
+   }
+
/* Emit relocation to surface contents */
if (bo) {
   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] llvmpipe: Fix rendering to PIPE_FORMAT_R10G10B10A2_UNORM.

2013-09-20 Thread jfonseca
From: José Fonseca 

We must take rounding in consideration when re-scaling to narrow
normalized channels, such as 2-bit normalized alpha.
---
 src/gallium/drivers/llvmpipe/lp_state_fs.c | 84 +++---
 1 file changed, 78 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index 07e6685..875a3cf 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -876,7 +876,22 @@ lp_blend_type_from_format_desc(const struct 
util_format_description *format_desc
 
 
 /**
- * Scale a normalized value from src_bits to dst_bits
+ * Scale a normalized value from src_bits to dst_bits.
+ *
+ * The exact calculation is
+ *
+ *dst = iround(src * dst_mask / src_mask)
+ *
+ *  or with integer rounding
+ *
+ *dst = src * (2*dst_mask + sign(src)*src_mask) / (2*src_mask)
+ *
+ *  where
+ *
+ *src_mask = (1 << src_bits) - 1
+ *dst_mask = (1 << dst_bits) - 1
+ *
+ * but we try to avoid division and multiplication through shifts.
  */
 static INLINE LLVMValueRef
 scale_bits(struct gallivm_state *gallivm,
@@ -889,11 +904,68 @@ scale_bits(struct gallivm_state *gallivm,
LLVMValueRef result = src;
 
if (dst_bits < src_bits) {
-  /* Scale down by LShr */
-  result = LLVMBuildLShr(builder,
- src,
- lp_build_const_int_vec(gallivm, src_type, 
src_bits - dst_bits),
- "");
+  int delta_bits = src_bits - dst_bits;
+
+  if (delta_bits <= dst_bits) {
+ /*
+  * Approximate the rescaling with a single shift.
+  *
+  * This gives the wrong rounding.
+  */
+
+ result = LLVMBuildLShr(builder,
+src,
+lp_build_const_int_vec(gallivm, src_type, 
delta_bits),
+"");
+
+  } else {
+ /*
+  * Try more accurate rescaling.
+  */
+
+ /*
+  * Drop the least significant bits to make space for the 
multiplication.
+  *
+  * XXX: A better approach would be to use a wider integer type as 
intermediate.  But
+  * this is enough to convert alpha from 16bits -> 2 when rendering to
+  * PIPE_FORMAT_R10G10B10A2_UNORM.
+  */
+ result = LLVMBuildLShr(builder,
+src,
+lp_build_const_int_vec(gallivm, src_type, 
dst_bits),
+"");
+
+
+ result = LLVMBuildMul(builder,
+   result,
+   lp_build_const_int_vec(gallivm, src_type, (1LL 
<< dst_bits) - 1),
+   "");
+
+ /*
+  * Add a rounding term before the division.
+  *
+  * TODO: Handle signed integers too.
+  */
+ if (!src_type.sign) {
+result = LLVMBuildAdd(builder,
+  result,
+  lp_build_const_int_vec(gallivm, src_type, 
(1LL << (delta_bits - 1))),
+  "");
+ }
+
+ /*
+  * Approximate the division by src_mask with a src_bits shift.
+  *
+  * Given the src has already been shifted by dst_bits, all we need
+  * to do is to shift by the difference.
+  */
+
+ result = LLVMBuildLShr(builder,
+result,
+lp_build_const_int_vec(gallivm, src_type, 
delta_bits),
+"");
+  }
+
} else if (dst_bits > src_bits) {
   /* Scale up bits */
   int db = dst_bits - src_bits;
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] draw/clip: don't emit so many empty triangles

2013-09-20 Thread Jose Fonseca


- Original Message -
> Am 19.09.2013 20:43, schrieb Zack Rusin:
> > Compress empty triangles (don't emit more than one in a row) and
> > never emit empty triangles if we already generated a triangle
> > covering a non-null area. We can't skip all null-triangles
> > because c_primitives expects ones that were generated from vertices
> > exactly at the clipping-plane, to be emitted.
> > 
> > Signed-off-by: Zack Rusin 
> > ---
> >  src/gallium/auxiliary/draw/draw_pipe_clip.c | 39
> >  +
> >  1 file changed, 39 insertions(+)
> > 
> > diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c
> > b/src/gallium/auxiliary/draw/draw_pipe_clip.c
> > index 0f90bfd..2d6df81 100644
> > --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
> > +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
> > @@ -209,6 +209,29 @@ static void interp( const struct clip_stage *clip,
> > }
> >  }
> >  
> > +/**
> > + * Checks whether the specifed triangle is empty and if it is returns
> specified
> 
> > + * true, otherwise returns false.
> > + * Triangle is considered null/empty if it's area is qual to zero.
> equal
> 
> 
> 
> > + */
> > +static INLINE boolean
> > +is_tri_null(struct draw_context *draw, const struct prim_header *header)
> > +{
> > +   const unsigned pos_attr = draw_current_shader_position_output(draw);
> > +   float x1 = header->v[1]->data[pos_attr][0] -
> > header->v[0]->data[pos_attr][0];
> > +   float y1 = header->v[1]->data[pos_attr][1] -
> > header->v[0]->data[pos_attr][1];
> > +   float z1 = header->v[1]->data[pos_attr][2] -
> > header->v[0]->data[pos_attr][2];
> > +
> > +   float x2 = header->v[2]->data[pos_attr][0] -
> > header->v[0]->data[pos_attr][0];
> > +   float y2 = header->v[2]->data[pos_attr][1] -
> > header->v[0]->data[pos_attr][1];
> > +   float z2 = header->v[2]->data[pos_attr][2] -
> > header->v[0]->data[pos_attr][2];
> > +
> > +   float vx = y1 * z2 - z1 * y2;
> > +   float vy = x1 * z2 - z1 * x2;
> > +   float vz = x1 * y2 - y1 * x2;
> > +


> > +   return (vx*vx  + vy*vy + vz*vz) == 0.f;

You can simplify this with

  return vz == 0.0f && vy == 0.0f && vz == 0.0f

> > +}
> 
> Hmm I think the usual calculation for area would be:
> 
>dx01 = x[0] - x[1];
>dy01 = y[0] - y[1];
> 
>dx20 = x[2] - x[0];
>dy20 = y[2] - y[0];
> 
>area = dx01 * dy20 - dx20 * dy01;

This is the projection of the aera in the xy plane, ie, the same a `vz` in 
Zack's patch.
> 
> Do you really need to factor in z for figuring out if the area is null
> here? In other words I'm wondering if the ordinary face culling stage
> wouldn't just do the job if enabled (as even without face culling it
> will calculate this area and toss out the prim in case it's zero).
> I guess though the problem is that cull stage is _after_ clip.

It's a matter of convention, so it really depends on what the tests expect to 
see in the statistics.

Otherwise series looks good to me.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] implement NV_vdpau_interop v2

2013-09-20 Thread Christian König
From: Christian König 

v2: Actually implement interop between the gallium
state tracker and the VDPAU backend.

Signed-off-by: Christian König 
---
 src/gallium/include/state_tracker/vdpau_interop.h |  49 
 src/gallium/state_trackers/vdpau/ftab.c   |  31 ++-
 src/gallium/state_trackers/vdpau/output.c |  11 +
 src/gallium/state_trackers/vdpau/surface.c|  21 ++
 src/gallium/state_trackers/vdpau/vdpau_private.h  |   6 +
 src/mapi/glapi/gen/Makefile.am|   1 +
 src/mapi/glapi/gen/NV_vdpau_interop.xml   |  69 ++
 src/mapi/glapi/gen/gl_API.xml |   2 +
 src/mapi/glapi/gen/gl_genexec.py  |   1 +
 src/mesa/Makefile.sources |   4 +-
 src/mesa/main/dd.h|  14 ++
 src/mesa/main/extensions.c|   1 +
 src/mesa/main/mtypes.h|   9 +
 src/mesa/main/vdpau.c | 279 ++
 src/mesa/main/vdpau.h |  72 ++
 src/mesa/state_tracker/st_context.c   |   3 +
 src/mesa/state_tracker/st_extensions.c|   1 +
 src/mesa/state_tracker/st_vdpau.c | 170 +
 src/mesa/state_tracker/st_vdpau.h |  42 
 19 files changed, 777 insertions(+), 9 deletions(-)
 create mode 100644 src/gallium/include/state_tracker/vdpau_interop.h
 create mode 100644 src/mapi/glapi/gen/NV_vdpau_interop.xml
 create mode 100644 src/mesa/main/vdpau.c
 create mode 100644 src/mesa/main/vdpau.h
 create mode 100644 src/mesa/state_tracker/st_vdpau.c
 create mode 100644 src/mesa/state_tracker/st_vdpau.h

diff --git a/src/gallium/include/state_tracker/vdpau_interop.h 
b/src/gallium/include/state_tracker/vdpau_interop.h
new file mode 100644
index 000..4069d20
--- /dev/null
+++ b/src/gallium/include/state_tracker/vdpau_interop.h
@@ -0,0 +1,49 @@
+/**
+ *
+ * Copyright 2013 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **/
+
+/*
+ * Authors:
+ *  Christian König 
+ *
+ */
+
+#ifndef _VDPAU_INTEROP_H_
+#define _VDPAU_INTEROP_H_
+
+/* driver specific functions for NV_vdpau_interop */
+
+#define VDP_FUNC_ID_BASE_DRIVER 0x2000
+#define VDP_FUNC_ID_VIDEO_SURFACE_GALLIUM (VDP_FUNC_ID_BASE_DRIVER + 0)
+#define VDP_FUNC_ID_OUTPUT_SURFACE_GALLIUM (VDP_FUNC_ID_BASE_DRIVER + 1)
+
+struct pipe_surface;
+struct pipe_video_buffer;
+
+typedef struct pipe_video_buffer *VdpVideoSurfaceGallium(uint32_t surface);
+typedef struct pipe_surface *VdpOutputSurfaceGallium(uint32_t surface);
+
+#endif /* _VDPAU_INTEROP_H_ */
diff --git a/src/gallium/state_trackers/vdpau/ftab.c 
b/src/gallium/state_trackers/vdpau/ftab.c
index 81d16ec..2c84554 100644
--- a/src/gallium/state_trackers/vdpau/ftab.c
+++ b/src/gallium/state_trackers/vdpau/ftab.c
@@ -26,6 +26,9 @@
  **/
 
 #include 
+
+#include "util/u_memory.h"
+
 #include "vdpau_private.h"
 
 static void* ftab[67] =
@@ -104,19 +107,31 @@ static void* ftab_winsys[1] =
&vlVdpPresentationQueueTargetCreateX11  /* 
VDP_FUNC_ID_PRESENTATION_QUEUE_TARGET_CREATE_X11 */
 };
 
+static void* ftab_driver[2] =
+{
+   &vlVdpVideoSurfaceGallium, /* VDP_FUNC_ID_SURFACE_GALLIUM */
+   &vlVdpOutputSurfaceGallium /* VDP_FUNC_ID_OUTPUT_SURFACE_GALLIUM */
+};
+
 boolean vlGetFuncFTAB(VdpFuncId function_id, void **func)
 {
assert(func);
+   *func = NULL;
+
if (function_id < VDP_FUNC_ID_BASE_WINSYS) {
-  if (function_id > 66)
- return FALSE;
-  *func = ftab[function_id];
-   }
-   else {
+  if (function_id < Elements(ftab))
+ *func = ftab[function_id];
+

Re: [Mesa-dev] [PATCH 0/4] RFC: gl_shader cache

2013-09-20 Thread Christoph Brill
2013/9/20 Tapani Pälli 

> Hello;
>
> This is RFC for shader cache implementation, goal is to optimize
> application
> startup time. The implementation is based on idea that Kenneth had on a
> Mesa
> branch for memory based cache. These patches implement a 'behind the
> scenes'
> cache for shader compiler storing gl_shader struct to the disk. Later on my
> goal is to implement OES_get_program_binary extension using some of this
> implementation as the basis.
>
> There are already number of TODO's and issues mentioned in commit msgs,
> please check these first. Then, there are probably bugs and missing things
> that I'm not aware of, please let me know about any of these. One big TODO
> is to rewrite the serialization to happen to memory area first (not
> directly
> to the disk), this will make serialization more robust with more
> centralized
> error handling but also makes it more usable for OES_get_program_binary
> implementation. I haven't paid much attention to the portability of this
> code
> so I would be interested to know also how well this would work with windows
> and others.
>
> One big TODO/goal is to move this all to happen after linking to cache
> all shaders in a gl program. This will give far better optimization for
> startup time. Currently rough approx 1/4 time is spent during compilation
> but much more after linking and optimizations done after linking (this is
> based on analysis done with particular app and callgrind). I have already
> somewhat promising results from this cache, for example compiling L4D2
> shaders and shader-db runs are faster..
>
> On my SNB desktop machine shader-db takes ~2.4secs without cache, with
> cache
> it takes ~1.8secs (~2.7 when cache gets generated) (values are avg from 100
> runs compiling all the shaders). Shader-db creates 266 blobs eating 46M of
> disk space when using cache.
>

This sounds that we will need a configure option to disable the cache on
platforms with
a low amount of disk space (thinking of phones/embedded devices here).


> Any comments greatly appreciated!
>
> Tapani Pälli (4):
>   glsl: add ir_cache class for IR serialization (WIP)
>   mesa: gl_shader_cache class (WIP)
>   glsl: export populate_symbol_table
>   glsl: use gl_shader_cache when compiling (WIP)
>
>  src/glsl/Makefile.sources |   2 +
>  src/glsl/glsl_parser_extras.cpp   |  40 ++
>  src/glsl/ir_cache.h   | 535 +
>  src/glsl/ir_cache_serialize.cpp   | 603 
>  src/glsl/ir_cache_unserialize.cpp | 969
> ++
>  src/glsl/linker.cpp   |   2 +-
>  src/glsl/linker.h |   3 +
>  src/mesa/Makefile.sources |   1 +
>  src/mesa/main/context.c   |   2 +
>  src/mesa/main/mtypes.h|   4 +
>  src/mesa/main/shadercache.cpp | 198 
>  src/mesa/main/shadercache.h   |  68 +++
>  12 files changed, 2426 insertions(+), 1 deletion(-)
>  create mode 100644 src/glsl/ir_cache.h
>  create mode 100644 src/glsl/ir_cache_serialize.cpp
>  create mode 100644 src/glsl/ir_cache_unserialize.cpp
>  create mode 100644 src/mesa/main/shadercache.cpp
>  create mode 100644 src/mesa/main/shadercache.h
>
> --
> 1.8.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-20 Thread Paul Berry
On 17 September 2013 19:54, Chia-I Wu  wrote:

> Hi Paul,
>
> On Mon, Sep 16, 2013 at 3:46 PM, Chia-I Wu  wrote:
> > On Sat, Sep 14, 2013 at 5:15 AM, Paul Berry 
> wrote:
> >> On 12 September 2013 22:06, Chia-I Wu  wrote:
> >>>
> >>> From: Chia-I Wu 
> >>>
> >>> Consider only the top-left and top-right pixels to approximate DDX in a
> >>> 2x2
> >>> subspan, unless the application or the user requests a more accurate
> >>> approximation.  This results in a less accurate approximation.
>  However,
> >>> it
> >>> improves the performance of Xonotic with Ultra settings by 24.3879% +/-
> >>> 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image
> quality
> >>> difference observed.
> >>>
> >>> No piglit gpu.tests regressions (tested with v1)
> >>>
> >>> I failed to come up with an explanation for the performance
> difference, as
> >>> the
> >>> change does not affect Ivy Bridge.  If anyone has the insight, please
> >>> kindly
> >>> enlighten me.  Performance differences may also be observed on other
> games
> >>> that call textureGrad and dFdx.
> >>>
> >>> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
> >>> Update
> >>> comments.
> >>
> >>
> >> I'm not entirely comfortable making a change that has a known negative
> >> impact on computational accuracy (even one that leads to such an
> impressive
> >> performance improvement) when we don't have any theories as to why the
> >> performance improvement happens, or why the improvement doesn't apply
> to Ivy
> >> Bridge.  In my experience, making changes to the codebase without
> >> understanding why they improve things almost always leads to
> improvements
> >> that are brittle, since it's likely that the true source of the
> improvement
> >> is a coincidence that will be wiped out by some future change (or won't
> be
> >> relevant to client programs other than this particular benchmark).
>  Having a
> >> theory as to why the performance improvement happens would help us be
> >> confident that we're applying the right fix under the right
> circumstances.
> > That is how I feel as I've mentioned.  I am really glad to have the
> > discussion.  I have done some experiments actually.  It is just that
> > those experiments only tell me what theories are likely to be wrong.
> > They could not tell me if a theory is right.
> Do the experiments make sense to you?  What other experiments do you
> want to see conducted?
>
> It could be hard to get direct proof without knowing the internal working..
>

Sorry for the slow reply.  We had some internal discussions with the
hardware architects about this, and it appears that the first theory is
correct: Haswell has an optimization in its sample_d processing which
allows it to assume that all pixels in a 2x2 subspan will resolve to the
same LOD provided that all the gradients in the 2x2 subspan are
sufficiently similar to each other.  There's a register called SAMPLER_MODE
which determines how similar the gradients have to be in order to trigger
the optimization.  It can be set to values between 0 and 0x1f, where 0 (the
default) means "only trigger the optimization if the gradients are exactly
equal" and 0x1f means "trigger the optimization as frequently as
possible".  Obviously triggering the optimization more often reduces the
quality of the rendered output slightly, because it forces all pixels
within a 2x2 subspan to sample from the same LOD.

We believe that setting this register to 0x1f should produce an equivalent
speed-up to your patch, without sacrificing the quality of d/dx when it is
used for other (non-sample_d) purposes.  This approach would have the
additional advantage that the benefit would apply to any shader that uses
the sample_d message, regardless of whether or not that shader uses d/dx
and d/dy to compute its gradients.

Would you mind trying this register to see if it produces an equivalent
performance benefit in both your micro-benchmark and Xonotic with Ultra
settings?  The register is located at address 07028h in register space MMIO
0/2/0.  When setting it, the upper 16 bits are a write mask, so to set the
register to 0 you would store 0x001f, and to set it to 0x1f you would
store 0x001f001f.

Since the SAMPLER_MODE setting allows us to trade off quality vs
performance, we're also interested to know whether a value less than 0x1f
is sufficient to produce the performance improvement in Xonotic--it would
be nice if we could find a "sweet spot" for this setting that produces the
performance improvement we need without sacrificing too much quality.

Finally, do you have any ability to see whether the Windows driver sets
this register, and if so what it sets it to?  That would provide some nice
confirmation that we aren't barking up the wrong tree here.


As a follow-up task, I'm planning to write a patch that improves the
quality of our d/dy calculation to be comparable to d/dx.  Based on our
current understanding of what's going on, I suspect that my patch may have
a s

Re: [Mesa-dev] [PATCH] implement NV_vdpau_interop v2

2013-09-20 Thread Marek Olšák
On Fri, Sep 20, 2013 at 4:34 PM, Christian König
 wrote:
> From: Christian König 
>
> v2: Actually implement interop between the gallium
> state tracker and the VDPAU backend.
>
> Signed-off-by: Christian König 
> ---
>  src/gallium/include/state_tracker/vdpau_interop.h |  49 
>  src/gallium/state_trackers/vdpau/ftab.c   |  31 ++-
>  src/gallium/state_trackers/vdpau/output.c |  11 +
>  src/gallium/state_trackers/vdpau/surface.c|  21 ++
>  src/gallium/state_trackers/vdpau/vdpau_private.h  |   6 +
>  src/mapi/glapi/gen/Makefile.am|   1 +
>  src/mapi/glapi/gen/NV_vdpau_interop.xml   |  69 ++
>  src/mapi/glapi/gen/gl_API.xml |   2 +
>  src/mapi/glapi/gen/gl_genexec.py  |   1 +
>  src/mesa/Makefile.sources |   4 +-
>  src/mesa/main/dd.h|  14 ++
>  src/mesa/main/extensions.c|   1 +
>  src/mesa/main/mtypes.h|   9 +
>  src/mesa/main/vdpau.c | 279 
> ++
>  src/mesa/main/vdpau.h |  72 ++
>  src/mesa/state_tracker/st_context.c   |   3 +
>  src/mesa/state_tracker/st_extensions.c|   1 +
>  src/mesa/state_tracker/st_vdpau.c | 170 +
>  src/mesa/state_tracker/st_vdpau.h |  42 
>  19 files changed, 777 insertions(+), 9 deletions(-)
>  create mode 100644 src/gallium/include/state_tracker/vdpau_interop.h
>  create mode 100644 src/mapi/glapi/gen/NV_vdpau_interop.xml
>  create mode 100644 src/mesa/main/vdpau.c
>  create mode 100644 src/mesa/main/vdpau.h
>  create mode 100644 src/mesa/state_tracker/st_vdpau.c
>  create mode 100644 src/mesa/state_tracker/st_vdpau.h
>
> diff --git a/src/gallium/include/state_tracker/vdpau_interop.h 
> b/src/gallium/include/state_tracker/vdpau_interop.h
> new file mode 100644
> index 000..4069d20
> --- /dev/null
> +++ b/src/gallium/include/state_tracker/vdpau_interop.h
> @@ -0,0 +1,49 @@
> +/**
> + *
> + * Copyright 2013 Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
> + * IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR
> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + *
> + **/
> +
> +/*
> + * Authors:
> + *  Christian König 
> + *
> + */
> +
> +#ifndef _VDPAU_INTEROP_H_
> +#define _VDPAU_INTEROP_H_
> +
> +/* driver specific functions for NV_vdpau_interop */
> +
> +#define VDP_FUNC_ID_BASE_DRIVER 0x2000
> +#define VDP_FUNC_ID_VIDEO_SURFACE_GALLIUM (VDP_FUNC_ID_BASE_DRIVER + 0)
> +#define VDP_FUNC_ID_OUTPUT_SURFACE_GALLIUM (VDP_FUNC_ID_BASE_DRIVER + 1)
> +
> +struct pipe_surface;
> +struct pipe_video_buffer;
> +
> +typedef struct pipe_video_buffer *VdpVideoSurfaceGallium(uint32_t surface);
> +typedef struct pipe_surface *VdpOutputSurfaceGallium(uint32_t surface);
> +
> +#endif /* _VDPAU_INTEROP_H_ */
> diff --git a/src/gallium/state_trackers/vdpau/ftab.c 
> b/src/gallium/state_trackers/vdpau/ftab.c
> index 81d16ec..2c84554 100644
> --- a/src/gallium/state_trackers/vdpau/ftab.c
> +++ b/src/gallium/state_trackers/vdpau/ftab.c
> @@ -26,6 +26,9 @@
>   **/
>
>  #include 
> +
> +#include "util/u_memory.h"
> +
>  #include "vdpau_private.h"
>
>  static void* ftab[67] =
> @@ -104,19 +107,31 @@ static void* ftab_winsys[1] =
> &vlVdpPresentationQueueTargetCreateX11  /* 
> VDP_FUNC_ID_PRESENTATION_QUEUE_TARGET_CREATE_X11 */
>  };
>
> +static void* ftab_driver[2] =
> +{
> +   &vlVdpVideoSurfaceGallium, /* VDP_FUNC_ID_SURFACE_GALLIUM */
> +   &vlVdpOutputSurfaceGallium /* VDP_FUNC_ID_OUTPUT_SURFACE_GALLIUM */
> +};
> +
>  boolean vlGetFuncFTAB(VdpFuncId function_id, vo

Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-20 Thread Ian Romanick
On 09/20/2013 09:50 AM, Paul Berry wrote:
> On 17 September 2013 19:54, Chia-I Wu  > wrote:
> 
> Hi Paul,
> 
> On Mon, Sep 16, 2013 at 3:46 PM, Chia-I Wu  > wrote:
> > On Sat, Sep 14, 2013 at 5:15 AM, Paul Berry
> mailto:stereotype...@gmail.com>> wrote:
> >> On 12 September 2013 22:06, Chia-I Wu  > wrote:
> >>>
> >>> From: Chia-I Wu mailto:o...@lunarg.com>>
> >>>
> >>> Consider only the top-left and top-right pixels to approximate
> DDX in a
> >>> 2x2
> >>> subspan, unless the application or the user requests a more accurate
> >>> approximation.  This results in a less accurate approximation.
>  However,
> >>> it
> >>> improves the performance of Xonotic with Ultra settings by
> 24.3879% +/-
> >>> 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image
> quality
> >>> difference observed.
> >>>
> >>> No piglit gpu.tests regressions (tested with v1)
> >>>
> >>> I failed to come up with an explanation for the performance
> difference, as
> >>> the
> >>> change does not affect Ivy Bridge.  If anyone has the insight,
> please
> >>> kindly
> >>> enlighten me.  Performance differences may also be observed on
> other games
> >>> that call textureGrad and dFdx.
> >>>
> >>> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
> >>> Update
> >>> comments.
> >>
> >>
> >> I'm not entirely comfortable making a change that has a known
> negative
> >> impact on computational accuracy (even one that leads to such an
> impressive
> >> performance improvement) when we don't have any theories as to
> why the
> >> performance improvement happens, or why the improvement doesn't
> apply to Ivy
> >> Bridge.  In my experience, making changes to the codebase without
> >> understanding why they improve things almost always leads to
> improvements
> >> that are brittle, since it's likely that the true source of the
> improvement
> >> is a coincidence that will be wiped out by some future change (or
> won't be
> >> relevant to client programs other than this particular
> benchmark).  Having a
> >> theory as to why the performance improvement happens would help us be
> >> confident that we're applying the right fix under the right
> circumstances.
> > That is how I feel as I've mentioned.  I am really glad to have the
> > discussion.  I have done some experiments actually.  It is just that
> > those experiments only tell me what theories are likely to be wrong.
> > They could not tell me if a theory is right.
> Do the experiments make sense to you?  What other experiments do you
> want to see conducted?
> 
> It could be hard to get direct proof without knowing the internal
> working..
> 
> 
> Sorry for the slow reply.  We had some internal discussions with the
> hardware architects about this, and it appears that the first theory is
> correct: Haswell has an optimization in its sample_d processing which
> allows it to assume that all pixels in a 2x2 subspan will resolve to the
> same LOD provided that all the gradients in the 2x2 subspan are
> sufficiently similar to each other.  There's a register called
> SAMPLER_MODE which determines how similar the gradients have to be in
> order to trigger the optimization.  It can be set to values between 0
> and 0x1f, where 0 (the default) means "only trigger the optimization if
> the gradients are exactly equal" and 0x1f means "trigger the
> optimization as frequently as possible".  Obviously triggering the
> optimization more often reduces the quality of the rendered output
> slightly, because it forces all pixels within a 2x2 subspan to sample
> from the same LOD.
> 
> We believe that setting this register to 0x1f should produce an
> equivalent speed-up to your patch, without sacrificing the quality of
> d/dx when it is used for other (non-sample_d) purposes.  This approach
> would have the additional advantage that the benefit would apply to any
> shader that uses the sample_d message, regardless of whether or not that
> shader uses d/dx and d/dy to compute its gradients.
> 
> Would you mind trying this register to see if it produces an equivalent
> performance benefit in both your micro-benchmark and Xonotic with Ultra
> settings?  The register is located at address 07028h in register space
> MMIO 0/2/0.  When setting it, the upper 16 bits are a write mask, so to
> set the register to 0 you would store 0x001f, and to set it to 0x1f
> you would store 0x001f001f.
> 
> Since the SAMPLER_MODE setting allows us to trade off quality vs
> performance, we're also interested to know whether a value less than
> 0x1f is sufficient to produce the performance improvement in Xonotic--it
> would be nice if we could find a "sweet

Re: [Mesa-dev] [PATCH] llvmpipe: Fix rendering to PIPE_FORMAT_R10G10B10A2_UNORM.

2013-09-20 Thread Roland Scheidegger
Am 20.09.2013 13:59, schrieb jfons...@vmware.com:
> From: José Fonseca 
> 
> We must take rounding in consideration when re-scaling to narrow
> normalized channels, such as 2-bit normalized alpha.
> ---
>  src/gallium/drivers/llvmpipe/lp_state_fs.c | 84 
> +++---
>  1 file changed, 78 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
> b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> index 07e6685..875a3cf 100644
> --- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
> +++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> @@ -876,7 +876,22 @@ lp_blend_type_from_format_desc(const struct 
> util_format_description *format_desc
>  
>  
>  /**
> - * Scale a normalized value from src_bits to dst_bits
> + * Scale a normalized value from src_bits to dst_bits.
> + *
> + * The exact calculation is
> + *
> + *dst = iround(src * dst_mask / src_mask)
> + *
> + *  or with integer rounding
> + *
> + *dst = src * (2*dst_mask + sign(src)*src_mask) / (2*src_mask)
> + *
> + *  where
> + *
> + *src_mask = (1 << src_bits) - 1
> + *dst_mask = (1 << dst_bits) - 1
> + *
> + * but we try to avoid division and multiplication through shifts.
>   */
>  static INLINE LLVMValueRef
>  scale_bits(struct gallivm_state *gallivm,
> @@ -889,11 +904,68 @@ scale_bits(struct gallivm_state *gallivm,
> LLVMValueRef result = src;
>  
> if (dst_bits < src_bits) {
> -  /* Scale down by LShr */
> -  result = LLVMBuildLShr(builder,
> - src,
> - lp_build_const_int_vec(gallivm, src_type, 
> src_bits - dst_bits),
> - "");
> +  int delta_bits = src_bits - dst_bits;
> +
> +  if (delta_bits <= dst_bits) {
> + /*
> +  * Approximate the rescaling with a single shift.
> +  *
> +  * This gives the wrong rounding.
> +  */
> +
> + result = LLVMBuildLShr(builder,
> +src,
> +lp_build_const_int_vec(gallivm, src_type, 
> delta_bits),
> +"");
> +
> +  } else {
> + /*
> +  * Try more accurate rescaling.
> +  */
> +
> + /*
> +  * Drop the least significant bits to make space for the 
> multiplication.
> +  *
> +  * XXX: A better approach would be to use a wider integer type as 
> intermediate.  But
> +  * this is enough to convert alpha from 16bits -> 2 when rendering 
> to
> +  * PIPE_FORMAT_R10G10B10A2_UNORM.
> +  */
> + result = LLVMBuildLShr(builder,
> +src,
> +lp_build_const_int_vec(gallivm, src_type, 
> dst_bits),
> +"");
> +
> +
> + result = LLVMBuildMul(builder,
> +   result,
> +   lp_build_const_int_vec(gallivm, src_type, 
> (1LL << dst_bits) - 1),
> +   "");
> +
> + /*
> +  * Add a rounding term before the division.
> +  *
> +  * TODO: Handle signed integers too.
> +  */
> + if (!src_type.sign) {
> +result = LLVMBuildAdd(builder,
> +  result,
> +  lp_build_const_int_vec(gallivm, src_type, 
> (1LL << (delta_bits - 1))),
> +  "");
> + }
> +
> + /*
> +  * Approximate the division by src_mask with a src_bits shift.
> +  *
> +  * Given the src has already been shifted by dst_bits, all we need
> +  * to do is to shift by the difference.
> +  */
> +
> + result = LLVMBuildLShr(builder,
> +result,
> +lp_build_const_int_vec(gallivm, src_type, 
> delta_bits),
> +"");
> +  }
> +
> } else if (dst_bits > src_bits) {
>/* Scale up bits */
>int db = dst_bits - src_bits;
> 

Looks good to me, though that up/downscaling for A2 looks a bit
complicated but we can't avoid it.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: remove stale comment

2013-09-20 Thread Dylan Noblesmith
This line stopped making sense in the great sed
replace of commit f9995b30756140724f41daf963fa06167912be7f
---
 src/mesa/main/mtypes.h |2 --
 1 file changed, 2 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 6d700ec..3c8b018 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3475,8 +3475,6 @@ struct gl_uniform_buffer_binding
  * OpenGL state is contained in this structure.
  * Think of this as a base class from which device drivers will derive
  * sub classes.
- *
- * The struct gl_context typedef names this structure.
  */
 struct gl_context
 {
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] mesa: remove outdated comment

2013-09-20 Thread Dylan Noblesmith
No such argument exists since this commit:

commit 92f3fca0ea429dcf07123e63447449db53308266
Author: Ian Romanick 
AuthorDate: Sun Aug 21 17:23:58 2011 -0700
Commit: Ian Romanick 
CommitDate: Tue Aug 23 14:52:09 2011 -0700

mesa: Remove target parameter from dd_function_table::BufferSubData
---
 src/mesa/main/bufferobj.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
index b22340f..312ffb7 100644
--- a/src/mesa/main/bufferobj.c
+++ b/src/mesa/main/bufferobj.c
@@ -440,7 +440,6 @@ _mesa_buffer_data( struct gl_context *ctx, GLenum
target, GLsizeiptrARB size,
  * Note that all GL error checking will have been done already.
  *
  * \param ctx GL context.
- * \param target  Buffer object target on which to operate.
  * \param offset  Offset of the first byte to be modified.
  * \param sizeSize, in bytes, of the data range.
  * \param dataPointer to the data to store in the buffer object.
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] mesa: remove handcounted magic number

2013-09-20 Thread Dylan Noblesmith
Also make it a compile-time error with STATIC_ASSERT.
---
 src/mesa/main/teximage.c |   33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index b719fc8..7a1d808 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -661,22 +661,29 @@ _mesa_delete_texture_image(struct gl_context *ctx,
 GLboolean
 _mesa_is_proxy_texture(GLenum target)
 {
+   unsigned i;
+   static const GLenum targets[] = {
+  GL_PROXY_TEXTURE_1D,
+  GL_PROXY_TEXTURE_2D,
+  GL_PROXY_TEXTURE_3D,
+  GL_PROXY_TEXTURE_CUBE_MAP,
+  GL_PROXY_TEXTURE_RECTANGLE,
+  GL_PROXY_TEXTURE_1D_ARRAY,
+  GL_PROXY_TEXTURE_2D_ARRAY,
+  GL_PROXY_TEXTURE_CUBE_MAP_ARRAY,
+  GL_PROXY_TEXTURE_2D_MULTISAMPLE,
+  GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY
+   };
/*
-* NUM_TEXTURE_TARGETS should match number of terms below, except
there's no
+* NUM_TEXTURE_TARGETS should match number of terms above, except
there's no
 * proxy for GL_TEXTURE_BUFFER and GL_TEXTURE_EXTERNAL_OES.
 */
-   assert(NUM_TEXTURE_TARGETS == 10 + 2);
-
-   return (target == GL_PROXY_TEXTURE_1D ||
-   target == GL_PROXY_TEXTURE_2D ||
-   target == GL_PROXY_TEXTURE_3D ||
-   target == GL_PROXY_TEXTURE_CUBE_MAP_ARB ||
-   target == GL_PROXY_TEXTURE_RECTANGLE_NV ||
-   target == GL_PROXY_TEXTURE_1D_ARRAY_EXT ||
-   target == GL_PROXY_TEXTURE_2D_ARRAY_EXT ||
-   target == GL_PROXY_TEXTURE_CUBE_MAP_ARRAY ||
-   target == GL_PROXY_TEXTURE_2D_MULTISAMPLE ||
-   target == GL_PROXY_TEXTURE_2D_MULTISAMPLE_ARRAY);
+   STATIC_ASSERT(NUM_TEXTURE_TARGETS == Elements(targets) + 2);
+
+   for (i = 0; i < Elements(targets); ++i)
+  if (target == targets[i])
+ return GL_TRUE;
+   return GL_FALSE;
 }


-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH] i965/fs: Improve accuracy of dFdy().

2013-09-20 Thread Paul Berry
Previously, we computed dFdy() using the following instruction:

  add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q }

That had the disadvantage that it computed the same value for all 4
pixels of a 2x2 subspan, which meant that it was less accurate than
dFdx().  This patch changes it to the following instruction:

  add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }

Which has comparable accuracy to dFdx().

Unfortunately, for some reason the SIMD16 version of this instruction:

  add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H }

Doesn't seem to work reliably (presumably the hardware designers never
validated the combination of align16 mode with compressed
instructions), so we unroll it to:

  add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
  add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q }
---

I'm sending this patch out as an RFC because it increases instruction
count on SIMD16 shaders.  I believe it's worth it to get the increased
accuracy.  Also I believe the cost is minimal, since we're replacing a
single add(16) with two add(8)'s, so the total number of instructions
dispatched by the EU is the same.  But I'd be interested in hearing
others' opinions.

Also, we shouldn't land this patch until we've come to a resolution on
"i965/hsw: compute DDX in a subspan based only on top row".

I'll be following up shortly with a piglit test that demonstrates the
accuracy improvement.

 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 34 +++---
 src/mesa/drivers/dri/i965/brw_reg.h|  1 +
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 7ce42c4..7cb1f45 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -556,10 +556,8 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  *
  * For DDX, it ends up being easy: width = 2, horiz=0 gets us the same result
  * for each pair, and vertstride = 2 jumps us 2 elements after processing a
- * pair. But for DDY, it's harder, as we want to produce the pairs swizzled
- * between each other.  We could probably do it like ddx and swizzle the right
- * order later, but bail for now and just produce
- * ((ss0.tl - ss0.bl)x4 (ss1.tl - ss1.bl)x4)
+ * pair.  For DDY, we need to use ALIGN16 mode since it's capable of doing the
+ * appropriate swizzling.
  */
 void
 fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg 
src)
@@ -591,18 +589,36 @@ fs_generator::generate_ddy(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
 BRW_REGISTER_TYPE_F,
 BRW_VERTICAL_STRIDE_4,
 BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
-   struct brw_reg src1 = brw_reg(src.file, src.nr, 2,
+BRW_HORIZONTAL_STRIDE_1,
+BRW_SWIZZLE_XYXY, WRITEMASK_XYZW);
+   struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
 BRW_REGISTER_TYPE_F,
 BRW_VERTICAL_STRIDE_4,
 BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+BRW_HORIZONTAL_STRIDE_1,
+BRW_SWIZZLE_ZWZW, WRITEMASK_XYZW);
+   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_16);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
if (negate_value)
   brw_ADD(p, dst, src1, negate(src0));
else
   brw_ADD(p, dst, src0, negate(src1));
+   if (dispatch_width == 16) {
+  /* For some reason, instruction compression doesn't seem to work
+   * properly with ALIGN16 mode, so when doing a 16-wide dispatch, just
+   * manually unroll to two instructions.
+   */
+  brw_set_compression_control(p, BRW_COMPRESSION_2NDHALF);
+  src0 = sechalf(src0);
+  src1 = sechalf(src1);
+  dst = sechalf(dst);
+  if (negate_value)
+ brw_ADD(p, dst, src1, negate(src0));
+  else
+ brw_ADD(p, dst, src0, negate(src1));
+   }
+   brw_pop_insn_state(p);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 6df3366..3ee3543 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -77,6 +77,7 @@ extern "C" {
 #define BRW_SWIZZLE_  BRW_SWIZZLE4(2,2,2,2)
 #define BRW_SWIZZLE_  BRW_SWIZZLE4(3,3,3,3)
 #define BRW_SWIZZLE_XYXY  BRW_SWIZZLE4(0,1,0,1)
+#define BRW_SWIZZLE_ZWZW  BRW_SWIZZLE4(2,3,2,3)
 
 static inline bool
 brw_is_single_value_swizzle(int swiz)
-- 
1.8.4

___

Re: [Mesa-dev] [PATCH] i965/gen7.5: Fix missing Shader Channel Select entries on Haswell

2013-09-20 Thread Kenneth Graunke
On 09/20/2013 03:56 AM, Abdiel Janulgue wrote:
> Probably non-intentional, but the SURFACE_STATE setup refactoring
> for buffer surfaces had missed the scs bits when creating constant
> surface states.
> 
> Fixes broken GLB 2.5 on Haswell where the knight's textures are missing
> 
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 6938b1a..7571cbf 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -249,6 +249,13 @@ gen7_emit_buffer_surface_state(struct brw_context *brw,
>  
> surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
>  
> +   if (brw->is_haswell) {
> +  surf[7] |= (SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
> +  SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
> +  SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
> +  SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A));
> +   }
> +
> /* Emit relocation to surface contents */
> if (bo) {
>drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> 

Crud, I must've forgotten to test this.  Ah, and it probably matters
because we do pull constants via the sampler, rather than the data port.

Thanks for the fix.

Reviewed-by: Kenneth Graunke 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-20 Thread Kenneth Graunke
On 09/20/2013 08:30 AM, Ian Romanick wrote:
> On 09/20/2013 09:50 AM, Paul Berry wrote:
[snip]
>> Since the SAMPLER_MODE setting allows us to trade off quality vs
>> performance, we're also interested to know whether a value less than
>> 0x1f is sufficient to produce the performance improvement in Xonotic--it
>> would be nice if we could find a "sweet spot" for this setting that
>> produces the performance improvement we need without sacrificing too
>> much quality.
> 
> How about if we just give a driconf option to adjust it.  Then gamers
> can make their own choice.  For applications where it know it makes a
> big difference, we can provide a default non-0 value in the system driconf.

Because you can't (yet) program registers from userspace unless you're root.

I would like to use the same default value as Windows.  I'm fine with
making it tunable beyond that.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] implement NV_vdpau_interop v2

2013-09-20 Thread Christian König

Am 20.09.2013 17:39, schrieb Marek Olšák:

On Fri, Sep 20, 2013 at 4:34 PM, Christian König
 wrote:

[SNIP]

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 34615e3..7070812 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -332,6 +332,7 @@ static const struct extension extension_table[] = {
 { "GL_NV_texture_barrier",  o(NV_texture_barrier), 
 GL, 2009 },
 { "GL_NV_texture_env_combine4", 
o(NV_texture_env_combine4), GLL,1999 },
 { "GL_NV_texture_rectangle",o(NV_texture_rectangle),   
 GLL,2000 },
+   { "GL_NV_vdpau_interop",o(NV_vdpau_interop),
GLL,2010 },

GLL means the extension will only be exposed in a legacy context
(GL<=3.0). I think you meant GL instead.


Ah, thx. I already wondered what the difference is.

Any more comments? I never worked with that part of the stack, so the 
code is completely new to me.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: remove stale comment

2013-09-20 Thread Ian Romanick
How did you send these patches?  They came formatted... strangely.
Saving the e-mail as a file an using git-am resulted in a commit message:

commit 97351aa26b74c33a102af8bf1e6bcfcd706d7a10
Author: Dylan Noblesmith 
Date:   Fri Sep 20 11:55:19 2013 -0400

mesa: remove stale comment

--001a11c23df6b47ea604e6d2b370
Content-Type: text/plain; charset=ISO-8859-1

This line stopped making sense in the great sed
replace of commit f9995b30756140724f41daf963fa06167912be7f

On 09/20/2013 10:55 AM, Dylan Noblesmith wrote:
> 
> This line stopped making sense in the great sed
> replace of commit f9995b30756140724f41daf963fa06167912be7f
> ---
>  src/mesa/main/mtypes.h |2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 6d700ec..3c8b018 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3475,8 +3475,6 @@ struct gl_uniform_buffer_binding
>   * OpenGL state is contained in this structure.
>   * Think of this as a base class from which device drivers will derive
>   * sub classes.
> - *
> - * The struct gl_context typedef names this structure.
>   */
>  struct gl_context
>  {
> -- 
> 1.7.9.5
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: Add core support for the GL_AMD_performance_monitor extension.

2013-09-20 Thread Kenneth Graunke
On 09/20/2013 07:55 AM, Brian Paul wrote:
> On Thu, Sep 19, 2013 at 5:27 PM, Kenneth Graunke  
> wrote:
[snip]
>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>> index 6d700ec..70dba6e 100644
>> --- a/src/mesa/main/mtypes.h
>> +++ b/src/mesa/main/mtypes.h
>> @@ -1778,6 +1778,89 @@ struct gl_transform_feedback_state
>>
>>
>>  /**
>> + * A "performance monitor" as described in AMD_performance_monitor.
>> + */
>> +struct gl_perf_monitor_object
>> +{
>> +   GLboolean Active;
>> +
>> +   /**
>> +* A list of groups with currently active counters.
>> +*
>> +* ActiveGroups[g] == n if there are n counters active from group 'g'.
>> +*/
>> +   unsigned *ActiveGroups;
> 
> GLuint?
> 

I chose 'unsigned' over 'GLuint' here since it's only used internally,
and never exposed by the API.  But I can switch if you prefer that.

>> +
>> +   /**
>> +* An array of bitsets, subscripted by group ID, then indexed by counter 
>> ID.
>> +*
>> +* Checking whether counter 'c' in group 'g' is active can be done via:
>> +*
>> +*BITSET_TEST(ActiveCounters[g], c)
>> +*/
>> +   GLuint **ActiveCounters;
> 
> GLbitfield?

The type here is actually BITSET_WORD (from bitset.h), which is a
#define for GLuint.  Including bitset.h from mtypes.h led to all kinds
of problems, so I just used GLuint.

It seems like we could do something better, but I'm not sure what.

> 
>> +};
>> +
>> +
>> +union gl_perf_monitor_counter_value
>> +{
>> +   float f;
>> +   uint64_t u64;
>> +   uint32_t u32;
>> +};
>> +
>> +
>> +struct gl_perf_monitor_counter
>> +{
>> +   /** Human readable name for the counter. */
>> +   const char *Name;
>> +
>> +   /**
>> +* Data type of the counter.  Valid values are FLOAT, UNSIGNED_INT,
>> +* UNSIGNED_INT64_AMD, and PERCENTAGE_AMD.
>> +*/
>> +   GLenum Type;
>> +
>> +   /** Minimum counter value. */
>> +   union gl_perf_monitor_counter_value Minimum;
>> +
>> +   /** Maximum counter value. */
>> +   union gl_perf_monitor_counter_value Maximum;
>> +};
>> +
>> +
>> +struct gl_perf_monitor_group
>> +{
>> +   /** Human readable name for the group. */
>> +   const char *Name;
>> +
>> +   /**
>> +* Maximum number of counters in this group which can be active at the
>> +* same time.
>> +*/
>> +   GLint MaxActiveCounters;
> 
> GLuint?

That would make sense, but for some reason the AMD_performance_monitor
extension exposes this value as a GLint:

void GetPerfMonitorCountersAMD(uint group, int *numCounters,
   int *maxActiveCounters, sizei countersSize,
   uint *counters)

I figured it should probably match...

>> +
>> +   /** Array of counters within this group. */
>> +   const struct gl_perf_monitor_counter *Counters;
>> +   GLint NumCounters;
>> +};
>> +
>> +
>> +/**
>> + * Context state for AMD_performance_monitor.
>> + */
>> +struct gl_perf_monitor_state
>> +{
>> +   /** Array of performance monitor groups (indexed by group ID) */
>> +   const struct gl_perf_monitor_group *Groups;
>> +   GLint NumGroups;
> 
> GLuint?

Likewise, the extension exposes this as a GLint:

void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize,
 uint *groups)

I don't know why...GLuint would have made more sense.  Of course, nobody
is going to have enough groups for it to make a difference :)

> 
>> +
>> +   /** The table of all performance monitors. */
>> +   struct _mesa_HashTable *Monitors;
>> +};
>> +
>> +
>> +/**
>>   * Names of the various vertex/fragment program register files, etc.
>>   *
>>   * NOTE: first four tokens must fit into 2 bits (see t_vb_arbprogram.c)
>> @@ -3153,6 +3236,7 @@ struct gl_extensions
>> GLboolean EXT_vertex_array_bgra;
>> GLboolean OES_standard_derivatives;
>> /* vendor extensions */
>> +   GLboolean AMD_performance_monitor;
>> GLboolean AMD_seamless_cubemap_per_texture;
>> GLboolean AMD_vertex_shader_layer;
>> GLboolean APPLE_object_purgeable;
>> @@ -3618,6 +3702,8 @@ struct gl_context
>>
>> struct gl_transform_feedback_state TransformFeedback;
>>
>> +   struct gl_perf_monitor_state PerfMonitor;
>> +
>> struct gl_buffer_object *CopyReadBuffer; /**< GL_ARB_copy_buffer */
>> struct gl_buffer_object *CopyWriteBuffer; /**< GL_ARB_copy_buffer */
>>
>> diff --git a/src/mesa/main/performance_monitor.c 
>> b/src/mesa/main/performance_monitor.c
>> new file mode 100644
>> index 000..0782219
>> --- /dev/null
>> +++ b/src/mesa/main/performance_monitor.c
>> @@ -0,0 +1,606 @@
>> +/*
>> + * Copyright © 2012 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the

[Mesa-dev] XDC schedule published

2013-09-20 Thread Ian Romanick
"Final" schedule is now available:

http://www.x.org/wiki/Events/XDC2013/Program/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: Add core support for the GL_AMD_performance_monitor extension.

2013-09-20 Thread Ian Romanick
On 09/20/2013 04:42 PM, Kenneth Graunke wrote:
> On 09/20/2013 07:55 AM, Brian Paul wrote:
>> On Thu, Sep 19, 2013 at 5:27 PM, Kenneth Graunke  
>> wrote:
> [snip]
>>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>>> index 6d700ec..70dba6e 100644
>>> --- a/src/mesa/main/mtypes.h
>>> +++ b/src/mesa/main/mtypes.h
>>> @@ -1778,6 +1778,89 @@ struct gl_transform_feedback_state
>>>
>>>
>>>  /**
>>> + * A "performance monitor" as described in AMD_performance_monitor.
>>> + */
>>> +struct gl_perf_monitor_object
>>> +{
>>> +   GLboolean Active;
>>> +
>>> +   /**
>>> +* A list of groups with currently active counters.
>>> +*
>>> +* ActiveGroups[g] == n if there are n counters active from group 'g'.
>>> +*/
>>> +   unsigned *ActiveGroups;
>>
>> GLuint?
>>
> 
> I chose 'unsigned' over 'GLuint' here since it's only used internally,
> and never exposed by the API.  But I can switch if you prefer that.
> 
>>> +
>>> +   /**
>>> +* An array of bitsets, subscripted by group ID, then indexed by 
>>> counter ID.
>>> +*
>>> +* Checking whether counter 'c' in group 'g' is active can be done via:
>>> +*
>>> +*BITSET_TEST(ActiveCounters[g], c)
>>> +*/
>>> +   GLuint **ActiveCounters;
>>
>> GLbitfield?
> 
> The type here is actually BITSET_WORD (from bitset.h), which is a
> #define for GLuint.  Including bitset.h from mtypes.h led to all kinds
> of problems, so I just used GLuint.
> 
> It seems like we could do something better, but I'm not sure what.

Could we make BITSET_WORD be uint32_t?  I suspect that's a type from
before we had our own inttypes.h...

>>> +};
>>> +
>>> +
>>> +union gl_perf_monitor_counter_value
>>> +{
>>> +   float f;
>>> +   uint64_t u64;
>>> +   uint32_t u32;
>>> +};
>>> +
>>> +
>>> +struct gl_perf_monitor_counter
>>> +{
>>> +   /** Human readable name for the counter. */
>>> +   const char *Name;
>>> +
>>> +   /**
>>> +* Data type of the counter.  Valid values are FLOAT, UNSIGNED_INT,
>>> +* UNSIGNED_INT64_AMD, and PERCENTAGE_AMD.
>>> +*/
>>> +   GLenum Type;
>>> +
>>> +   /** Minimum counter value. */
>>> +   union gl_perf_monitor_counter_value Minimum;
>>> +
>>> +   /** Maximum counter value. */
>>> +   union gl_perf_monitor_counter_value Maximum;
>>> +};
>>> +
>>> +
>>> +struct gl_perf_monitor_group
>>> +{
>>> +   /** Human readable name for the group. */
>>> +   const char *Name;
>>> +
>>> +   /**
>>> +* Maximum number of counters in this group which can be active at the
>>> +* same time.
>>> +*/
>>> +   GLint MaxActiveCounters;
>>
>> GLuint?
> 
> That would make sense, but for some reason the AMD_performance_monitor
> extension exposes this value as a GLint:
> 
> void GetPerfMonitorCountersAMD(uint group, int *numCounters,
>int *maxActiveCounters, sizei countersSize,
>uint *counters)
> 
> I figured it should probably match...
> 
>>> +
>>> +   /** Array of counters within this group. */
>>> +   const struct gl_perf_monitor_counter *Counters;
>>> +   GLint NumCounters;
>>> +};
>>> +
>>> +
>>> +/**
>>> + * Context state for AMD_performance_monitor.
>>> + */
>>> +struct gl_perf_monitor_state
>>> +{
>>> +   /** Array of performance monitor groups (indexed by group ID) */
>>> +   const struct gl_perf_monitor_group *Groups;
>>> +   GLint NumGroups;
>>
>> GLuint?
> 
> Likewise, the extension exposes this as a GLint:
> 
> void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize,
>  uint *groups)
> 
> I don't know why...GLuint would have made more sense.  Of course, nobody
> is going to have enough groups for it to make a difference :)
> 
>>
>>> +
>>> +   /** The table of all performance monitors. */
>>> +   struct _mesa_HashTable *Monitors;
>>> +};
>>> +
>>> +
>>> +/**
>>>   * Names of the various vertex/fragment program register files, etc.
>>>   *
>>>   * NOTE: first four tokens must fit into 2 bits (see t_vb_arbprogram.c)
>>> @@ -3153,6 +3236,7 @@ struct gl_extensions
>>> GLboolean EXT_vertex_array_bgra;
>>> GLboolean OES_standard_derivatives;
>>> /* vendor extensions */
>>> +   GLboolean AMD_performance_monitor;
>>> GLboolean AMD_seamless_cubemap_per_texture;
>>> GLboolean AMD_vertex_shader_layer;
>>> GLboolean APPLE_object_purgeable;
>>> @@ -3618,6 +3702,8 @@ struct gl_context
>>>
>>> struct gl_transform_feedback_state TransformFeedback;
>>>
>>> +   struct gl_perf_monitor_state PerfMonitor;
>>> +
>>> struct gl_buffer_object *CopyReadBuffer; /**< GL_ARB_copy_buffer */
>>> struct gl_buffer_object *CopyWriteBuffer; /**< GL_ARB_copy_buffer */
>>>
>>> diff --git a/src/mesa/main/performance_monitor.c 
>>> b/src/mesa/main/performance_monitor.c
>>> new file mode 100644
>>> index 000..0782219
>>> --- /dev/null
>>> +++ b/src/mesa/main/performance_monitor.c
>>> @@ -0,0 +1,606 @@
>>> +/*
>>> + * Copyright © 2012 Intel Corporation
>>> + *
>>> + * Permission is hereby granted, free of charge, to any pe

[Mesa-dev] [Bug 57018] syntax error near unexpected token `[2.4.2]'

2013-09-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=57018

--- Comment #3 from Antonio Alecrim Jr  ---
git sha1: 1569b3e536da9337a28a16d0cc6ed07043bf094b

To reproduce this "bug":
mesa> ./autogen.sh
...
src/egl/drivers/dri2/Makefile.am:37: error: Libtool library used but 'LIBTOOL'
is undefined
...
autoreconf: automake failed with exit status: 1
mesa> ./configure
...
./configure: line 5129: syntax error near unexpected token `2.2'
./configure: line 5129: `LT_PREREQ(2.2)'

To fix this "bug" I commented these problematic lines and installed all
required packages.
To fix "autogen.sh" error I installed "Libtool":
> git clean -xfd
> ./autogen.sh
> ./configure

And everything works fine.
Maybe is just an environment problem.

Can you check if this solves your problem?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Define isnormal and copysign for MSVC to fix build.

2013-09-20 Thread Vinson Lee
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69541
Signed-off-by: Vinson Lee 
---
 src/glsl/ir_constant_expression.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/glsl/ir_constant_expression.cpp 
b/src/glsl/ir_constant_expression.cpp
index 4579ef2..805e1f3 100644
--- a/src/glsl/ir_constant_expression.cpp
+++ b/src/glsl/ir_constant_expression.cpp
@@ -40,6 +40,18 @@
 #include "glsl_types.h"
 #include "program/hash_table.h"
 
+#if defined(_MSC_VER) && (_MSC_VER < 1800)
+static int isnormal(double x) {
+   return _fpclass(x) == _FPCLASS_NN || _fpclass(x) == _FPCLASS_PN;
+}
+#endif
+
+#if defined(_MSC_VER)
+static double copysign(double x, double y) {
+   return _copysign(x, y);
+}
+#endif
+
 static float
 dot(ir_constant *op0, ir_constant *op1)
 {
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Define isnormal and copysign for MSVC to fix build.

2013-09-20 Thread Matt Turner
On Fri, Sep 20, 2013 at 6:03 PM, Vinson Lee  wrote:
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69541
> Signed-off-by: Vinson Lee 
> ---
>  src/glsl/ir_constant_expression.cpp | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/glsl/ir_constant_expression.cpp 
> b/src/glsl/ir_constant_expression.cpp
> index 4579ef2..805e1f3 100644
> --- a/src/glsl/ir_constant_expression.cpp
> +++ b/src/glsl/ir_constant_expression.cpp
> @@ -40,6 +40,18 @@
>  #include "glsl_types.h"
>  #include "program/hash_table.h"
>
> +#if defined(_MSC_VER) && (_MSC_VER < 1800)
> +static int isnormal(double x) {
> +   return _fpclass(x) == _FPCLASS_NN || _fpclass(x) == _FPCLASS_PN;
> +}
> +#endif
> +
> +#if defined(_MSC_VER)
> +static double copysign(double x, double y) {
> +   return _copysign(x, y);
> +}
> +#endif
> +
>  static float
>  dot(ir_constant *op0, ir_constant *op1)
>  {
> --
> 1.8.1.2

Thanks Vinson,

Acked-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] mesa: Shrink the size of the enum string lookup struct.

2013-09-20 Thread Eric Anholt
Since it's only used for debug information, we can misalign the struct and
save the disk space.  Another 19k on a 64-bit build.
---
 src/mapi/glapi/gen/gl_enums.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/gl_enums.py b/src/mapi/glapi/gen/gl_enums.py
index e1ab600..6f54b06 100644
--- a/src/mapi/glapi/gen/gl_enums.py
+++ b/src/mapi/glapi/gen/gl_enums.py
@@ -47,8 +47,8 @@ class PrintGlEnums(gl_XML.gl_print_base):
 print '#include "main/imports.h"'
 print '#include "main/mtypes.h"'
 print ''
-print 'typedef struct {'
-print '   size_t offset;'
+print 'typedef struct __attribute__((__packed__)) {'
+print '   uint16_t offset;'
 print '   int n;'
 print '} enum_elt;'
 print ''
@@ -78,6 +78,8 @@ static char token_tmp[20];
 
 const char *_mesa_lookup_enum_by_nr( int nr )
 {
+   STATIC_ASSERT(sizeof(enum_string_table) < (1 << 16));
+
enum_elt *elt = _mesa_bsearch(& nr, enum_string_table_offsets,
  Elements(enum_string_table_offsets),
  sizeof(enum_string_table_offsets[0]),
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] mesa: Remove _mesa_lookup_enum_by_name().

2013-09-20 Thread Eric Anholt
It's been unused for a long time.  I stopped digging through git history
as of 2009.
---
 src/mapi/glapi/gen/gl_enums.py   | 29 -
 src/mesa/main/enums.h|  2 --
 src/mesa/main/tests/enum_strings.cpp | 32 
 3 files changed, 63 deletions(-)

diff --git a/src/mapi/glapi/gen/gl_enums.py b/src/mapi/glapi/gen/gl_enums.py
index 806d384..d8f1045 100644
--- a/src/mapi/glapi/gen/gl_enums.py
+++ b/src/mapi/glapi/gen/gl_enums.py
@@ -59,21 +59,6 @@ class PrintGlEnums(gl_XML.gl_print_base):
 typedef int (*cfunc)(const void *, const void *);
 
 /**
- * Compare a key name to an element in the \c all_enums array.
- *
- * \c bsearch always passes the key as the first parameter and the pointer
- * to the array element as the second parameter.  We can elimiate some
- * extra work by taking advantage of that fact.
- *
- * \param a  Pointer to the desired enum name.
- * \param b  Pointer to an element of the \c all_enums array.
- */
-static int compar_name( const char *a, const enum_elt *b )
-{
-   return strcmp( a, & enum_string_table[ b->offset ] );
-}
-
-/**
  * Compare a key enum value to an element in the \c all_enums array.
  *
  * \c bsearch always passes the key as the first parameter and the pointer
@@ -147,20 +132,6 @@ _mesa_lookup_prim_by_nr(GLuint nr)
 }
 
 
-int _mesa_lookup_enum_by_name( const char *symbol )
-{
-   enum_elt * f = NULL;
-
-   if ( symbol != NULL ) {
-  f = (enum_elt *) _mesa_bsearch(symbol, all_enums,
- Elements(all_enums),
- sizeof( enum_elt ),
- (cfunc) compar_name);
-   }
-
-   return (f != NULL) ? f->n : -1;
-}
-
 """
 return
 
diff --git a/src/mesa/main/enums.h b/src/mesa/main/enums.h
index 556c1db..36c053d 100644
--- a/src/mesa/main/enums.h
+++ b/src/mesa/main/enums.h
@@ -44,6 +44,4 @@ extern const char *_mesa_lookup_enum_by_nr( int nr );
  */
 const char *_mesa_lookup_prim_by_nr( unsigned nr );
 
-extern int _mesa_lookup_enum_by_name( const char *symbol );
-
 #endif
diff --git a/src/mesa/main/tests/enum_strings.cpp 
b/src/mesa/main/tests/enum_strings.cpp
index 1dae60f..c8df819 100644
--- a/src/mesa/main/tests/enum_strings.cpp
+++ b/src/mesa/main/tests/enum_strings.cpp
@@ -34,7 +34,6 @@ struct enum_info {
 };
 
 extern const struct enum_info everything[];
-extern const struct enum_info alternate_names[];
 
 TEST(EnumStrings, LookUpByNumber)
 {
@@ -44,25 +43,6 @@ TEST(EnumStrings, LookUpByNumber)
}
 }
 
-TEST(EnumStrings, LookUpByName)
-{
-   for (unsigned i = 0; everything[i].name != NULL; i++) {
-  EXPECT_EQ(everything[i].value,
-   _mesa_lookup_enum_by_name(everything[i].name));
-   }
-}
-
-TEST(EnumStrings, LookUpByDuplicateName)
-{
-   /* Some enum values have multiple names.  Try to find some values
-* by alternate names.
-*/
-   for (unsigned i = 0; alternate_names[i].name != NULL; i++) {
-  EXPECT_EQ(alternate_names[i].value,
-   _mesa_lookup_enum_by_name(alternate_names[i].name));
-   }
-}
-
 TEST(EnumStrings, LookUpUnknownNumber)
 {
EXPECT_STRCASEEQ("0x", _mesa_lookup_enum_by_nr(0x));
@@ -1866,15 +1846,3 @@ const struct enum_info everything[] = {
{ 0x19262, "GL_RASTER_POSITION_UNCLIPPED_IBM" },
{ 0, NULL }
 };
-
-const struct enum_info alternate_names[] = {
-   { 0x8513, "GL_TEXTURE_CUBE_MAP_ARB" },
-   { 0x8514, "GL_TEXTURE_BINDING_CUBE_MAP_ARB" },
-   { 0x8515, "GL_TEXTURE_CUBE_MAP_POSITIVE_X_ARB" },
-   { 0x8516, "GL_TEXTURE_CUBE_MAP_NEGATIVE_X_ARB" },
-   { 0x8517, "GL_TEXTURE_CUBE_MAP_POSITIVE_Y_ARB" },
-   { 0x8518, "GL_TEXTURE_CUBE_MAP_NEGATIVE_Y_ARB" },
-   { 0x8519, "GL_TEXTURE_CUBE_MAP_POSITIVE_Z_ARB" },
-   { 0x851A, "GL_TEXTURE_CUBE_MAP_NEGATIVE_Z_ARB" },
-   { 0, NULL }
-};
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa built code size reduction

2013-09-20 Thread Eric Anholt
Since I'm going to be talking about the megadrivers idea next week at
XDC, I thought I'd look at what the impact would be of an alternate
option, and to do that I wanted to first fix up any stupid wastes of
disk space I found.  I found more than I expected.

The last commit is an important one.  I think it accounts for most of
the performance change of the series (and the performance loss that we
got from dricore).  I'm planning on comparing this series to
non-dricore and to megadrivers for the talk.

Size of mesa before:
   textdata bss dec hex filename
 872896   199123392  896200   dacc8 lib/i965_dri.so
3387494  112060   95592 3595146  36db8a lib/libdricore9.3.0-devel.so.1

After:
 872850   199123392  896154   dac9a lib/i965_dri.so
3184667   88524   95592 3368783  33674f lib/libdricore9.3.0-devel.so.1

(a little over 6% reduction)

Results of testing of CPU overhead in mesa (cairo-gl with
INTEL_NO_HW=1.  Also had a patch for reducing _mesa_DummyProgram size
that I found was broken after I piglited):

x before
+ after
+--+
|  +   x   |
|  +   x   |
|  +   x   |
|  +   xx  |
| ++ + xx  |
| ++ + xx  |
|  |
| x +  |
| x x+ x  + +  +   x   |
| x++* +x++x+++ + x  + +   x   |
| *+** +x++x+++   + *+x   +  + +  xx x |
| **xxx+**+*x++x+**   x  ++x+ * * x+ +x   ++x  + x+ xx+  + ++* xx xx x |
|+***xx+**+*x*+x***x  xxx**x*+*+*x**x*x+* ++xxx+ x** x* xx++***xx x|
|++*x***  **x+***x**x+**x*** **xx+++***|
| |__|M_AM___A|___||
+--+
N   Min   MaxMedian   AvgStddev
x 200 29.417143 47.735255  35.27498 36.471098 6.0928765
+ 200 28.254668 45.819881 33.445248 35.119495 5.5758312
Difference at 95.0% confidence
-1.3516 +/- 1.14466
-3.70596% +/- 3.13853%
(Student's t, pooled s = 5.84008)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] mesa: Convert some runtime asserts to static asserts.

2013-09-20 Thread Eric Anholt
Noticed while grepping through the code for something else.
---
 src/mesa/program/program.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
index 2529c13..5dd68d9 100644
--- a/src/mesa/program/program.c
+++ b/src/mesa/program/program.c
@@ -55,27 +55,35 @@ _mesa_init_program(struct gl_context *ctx)
 * If this assertion fails, we need to increase the field
 * size for register indexes (see INST_INDEX_BITS).
 */
-   ASSERT(ctx->Const.VertexProgram.MaxUniformComponents / 4
+   STATIC_ASSERT(ctx->Const.VertexProgram.MaxUniformComponents / 4
   <= (1 << INST_INDEX_BITS));
-   ASSERT(ctx->Const.FragmentProgram.MaxUniformComponents / 4
+   STATIC_ASSERT(ctx->Const.FragmentProgram.MaxUniformComponents / 4
   <= (1 << INST_INDEX_BITS));
 
-   ASSERT(ctx->Const.VertexProgram.MaxTemps <= (1 << INST_INDEX_BITS));
-   ASSERT(ctx->Const.VertexProgram.MaxLocalParams <= (1 << INST_INDEX_BITS));
-   ASSERT(ctx->Const.FragmentProgram.MaxTemps <= (1 << INST_INDEX_BITS));
-   ASSERT(ctx->Const.FragmentProgram.MaxLocalParams <= (1 << INST_INDEX_BITS));
-
-   ASSERT(ctx->Const.VertexProgram.MaxUniformComponents <= 4 * MAX_UNIFORMS);
-   ASSERT(ctx->Const.FragmentProgram.MaxUniformComponents <= 4 * MAX_UNIFORMS);
-
-   ASSERT(ctx->Const.VertexProgram.MaxAddressOffset <= (1 << INST_INDEX_BITS));
-   ASSERT(ctx->Const.FragmentProgram.MaxAddressOffset <= (1 << 
INST_INDEX_BITS));
+   STATIC_ASSERT(ctx->Const.VertexProgram.MaxTemps <=
+ (1 << INST_INDEX_BITS));
+   STATIC_ASSERT(ctx->Const.VertexProgram.MaxLocalParams <=
+ (1 << INST_INDEX_BITS));
+   STATIC_ASSERT(ctx->Const.FragmentProgram.MaxTemps <=
+ (1 << INST_INDEX_BITS));
+   STATIC_ASSERT(ctx->Const.FragmentProgram.MaxLocalParams <=
+ (1 << INST_INDEX_BITS));
+
+   STATIC_ASSERT(ctx->Const.VertexProgram.MaxUniformComponents <=
+ 4 * MAX_UNIFORMS);
+   STATIC_ASSERT(ctx->Const.FragmentProgram.MaxUniformComponents <=
+ 4 * MAX_UNIFORMS);
+
+   STATIC_ASSERT(ctx->Const.VertexProgram.MaxAddressOffset <=
+ (1 << INST_INDEX_BITS));
+   STATIC_ASSERT(ctx->Const.FragmentProgram.MaxAddressOffset <=
+ (1 << INST_INDEX_BITS));
 
/* If this fails, increase prog_instruction::TexSrcUnit size */
-   ASSERT(MAX_TEXTURE_UNITS <= (1 << 5));
+   STATIC_ASSERT(MAX_TEXTURE_UNITS <= (1 << 5));
 
/* If this fails, increase prog_instruction::TexSrcTarget size */
-   ASSERT(NUM_TEXTURE_TARGETS <= (1 << 4));
+   STATIC_ASSERT(NUM_TEXTURE_TARGETS <= (1 << 4));
 
ctx->Program.ErrorPos = -1;
ctx->Program.ErrorString = _mesa_strdup("");
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] mesa: Remove the extra enum strings and extra lookup table.

2013-09-20 Thread Eric Anholt
Now that there's no name -> enum direction, we can drop the extra strings,
and merge the offsets table and the reduced_enums table.

Between the previous commit and this one, Mesa core drops by 30k.
---
 src/mapi/glapi/gen/gl_enums.py | 61 ++
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/src/mapi/glapi/gen/gl_enums.py b/src/mapi/glapi/gen/gl_enums.py
index d8f1045..e1ab600 100644
--- a/src/mapi/glapi/gen/gl_enums.py
+++ b/src/mapi/glapi/gen/gl_enums.py
@@ -59,18 +59,18 @@ class PrintGlEnums(gl_XML.gl_print_base):
 typedef int (*cfunc)(const void *, const void *);
 
 /**
- * Compare a key enum value to an element in the \c all_enums array.
+ * Compare a key enum value to an element in the \c enum_string_table_offsets 
array.
  *
  * \c bsearch always passes the key as the first parameter and the pointer
  * to the array element as the second parameter.  We can elimiate some
  * extra work by taking advantage of that fact.
  *
  * \param a  Pointer to the desired enum name.
- * \param b  Pointer to an index into the \c all_enums array.
+ * \param b  Pointer into the \c enum_string_table_offsets array.
  */
-static int compar_nr( const int *a, const unsigned *b )
+static int compar_nr( const int *a, enum_elt *b )
 {
-   return a[0] - all_enums[*b].n;
+   return a[0] - b->n;
 }
 
 
@@ -78,15 +78,13 @@ static char token_tmp[20];
 
 const char *_mesa_lookup_enum_by_nr( int nr )
 {
-   unsigned * i;
+   enum_elt *elt = _mesa_bsearch(& nr, enum_string_table_offsets,
+ Elements(enum_string_table_offsets),
+ sizeof(enum_string_table_offsets[0]),
+ (cfunc) compar_nr);
 
-   i = (unsigned *) _mesa_bsearch(& nr, reduced_enums,
-  Elements(reduced_enums),
-  sizeof(reduced_enums[0]),
-  (cfunc) compar_nr);
-
-   if ( i != NULL ) {
-  return & enum_string_table[ all_enums[ *i ].offset ];
+   if (elt != NULL) {
+  return &enum_string_table[elt->offset];
}
else {
   /* this is not re-entrant safe, no big deal here */
@@ -141,56 +139,37 @@ _mesa_lookup_prim_by_nr(GLuint nr)
 for api in api_list:
 self.process_enums( api )
 
-keys = self.enum_table.keys()
-keys.sort()
-
-name_table = []
-enum_table = {}
+enum_table = []
 
-for enum in keys:
+for enum in sorted(self.enum_table.keys()):
 low_pri = 9
+best_name = ''
 for [name, pri] in self.enum_table[ enum ]:
-name_table.append( [name, enum] )
-
 if pri < low_pri:
 low_pri = pri
-enum_table[enum] = name
-
+best_name = name
 
-name_table.sort()
+enum_table.append((enum, best_name))
 
 string_offsets = {}
 i = 0;
 print 'LONGSTRING static const char enum_string_table[] = '
-for [name, enum] in name_table:
+for enum, name in enum_table:
 print '   "%s\\0"' % (name)
-string_offsets[ name ] = i
+string_offsets[ enum ] = i
 i += len(name) + 1
 
 print '   ;'
 print ''
 
 
-print 'static const enum_elt all_enums[%u] =' % (len(name_table))
+print 'static const enum_elt enum_string_table_offsets[%u] =' % 
(len(enum_table))
 print '{'
-for [name, enum] in name_table:
-print '   { %5u, 0x%08X }, /* %s */' % (string_offsets[name], 
enum, name)
+for enum, name in enum_table:
+print '   { %5u, 0x%08X }, /* %s */' % (string_offsets[enum], 
enum, name)
 print '};'
 print ''
 
-print 'static const unsigned reduced_enums[%u] =' % (len(keys))
-print '{'
-for enum in keys:
-name = enum_table[ enum ]
-if [name, enum] not in name_table:
-print '  /* Error! %s, 0x%04x */ 0,' % (name, enum)
-else:
-i = name_table.index( [name, enum] )
-
-print '  %4u, /* %s */' % (i, name)
-print '};'
-
-
 self.print_code()
 return
 
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] mesa: Use -Bsymbolic in the linker to locally resolve Mesa-internal symbols.

2013-09-20 Thread Eric Anholt
Normally, LD_PRELOAD will take precedence over your own symbols, which you
want for things like malloc() in libc.  But we don't have any local
symbols we would want overridden (like hash_table_insert(), for example!),
so tell the linker to resolve them internally.  This also avoids calls
through the PLT.

Saves almost 100k on libdricore's size, and gets us a bunch of the
performance back that we had with non-dricore.
---
 configure.ac | 3 +++
 src/mesa/drivers/dri/i915/Makefile.am| 2 +-
 src/mesa/drivers/dri/i965/Makefile.am| 2 +-
 src/mesa/drivers/dri/nouveau/Makefile.am | 2 +-
 src/mesa/drivers/dri/r200/Makefile.am| 2 +-
 src/mesa/drivers/dri/radeon/Makefile.am  | 2 +-
 src/mesa/drivers/dri/swrast/Makefile.am  | 3 ++-
 src/mesa/libdricore/Makefile.am  | 6 +-
 8 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/configure.ac b/configure.ac
index ca9228c..fd1c655 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1032,10 +1032,13 @@ if test "x$enable_dri" = xyes; then
 # put all the necessary libs together
 DRI_LIB_DEPS="$DRI_LIB_DEPS $SELINUX_LIBS $LIBDRM_LIBS $EXPAT_LIB -lm 
$PTHREAD_LIBS $DLOPEN_LIBS"
 GALLIUM_DRI_LIB_DEPS="$GALLIUM_DRI_LIB_DEPS $SELINUX_LIBS $LIBDRM_LIBS 
$EXPAT_LIB -lm $CLOCK_LIB $PTHREAD_LIBS $DLOPEN_LIBS"
+
+DRI_DRIVER_LDFLAGS="-module -avoid-version -shared -Wl,-Bsymbolic"
 fi
 AM_CONDITIONAL(NEED_LIBDRICORE, test -n "$DRI_DIRS")
 AC_SUBST([EXPAT_INCLUDES])
 AC_SUBST([DRI_LIB_DEPS])
+AC_SUBST([DRI_DRIVER_LDFLAGS])
 AC_SUBST([GALLIUM_DRI_LIB_DEPS])
 
 case $DRI_DIRS in
diff --git a/src/mesa/drivers/dri/i915/Makefile.am 
b/src/mesa/drivers/dri/i915/Makefile.am
index 978917c..f66967d 100644
--- a/src/mesa/drivers/dri/i915/Makefile.am
+++ b/src/mesa/drivers/dri/i915/Makefile.am
@@ -52,7 +52,7 @@ dri_LTLIBRARIES = i915_dri.la
 endif
 
 i915_dri_la_SOURCES = $(i915_FILES)
-i915_dri_la_LDFLAGS = -module -avoid-version -shared
+i915_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
 i915_dri_la_LIBADD = \
../common/libdricommon.la \
$(DRI_LIB_DEPS) \
diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 27c67d1..541e0d6 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -72,7 +72,7 @@ TEST_LIBS = \
 i965_dri_la_SOURCES =
 nodist_EXTRA_i965_dri_la_SOURCES = dummy2.cpp
 i965_dri_la_LIBADD = $(COMMON_LIBS)
-i965_dri_la_LDFLAGS = -module -avoid-version -shared
+i965_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
 
 TESTS = \
 test_eu_compact \
diff --git a/src/mesa/drivers/dri/nouveau/Makefile.am 
b/src/mesa/drivers/dri/nouveau/Makefile.am
index 2b47f75..7172e62 100644
--- a/src/mesa/drivers/dri/nouveau/Makefile.am
+++ b/src/mesa/drivers/dri/nouveau/Makefile.am
@@ -42,7 +42,7 @@ endif
 nouveau_vieux_dri_la_SOURCES = \
$(NOUVEAU_C_FILES)
 
-nouveau_vieux_dri_la_LDFLAGS = -module -avoid-version -shared
+nouveau_vieux_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
 nouveau_vieux_dri_la_LIBADD = \
../common/libdricommon.la \
$(DRI_LIB_DEPS) \
diff --git a/src/mesa/drivers/dri/r200/Makefile.am 
b/src/mesa/drivers/dri/r200/Makefile.am
index 4357922..fc0482a 100644
--- a/src/mesa/drivers/dri/r200/Makefile.am
+++ b/src/mesa/drivers/dri/r200/Makefile.am
@@ -45,7 +45,7 @@ endif
 r200_dri_la_SOURCES = \
 $(R200_C_FILES)
 
-r200_dri_la_LDFLAGS = -module -avoid-version -shared
+r200_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
 r200_dri_la_LIBADD = \
../common/libdricommon.la \
$(DRI_LIB_DEPS) \
diff --git a/src/mesa/drivers/dri/radeon/Makefile.am 
b/src/mesa/drivers/dri/radeon/Makefile.am
index 43de059..d13b803 100644
--- a/src/mesa/drivers/dri/radeon/Makefile.am
+++ b/src/mesa/drivers/dri/radeon/Makefile.am
@@ -45,7 +45,7 @@ endif
 radeon_dri_la_SOURCES = \
 $(RADEON_C_FILES)
 
-radeon_dri_la_LDFLAGS = -module -avoid-version -shared
+radeon_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
 radeon_dri_la_LIBADD = \
../common/libdricommon.la \
$(DRI_LIB_DEPS) \
diff --git a/src/mesa/drivers/dri/swrast/Makefile.am 
b/src/mesa/drivers/dri/swrast/Makefile.am
index d3da196..fb9b8a0 100644
--- a/src/mesa/drivers/dri/swrast/Makefile.am
+++ b/src/mesa/drivers/dri/swrast/Makefile.am
@@ -42,7 +42,8 @@ endif
 swrast_dri_la_SOURCES = \
$(SWRAST_C_FILES)
 
-swrast_dri_la_LDFLAGS = -module -avoid-version -shared
+swrast_dri_la_LDFLAGS = $(DRI_DRIVER_LDFLAGS)
+
 swrast_dri_la_LIBADD = \
$(DRI_LIB_DEPS)
 
diff --git a/src/mesa/libdricore/Makefile.am b/src/mesa/libdricore/Makefile.am
index 106c7db..686e478 100644
--- a/src/mesa/libdricore/Makefile.am
+++ b/src/mesa/libdricore/Makefile.am
@@ -37,7 +37,11 @@ libdricore@VERSION@_la_SOURCES = \
$(LIBGLCPP_FILES) \
$(LIBGLSL_GENERATED_CXX_FILES) \
$(LIBGLSL_FILES)
-libdricore@VERSION@_la_LDFLAGS = -version-number 1:0
+libdricore@VERSION@_la_LDFLAGS = \
+-version-number 1:0 \
+   -Wl,-Bsymbolic \
+$()

[Mesa-dev] [PATCH 6/7] glsl: Hide many classes local to individual .cpp files in anon namespaces.

2013-09-20 Thread Eric Anholt
This gives the compiler the chance to inline and not export class symbols
even in the absence of LTO.  Saves about 60kb on disk.
---
 src/glsl/builtin_functions.cpp   |  4 
 src/glsl/ir_function_detect_recursion.cpp|  4 
 src/glsl/ir_import_prototypes.cpp|  3 +++
 src/glsl/ir_reader.cpp   |  4 
 src/glsl/ir_set_program_inouts.cpp   |  4 
 src/glsl/ir_validate.cpp |  3 +++
 src/glsl/link_functions.cpp  |  3 +++
 src/glsl/link_uniform_blocks.cpp |  4 
 src/glsl/link_uniforms.cpp   |  4 
 src/glsl/link_varyings.cpp   |  2 ++
 src/glsl/linker.cpp  |  3 +++
 src/glsl/loop_analysis.cpp   |  2 ++
 src/glsl/loop_controls.cpp   |  2 ++
 src/glsl/loop_unroll.cpp |  3 +++
 src/glsl/lower_clip_distance.cpp |  3 +++
 src/glsl/lower_discard.cpp   |  3 +++
 src/glsl/lower_discard_flow.cpp  |  4 
 src/glsl/lower_if_to_cond_assign.cpp |  4 
 src/glsl/lower_instructions.cpp  |  4 
 src/glsl/lower_jumps.cpp |  4 
 src/glsl/lower_mat_op_to_vec.cpp |  4 
 src/glsl/lower_named_interface_blocks.cpp|  4 
 src/glsl/lower_output_reads.cpp  |  4 
 src/glsl/lower_packed_varyings.cpp   |  4 
 src/glsl/lower_texture_projection.cpp|  4 
 src/glsl/lower_variable_index_to_cond_assign.cpp |  3 +++
 src/glsl/lower_vec_index_to_cond_assign.cpp  |  4 
 src/glsl/lower_vec_index_to_swizzle.cpp  |  4 
 src/glsl/lower_vector.cpp|  4 
 src/glsl/lower_vector_insert.cpp |  3 +++
 src/glsl/opt_array_splitting.cpp |  5 +
 src/glsl/opt_dead_builtin_varyings.cpp   |  2 ++
 src/mesa/program/ir_to_mesa.cpp  | 16 ++--
 33 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 96358a7..72054e0 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -304,6 +304,8 @@ tex3d_lod(const _mesa_glsl_parse_state *state)
 
 
/**/
 
+namespace {
+
 /**
  * builtin_builder: A singleton object representing the core of the built-in
  * function module.
@@ -521,6 +523,8 @@ private:
/** @} */
 };
 
+} /* anonymous namespace */
+
 /**
  * Core builtin_builder functionality:
  *  @{
diff --git a/src/glsl/ir_function_detect_recursion.cpp 
b/src/glsl/ir_function_detect_recursion.cpp
index 280c473..9d8fbc1 100644
--- a/src/glsl/ir_function_detect_recursion.cpp
+++ b/src/glsl/ir_function_detect_recursion.cpp
@@ -127,6 +127,8 @@
 #include "program/hash_table.h"
 #include "program.h"
 
+namespace {
+
 struct call_node : public exec_node {
class function *func;
 };
@@ -240,6 +242,8 @@ public:
bool progress;
 };
 
+} /* anonymous namespace */
+
 static void
 destroy_links(exec_list *list, function *f)
 {
diff --git a/src/glsl/ir_import_prototypes.cpp 
b/src/glsl/ir_import_prototypes.cpp
index 3585bf6..b0429fb 100644
--- a/src/glsl/ir_import_prototypes.cpp
+++ b/src/glsl/ir_import_prototypes.cpp
@@ -30,6 +30,8 @@
 #include "ir.h"
 #include "glsl_symbol_table.h"
 
+namespace {
+
 /**
  * Visitor used to import function prototypes
  *
@@ -99,6 +101,7 @@ private:
void *mem_ctx;
 };
 
+} /* anonymous namespace */
 
 /**
  * Import function prototypes from one IR tree into another
diff --git a/src/glsl/ir_reader.cpp b/src/glsl/ir_reader.cpp
index ec35b68..f0318ea 100644
--- a/src/glsl/ir_reader.cpp
+++ b/src/glsl/ir_reader.cpp
@@ -28,6 +28,8 @@
 
 const static bool debug = false;
 
+namespace {
+
 class ir_reader {
 public:
ir_reader(_mesa_glsl_parse_state *);
@@ -66,6 +68,8 @@ private:
ir_dereference_variable *read_var_ref(s_expression *);
 };
 
+} /* anonymous namespace */
+
 ir_reader::ir_reader(_mesa_glsl_parse_state *state) : state(state)
 {
this->mem_ctx = state;
diff --git a/src/glsl/ir_set_program_inouts.cpp 
b/src/glsl/ir_set_program_inouts.cpp
index 6196d6a..1267d6d 100644
--- a/src/glsl/ir_set_program_inouts.cpp
+++ b/src/glsl/ir_set_program_inouts.cpp
@@ -42,6 +42,8 @@
 #include "ir_visitor.h"
 #include "glsl_types.h"
 
+namespace {
+
 class ir_set_program_inouts_visitor : public ir_hierarchical_visitor {
 public:
ir_set_program_inouts_visitor(struct gl_program *prog, GLenum shader_type)
@@ -67,6 +69,8 @@ private:
GLenum shader_type;
 };
 
+} /* anonymous namespace */
+
 static inline bool
 is_shader_inout(ir_variable *var)
 {
diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index 2068de0..2c64f4e 100644
--- a/src/glsl/ir_validate.cpp
+++ b/sr

[Mesa-dev] [PATCH 5/7] mesa: Drop an extra copy-and-pasted copy in the program clone function.

2013-09-20 Thread Eric Anholt
---
 src/mesa/program/program.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
index 5dd68d9..aa3c5b4 100644
--- a/src/mesa/program/program.c
+++ b/src/mesa/program/program.c
@@ -485,7 +485,6 @@ _mesa_clone_program(struct gl_context *ctx, const struct 
gl_program *prog)
if (prog->Parameters)
   clone->Parameters = _mesa_clone_parameter_list(prog->Parameters);
memcpy(clone->LocalParams, prog->LocalParams, sizeof(clone->LocalParams));
-   memcpy(clone->LocalParams, prog->LocalParams, sizeof(clone->LocalParams));
clone->IndirectRegisterFiles = prog->IndirectRegisterFiles;
clone->NumInstructions = prog->NumInstructions;
clone->NumTemporaries = prog->NumTemporaries;
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Fix usage of the wrong union member in program_resource_visitor::recursion.

2013-09-20 Thread Francisco Jerez
In the array-of-struct case, recursion() takes the row_major flag for
each iteration from 't->fields.structure[i]', but 't' is not a record
type.  Inherit the array declaration row_major flag instead.

This mistake was found by running piglit on valgrind.
---
 src/glsl/link_uniforms.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index fa77157..6cec96e 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -140,8 +140,8 @@ program_resource_visitor::recursion(const glsl_type *t, 
char **name,
 /* Append the subscript to the current variable name */
 ralloc_asprintf_rewrite_tail(name, &new_length, "[%u]", i);
 
- recursion(t->fields.array, name, new_length,
-   t->fields.structure[i].row_major, record_type);
+ recursion(t->fields.array, name, new_length, row_major,
+   record_type);
 
  /* Only the first leaf-field of the record gets called with the
   * record type pointer.
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Zero allocation of C++ classes.

2013-09-20 Thread Francisco Jerez
This is a follow-up patch series on the issue we were discussing in
the ARB_shader_atomic_counters thread, it's based on master with Ken's
recent patch series that defines macros for implementing ralloc-based
new and delete operators.

I've done a closer analysis on the classes we used to allocate with
rzalloc, and I think I've found all cases where a class constructor
leaves member variables uninitialized.  This is a real problem because
classes that rely on the allocator to initialize members to zero for
them cannot be allocated in the stack, they cannot be aggregated into
other objects safely, and they cannot be allocated using array new or
the normal variant of new -- And there's no clear indication or
comment that it doesn't work until the program starts reading
uninitialized memory.

In the current situation, the fact that a few classes use rzalloc
instead of ralloc doesn't save anyone extending a class with new
member variables from the responsibility of making sure whether the
class that's being extended uses the ralloc or the rzalloc convention,
as it's been quite inconsistent until now which uses which.

I think it's worrying to get used to the latter, because you're twice
as likely to make a mistake when you come across the former
convention, because it makes your classes less flexible on how they
can be allocated, and because it also makes it more likely for the
users of your class to make a mistake.

If anyone feels strongly against any of these patches, let's change
that particular case to use memset(0) from the class constructor, or
let's make its constructor private and use a factory method that
handles allocation and construction together, so there's no
possibility for them to be allocated incorrectly.

I've been running piglit on valgrind for a few hours already and I
haven't found any regression from this series.  I'll let it run
overnight anyway, just in case I've missed something.

I hope we can reach some sort of consensus on this.  [The last patch
could probably be squashed into Ken's series in that case.]

[1] http://lists.freedesktop.org/archives/mesa-dev/2013-September/044900.html

[PATCH 01/11] glsl: Initialize all member variables of _mesa_glsl_parse_state 
on construction.
[PATCH 02/11] i965: Initialize all member variables of vec4_instruction on 
construction.
[PATCH 03/11] glsl: Switch ast_node to the non-zeroing allocator.
[PATCH 04/11] glsl: Switch ast_type_qualifier to the non-zeroing allocator.
[PATCH 05/11] i965: Initialize all member variables of bblock_t on construction.
[PATCH 06/11] i965: Initialize all member variables of cfg_t on construction.
[PATCH 07/11] i965: Switch ip_record to the non-zeroing allocator.
[PATCH 08/11] i965: Switch fs_inst to the non-zeroing allocator.
[PATCH 09/11] i965: Switch fs_live_variables to the non-zeroing allocator.
[PATCH 10/11] i965: Switch vec4_live_variables to the non-zeroing allocator.
[PATCH 11/11] ralloc: Remove the rzalloc-based new/delete operator definition 
macro.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] glsl: Initialize all member variables of _mesa_glsl_parse_state on construction.

2013-09-20 Thread Francisco Jerez
The _mesa_glsl_parse_state object relies on the memory allocator
zeroing out its contents before it's initialized, which is quite an
unusual practice in the C++ world because it ties objects to some
specific allocation scheme, and gives unpredictable results when an
object is created with a different allocator -- Stack allocation,
array allocation, or aggregation inside a different object are some of
the useful possibilities that come to my mind.  Initialize all fields
from the constructor and stop using the zeroing allocator.

Reviewed-by: Paul Berry 
Reviewed-by: Chad Versace 
---
 src/glsl/glsl_parser_extras.cpp | 16 ++--
 src/glsl/glsl_parser_extras.h   |  2 +-
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index cac5a18..772933f 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -55,7 +55,7 @@ static unsigned known_desktop_glsl_versions[] =
 
 _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx,
   GLenum target, void *mem_ctx)
- : ctx(_ctx)
+   : ctx(_ctx), switch_state()
 {
switch (target) {
case GL_VERTEX_SHADER:   this->target = vertex_shader; break;
@@ -66,10 +66,14 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
gl_context *_ctx,
this->scanner = NULL;
this->translation_unit.make_empty();
this->symbols = new(mem_ctx) glsl_symbol_table;
+
+   this->num_uniform_blocks = 0;
+   this->uniform_block_array_size = 0;
+   this->uniform_blocks = NULL;
+
this->info_log = ralloc_strdup(mem_ctx, "");
this->error = false;
this->loop_nesting_ast = NULL;
-   this->switch_state.switch_nesting_ast = NULL;
 
this->struct_specifier_depth = 0;
this->num_builtins_to_link = 0;
@@ -105,6 +109,13 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
gl_context *_ctx,
 
this->Const.MaxDrawBuffers = ctx->Const.MaxDrawBuffers;
 
+   this->current_function = NULL;
+   this->toplevel_ir = NULL;
+   this->found_return = false;
+   this->all_invariant = false;
+   this->user_structures = NULL;
+   this->num_user_structures = 0;
+
/* Populate the list of supported GLSL versions */
/* FINISHME: Once the OpenGL 3.0 'forward compatible' context or
 * the OpenGL 3.2 Core context is supported, this logic will need
@@ -163,6 +174,7 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct 
gl_context *_ctx,
 
this->gs_input_prim_type_specified = false;
this->gs_input_prim_type = GL_POINTS;
+   this->gs_input_size = 0;
this->out_qualifier = new(this) ast_type_qualifier();
 }
 
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index 364a983..853a9b0 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -73,7 +73,7 @@ struct _mesa_glsl_parse_state {
_mesa_glsl_parse_state(struct gl_context *_ctx, GLenum target,
  void *mem_ctx);
 
-   DECLARE_RZALLOC_CXX_OPERATORS(_mesa_glsl_parse_state);
+   DECLARE_RALLOC_CXX_OPERATORS(_mesa_glsl_parse_state);
 
/**
 * Generate a string representing the GLSL version currently being compiled
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/11] i965: Initialize all member variables of vec4_instruction on construction.

2013-09-20 Thread Francisco Jerez
The vec4_instruction object relies on the memory allocator zeroing out
its contents before it's initialized, which is quite an unusual
practice in the C++ world because it ties objects to some specific
allocation scheme, and gives unpredictable results when an object is
created with a different allocator -- Stack allocation, array
allocation, or aggregation inside a different object are some of the
useful possibilities that come to my mind.  Initialize all fields from
the constructor and stop using the zeroing allocator.

Reviewed-by: Paul Berry 
Reviewed-by: Chad Versace 
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 689040b..847c75e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -168,7 +168,7 @@ with_writemask(dst_reg const &r, int mask);
 
 class vec4_instruction : public backend_instruction {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(vec4_instruction)
+   DECLARE_RALLOC_CXX_OPERATORS(vec4_instruction)
 
vec4_instruction(vec4_visitor *v, enum opcode opcode,
dst_reg dst = dst_reg(),
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 3ff6a61..6b9c4c6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -38,7 +38,22 @@ vec4_instruction::vec4_instruction(vec4_visitor *v,
this->src[0] = src0;
this->src[1] = src1;
this->src[2] = src2;
+   this->saturate = false;
+   this->force_writemask_all = false;
+   this->no_dd_clear = false;
+   this->no_dd_check = false;
+   this->conditional_mod = BRW_CONDITIONAL_NONE;
+   this->sampler = 0;
+   this->texture_offset = 0;
+   this->target = 0;
+   this->shadow_compare = false;
this->ir = v->base_ir;
+   this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS;
+   this->header_present = false;
+   this->mlen = 0;
+   this->base_mrf = 0;
+   this->offset = 0;
+   this->ir = NULL;
this->annotation = v->current_annotation;
 }
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] glsl: Switch ast_node to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of ast_node are already being initialized from
its constructor, but some of its derived classes were leaving members
uninitialized -- Fix them.

Using rzalloc makes it more likely that we will start relying on the
allocator to zero out all memory if the class is ever extended with
new member variables.  That's bad because it ties objects to some
specific allocation scheme, and gives unpredictable results when an
object is created with a different allocator -- Stack allocation,
array allocation, or aggregation inside a different object are some of
the useful possibilities that come to my mind.
---
 src/glsl/ast.h  | 10 +-
 src/glsl/glsl_parser_extras.cpp |  3 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index c3361a1..f278fe5 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -49,7 +49,7 @@ struct YYLTYPE;
  */
 class ast_node {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(ast_node);
+   DECLARE_RALLOC_CXX_OPERATORS(ast_node);
 
/**
 * Print an AST node in something approximating the original GLSL code
@@ -576,6 +576,10 @@ public:
virtual void print(void) const;
bool has_qualifiers() const;
 
+   ast_fully_specified_type() : qualifier(), specifier()
+   {
+   }
+
const struct glsl_type *glsl_type(const char **name,
 struct _mesa_glsl_parse_state *state)
   const;
@@ -859,6 +863,10 @@ public:
 
 class ast_function_definition : public ast_node {
 public:
+   ast_function_definition() : prototype(), body()
+   {
+   }
+
virtual void print(void) const;
 
virtual ir_rvalue *hir(exec_list *instructions,
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 772933f..bf2909c 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1058,7 +1058,8 @@ ast_expression::print(void) const
 ast_expression::ast_expression(int oper,
   ast_expression *ex0,
   ast_expression *ex1,
-  ast_expression *ex2)
+  ast_expression *ex2) :
+   primary_expression()
 {
this->oper = ast_operators(oper);
this->subexpressions[0] = ex0;
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] glsl: Switch ast_type_qualifier to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of ast_type_qualifier are already being
initialized from its implicitly defined constructor, it's not
necessary to use rzalloc to allocate its memory.
---
 src/glsl/ast.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index f278fe5..bc28375 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -346,7 +346,7 @@ enum {
 };
 
 struct ast_type_qualifier {
-   DECLARE_RZALLOC_CXX_OPERATORS(ast_type_qualifier);
+   DECLARE_RALLOC_CXX_OPERATORS(ast_type_qualifier);
 
union {
   struct {
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] i965: Initialize all member variables of bblock_t on construction.

2013-09-20 Thread Francisco Jerez
The bblock_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.  Initialize all fields from the
constructor and stop using the zeroing allocator.
---
 src/mesa/drivers/dri/i965/brw_cfg.cpp | 3 ++-
 src/mesa/drivers/dri/i965/brw_cfg.h   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
b/src/mesa/drivers/dri/i965/brw_cfg.cpp
index f4cfcd5..6a886ce 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
+++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
@@ -44,7 +44,8 @@ pop_stack(exec_list *list)
return block;
 }
 
-bblock_t::bblock_t()
+bblock_t::bblock_t() :
+   start_ip(), end_ip(), block_num()
 {
start = NULL;
end = NULL;
diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index 27bc8b6..505a5cf 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -39,7 +39,7 @@ public:
 
 class bblock_t {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(bblock_t)
+   DECLARE_RALLOC_CXX_OPERATORS(bblock_t)
 
bblock_link *make_list(void *mem_ctx);
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/11] i965: Switch ip_record to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of ip_record are already being initialized from
its constructor, it's not necessary to use rzalloc to allocate its
memory, and doing so makes it more likely that we will start relying
on the allocator to zero out all memory if the class is ever extended
with new member variables.

That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
---
 src/mesa/drivers/dri/i965/brw_fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index b2aa041..8d35112 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -112,7 +112,7 @@ static const fs_reg reg_null_d(ARF, BRW_ARF_NULL, 
BRW_REGISTER_TYPE_D);
 
 class ip_record : public exec_node {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(ip_record)
+   DECLARE_RALLOC_CXX_OPERATORS(ip_record)
 
ip_record(int ip)
{
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/11] i965: Initialize all member variables of cfg_t on construction.

2013-09-20 Thread Francisco Jerez
The cfg_t object relies on the memory allocator zeroing out its
contents before it's initialized, which is quite an unusual practice
in the C++ world because it ties objects to some specific allocation
scheme, and gives unpredictable results when an object is created with
a different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.  Initialize all fields from the
constructor and stop using the zeroing allocator.
---
 src/mesa/drivers/dri/i965/brw_cfg.cpp | 1 +
 src/mesa/drivers/dri/i965/brw_cfg.h   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
b/src/mesa/drivers/dri/i965/brw_cfg.cpp
index 6a886ce..4ba4dcb 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
+++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
@@ -82,6 +82,7 @@ cfg_t::create(void *parent_mem_ctx, exec_list *instructions)
 {
mem_ctx = ralloc_context(parent_mem_ctx);
block_list.make_empty();
+   blocks = NULL;
num_blocks = 0;
ip = 0;
cur = NULL;
diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index 505a5cf..ec5a3a0 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -60,7 +60,7 @@ public:
 
 class cfg_t {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(cfg_t)
+   DECLARE_RALLOC_CXX_OPERATORS(cfg_t)
 
cfg_t(backend_visitor *v);
cfg_t(void *mem_ctx, exec_list *instructions);
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] i965: Switch fs_inst to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of fs_inst are already being initialized from its
constructor, it's not necessary to use rzalloc to allocate its memory,
and doing so makes it more likely that we will start relying on the
allocator to zero out all memory if the class is ever extended with
new member variables.

That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
---
 src/mesa/drivers/dri/i965/brw_fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 8d35112..6d93132 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -124,7 +124,7 @@ public:
 
 class fs_inst : public backend_instruction {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(fs_inst)
+   DECLARE_RALLOC_CXX_OPERATORS(fs_inst)
 
void init();
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] i965: Switch fs_live_variables to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of fs_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.

That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
---
 src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
index fa8b61d..e227439 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
@@ -53,7 +53,7 @@ struct block_data {
 
 class fs_live_variables {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(fs_live_variables)
+   DECLARE_RALLOC_CXX_OPERATORS(fs_live_variables)
 
fs_live_variables(fs_visitor *v, cfg_t *cfg);
~fs_live_variables();
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] i965: Switch vec4_live_variables to the non-zeroing allocator.

2013-09-20 Thread Francisco Jerez
All member variables of vec4_live_variables are already being
initialized from its constructor, it's not necessary to use rzalloc to
allocate its memory, and doing so makes it more likely that we will
start relying on the allocator to zero out all memory if the class is
ever extended with new member variables.

That's bad because it ties objects to some specific allocation scheme,
and gives unpredictable results when an object is created with a
different allocator -- Stack allocation, array allocation, or
aggregation inside a different object are some of the useful
possibilities that come to my mind.
---
 src/mesa/drivers/dri/i965/brw_vec4_live_variables.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h 
b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
index b8ab95a..296468a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
@@ -52,7 +52,7 @@ struct block_data {
 
 class vec4_live_variables {
 public:
-   DECLARE_RZALLOC_CXX_OPERATORS(vec4_live_variables)
+   DECLARE_RALLOC_CXX_OPERATORS(vec4_live_variables)
 
vec4_live_variables(vec4_visitor *v, cfg_t *cfg);
~vec4_live_variables();
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/11] ralloc: Remove the rzalloc-based new/delete operator definition macro.

2013-09-20 Thread Francisco Jerez
Using it encourages the (IMHO worrying) practice of leaving member
variables uninitialized in constructor definitions.  This macro
shouldn't be necessary anymore after the last patch series fixing all
its users to initialize all member variables from the class
constructor.  Remove it.
---
 src/glsl/ralloc.h | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/src/glsl/ralloc.h b/src/glsl/ralloc.h
index 82a3daa..22ed7f3 100644
--- a/src/glsl/ralloc.h
+++ b/src/glsl/ralloc.h
@@ -404,7 +404,17 @@ bool ralloc_vasprintf_append(char **str, const char *fmt, 
va_list args);
 } /* end of extern "C" */
 #endif
 
-#define _RALLOC_OPS(ALLOC, TYPE) \
+/**
+ * Declare C++ new and delete operators which use ralloc.
+ *
+ * Placing one of these macros in the body of a class makes it possible to do:
+ *
+ * TYPE *var = new(mem_ctx) TYPE(...);
+ * delete var;
+ *
+ * which is more idiomatic in C++ than calling ralloc or rzalloc.
+ */
+#define DECLARE_RALLOC_CXX_OPERATORS(TYPE)   \
 private: \
static void _ralloc_##TYPE##_destructor_callback(void *p) \
{ \
@@ -413,7 +423,7 @@ private:
 \
 public:  \
static void* operator new(size_t size, void *mem_ctx) \
{ \
-  void *p = ALLOC(mem_ctx, size);\
+  void *p = ralloc_size(mem_ctx, size);  \
   assert(p != NULL); \
   ralloc_set_destructor(p, _ralloc_##TYPE##_destructor_callback);\
   return p;  \
@@ -425,17 +435,5 @@ public:
  \
   ralloc_free(p);\
}
 
-/**
- * Declare C++ new and delete operators which use ralloc.
- *
- * Placing one of these macros in the body of a class makes it possible to do:
- *
- * TYPE *var = new(mem_ctx) TYPE(...);
- * delete var;
- *
- * which is more idiomatic in C++ than calling ralloc or rzalloc.
- */
-#define DECLARE_RALLOC_CXX_OPERATORS(TYPE)  _RALLOC_OPS(ralloc_size,  TYPE)
-#define DECLARE_RZALLOC_CXX_OPERATORS(TYPE) _RALLOC_OPS(rzalloc_size, TYPE)
 
 #endif
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-20 Thread Mark Mueller
Wow, it's not often we as developers get to attribute an unexplained
performance improvement to silicon. I'm happy to say I guessed wrong here.

Did this start specifically with Haswell and is it likely this will persist
in future gen hardware - more specifically, what's the proper test by the
driver for availability of this feature in hardware?

Finally, were any other silicon improvements revealed besides what Chai-I
was able to expose?

Mark


On Fri, Sep 20, 2013 at 12:51 PM, Kenneth Graunke wrote:

> On 09/20/2013 08:30 AM, Ian Romanick wrote:
> > On 09/20/2013 09:50 AM, Paul Berry wrote:
> [snip]
> >> Since the SAMPLER_MODE setting allows us to trade off quality vs
> >> performance, we're also interested to know whether a value less than
> >> 0x1f is sufficient to produce the performance improvement in Xonotic--it
> >> would be nice if we could find a "sweet spot" for this setting that
> >> produces the performance improvement we need without sacrificing too
> >> much quality.
> >
> > How about if we just give a driconf option to adjust it.  Then gamers
> > can make their own choice.  For applications where it know it makes a
> > big difference, we can provide a default non-0 value in the system
> driconf.
>
> Because you can't (yet) program registers from userspace unless you're
> root.
>
> I would like to use the same default value as Windows.  I'm fine with
> making it tunable beyond that.
>
> --Ken
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev