Re: [Mesa-dev] [PATCH] i965: correct alignment units for 2D compressed textures on Skylake

2015-06-11 Thread Neil Roberts
Hi Nanley,

Could you explain the reasoning behind this patch? I can't find any
mention of needing to align to the square of the block size in the docs.

I think how it works is that on Skylake you can pick any alignment value
you want out of 4, 8 or 16 but for compressed textures that is counted
in units of blocks rather than pixels. Currently we effectively always
pick 4 for compressed textures because I don't think there's any reason
to align to a higher value. The mt->align_w/h values are used for two
things; in intel_miptree_set_total_width_height they are used to choose
the positions for the mipmap images and after that they are only used to
upload the alignment values as part of the surface state. On Skylake
mt->align_w/h is temporarily set to be in units of pixels during
brw_miptree_layout but at the end of the function it divides the values
by the block size so that they will be in units of blocks ready for
uploading as part of the surface state.

Your patch modifies how it picks the alignment value but it doesn't
modify the bit at the end which divides the chosen alignment by the
block size. For FXT1 I think that would end up making it pick a
horizontal alignment value of ALIGN_8 (ie, the pixel alignment value is
now 8*8=64 which when divided by the block width ends up as
64/8=ALIGN_8).

Before my patches for Skylake bits of the code were lazily assuming that
mt->align_w/h is always the same as the block size because that
previously happened to be always correct. I fixed some of the cases
within the layouting code to instead directly query the block size.
However it looks like I missed one in intel_miptree_copy_slice. I think
your patch would make this particular case work because now the
alignment value is ALIGN_8 which happens to match the block width for
FXT1.

I'm about to post an alternative patch which fixes this case to use the
block size instead and it does seem to fix the broken FXT1 test cases as
well.

Regards,
- Neil

Nanley Chery  writes:

> From: Nanley Chery 
>
> On Gen9+, vertical and horizontal alignment values for compressed textures are
> equal to the pre-Gen9 value squared. Each miplevel must be aligned to this
> value.
>
> Signed-off-by: Nanley Chery 
> ---
>
> This fixes an FXT1 Piglit test regression and shows no failures on Jenkins. 
>
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index 312a887..dffc699 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -149,15 +149,8 @@ intel_horizontal_texture_alignment_unit(struct 
> brw_context *brw,
>unsigned int i, j;
>_mesa_get_format_block_size(mt->format, &i, &j);
>  
> -  /* On Gen9+ we can pick our own alignment for compressed textures but 
> it
> -   * has to be a multiple of the block size. The minimum alignment we can
> -   * pick is 4 so we effectively have to align to 4 times the block
> -   * size
> -   */
> -  if (brw->gen >= 9)
> - return i * 4;
> -  else
> - return i;
> +  /* On Gen9+ the alignment value is squared. */
> +  return brw->gen >= 9 ? i * i : i;
>  }
>  
> if (mt->format == MESA_FORMAT_S_UINT8)
> @@ -269,9 +262,12 @@ intel_vertical_texture_alignment_unit(struct brw_context 
> *brw,
>  * Where "*" means either VALIGN_2 or VALIGN_4 depending on the setting of
>  * the SURFACE_STATE "Surface Vertical Alignment" field.
>  */
> -   if (_mesa_is_format_compressed(mt->format))
> +   if (_mesa_is_format_compressed(mt->format)) {
> +  unsigned int i, j;
> +  _mesa_get_format_block_size(mt->format, &i, &j);
>/* See comment above for the horizontal alignment */
> -  return brw->gen >= 9 ? 16 : 4;
> +  return brw->gen >= 9 ? j * j : j;
> +   }
>  
> if (mt->format == MESA_FORMAT_S_UINT8)
>return brw->gen >= 7 ? 8 : 4;
> @@ -379,7 +375,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>  
> if (mt->compressed) {
>mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
> - ALIGN(minify(mt->physical_width0, 2), bw);
> + ALIGN(minify(mt->physical_width0, 2), mt->align_w);
> } else {
>mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
>   minify(mt->physical_width0, 2);
> -- 
> 2.4.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix aligning to the block size in intel_miptree_copy_slice

2015-06-11 Thread Neil Roberts
This function was trying to align the width and height to a multiple
of the block size for compressed textures. It was using align_w/h as a
shortcut to get the block size as up until Gen9 this always happens to
match. However in Gen9+ the alignment values are expressed as
multiples of the block size so in effect the alignment values are
always 4 for compressed textures as that is the minimum value we can
pick. This happened to work for most compressed formats because the
block size is also 4, but for FXT1 this was breaking because it has a
block width of 8.

This fixes some Piglit tests testing FXT1 such as

spec@3dfx_texture_compression_fxt1@fbo-generatemipmap-formats
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 615cbfb..6c00581 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1168,8 +1168,10 @@ intel_miptree_copy_slice(struct brw_context *brw,
assert(src_mt->format == dst_mt->format);
 
if (dst_mt->compressed) {
-  height = ALIGN(height, dst_mt->align_h) / dst_mt->align_h;
-  width = ALIGN(width, dst_mt->align_w);
+  unsigned int i, j;
+  _mesa_get_format_block_size(dst_mt->format, &i, &j);
+  height = ALIGN(height, j) / j;
+  width = ALIGN(width, i);
}
 
/* If it's a packed depth/stencil buffer with separate stencil, the blit
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Abort texture upload if pixels == null and no pixel unpack buffer set

2015-06-12 Thread Neil Roberts
Ah, good catch. Looks good to me.

Reviewed-by: Neil Roberts 

It seems a bit weird to use create_pbo=true at all for
glTexImage{1,2,3}D because in that case we are completely replacing the
texture. If the texture's buffer is busy instead of allocating a PBO we
might as well just directly allocate some new storage for the texture
and abandon the old storage. That would be a separate patch though and
either way I think this patch makes sense.

Anuj Phogat  writes:

> in case of glTex{Sub}Image{1,2,3}D().

I think in practice this patch only applies to glTexImage* (not
glTexSubImage*) doesn't it? It wouldn't make any sense to call
glTexSubImage without a PBO and with NULL pixels.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't create a temp PBO when uploading data from glTexImage*

2015-06-12 Thread Neil Roberts
Previously when glTexImage* is called it would attempt to create a
temporary PBO if the texture is busy in order to avoid blocking when
mapping the texture. This doesn't make much sense for glTexImage
because in that case we are completely replacing the texture anyway so
instead of allocating a PBO we can just allocate new storage for the
texture.

The code was buggy anyway because it was checking whether the buffer
was busy before calling Driver->AllocTextureImageBuffer. That function
actually always frees the buffer and recreates a new one so it was
checking whether the previous buffer was busy and this is irrelevant.

In practice I think this wouldn't matter too much because the upper
layers of Mesa always call Driver->FreeTextureImageBuffer before
calling Driver->TexImage anyway so there would never be a buffer that
could be busy.
---
 src/mesa/drivers/dri/i965/intel_tex_image.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 85d3d04..2874e5b 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -95,8 +95,6 @@ intelTexImage(struct gl_context * ctx,
struct intel_texture_image *intelImage = intel_texture_image(texImage);
bool ok;
 
-   bool tex_busy = intelImage->mt && drm_intel_bo_busy(intelImage->mt->bo);
-
DBG("%s mesa_format %s target %s format %s type %s level %d %dx%dx%d\n",
__func__, _mesa_get_format_name(texImage->TexFormat),
_mesa_lookup_enum_by_nr(texImage->TexObject->Target),
@@ -116,7 +114,8 @@ intelTexImage(struct gl_context * ctx,
texImage->Depth,
format, type, pixels,
false /*allocate_storage*/,
-   tex_busy, unpack);
+   false /*create_pbo*/,
+   unpack);
if (ok)
   return;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta: Abort texture upload if pixels == null and no pixel unpack buffer set

2015-06-12 Thread Neil Roberts
Neil Roberts  writes:

> It seems a bit weird to use create_pbo=true at all for
> glTexImage{1,2,3}D because in that case we are completely replacing
> the texture. If the texture's buffer is busy instead of allocating a
> PBO we might as well just directly allocate some new storage for the
> texture and abandon the old storage. That would be a separate patch
> though and either way I think this patch makes sense.

I've filed a patch for this:

http://patchwork.freedesktop.org/patch/51757/

I think it would be good to land your patch as well anyway.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix aligning to the block size in intel_miptree_copy_slice

2015-06-12 Thread Neil Roberts
Hi,

Could you perhaps send me a list of the regressions? I ran it through on
my Skylake and the only changes I see are the two FXT1 tests. However, I
have to admit that my device is being a bit flakey so I cheated a little
bit to make this work. Originally when I compared the differences with
and without the patch I had about 150 differences (some fixes, some
regressions) that were pretty random. I got the list of these with
"piglit summary console -d" and then reran just those tests (with the
--test-list option) after a reboot with and without the patch. I then
have a noddy Python script to merge a Piglit results file so that it
replaces test results from the first file with any that appear in the
second file. Once I do that then there are no regressions and only the
two FXT1 tests appear in the fixes.

It might be interesting to see if you can run some of the regressing
tests by hand after a reboot to see if they fail consistently.

Regards,
- Neil

Nanley Chery  writes:

> Hey Neil,
>
> While this patch does fix FXT1, it also regresses 21 other Piglit tests on 
> SKL.
>
> - Nanley
>
> On Thu, Jun 11, 2015 at 8:59 AM, Neil Roberts  wrote:
>> This function was trying to align the width and height to a multiple
>> of the block size for compressed textures. It was using align_w/h as a
>> shortcut to get the block size as up until Gen9 this always happens to
>> match. However in Gen9+ the alignment values are expressed as
>> multiples of the block size so in effect the alignment values are
>> always 4 for compressed textures as that is the minimum value we can
>> pick. This happened to work for most compressed formats because the
>> block size is also 4, but for FXT1 this was breaking because it has a
>> block width of 8.
>>
>> This fixes some Piglit tests testing FXT1 such as
>>
>> spec@3dfx_texture_compression_fxt1@fbo-generatemipmap-formats
>> ---
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
>> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> index 615cbfb..6c00581 100644
>> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> @@ -1168,8 +1168,10 @@ intel_miptree_copy_slice(struct brw_context *brw,
>> assert(src_mt->format == dst_mt->format);
>>
>> if (dst_mt->compressed) {
>> -  height = ALIGN(height, dst_mt->align_h) / dst_mt->align_h;
>> -  width = ALIGN(width, dst_mt->align_w);
>> +  unsigned int i, j;
>> +  _mesa_get_format_block_size(dst_mt->format, &i, &j);
>> +  height = ALIGN(height, j) / j;
>> +  width = ALIGN(width, i);
>> }
>>
>> /* If it's a packed depth/stencil buffer with separate stencil, the blit
>> --
>> 1.9.3
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't create a temp PBO when uploading data from glTexImage*

2015-06-12 Thread Neil Roberts
> On Fri, Jun 12, 2015 at 7:34 AM, Neil Roberts  wrote:
>> The code was buggy anyway because it was checking whether the buffer
>> was busy before calling Driver->AllocTextureImageBuffer. That function
>> actually always frees the buffer and recreates a new one so it was
>> checking whether the previous buffer was busy and this is irrelevant.

Jason Ekstrand  writes:

> I'm not sure this is correct.  You can still do partial updates with
> TexImage if you are updating a single miplevel or cube face.

Ah yes, you're right. I didn't look into the code far enough. When it
calls Driver->AllocTextureImageBuffer it will likely get the same
miptree back because there is a pointer to it from the texture object.

However, it still seems like creating a PBO would be counter-productive
most of the time because you are most probably going to replace all of
the mipmaps if you replace one of them. Maybe a good thing to do would
be to check whether intel_texobj->mt is busy in
intel_alloc_texture_image_buffer and then let it allocate a new mt in
that case. That way in the likely case that the rest of the images are
updated too then it would use a new buffer and avoid blocking without
introducing an extra copy. If it turns out that the application really
is just updating one image then it will be copied in anyway when the
texture is finalized which achieves more or less the same thing as using
a PBO.

>> In practice I think this wouldn't matter too much because the upper
>> layers of Mesa always call Driver->FreeTextureImageBuffer before
>> calling Driver->TexImage anyway so there would never be a buffer that
>> could be busy.

I think this point still holds and means it's effectively impossible to
hit the create_pbo=true case.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: correct alignment units for 2D compressed textures on Skylake

2015-06-15 Thread Neil Roberts
Nanley Chery  writes:

> Although most of the patch is incorrect, the following change is still
> necessary isn't it?
> if (mt->compressed) {
>mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
> - ALIGN(minify(mt->physical_width0, 2), bw);
> + ALIGN(minify(mt->physical_width0, 2), mt->align_w);
>
> Each LOD is aligned by the alignment term. I don't see why we're
> switching to block width here.

I'm pretty sure the original version is correct and we do want to align
to the block width here. This bit of code is not being used to work out
the position of an image so it doesn't need to take into account the
image alignment. It is only being used to calculate the mt->total_width
value. That value is not aligned to the image alignment. You can see
this because it will usually just be directly taken from
mt->physical_width0 and that is not aligned. In the non-compressed case
just below you can see that the second half of the addition is not
aligned to anything at all. The total_width does however need to be at
least a multiple of the block size because it isn't possible to allocate
space for half of a block.

If I understand correctly this code is just trying to cope with cases
where the third image in the mipmap is positioned so that it extends
past the width of the first image. If that happens then the width of the
image containing all of the mipmap images needs to be extended slightly
or it would crop the third mipmap image.

Please see the attached SVG.

Looking at it a bit more I think this bit above is wrong:

   if (mt->compressed) {
   mt->total_width = ALIGN(mt->physical_width0, mt->align_w);
   }

That should also be using bw instead of mt->align_w for the same reason.
I think it could only end up making the total_width a bit larger than
necessary so it probably isn't causing any actual problems.

I'll make a patch and test it.

Regards,
- Neil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size

2015-06-16 Thread Neil Roberts
brw_miptree_layout_2d tries to ensure that mt->total_width is a
multiple of the compressed block size, presumably because it wouldn't
be possible to make an image that has a fraction of a block. However
it was doing this by aligning mt->total_width to align_w. Previously
align_w has been used as a shortcut for getting the block width
because before Gen9 the block width was always equal to the alignment.
Commit 4ab8d59a2 tried to fix these cases to use the block width
instead of the alignment but it missed this case.

I think in practice this probably won't make any difference because
the buffer for the texture will be allocated to be large enough to
contain the entire pitch and libdrm aligns the pitch to the tile width
anyway. However I think the patch is worth having to make the
intention clearer.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 1e7d8a1..dbb6cef 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 
mt->total_width = mt->physical_width0;
 
-   if (mt->compressed) {
-   mt->total_width = ALIGN(mt->physical_width0, mt->align_w);
-   }
+   if (mt->compressed)
+   mt->total_width = ALIGN(mt->total_width, bw);
 
/* May need to adjust width to accommodate the placement of
 * the 2nd mipmap.  This occurs when the alignment
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation

2015-06-24 Thread Neil Roberts
Hi,

If we are going to have to prod all of the code using this list
implementation to solve this problem anyway maybe it would make more
sense to switch to a kernel-style list instead. There is already an
implementation of this in src/util/list.h. I think this style of list
only accesses the pointers through a single struct so there shouldn't be
any aliasing issues. It should also end up using one fewer pointer to
store the head sentinel. It would be nice to reduce the number of list
implementations used in Mesa.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size

2015-06-24 Thread Neil Roberts
Ben Widawsky  writes:

> I think this is beginning to infringe upon the definition of align_w.
> The total width is a function of it's miptree properties and not the
> compressed block properties, right?
>
> In other words, if there is a case where align_w != bw, I think
> total_width should be aligned to align_w, NOT bw.

I don't think it's so clear cut. In practice the mt->total_width doesn't
really need to be aligned to anything because as far as I can tell it is
only used to calculate the row stride. The row stride is separately
aligned to whatever constraints necessary by libdrm so it doesn't really
matter what we pick here.

The reason I think that the intention was to align it to the block width
rather than the horizontal alignment is that in the non-compressed case
the total width isn't aligned to anything at all.

It's probably not worth making too much of a fuss over this patch seeing
as it doesn't make any practical difference. I'm happy to forget about
it and pretend I never noticed the inconsistency.

Regards,
- Neil

>
> (I'm not opposed to the patch, just making sure I understand.)
>
>> ---
>>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
>> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
>> index 1e7d8a1..dbb6cef 100644
>> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
>> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
>> @@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>>  
>> mt->total_width = mt->physical_width0;
>>  
>> -   if (mt->compressed) {
>> -   mt->total_width = ALIGN(mt->physical_width0, mt->align_w);
>> -   }
>> +   if (mt->compressed)
>> +   mt->total_width = ALIGN(mt->total_width, bw);
>>  
>> /* May need to adjust width to accommodate the placement of
>>  * the 2nd mipmap.  This occurs when the alignment
>> -- 
>> 1.9.3
>> 
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Don't try to print the GLSL IR if it has been freed

2015-06-26 Thread Neil Roberts
Since commit 104c8fc2c2aa5621261f8 the GLSL IR will be freed if NIR is
being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
This patch just makes it avoid dumping the GLSL IR in that case.
---
 src/mesa/drivers/dri/i965/brw_program.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 2327af7..85e271d 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -574,10 +574,13 @@ brw_dump_ir(const char *stage, struct gl_shader_program 
*shader_prog,
 struct gl_shader *shader, struct gl_program *prog)
 {
if (shader_prog) {
-  fprintf(stderr,
-  "GLSL IR for native %s shader %d:\n", stage, shader_prog->Name);
-  _mesa_print_ir(stderr, shader->ir, NULL);
-  fprintf(stderr, "\n\n");
+  if (shader->ir) {
+ fprintf(stderr,
+ "GLSL IR for native %s shader %d:\n",
+ stage, shader_prog->Name);
+ _mesa_print_ir(stderr, shader->ir, NULL);
+ fprintf(stderr, "\n\n");
+  }
} else {
   fprintf(stderr, "ARB_%s_program %d ir for native %s shader\n",
   stage, prog->Id, stage);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Don't use the pixel interpolater for centroid interpolation

2015-06-30 Thread Neil Roberts
For centroid interpolation we can just directly use the values set up
in the shader payload instead of querying the pixel interpolator. To
do this we need to modify brw_compute_barycentric_interp_modes to
detect when interpolateAtCentroid is called.

This fixes the interpolateAtCentroid tests on SKL in Piglit but it is
probably cheating a bit because there still seems to be some
underlying problem with using the pixel interpolater and this patch
just avoids it. The other interpolateAt* tests are still failing.
---

I'm not really sure if we want this patch because although it does
make using interpolateAtCentroid more efficient I don't think we know
of anything that is actually using that so it's not really a priority
to optimise. It adds a bit of complexity to the compiler so maybe we
are better off without it? I'm mostly just posting it to gauge
opinion.

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 42 
 src/mesa/drivers/dri/i965/brw_wm.c   | 55 
 2 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 59081ea..fcde545 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1217,6 +1217,21 @@ fs_visitor::emit_percomp(const fs_builder &bld, const 
fs_inst &inst,
}
 }
 
+/* For most messages, we need one reg of ignored data; the hardware requires
+ * mlen==1 even when there is no payload. in the per-slot offset case, we'll
+ * replace this with the proper source data.
+ */
+static void
+setup_pixel_interpolater_instruction(nir_intrinsic_instr *instr,
+ fs_inst *inst,
+ int mlen = 1)
+{
+  inst->mlen = mlen;
+  inst->regs_written = 2; /* 2 floats per slot returned */
+  inst->pi_noperspective = instr->variables[0]->var->data.interpolation ==
+   INTERP_QUALIFIER_NOPERSPECTIVE;
+}
+
 void
 fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr 
*instr)
 {
@@ -1469,19 +1484,21 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
 
   fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
 
-  /* For most messages, we need one reg of ignored data; the hardware
-   * requires mlen==1 even when there is no payload. in the per-slot
-   * offset case, we'll replace this with the proper source data.
-   */
   fs_reg src = vgrf(glsl_type::float_type);
-  int mlen = 1; /* one reg unless overriden */
   fs_inst *inst;
 
   switch (instr->intrinsic) {
-  case nir_intrinsic_interp_var_at_centroid:
- inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_CENTROID,
- dst_xy, src, fs_reg(0u));
+  case nir_intrinsic_interp_var_at_centroid: {
+ enum brw_wm_barycentric_interp_mode interp_mode;
+ if (instr->variables[0]->var->data.interpolation ==
+ INTERP_QUALIFIER_NOPERSPECTIVE)
+interp_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
+ else
+interp_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
+ uint8_t reg = payload.barycentric_coord_reg[interp_mode];
+ dst_xy = fs_reg(brw_vec16_grf(reg, 0));
  break;
+  }
 
   case nir_intrinsic_interp_var_at_sample: {
  /* XXX: We should probably handle non-constant sample id's */
@@ -1490,6 +1507,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
  unsigned msg_data = const_sample ? const_sample->i[0] << 4 : 0;
  inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SAMPLE, dst_xy, src,
  fs_reg(msg_data));
+ setup_pixel_interpolater_instruction(instr, inst);
  break;
   }
 
@@ -1502,6 +1520,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
 
 inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET, dst_xy, 
src,
 fs_reg(off_x | (off_y << 4)));
+setup_pixel_interpolater_instruction(instr, inst);
  } else {
 src = vgrf(glsl_type::ivec2_type);
 fs_reg offset_src = retype(get_nir_src(instr->src[0]),
@@ -1531,9 +1550,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
bld.SEL(offset(src, i), itemp, fs_reg(7)));
 }
 
-mlen = 2;
 inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, 
src,
 fs_reg(0u));
+setup_pixel_interpolater_instruction(instr, inst, 2);
  }
  break;
   }
@@ -1542,11 +1561,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
  unreachable("Invalid intrinsic");
   }
 
-  inst->mlen = mlen;
-  inst->regs_written = 2; /* 2 floa

Re: [Mesa-dev] [PATCH] i965/fs: Don't use the pixel interpolater for centroid interpolation

2015-07-01 Thread Neil Roberts
Ben Widawsky  writes:

> I am not the right person to judge the complexity tradeoff, but it
> seems like a worthwhile patch to me. I spent a few minutes thinking
> about how it could hurt performance and was unable to come up with
> anything.

Thanks. I was thinking more that the complexity means more code to
maintain rather than that it would be a performance hit. If we have this
patch and another patch to make it not fall back to SIMD8 then using
interpolateAtCentroid ends up being exactly the same as using the
centroid interpolation qualifier on the varying except that it is more
convenient because you don't need to put it in the vertex shader as
well. However I think centroid interpolation is a pretty obscure feature
that is only useful if you're doing multisampling with an algorithm that
breaks if it is used for samples that lie outside of the polygon, so
maybe it's not really worth worrying about.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] glsl: fix some strict aliasing issues in exec_list

2015-07-01 Thread Neil Roberts
Hi,

If we wanted to avoid growing the size of exec_list to four pointers
instead of three, maybe we could store it in a union like below:

struct exec_list {
   union {
  struct {
 struct exec_node head_sentinel;
 struct exec_node *dummy_pointer_a;
  };
  struct {
 struct exec_node *dummy_pointer_b;
 struct exec_node tail_sentinel;
  };
   };
   /* ... */
};

This is using anonymous structs and unions so that you wouldn't have to
otherwise modify the patch. However we might have to avoid anonymous
structs and unions on MSVC. If that's the case then it shouldn't be too
much effort to just name them and modify all of the code that is
accessing them.

You could even put a static assert somewhere to make sure it works like
this:

  STATIC_ASSERT(offsetof(struct exec_list, tail_sentinel.next) ==
offsetof(struct exec_list, head_sentinel.prev));

It looks like this might also require removing the constructor for
exec_node because you can't have objects with constructors in a union
apparently. If the constructor is just a safety net anyway then maybe
this won't cause any trouble. I don't know.

Also just to note, your patch doesn't apply with git-am. Have you
cut-and-paste the patch file into an email? It looks like something has
done some word wrapping which has corrupted it. Eg:

diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index cfe0df3..f5efcc5 100644
--- a/src/glsl/ir_validate.cpp
+++ b/src/glsl/ir_validate.cpp
@@ -842,8 +842,9 @@ ir_validate::visit_enter(ir_call *ir)
abort();
 }

-   const exec_node *formal_param_node = callee->parameters.head;
-   const exec_node *actual_param_node = ir->actual_parameters.head;
+   const exec_node *formal_param_node =
callee->parameters.head_sentinel.next;

This isn't a valid patch because all of the lines have to begin with
either +, - or a space. If you can't use git-send-email for whatever
reason, maybe you can find a way to disable the word wrapping?

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] glsl: fix some strict aliasing issues in exec_list

2015-07-02 Thread Neil Roberts
Davin McCall  writes:

> I actually had thought about this, but technically, you can only use
> unions for type aliasing if you perform all accesses (that are not to
> the 'active' member) through the union. All the list processing code
> that iterates through all the nodes including the tail sentinel would
> *technically* still have an aliasing problem, because it doesn't go
> through the exec_list structure at all (though in practice, I don't
> think it would ever manifest in as an issue in the compiled code).

I don't think it matters that the list iterating code doesn't use
exec_list. If something modifies the list pointers through an exec_node
then the compiler will know that that potentially aliases the pointers
in an exec_list because the pointers in exec_list are also wrapped in an
exec_node. With your patch there is no type aliasing at all and the
union modification doesn't alter that.

> Why must all accesses go through the union? Consider the case:
>
>  int someMethod(float *f, int *i) {
>  *f = 4.0;// LINE A
>  int a = *i;   // LINE B
>  return a;
>  }
>
> If I had some union 'u' with { float ff; int ii; } then I could call
> the above method, even if it was in a different module, with:
>
>  someMethod(&u.ff, &u.ii).
>
> Now, that would mean that the compiler would not be allowed to
> re-order LINE A and LINE B. But as I said, someMethod might be in a
> different module, where the compiler does not know that 'i' and 'f'
> point to members of the same union. In that case it assumes that 'i'
> and 'f' don't alias. Compare that to:
>
>  int someMethod(union u *a, union u *b)
>  {
>  u->ff = 4.0;
>  int a = u->ii;
>  return a;
>  }
>
> In this version, accesses are through the union, and the compiler
> knows that they potentially alias.

I don't think this example is relevant in this case because all of the
relevant members of the union I suggested are the same type (struct
exec_node). There is no type aliasing.

Maybe a hypothetical problem with this sort of use could be if you had a
function like this:

struct exec_node *
some_method(struct exec_node *a,
struct exec_node *b)
{
   a->prev = &something;
   b->next = &something_else;
   return a->prev;
}

If you called this with the head and tail sentinels then the compiler
won't know that a->prev and b->next alias each other so it might return
&something instead of &something_else. However in practice for this use
case the only part that is aliased is a NULL pointer that is never
written to so I don't think it would actually matter.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

2015-07-02 Thread Neil Roberts
There was a comment saying that in SIMD16 mode the pixel interpolator
returns coords interleaved 8 channels at a time and that this requires
extra work to support. However, this interleaved format is exactly
what the PLN instruction requires so I don't think anything needs to
be done to support it apart from removing the line to disable it and
to ensure that the message lengths for the send message are correct.

I am more convinced that this is correct because as it says in the
comment this interleaved output is identical to what is given in the
thread payload. The code generated to apply the plane equation to
these coordinates is identical on SIMD16 and SIMD8 except that the
dispatch width is larger which implies no special unmangling is
needed.

Perhaps the confusion stems from the fact that the description of the
PLN instruction in the IVB PRM seems to imply that the src1 inputs are
not interleaved so it wouldn't work. However, in the HSW and BDW PRMs,
the pseudo-code is different and looks like it expects the interleaved
format. Mesa doesn't seem to generate different code on IVB to
uninterleave the payload registers and everything is working so I can
only assume that the PRM is wrong.

I tested the interpolateAt tests on HSW and did a full Piglit run on
IVB on there were no regressions.
---

I've CC'd Chris Forbes because according to git-annotate he wrote the
original comment so he might know something I don't.

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 59081ea..717e597 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1461,12 +1461,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
case nir_intrinsic_interp_var_at_centroid:
case nir_intrinsic_interp_var_at_sample:
case nir_intrinsic_interp_var_at_offset: {
-  /* in SIMD16 mode, the pixel interpolator returns coords interleaved
-   * 8 channels at a time, same as the barycentric coords presented in
-   * the FS payload. this requires a bit of extra work to support.
-   */
-  no16("interpolate_at_* not yet supported in SIMD16 mode.");
-
   fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
 
   /* For most messages, we need one reg of ignored data; the hardware
@@ -1531,7 +1525,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
bld.SEL(offset(src, i), itemp, fs_reg(7)));
 }
 
-mlen = 2;
+mlen = 2 * dispatch_width / 8;
 inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, 
src,
 fs_reg(0u));
  }
@@ -1543,7 +1537,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
   }
 
   inst->mlen = mlen;
-  inst->regs_written = 2; /* 2 floats per slot returned */
+  /* 2 floats per slot returned */
+  inst->regs_written = 2 * dispatch_width / 8;
   inst->pi_noperspective = instr->variables[0]->var->data.interpolation ==
INTERP_QUALIFIER_NOPERSPECTIVE;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA

2015-07-03 Thread Neil Roberts
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 1 +
 src/mesa/drivers/dri/i965/brw_defines.h   | 1 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 4 
 src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++
 4 files changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3553f6e..7596139 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -415,6 +415,7 @@ struct brw_wm_prog_data {
bool uses_pos_offset;
bool uses_omask;
bool uses_kill;
+   bool pulls_bary;
uint32_t prog_offset_16;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 66b9abc..19489ab 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode {
 # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 << 7)
 # define GEN8_PSX_SHADER_IS_PER_SAMPLE  (1 << 6)
 # define GEN8_PSX_SHADER_COMPUTES_STENCIL   (1 << 5)
+# define GEN9_PSX_SHADER_PULLS_BARY (1 << 3)
 # define GEN8_PSX_SHADER_HAS_UAV(1 << 2)
 # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK   (1 << 1)
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index bd71404..3ebc3a2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
case nir_intrinsic_interp_var_at_centroid:
case nir_intrinsic_interp_var_at_sample:
case nir_intrinsic_interp_var_at_offset: {
+  assert(stage == MESA_SHADER_FRAGMENT);
+
+  ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true;
+
   fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
 
   /* For most messages, we need one reg of ignored data; the hardware
diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
b/src/mesa/drivers/dri/i965/gen8_ps_state.c
index a88f109..d544509 100644
--- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
@@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw,
if (prog_data->uses_omask)
   dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET;
 
+   if (brw->gen >= 9 && prog_data->pulls_bary)
+  dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
+
if (_mesa_active_fragment_shader_has_atomic_ops(&brw->ctx))
   dw1 |= GEN8_PSX_SHADER_HAS_UAV;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glsl: Add missing check for whether an expression is an add operation

2015-07-04 Thread Neil Roberts
There is a piece of code that is trying to match expressions of the
form (mul (floor (add (abs x) 0.5) (sign x))). However the check for
the add expression wasn't checking whether it had the expected
operation. It looks like this was just an oversight because it doesn't
match the pattern for the rest of the code snippet. The existing line
to check whether add_expr!=NULL was added as part of a coverity fix in
3384179f.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91226
Cc: Matt Turner 
Cc: "10.6" 
---
 src/glsl/opt_algebraic.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index fa5db70..9b8a426 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -580,7 +580,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 continue;
 
  ir_expression *add_expr = floor_expr->operands[0]->as_expression();
- if (!add_expr)
+ if (!add_expr || add_expr->operation != ir_binop_add)
 continue;
 
  for (int j = 0; j < 2; j++) {
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] glsl: Make sure not to dereference NULL

2015-07-04 Thread Neil Roberts
In this bit of code point_five can be NULL if the expression is not a
constant. This fixes it to match the pattern of the rest of the chunk
of code so that it checks for NULLs.

Cc: Matt Turner 
Cc: "10.6" 
---
 src/glsl/opt_algebraic.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 9b8a426..c4b8715 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -589,7 +589,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
continue;
 
 ir_constant *point_five = add_expr->operands[1 - j]->as_constant();
-if (!point_five->is_value(0.5, 0))
+if (!point_five || !point_five->is_value(0.5, 0))
continue;
 
 if (abs_expr->operands[0]->equals(sign_expr->operands[0])) {
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation

2015-07-09 Thread Neil Roberts
For centroid interpolation we can just directly use the values set up
in the shader payload instead of querying the pixel interpolator. To
do this we need to modify brw_compute_barycentric_interp_modes to
detect when interpolateAtCentroid is called.

v2: Rebase on top of changes to set the pulls bary bit on SKL
---

As an aside, I was deliberating over whether to call the function
set_up_blah instead of setup_blah because I think the former is more
correct. The rest of Mesa seems to use setup so maybe it's more
important to be consistent than correct.

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 52 +++---
 src/mesa/drivers/dri/i965/brw_wm.c   | 55 
 2 files changed, 88 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 5d1ea21..fd7f1b8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1238,6 +1238,25 @@ fs_visitor::emit_percomp(const fs_builder &bld, const 
fs_inst &inst,
}
 }
 
+/* For most messages, we need one reg of ignored data; the hardware requires
+ * mlen==1 even when there is no payload. in the per-slot offset case, we'll
+ * replace this with the proper source data.
+ */
+static void
+setup_pixel_interpolater_instruction(fs_visitor *v,
+ nir_intrinsic_instr *instr,
+ fs_inst *inst,
+ int mlen = 1)
+{
+  inst->mlen = mlen;
+  inst->regs_written = 2 * v->dispatch_width / 8;
+  inst->pi_noperspective = instr->variables[0]->var->data.interpolation ==
+   INTERP_QUALIFIER_NOPERSPECTIVE;
+
+  assert(v->stage == MESA_SHADER_FRAGMENT);
+  ((struct brw_wm_prog_data *) v->prog_data)->pulls_bary = true;
+}
+
 void
 fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr 
*instr)
 {
@@ -1482,25 +1501,23 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
case nir_intrinsic_interp_var_at_centroid:
case nir_intrinsic_interp_var_at_sample:
case nir_intrinsic_interp_var_at_offset: {
-  assert(stage == MESA_SHADER_FRAGMENT);
-
-  ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true;
-
   fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
 
-  /* For most messages, we need one reg of ignored data; the hardware
-   * requires mlen==1 even when there is no payload. in the per-slot
-   * offset case, we'll replace this with the proper source data.
-   */
   fs_reg src = vgrf(glsl_type::float_type);
-  int mlen = 1; /* one reg unless overriden */
   fs_inst *inst;
 
   switch (instr->intrinsic) {
-  case nir_intrinsic_interp_var_at_centroid:
- inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_CENTROID,
- dst_xy, src, fs_reg(0u));
+  case nir_intrinsic_interp_var_at_centroid: {
+ enum brw_wm_barycentric_interp_mode interp_mode;
+ if (instr->variables[0]->var->data.interpolation ==
+ INTERP_QUALIFIER_NOPERSPECTIVE)
+interp_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
+ else
+interp_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
+ uint8_t reg = payload.barycentric_coord_reg[interp_mode];
+ dst_xy = fs_reg(brw_vec16_grf(reg, 0));
  break;
+  }
 
   case nir_intrinsic_interp_var_at_sample: {
  /* XXX: We should probably handle non-constant sample id's */
@@ -1509,6 +1526,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
  unsigned msg_data = const_sample ? const_sample->i[0] << 4 : 0;
  inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SAMPLE, dst_xy, src,
  fs_reg(msg_data));
+ setup_pixel_interpolater_instruction(this, instr, inst);
  break;
   }
 
@@ -1521,6 +1539,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
 
 inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET, dst_xy, 
src,
 fs_reg(off_x | (off_y << 4)));
+setup_pixel_interpolater_instruction(this, instr, inst);
  } else {
 src = vgrf(glsl_type::ivec2_type);
 fs_reg offset_src = retype(get_nir_src(instr->src[0]),
@@ -1550,9 +1569,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
bld.SEL(offset(src, bld, i), itemp, fs_reg(7)));
 }
 
-mlen = 2 * dispatch_width / 8;
 inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, 
src,
 fs_reg(0u));
+setup_pixel_interpolater_instruction(this, instr, inst,
+ 2 * dispatch_width / 8);
  }
  break;
 

[Mesa-dev] [PATCH] i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used

2015-07-10 Thread Neil Roberts
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: "10.6 10.5" 
---
 src/mesa/drivers/dri/i965/gen8_draw_upload.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c 
b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
index 1af90ec..65b7625 100644
--- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
@@ -218,13 +218,24 @@ gen8_emit_vertices(struct brw_context *brw)
}
ADVANCE_BATCH();
 
-   for (unsigned i = 0; i < brw->vb.nr_enabled; i++) {
+   for (unsigned i = 0, j = 0; i < brw->vb.nr_enabled; i++) {
   const struct brw_vertex_element *input = brw->vb.enabled[i];
   const struct brw_vertex_buffer *buffer = &brw->vb.buffers[input->buffer];
+  unsigned element_index;
+
+  /* The edge flag element is reordered to be the last one in the code
+   * above so we need to compensate for that in the element indices used
+   * below.
+   */
+  if (input == gen6_edgeflag_input)
+ element_index = brw->vb.nr_enabled - 1;
+  else
+ element_index = j++;
 
   BEGIN_BATCH(3);
   OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2));
-  OUT_BATCH(i | (buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0));
+  OUT_BATCH(element_index |
+(buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0));
   OUT_BATCH(buffer->step_rate);
   ADVANCE_BATCH();
}
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation

2015-07-13 Thread Neil Roberts
Chris Forbes  writes:

> Nitpicks aside, I don't think this is a great idea now that you've got
> the SKL PI working.

Can you explain why you don't think this is a good idea? Is it because
it is an optimisation for something that is not known to be a big use
case so carrying around the extra code just adds unnecessary maintenance
burden? I could agree with that so I'm happy to abandon the patch for
now if that's the general consensus.

> I also think it's broken -- you need to arrange to have the centroid
> barycentric coords delivered to the FS thread, which won't be
> happening if this is the *only* use of them. Masked in the tests,
> because they compare with a centroid-qualified input. [I'm assuming
> you don't always get these delivered to the FS in SKL, but no docs
> access...]

The changes to brw_compute_barycentric_interp_modes in the patch ensure
that the centroid barycentric coords are delivered whenever
interpolateAtCentroid is used in a shader. I don't think this is a
problem. At least it seems to work in a simple test without using a
separate varying with the centroid qualifier.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965/bdw: Fix setting the instancing state for the SGVS element

2015-07-13 Thread Neil Roberts
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS
instruction is sent to create a sort of element to store the generated
values. The last instruction in this chunk of code looks like it was
trying to set the instancing state for the element using the
3DSTATE_VF_INSTANCING instruction. However it was sending
brw->vb.nr_buffers instead of the element index. This instruction is
supposed to take an element index and that is how it is used further
down in the function so the previous code looks wrong. Perhaps
previously the number of buffers coincidentally matched the number of
enabled elements so the value was generally correct anyway. In a
subsequent patch I want to change a bit how it chooses the SGVS
element index so this needs to be fixed.

Cc: "10.6 10.5" 
---
 src/mesa/drivers/dri/i965/gen8_draw_upload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c 
b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
index 1af90ec..f7d9952 100644
--- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
@@ -74,7 +74,7 @@ gen8_emit_vertices(struct brw_context *brw)
 
   BEGIN_BATCH(3);
   OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2));
-  OUT_BATCH(brw->vb.nr_buffers | GEN8_VF_INSTANCING_ENABLE);
+  OUT_BATCH(vue | GEN8_VF_INSTANCING_ENABLE);
   OUT_BATCH(0);
   ADVANCE_BATCH();
} else {
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] i965: Fix problems with gl_Vertex/InstanceID and glPolygonMode

2015-07-13 Thread Neil Roberts
This series fixes these two related bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=84677
https://bugs.freedesktop.org/show_bug.cgi?id=91292

The first bug has had a fix for a while but until now I wasn't able to
figure out what to do on BDW. I had previously made a fix for the
second bug but as pointed out by the original bug reporter the two
bugs interact and the fixes would conflict. This series has a v2 of
the patch so that the two fixes work together.

I've posted a patch to make the Piglit testing more thorough here:

http://patchwork.freedesktop.org/patch/54320/

In the v1 for patch 3 Chris Forbes had a comment that it would be
better to move the edge flag element within brw->vb.* instead of while
emitting the state. I think this would be a good idea except it won't
help in this case now that it works with the SGVS element as well
because there is no input for that at all in brw->vb.*.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3 v2] i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used

2015-07-13 Thread Neil Roberts
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.

v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: "10.6 10.5" 
---
 src/mesa/drivers/dri/i965/gen8_draw_upload.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c 
b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
index 2bac5ff..1b48643 100644
--- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
@@ -246,13 +246,24 @@ gen8_emit_vertices(struct brw_context *brw)
}
ADVANCE_BATCH();
 
-   for (unsigned i = 0; i < brw->vb.nr_enabled; i++) {
+   for (unsigned i = 0, j = 0; i < brw->vb.nr_enabled; i++) {
   const struct brw_vertex_element *input = brw->vb.enabled[i];
   const struct brw_vertex_buffer *buffer = &brw->vb.buffers[input->buffer];
+  unsigned element_index;
+
+  /* The edge flag element is reordered to be the last one in the code
+   * above so we need to compensate for that in the element indices used
+   * below.
+   */
+  if (input == gen6_edgeflag_input)
+ element_index = nr_elements - 1;
+  else
+ element_index = j++;
 
   BEGIN_BATCH(3);
   OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2));
-  OUT_BATCH(i | (buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0));
+  OUT_BATCH(element_index |
+(buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0));
   OUT_BATCH(buffer->step_rate);
   ADVANCE_BATCH();
}
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3 v2] i965: Swap the order of the vertex ID and edge flag attributes

2015-07-13 Thread Neil Roberts
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.

v2: Also implement for BDW+

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: "10.6 10.5" 
---
 src/mesa/drivers/dri/i965/brw_draw_upload.c  | 30 +++
 src/mesa/drivers/dri/i965/gen8_draw_upload.c | 56 +---
 2 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index 320e40e..67a8dfd 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -787,21 +787,6 @@ static void brw_emit_vertices(struct brw_context *brw)
 ((i * 4) << BRW_VE1_DST_OFFSET_SHIFT));
}
 
-   if (brw->gen >= 6 && gen6_edgeflag_input) {
-  uint32_t format =
- brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray);
-
-  OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) |
-GEN6_VE0_VALID |
-GEN6_VE0_EDGE_FLAG_ENABLE |
-(format << BRW_VE0_FORMAT_SHIFT) |
-(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT));
-  OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) |
-(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) |
-(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) |
-(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT));
-   }
-
if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) 
{
   uint32_t dw0 = 0, dw1 = 0;
   uint32_t comp0 = BRW_VE1_COMPONENT_STORE_0;
@@ -842,6 +827,21 @@ static void brw_emit_vertices(struct brw_context *brw)
   OUT_BATCH(dw1);
}
 
+   if (brw->gen >= 6 && gen6_edgeflag_input) {
+  uint32_t format =
+ brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray);
+
+  OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) |
+GEN6_VE0_VALID |
+GEN6_VE0_EDGE_FLAG_ENABLE |
+(format << BRW_VE0_FORMAT_SHIFT) |
+(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT));
+  OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) |
+(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) |
+(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) |
+(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT));
+   }
+
ADVANCE_BATCH();
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c 
b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
index f7d9952..2bac5ff 100644
--- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
@@ -40,16 +40,25 @@ gen8_emit_vertices(struct brw_context *brw)
 {
struct gl_context *ctx = &brw->ctx;
uint32_t mocs_wb = brw->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB;
+   bool uses_edge_flag;
 
brw_prepare_vertices(brw);
brw_prepare_shader_draw_parameters(brw);
 
+   uses_edge_flag = (ctx->Polygon.FrontMode != GL_FILL ||
+ ctx->Polygon.BackMode != GL_FILL);
+
if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) 
{
   unsigned vue = brw->vb.nr_enabled;
 
-  WARN_ONCE(brw->vs.prog_data->inputs_read & VERT_BIT_EDGEFLAG,
-"Using VID/IID with edgeflags, need to reorder the "
-"vertex attributes");
+  /* The element for the edge flags must always be last, so we have to
+   * insert the SGVS before it in that case.
+   */
+  if (uses_edge_flag) {
+ assert(vue > 0);
+ vue--;
+  }
+
   WARN_ONCE(vue >= 33,
 "Trying to insert VID/IID past 33rd vertex element, "
 "need to reorder the vertex attrbutes.");
@@ -138,7 +147,18 @@ gen8_emit_vertices(struct brw_context *brw)
   ADVANCE_BATCH();
}
 
-   unsigned nr_elements = brw->vb.nr_enabled + 
brw->vs.prog_data->uses_vertexid;
+   /* Normally we don't need an element for the SGVS attribute because the
+* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an
+* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if the
+* vertex ID is used then it needs an element for the base vertex buffer.
+* Additionally if there is an edge flag element then the SGVS can't be
+* inserted past that so we need a dummy element to ensure that the edge
+* flag is the last one.
+*/
+   bool needs_sgvs_element = (brw->vs.prog_data->uses_vertexid ||
+  (brw->vs.prog_data->use

[Mesa-dev] [PATCH] i965/fs: Handle non-const sample number in interpolateAtSample

2015-07-21 Thread Neil Roberts
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.

Note that unlike sampler array indexing I can't find any mention in
the GLSL spec about the sample ID having to be dynamically uniform.
That wouldn't be handled by this patch and I'm not sure how that could
be implemented unless we do something like sending a separate message
for each fragment (ie, 16 messages in SIMD16).

---
I've added a Piglit test for this here:

http://patchwork.freedesktop.org/patch/55028/

 src/mesa/drivers/dri/i965/brw_eu.h |  2 +-
 src/mesa/drivers/dri/i965/brw_eu_emit.c| 34 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  5 ++--
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 23 ++---
 4 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 761aa0e..0ac1ad9 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -461,7 +461,7 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
  struct brw_reg mrf,
  bool noperspective,
  unsigned mode,
- unsigned data,
+ struct brw_reg data,
  unsigned msg_length,
  unsigned response_length);
 
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 4d39762..25524d4 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3192,26 +3192,38 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
  struct brw_reg mrf,
  bool noperspective,
  unsigned mode,
- unsigned data,
+ struct brw_reg data,
  unsigned msg_length,
  unsigned response_length)
 {
const struct brw_device_info *devinfo = p->devinfo;
-   struct brw_inst *insn = next_insn(p, BRW_OPCODE_SEND);
+   struct brw_inst *insn;
+   uint16_t exec_size;
 
-   brw_set_dest(p, insn, dest);
-   brw_set_src0(p, insn, mrf);
-   brw_set_message_descriptor(p, insn, GEN7_SFID_PIXEL_INTERPOLATOR,
-  msg_length, response_length,
-  false /* header is never present for PI */,
-  false);
+   if (data.file == BRW_IMMEDIATE_VALUE) {
+  insn = next_insn(p, BRW_OPCODE_SEND);
+  brw_set_dest(p, insn, dest);
+  brw_set_src0(p, insn, mrf);
+  brw_set_message_descriptor(p, insn, GEN7_SFID_PIXEL_INTERPOLATOR,
+ msg_length, response_length,
+ false /* header is never present for PI */,
+ false);
+  brw_inst_set_pi_message_data(devinfo, insn, data.dw1.ud);
+   } else {
+  insn = brw_send_indirect_message(p,
+   GEN7_SFID_PIXEL_INTERPOLATOR,
+   dest,
+   mrf,
+   vec1(data));
+  brw_inst_set_mlen(devinfo, insn, msg_length);
+  brw_inst_set_rlen(devinfo, insn, response_length);
+   }
 
-   brw_inst_set_pi_simd_mode(
- devinfo, insn, brw_inst_exec_size(devinfo, insn) == BRW_EXECUTE_16);
+   exec_size = brw_inst_exec_size(devinfo, p->current);
+   brw_inst_set_pi_simd_mode(devinfo, insn, exec_size == BRW_EXECUTE_16);
brw_inst_set_pi_slot_group(devinfo, insn, 0); /* zero unless 32/64px 
dispatch */
brw_inst_set_pi_nopersp(devinfo, insn, noperspective);
brw_inst_set_pi_message_type(devinfo, insn, mode);
-   brw_inst_set_pi_message_data(devinfo, insn, data);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index bae7216..3c92341 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1324,15 +1324,14 @@ fs_generator::generate_pixel_interpolator_query(fs_inst 
*inst,
 struct brw_reg msg_data,
 unsigned msg_type)
 {
-   assert(msg_data.file == BRW_IMMEDIATE_VALUE &&
-  msg_data.type == BRW_REGISTER_TYPE_UD);
+   assert(msg_data.type == BRW_REGISTER_TYPE_UD);
 
brw_pixel_interpolator_query(p,
  retype(dst, BRW_REGISTER_TYPE_UW),
  src,
  inst->pi_noperspective,
  msg_type,
- msg_data.dw1.ud,
+ msg_data,
  inst->mlen,
  inst->r

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Neil Roberts
Ben Widawsky  writes:

> The impetus for this patch comes from a seemingly benign statement within the
> spec (quoted within the patch). For me, this patch was at some point critical
> for getting stable piglit results (though this did not seem to be the case on 
> a
> branch Chad was working on).
>
> It is very important for clearing multiple color buffer attachments and can be
> observed in the following piglit tests:
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
>
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> +
>  1 file changed, 84 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 7bf52f0..9e6711e 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
>  }
>  
> +/**
> + * Individually fast clear each color buffer attachment. On previous gens 
> this
> + * isn't required. The motivation for this comes from one line (which seems 
> to
> + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> + * Render Target(s)_
> + *
> + *   "Since only one RT is bound with a clear pass, only one RT can be 
> cleared
> + *   at a time. To clear multiple RTs, multiple clear passes are required."

This sentence also appears in the HSW PRM so it seems a bit odd if it's
only causing problems on SKL. I guess if we get Piglit regressions
without it then it makes sense to have the patch. It might be worth just
double checking whether this patch is completely necessary. The wording
in the commit message seems a little unsure.

> + *
> + * The code follows the same idea as the resolve code which creates a fake 
> FBO
> + * to avoid interfering with too much of the GL state.
> + */

What state are you trying to avoid interfering with? Maybe the resolve
code only creates an FBO because it needs to work on an abritrary
intel_mipmap_tree which might not already have an FBO associated with
it. It seems like it should be enough just to call
_mesa_meta_drawbuffers_from_bitfield(1 << index) instead of creating an
FBO. If you did that you could also make the loop a bit nicer by just
looping over the bits set in fast_clear_buffers with ffs. The
glDrawBuffers state is already saved as part of the meta save state and
the existing code for gen<9 also modifies it.

> +static void
> +fast_clear_attachments(struct brw_context *brw,
> +   struct gl_framebuffer *fb,
> +   uint32_t fast_clear_buffers,
> +   struct rect fast_clear_rect)
> +{
> +   assert(brw->gen >= 9);
> +   struct gl_context *ctx = &brw->ctx;
> +
> +   GLuint old_fb = ctx->DrawBuffer->Name;
> +
> +   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
> +  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
> +  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> +  GLuint fbo, rbo;
> +  int index = fb->_ColorDrawBufferIndexes[buf];
> +
> +  if (!((1 << index) & fast_clear_buffers))
> + continue;
> +
> +  _mesa_GenFramebuffers(1, &fbo);
> +  rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false);
> +
> +  _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
> +  _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
> +GL_COLOR_ATTACHMENT0,
> +GL_RENDERBUFFER, rbo);
> +  _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
> +
> +  brw_fast_clear_init(brw);

Is is really necessary to call this for every buffer? It looks like it's
called once before fast_clear_attachments is called so maybe it doesn't
need to be in this function at all?

> +
> +  use_rectlist(brw, true);

Same for this.

> +
> +  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +
> +  /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> +   * bits to let us select the type of resolve.  For fast clear 
> resolves, it
> +   * turns out we can use the same value as pre-SKL though.
> +   */
> +  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> +  brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
> +  set_fast_clear_op(brw, 0);
> +  use_rectlist(brw, false);

And this.

> +
> +  _mesa_DeleteRenderbuffers(1, &rbo);
> +  _mesa_DeleteFramebuffers(1, &fbo);
> +
> +  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
> +   * resolve them eventually.
> +   */
> +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> +   }
> +
> +   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb);
> +}
> +
>  bool
>  b

Re: [Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-14 Thread Neil Roberts
This patch doesn't look right. See this sentence in “Render Target Fast
Clear”:

“The pixel shader kernel requires no attributes, and delivers a value of
 0x in all channels of the render target write message”

Presumably the fast_clear_color is trying to implement this restriction.

Regards,
- Neil

Ben Widawsky  writes:

> It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's
> meant to do).
>
> Cc: Kristian Høgsberg 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 41afc9a..9c51ffb 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
> }
>  }
>  
> -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> -
>  static void
>  set_fast_clear_op(struct brw_context *brw, uint32_t op)
>  {
> @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
>  
>use_rectlist(brw, true);
>  
> -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>  
>/* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> * bits to let us select the type of resolve.  For fast clear 
> resolves, it
> @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
> } else if (fast_clear_buffers) {
>_mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
> -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
>brw_draw_rectlist(ctx, &fast_clear_rect, layers);
>set_fast_clear_op(brw, 0);
> @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
>  
> use_rectlist(brw, true);
>  
> -   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
>  
> /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
>  * bits to let us select the type of resolve.  For fast clear resolves, it
> -- 
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-14 Thread Neil Roberts
It would be nice if you could give some indication of where this list of
formats came from.

Unless we expect the list to change with future generations, maybe it
would be better to make it a static const table? It's a shame to grow
the context size unnecessarily.

Regards,
- Neil

Ben Widawsky  writes:

> Initially I had this planned as a patch to be squashed in to the enabling 
> patch
> because there is no point enabling fast clears without this. However, Chad
> merged a patch which disables fast clears on gen9 explicitly, and so I can 
> hide
> this behind the revert of that patch. This is a nice I really wanted this 
> patch
> as a distinct patch for review. This is a new, weird, and poorly documented
> restriction for SKL. (In fact, I am still not 100% certain the restriction is
> entirely necessary, but there are around 30 piglit regressions without this).
>
> SKL adds compressible render targets and as a result mutates some of the
> programming for fast clears and resolves. There is a new internal surface type
> called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface 
> is
> a CCS (Color Control Surface) with compression disabled or an MCS with
> compression enabled, depending on number of multisamples. MCS (Multisample
> Control Surface) is a special type of CCS."
>
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h |  1 +
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 
> +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
>  4 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index e59478a..32b8250 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1546,6 +1546,7 @@ struct brw_context
>  
> uint32_t render_target_format[MESA_FORMAT_COUNT];
> bool format_supported_as_render_target[MESA_FORMAT_COUNT];
> +   bool losslessly_compressable[MESA_FORMAT_COUNT];
>  
> /* Interpolation modes, one byte per vue slot.
>  * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 97fff60..d706ecc 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
>}
> }
>  
> +   if (brw->gen >= 9) {
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
> +   }
> +
> /* We will check this table for FBO completeness, but the surface format
>  * table above only covered color rendering.
>  */
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 995b4dd..b19b492 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
> * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, 
> HALIGN
> *  16 must be used."
> */
> -  if (brw->gen >= 9 || mt->num_samples =

Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Neil Roberts
Looks good, it'll be great to get this landed. Patches 1-3 and 6-8 are:

Reviewed-by: Neil Roberts 

I've sent comments separately for 4, 5 and 9. Hopefully I can try to
help with patch 10 once my SKL machine arrives.

Regards,
- Neil

Ben Widawsky  writes:

> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
>
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
>
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
>
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
>
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.
>
> Ben Widawsky (10):
>   i965/gen8+: Remove redundant zeroing of surface state
>   i965/gen8+: Extract color clear surface state
>   i965/skl: Enable fast color clears on SKL
>   i965/skl: skip fast clears for certain surface formats
>   i965/meta/gen9: Individually fast clear color attachments
>   Revert "i965/gen9: Disable MCS for 1x color surfaces"
>   Revert "i965/gen9: Enable rep clears on gen9"
>   i965/meta: Assert fast clears and rep clears never overlap
>   i965/meta: Remove fast_clear_color variable
>   i965/gen9: Support fast clears for 32b float
>
>  src/mesa/drivers/dri/i965/brw_context.h |   1 +
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 
> ++--
>  src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
>  6 files changed, 205 insertions(+), 70 deletions(-)
>
> -- 
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader

2015-10-15 Thread Neil Roberts
Ping, could you please push this patch? It's a pain to use the optimise
debug output without it. Thanks.

Reviewed-by: Neil Roberts 

- Neil

Jason Ekstrand  writes:

> This has the better name to use. Aparently, sh->Name is usually 0.
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 6e1dd84..3284bdc 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
>if (sh->Program->SamplersUsed & (1 << i))
>   num_textures = i;
>  
> -   shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name);
> +   shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name);
> if (shader_prog->Label)
>shader->info.label = ralloc_strdup(shader, shader_prog->Label);
> shader->info.num_textures = num_textures;
> -- 
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Run all of the optimisations after lower_load_payload

2015-10-15 Thread Neil Roberts
Instead of just running a couple of the possible optimisations in one
single iteration, it now runs the whole loop again after lowering the
load payloads. According to shader-db this gives:

total instructions in shared programs: 6493365 -> 6493289 (-0.00%)
instructions in affected programs: 1696 -> 1620 (-4.48%)
total loops in shared programs:2237 -> 2237 (0.00%)
helped:20
HURT:  0

Most of the shaders just benefit from running the register coalesce
pass multiple times. However the following two additionally benefit
from an extra pass of opt_saturate_propagation which causes a bunch of
further optimisations:

steam-metro-last-light-1719.shader_test
steam-metro-last-light-836.shader_test

The optimisations that get run after lowering the load payloads are:

04-14-register_coalesce
04-17-compact_virtual_grfs
05-12-opt_saturate_propagation
06-02-opt_algebraic
06-04-opt_copy_propagate
06-07-dead_code_eliminate
06-12-opt_saturate_propagation
06-14-register_coalesce
06-17-compact_virtual_grfs
07-18-opt_combine_constants

In the first shader this drops four instructions.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 94 +++-
 src/mesa/drivers/dri/i965/brw_fs.h   |  3 ++
 2 files changed, 53 insertions(+), 44 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 01a7c99..8d1db23 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4754,33 +4754,6 @@ fs_visitor::calculate_register_pressure()
}
 }
 
-void
-fs_visitor::optimize()
-{
-   /* Start by validating the shader we currently have. */
-   validate();
-
-   /* bld is the common builder object pointing at the end of the program we
-* used to translate it into i965 IR.  For the optimization and lowering
-* passes coming next, any code added after the end of the program without
-* having explicitly called fs_builder::at() clearly points at a mistake.
-* Ideally optimization passes wouldn't be part of the visitor so they
-* wouldn't have access to bld at all, but they do, so just in case some
-* pass forgets to ask for a location explicitly set it to NULL here to
-* make it trip.  The dispatch width is initialized to a bogus value to
-* make sure that optimizations set the execution controls explicitly to
-* match the code they are manipulating instead of relying on the defaults.
-*/
-   bld = fs_builder(this, 64);
-
-   assign_constant_locations();
-   demote_pull_constants();
-
-   validate();
-
-   split_virtual_grfs();
-   validate();
-
 #define OPT(pass, args...) ({   \
   pass_num++;   \
   bool this_progress = pass(args);  \
@@ -4799,20 +4772,10 @@ fs_visitor::optimize()
   this_progress;\
})
 
-   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
-  char filename[64];
-  snprintf(filename, 64, "%s%d-%s-00-start",
-   stage_abbrev, dispatch_width, nir->info.name);
-
-  backend_shader::dump_instructions(filename);
-   }
-
-   bool progress = false;
-   int iteration = 0;
-   int pass_num = 0;
-
-   OPT(lower_simd_width);
-   OPT(lower_logical_sends);
+void
+fs_visitor::optimize_loop(int &iteration, int &pass_num)
+{
+   bool progress;
 
do {
   progress = false;
@@ -4839,6 +4802,51 @@ fs_visitor::optimize()
 
   OPT(compact_virtual_grfs);
} while (progress);
+}
+
+void
+fs_visitor::optimize()
+{
+   /* Start by validating the shader we currently have. */
+   validate();
+
+   /* bld is the common builder object pointing at the end of the program we
+* used to translate it into i965 IR.  For the optimization and lowering
+* passes coming next, any code added after the end of the program without
+* having explicitly called fs_builder::at() clearly points at a mistake.
+* Ideally optimization passes wouldn't be part of the visitor so they
+* wouldn't have access to bld at all, but they do, so just in case some
+* pass forgets to ask for a location explicitly set it to NULL here to
+* make it trip.  The dispatch width is initialized to a bogus value to
+* make sure that optimizations set the execution controls explicitly to
+* match the code they are manipulating instead of relying on the defaults.
+*/
+   bld = fs_builder(this, 64);
+
+   assign_constant_locations();
+   demote_pull_constants();
+
+   validate();
+
+   split_virtual_grfs();
+   validate();
+
+   if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) {
+  char filename[64];
+  snprintf(filename, 64, "%s%d-%s-00-start",
+   stage_abbrev, dispatch_width, nir->info.name);
+
+  backend_shader::dump_instructions(filename);
+   }
+
+   bool progress = false;
+   int iteration = 0

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-19 Thread Neil Roberts
Just a thought, would it be better to move this check into the
eliminate_find_live_channel optimisation? That way it could catch
sources that become immediates through later optimisations. One problem
with this that I've seen before is that eliminating the
FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
eliminated because the copy propagation doesn't work. I made this patch
a while ago but I never posted it anywhere because it's a of a kludge
and it would probably be better to fix the copy propagation:

https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa

Either way though I don't think it would do any harm to have Kristian's
patch as well even if we did improve elimintate_find_live_channel so it
is:

Reviewed-by: Neil Roberts 

- Neil

Kristian Høgsberg Kristensen  writes:

> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up passing immediate surface indices through
> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
> constant propagate out of, and further, we don't constant propagate into
> the typed/untype read/write opcodes at all.  The end result is that all
> typed/untyped read/write/atomics end up as indirect sends.
>
> Code generation should always produce either an immediate or an actual
> indirect surface index, so we can fix this by just special casing
> immediates in emit_uniformize.
>
> Signed-off-by: Kristian Høgsberg Kristensen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index df10a9d..98ce71e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -394,6 +394,9 @@ namespace brw {
>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>   const dst_reg dst = component(vgrf(src.type), 0);
>  
> + if (src.file == IMM)
> +return src;
> +
>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>  
> -- 
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Remove block arg from foreach_inst_in_block_*_starting_from

2015-10-20 Thread Neil Roberts
Since 49374fab5d793 these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.
---
 src/mesa/drivers/dri/i965/brw_cfg.h   | 4 ++--
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 6 +++---
 src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +--
 src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp| 2 +-
 5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index a094917..a06b0aa 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -327,12 +327,12 @@ struct cfg_t {
 #define foreach_inst_in_block_reverse_safe(__type, __inst, __block) \
foreach_in_list_reverse_safe(__type, __inst, &(__block)->instructions)
 
-#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst, 
__block) \
+#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst) \
for (__type *__scan_inst = (__type *)__inst->next;  \
 !__scan_inst->is_tail_sentinel();  \
 __scan_inst = (__type *)__scan_inst->next)
 
-#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, 
__inst, __block) \
+#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, 
__inst) \
for (__type *__scan_inst = (__type *)__inst->prev;  \
 !__scan_inst->is_head_sentinel();  \
 __scan_inst = (__type *)__scan_inst->prev)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..97d7fd7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2456,7 +2456,7 @@ fs_visitor::compute_to_mrf()
   /* Found a move of a GRF to a MRF.  Let's see if we can go
* rewrite the thing that made this GRF to write into the MRF.
*/
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
 if (scan_inst->dst.file == GRF &&
 scan_inst->dst.reg == inst->src[0].reg) {
/* Found the last thing to write our reg we want to turn
@@ -2789,7 +2789,7 @@ 
fs_visitor::insert_gen4_pre_send_dependency_workarounds(bblock_t *block,
 * we assume that there are no outstanding dependencies on entry to the
 * program.
 */
-   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
   /* If we hit control flow, assume that there *are* outstanding
* dependencies, and force their cleanup before our instruction.
*/
@@ -2855,7 +2855,7 @@ 
fs_visitor::insert_gen4_post_send_dependency_workarounds(bblock_t *block, fs_ins
/* Walk forwards looking for writes to registers we're writing which aren't
 * read before being written.
 */
-   foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst, block) {
+   foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst) {
   /* If we hit control flow, force resolve all remaining dependencies. */
   if (block->end() == scan_inst) {
  for (int i = 0; i < write_len; i++) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
index 469f2ea..883e8d2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
@@ -87,8 +87,7 @@ opt_cmod_propagation_local(bblock_t *block)
  continue;
 
   bool read_flag = false;
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
-  block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
  if (scan_inst->overwrites_reg(inst->src[0])) {
 if (scan_inst->is_partial_write() ||
 scan_inst->dst.reg_offset != inst->src[0].reg_offset)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
index 8792a8c..862e324 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
@@ -64,7 +64,7 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
   int src_end_ip = v->live_intervals->end[src_var];
 
   bool interfered = false;
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
  if (scan_inst->overwrites_reg(inst->src[0])) {
 if (scan_inst->is_partial_write

[Mesa-dev] [PATCH 2/2] i965/fs: Improve search for the argument source in opt_zero_samples

2015-10-20 Thread Neil Roberts
The opt_zero_samples instruction tries to find the corresponding
load_payload instruction for each sample instruction. However it was
previously only looking at the previous instruction. This patch makes
it search back within the block to whatever was the last instruction
to write to each individual argument to the send message. There are
two reasons two do this:

On Gen<=6 load_payload isn't used and there is a separate message
register file. This version of the optimisation also finds MOVs into
the MRF registers so it now also works on SNB. Unfortunately this
doesn't show up in a shader-db report because the dead code eliminator
doesn't do anything for instructions writing to MRF registers so it
can't remove the redundant MOVs. However if I hack Mesa to report the
message lengths instead of the instruction counts then it shows this:

total mlen in shared programs: 2600373 -> 2574663 (-0.99%)
mlen in affected programs: 237077 -> 211367 (-10.84%)
helped:3508
HURT:  0

I haven't tested whether reducing the message length without
decreasing the instruction count is actually a performance benefit but
it's hard to imagine that it could possibly be a disadvantage. It also
paves the way to reduce the instruction count later if someone
improves the dead code eliminator.

Secondly it could help on other gens because sometimes the
load_payload instruction can become separated from the corresponding
send instruction and the old version wouldn't work in those cases.
Currently this doesn't seem to make any difference in practice because
the register coalescer is run after this optimisation. However it
seems like this version is more robust.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 54 
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 97d7fd7..f87a5a7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2150,6 +2150,41 @@ fs_visitor::opt_algebraic()
return progress;
 }
 
+static bool
+last_texture_source_is_zero(const fs_inst *send_inst)
+{
+   int reg_offset = send_inst->mlen - send_inst->exec_size / 8;
+   fs_reg src;
+
+   /* Get the last argument of the texture instruction */
+   if (send_inst->is_send_from_grf())
+  src = byte_offset(send_inst->src[0], reg_offset * 32);
+   else
+  src = fs_reg(MRF, send_inst->base_mrf + reg_offset);
+
+   /* Look for the last instruction that writes to the source */
+   foreach_inst_in_block_reverse_starting_from(const fs_inst, inst, send_inst) 
{
+  if (inst->overwrites_reg(src)) {
+ if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
+const int src_num = ((send_inst->mlen - send_inst->header_size) /
+ (inst->exec_size / 8) +
+ inst->header_size - 1);
+return inst->src[src_num].is_zero();
+ } else if (inst->opcode == BRW_OPCODE_MOV) {
+if (inst->is_partial_write() || !inst->dst.equals(src))
+   return false;
+
+return inst->src[0].is_zero();
+ }
+
+ /* Something unknown is writing to the src */
+ break;
+  }
+   }
+
+   return false;
+}
+
 /**
  * Optimize sample messages that have constant zero values for the trailing
  * texture coordinates. We can just reduce the message length for these
@@ -2173,12 +2208,6 @@ fs_visitor::opt_zero_samples()
   if (!inst->is_tex())
  continue;
 
-  fs_inst *load_payload = (fs_inst *) inst->prev;
-
-  if (load_payload->is_head_sentinel() ||
-  load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
- continue;
-
   /* We don't want to remove the message header or the first parameter.
* Removing the first parameter is not allowed, see the Haswell PRM
* volume 7, page 149:
@@ -2186,12 +2215,13 @@ fs_visitor::opt_zero_samples()
* "Parameter 0 is required except for the sampleinfo message, which
*  has no parameter 0"
*/
-  while (inst->mlen > inst->header_size + inst->exec_size / 8 &&
- load_payload->src[(inst->mlen - inst->header_size) /
-   (inst->exec_size / 8) +
-   inst->header_size - 1].is_zero()) {
- inst->mlen -= inst->exec_size / 8;
- progress = true;
+  while (inst->mlen > inst->header_size + inst->exec_size / 8) {
+ if (last_texture_source_is_zero(inst)) {
+inst->mlen -= inst->exec_size / 8;
+progress = true;
+ } else {
+break;
+ }
   }
}
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Neil Roberts
In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.
---

That paragraph in the spec also mentions further restrictions that we
should probably worry about like that the shader shouldn't combine
this optimisation with any other render target data port read/writes.

It also has a fairly pessimistic note saying the optimisation is only
really good for large polygons in a GUI-like workload. I wonder
whether we should be doing some more benchmarking to decide whether
it's really a good idea to enable this as a general optimisation even
for games.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..bf9ff84 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
   return false;
 
-   /* This optimisation doesn't seem to work for textureGather for some
-* reason. I can't find any documentation or known workarounds to indicate
-* that this is expected, but considering that it is probably pretty
-* unlikely that a shader would directly write out the results from
-* textureGather we might as well just disable it.
+   /* 3D Sampler » Messages » Message Format
+*
+* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
+*  messages except sample+killpix, resinfo, sampleinfo, LOD, and gather4*”
 */
-   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
+   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
+   tex_inst->opcode == SHADER_OPCODE_LOD ||
+   tex_inst->opcode == SHADER_OPCODE_TG4 ||
tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
   return false;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Neil Roberts
Ilia Mirkin  writes:

>> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
>> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
>> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
>
> Do you also need to include SHADER_OPCODE_SAMPLEINFO?

Oops, yes, thanks! I've gone ahead and pushed it with this additional
change.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

2015-10-21 Thread Neil Roberts
Bump. Does anybody have some time to review this patch? I think it's the
only one holding up landing 16x MSAA support.

The following three only have an ack-by but I'm hoping it is reasonable
to just push the branch with that.

i965/meta: Support 16x MSAA in the meta stencil blit
http://patchwork.freedesktop.org/patch/59790/
i965/fs: Add a sampler program key for whether the texture is 16x MSAA
http://patchwork.freedesktop.org/patch/59791/
i965/vec4/skl+: Use ld2dms_w instead of ld2dms
http://patchwork.freedesktop.org/patch/59788/

I've rebased a branch with all of these patches here:

https://github.com/bpeel/mesa/tree/wip/16x-msaa

Thanks.

- Neil

Neil Roberts  writes:

> Previously there was a problem in i965 where if 16x MSAA is used then
> some of the sample positions are exactly on the 0 x or y axis. When
> the MSAA copy blit shader interpolates the texture coordinates at
> these sample positions it was possible that it would jump to a
> neighboring texel due to rounding errors. It is likely that these
> positions would be used on 16x MSAA because that is where they are
> defined to be in D3D.
>
> To fix that this patch makes it use interpolateAtOffset in the blit
> shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
> is available. This forces it to interpolate the texture coordinates at
> the pixel center to avoid these problematic positions.
>
> This fixes ext_framebuffer_multisample-unaligned-blit and
> ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
> SKL+.
> ---
>  src/mesa/drivers/common/meta_blit.c | 64 
> ++---
>  1 file changed, 52 insertions(+), 12 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index 1f9515a..e812ecb 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -352,17 +352,27 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
> shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY ||
> shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) {
>char *sample_index;
> -  const char *arb_sample_shading_extension_string;
> +  const char *extra_extensions;
> +  const char *tex_coords = "texCoords";
>  
>if (dst_is_msaa) {
> - arb_sample_shading_extension_string = "#extension 
> GL_ARB_sample_shading : enable";
>   sample_index = "gl_SampleID";
>   name = "depth MSAA copy";
> +
> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
> +extra_extensions =
> +   "#extension GL_ARB_sample_shading : enable\n"
> +   "#extension GL_ARB_gpu_shader5 : enable\n";
> +/* See comment below for the color copy */
> +tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))";
> + } else {
> +extra_extensions = "#extension GL_ARB_sample_shading : enable\n";
> + }
>} else {
> - /* Don't need that extension, since we're drawing to a 
> single-sampled
> -  * destination.
> + /* Don't need any extra extensions, since we're drawing to a
> +  * single-sampled destination.
>*/
> - arb_sample_shading_extension_string = "";
> + extra_extensions = "";
>   /* From the GL 4.3 spec:
>*
>* "If there is a multisample buffer (the value of 
> SAMPLE_BUFFERS
> @@ -399,27 +409,57 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>"\n"
>"void main()\n"
>"{\n"
> -  "   gl_FragDepth = texelFetch(texSampler, 
> i%s(texCoords), %s).r;\n"
> +  "   gl_FragDepth = texelFetch(texSampler, 
> i%s(%s), %s).r;\n"
>"}\n",
> -  arb_sample_shading_extension_string,
> +  extra_extensions,
>sampler_array_suffix,
>texcoord_type,
>texcoord_type,
> +  tex_coords,
>sample_index);
> } else {
>/* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning 
> 1
> * sample).  Yes, this is ridiculous.
> */
>char *sample_resolve;
> -  c

Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

2015-10-21 Thread Neil Roberts
Ian Romanick  writes:

>> To fix that this patch makes it use interpolateAtOffset in the blit
>> shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
>> is available. This forces it to interpolate the texture coordinates at
>> the pixel center to avoid these problematic positions.
>
> Would it also work to use "centroid in" interpolation qualifier?  Do we
> have any data about the relative cost of the three interpolation
> methods?

I don't think centroid interpolation does anything for per-sample
shading. Centroid interpolation is just meant to ensure that the
interpolated values are within the polygon (it's confusingly named and
has nothing to do with the center). For per-sample shading the sample
position will always be within the polygon so it will just use that for
the interpolation and we would be stuck with the same problem that some
of these positions are on the pixel boundary.

>> +
>> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
>> +extra_extensions =
>> +   "#extension GL_ARB_sample_shading : enable\n"
>> +   "#extension GL_ARB_gpu_shader5 : enable\n";
>
> You can unconditionally add the enables. If the implementation doesn't
> support the extension, enable will still succeed while require will
> not.

Ok, yes that is probably worth doing. The GL_ARB_sample_shading one was
already conditionally added before my patch, so maybe I can make a
second patch that first stops it from doing that.

Thanks for looking at the patch.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] meta/blit: Always try to enable GL_ARB_sample_shading

2015-10-22 Thread Neil Roberts
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.
---
 src/mesa/drivers/common/meta_blit.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index b92c2e2..496ce45 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -357,17 +357,11 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY ||
shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) {
   char *sample_index;
-  const char *arb_sample_shading_extension_string;
 
   if (dst_is_msaa) {
- arb_sample_shading_extension_string = "#extension 
GL_ARB_sample_shading : enable";
  sample_index = "gl_SampleID";
  name = "depth MSAA copy";
   } else {
- /* Don't need that extension, since we're drawing to a single-sampled
-  * destination.
-  */
- arb_sample_shading_extension_string = "";
  /* From the GL 4.3 spec:
   *
   * "If there is a multisample buffer (the value of SAMPLE_BUFFERS
@@ -397,7 +391,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
   fs_source = ralloc_asprintf(mem_ctx,
   "#version 130\n"
   "#extension GL_ARB_texture_multisample : 
enable\n"
-  "%s\n"
+  "#extension GL_ARB_sample_shading : enable\n"
   "uniform sampler2DMS%s texSampler;\n"
   "in %s texCoords;\n"
   "out vec4 out_color;\n"
@@ -406,7 +400,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
   "{\n"
   "   gl_FragDepth = texelFetch(texSampler, 
i%s(texCoords), %s).r;\n"
   "}\n",
-  arb_sample_shading_extension_string,
   sampler_array_suffix,
   texcoord_type,
   texcoord_type,
@@ -416,14 +409,12 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
* sample).  Yes, this is ridiculous.
*/
   char *sample_resolve;
-  const char *arb_sample_shading_extension_string;
   const char *merge_function;
   name = ralloc_asprintf(mem_ctx, "%svec4 MSAA %s",
  vec4_prefix,
  dst_is_msaa ? "copy" : "resolve");
 
   if (dst_is_msaa) {
- arb_sample_shading_extension_string = "#extension 
GL_ARB_sample_shading : enable";
  sample_resolve = ralloc_asprintf(mem_ctx, "   out_color = 
texelFetch(texSampler, i%s(texCoords), gl_SampleID);", texcoord_type);
  merge_function = "";
   } else {
@@ -439,8 +430,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
"vec4 merge(vec4 a, vec4 b) { return (a + b); }\n";
  }
 
- arb_sample_shading_extension_string = "";
-
  /* We're assuming power of two samples for this resolution procedure.
   *
   * To avoid losing any floating point precision if the samples all
@@ -496,7 +485,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
   fs_source = ralloc_asprintf(mem_ctx,
   "#version 130\n"
   "#extension GL_ARB_texture_multisample : 
enable\n"
-  "%s\n"
+  "#extension GL_ARB_sample_shading : enable\n"
   "#define gvec4 %svec4\n"
   "uniform %ssampler2DMS%s texSampler;\n"
   "in %s texCoords;\n"
@@ -507,7 +496,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
   "{\n"
   "%s\n" /* sample_resolve */
   "}\n",
-  arb_sample_shading_extension_string,
   vec4_prefix,
   vec4_prefix,
   sampler_array_suffix,
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 2/2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

2015-10-22 Thread Neil Roberts
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
[Ian Romanick]
---
 src/mesa/drivers/common/meta_blit.c | 39 +++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index 496ce45..4a2444a 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -357,10 +357,16 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY ||
shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) {
   char *sample_index;
+  const char *tex_coords = "texCoords";
 
   if (dst_is_msaa) {
  sample_index = "gl_SampleID";
  name = "depth MSAA copy";
+
+ if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
+/* See comment below for the color copy */
+tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))";
+ }
   } else {
  /* From the GL 4.3 spec:
   *
@@ -392,17 +398,19 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
   "#version 130\n"
   "#extension GL_ARB_texture_multisample : 
enable\n"
   "#extension GL_ARB_sample_shading : enable\n"
+  "#extension GL_ARB_gpu_shader5 : enable\n"
   "uniform sampler2DMS%s texSampler;\n"
   "in %s texCoords;\n"
   "out vec4 out_color;\n"
   "\n"
   "void main()\n"
   "{\n"
-  "   gl_FragDepth = texelFetch(texSampler, 
i%s(texCoords), %s).r;\n"
+  "   gl_FragDepth = texelFetch(texSampler, 
i%s(%s), %s).r;\n"
   "}\n",
   sampler_array_suffix,
   texcoord_type,
   texcoord_type,
+  tex_coords,
   sample_index);
} else {
   /* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning 1
@@ -415,7 +423,33 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
  dst_is_msaa ? "copy" : "resolve");
 
   if (dst_is_msaa) {
- sample_resolve = ralloc_asprintf(mem_ctx, "   out_color = 
texelFetch(texSampler, i%s(texCoords), gl_SampleID);", texcoord_type);
+ const char *tex_coords;
+
+ if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) {
+/* If interpolateAtOffset is available then it will be used to
+ * force the interpolation to the center. This is required at
+ * least on Intel hardware because it is possible to have a sample
+ * position on the 0 x or y axis which means it will lie exactly
+ * on the pixel boundary. If we let the hardware interpolate the
+ * coordinates at one of these positions then it is possible for
+ * it to jump to a neighboring texel when converting to ints due
+ * to rounding errors. This is only done for >= 16x MSAA because
+ * it probably has some overhead. It is more likely that some
+ * hardware will use one of these problematic positions at 16x
+ * MSAA because in that case in D3D they are defined to be at
+ * these positions.
+ */
+tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))";
+ } else {
+tex_coords = "texCoords";
+ }
+
+ sample_resolve =
+ralloc_asprintf(mem_ctx,
+"   out_color = texelFetch(texSampler, "
+"i%s(%s), gl_SampleID);",
+texcoord_type, tex_coords);
+
  merge_function = "";
   } else {
  int

Re: [Mesa-dev] [PATCH v6 2/2] mesa/teximage: accept ASTC formats for 3D texture specification

2015-10-26 Thread Neil Roberts
Nanley Chery  writes:

> +  /* Throw an INVALID_OPERATION error if the target is
> +   * TEXTURE_CUBE_MAP_ARRAY and the format is not ASTC.
> +   */
> +  if (target_can_be_compresed &&
> +  ctx->Extensions.KHR_texture_compression_astc_ldr &&
> +  layout != MESA_FORMAT_LAYOUT_ASTC)
>   return write_error(error, GL_INVALID_OPERATION);

This seems to cause regressions in the following Piglit tests for Gen9:

piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc
piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array 
s3tc_dxt1

I think the spec for the extension is based on the GLES spec. Perhaps
the intention was to add ASTC support for cube-map array textures
whereas previously in GLES no compressed formats were supported for this
target. However in regular GL presumably S3TC is supposed to be
supported. As it stands this patch disables cube-map array textures for
any formats other than ASTC whenever the ASTC extension is available, so
it disables S3TC on Gen9.

Maybe this section should be limited to GLES?

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color

2015-11-04 Thread Neil Roberts
It seems that if you have a surface format with less than 4 components
then the sampler still reports the fast color clear value for the
components that are missing. So for example if you clear a GL_RED
surface to 1,1,1,1 then the sampling instructions will return 1,1,1,1
instead of 1,0,0,1. Normally this doesn't matter because fast color
clears are only available on Gen7+ and for that hardware we also
always set the texture swizzle to force the missing components to zero
or one. However if the blorp blitter is used then the swizzle is not
programmed in the surface state so it doesn't work. To avoid this
situation this patch makes it just force the surface's clear color to
0 or 1 for any missing components of the surface's format.
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 69fe7b4..ed26a44 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -361,13 +361,20 @@ is_color_fast_clear_compatible(struct brw_context *brw,
  * SURFACE_STATE.
  */
 static uint32_t
-compute_fast_clear_color_bits(const union gl_color_union *color)
+compute_fast_clear_color_bits(mesa_format format,
+  const union gl_color_union *color)
 {
uint32_t bits = 0;
for (int i = 0; i < 4; i++) {
-  /* Testing for non-0 works for integer and float colors */
-  if (color->f[i] != 0.0f)
+  /* Testing for non-0 works for integer and float colors. If the
+   * component doesn't exist in the format then force the color to 0 for
+   * the RGB components and 1 for the alpha.
+   */
+  if (_mesa_format_has_color_component(format, i) ?
+  (color->f[i] != 0.0f) :
+  (i == 3)) {
  bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+  }
}
return bits;
 }
@@ -505,7 +512,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   switch (clear_type) {
   case FAST_CLEAR:
  irb->mt->fast_clear_color_value =
-compute_fast_clear_color_bits(&ctx->Color.ClearColor);
+compute_fast_clear_color_bits(irb->mt->format,
+  &ctx->Color.ClearColor);
  irb->need_downsample = true;
 
  /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa/blit: Don't require the same format for mulitisample blits

2015-11-04 Thread Neil Roberts
Previously the GL spec required that whenever glBlitFramebuffer is
used with either buffer being multisampled, the internal formats must
match. However the GL 4.4 spec was later changed to remove this
restriction. In the section entitled “Changes in the released
Specification of July 22, 2013” it says:

“Relax BlitFramebuffer in section 18.3.1 so that format conversion can
 take place during multisample blits, since drivers already allow this
 and some apps depend on it.”

If most drivers already allowed this in earlier versions I think it's
safe to assume that this is a spec bug and it should also be allowed
in all versions.

This patch just removes the restriction on desktop GL. For GLES there
are conformance tests that assert the previous behaviour so it is
probably safer to leave it in.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92706
---

There is a corresponding update to Piglit to make it expect this new
behaviour here:

http://patchwork.freedesktop.org/patch/63670/

I've only tested this with SKL and HSW. It passed fine on SKL but it
needs the previous patch to work on HSW. I put it through a full
Jenkins run but it doesn't really test much without the Piglit patch
as well and I don't think I can post Piglit patches to Jenkins.

As this enables the new behaviour for all Mesa drivers of course it
would need some testing with other drivers before landing.

 src/mesa/main/blit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c
index a32f1a4..6803062 100644
--- a/src/mesa/main/blit.c
+++ b/src/mesa/main/blit.c
@@ -285,7 +285,8 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
 /* extra checks for multisample copies... */
 if (readFb->Visual.samples > 0 || drawFb->Visual.samples > 0) {
/* color formats must match */
-   if (!compatible_resolve_formats(colorReadRb, colorDrawRb)) {
+   if (_mesa_is_gles(ctx) &&
+   !compatible_resolve_formats(colorReadRb, colorDrawRb)) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
  "%s(bad src/dst multisample pixel formats)", func);
   return;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color

2015-11-04 Thread Neil Roberts
Neil Roberts  writes:

> Normally this doesn't matter because fast color clears are only
> available on Gen7+ and for that hardware we also always set the
> texture swizzle to force the missing components to zero or one.

Looking at it a bit more I think this part is wrong and it doesn't
actually set the texture swizzle in for this case. It is actually a
general problem even without the blorp. I can demonstrate it with this
simpler test which I've posted to the Piglit mailing list:

http://patchwork.freedesktop.org/patch/63678/

Apart from the commit message I think the patch is still sound and it
does fix the test case.

I tried to make a version that is based on top of Ben's skl-fast-clear
branch but it doesn't seem to work on SKL. I am currently looking into
this.

- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color

2015-11-05 Thread Neil Roberts
Ben Widawsky  writes:

> It seems reasonable to me, but since you touch the same code I touch
> in fast clears, and since I always screw up rebases, any chance I
> could persuade you to wait until I land fast clears?

Sure, I don't mind waiting. I did have a go at rebasing the patch on top
of your branch and I think the code should look something like this:

if (brw->gen >= 9) {
   mt->gen9_fast_clear_color = *color;
+
+  /* Override components that aren't in the surface format */
+  for (int i = 0; i < 4; i++) {
+ if (!_mesa_format_has_color_component(mt->format, i)) {
+if (_mesa_is_format_integer(mt->format))
+   mt->gen9_fast_clear_color.i[i] = (i == 3);
+else
+   mt->gen9_fast_clear_color.f[i] = (i == 3);
+ }
+  }
} else {
   mt->fast_clear_color_value = 0;
   for (int i = 0; i < 4; i++) {
- /* Testing for non-0 works for integer and float colors */
- if (color->f[i] != 0.0f) {
- mt->fast_clear_color_value |=
-1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+ /* Testing for non-0 works for integer and float colors. If the
+  * component doesn't exist in the format then force the color to 0 for
+  * the RGB components and 1 for the alpha.
+  */
+ if (_mesa_format_has_color_component(mt->format, i) ?
+ (color->f[i] != 0.0f) :
+ (i == 3)) {
+mt->fast_clear_color_value |=
+   1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
  }
   }
}

However I haven't been able to get it to work. It looks like there is a
more general problem with fast clears on SKL where a bunch of formats
(such as GL_RGB) always return 0,0,0,0 for the pixels that are cleared
regardless of the clear color. I think we haven't noticed this in the
Piglit tests because most of them clear to black anyway. However if you
modify ColorGradientSunburst to clear to white instead then the
multisample formats tests starts failing for a lot more formats. I am
currently trying to look into this but maybe we should discuss this in a
different thread because I think it's not directly related to this
patch.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] [v2] i965/gen9: Support fast clears for 32b float

2015-11-12 Thread Neil Roberts
Ben Widawsky  writes:

> Two formats are explicitly disabled because they fail piglit tests,
> LUMINANCE16F and INTENSITY16F. There is some question about the
> validity of sampling from these surfaces for all gens, however, there
> seem to be no other failures, so I'd prefer to leave tackling that for
> a separate series.

Did you see my message in another thread pointing out that there are
other issues with the SKL fast clear? If you clear to white instead of
black then a bunch of other more normal formats fail such as GL_RGB.
This isn't related to this 32b float patch because obviously white
should fit the criteria of being just 0 and 1.

Perhaps the easiest way to demonstrate this bug is to tweak the formats
test so that it clears to white instead of black, like this:

diff --git a/tests/util/piglit-test-pattern.cpp 
b/tests/util/piglit-test-pattern.cpp
index 12d9918..bec25bc 100644
--- a/tests/util/piglit-test-pattern.cpp
+++ b/tests/util/piglit-test-pattern.cpp
@@ -664,7 +664,7 @@ ColorGradientSunburst::draw_with_scale_and_offset(const 
float (*proj)[4],
}
case GL_UNSIGNED_NORMALIZED:
case GL_FLOAT: {
-   glClearColor(offset, offset, offset, offset);
+   glClearColor(1.0, 1.0, 1.0, 1.0);
glClear(GL_COLOR_BUFFER_BIT);
break;
}

This passes on HSW but makes a bunch of formats fail on SKL.

It seems like we're missing something about how we're supposed to
program the fast colour state. For GL_RGB it seems to be expecting an
integer value in the range 0→255 instead of a float. Eg, if you program
the integer 128 you get back 0.5 when reading. However this isn't the
case for GL_RGBA which does seem to expect float values. I can't find
anything that would explain this in the specs. It makes no sense!

There is this bit in the spec which we are ignoring:

“If Number of Multisamples is not MULTISAMPLECOUNT_1, only 0/1 values
 allowed”

But that doesn't explain the problem because in this case we are
clearing to white.

http://lists.freedesktop.org/archives/mesa-dev/2015-November/099274.html

Regards,
- Neil

>
> v2: Just reject the two failing types.
>
> Cc: Neil Roberts  Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 12 ++--
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 67dd22d..a024d02 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -360,10 +360,18 @@ is_color_fast_clear_compatible(struct brw_context *brw,
> }
>  
> for (int i = 0; i < 4; i++) {
> -  if (color->f[i] != 0.0f && color->f[i] != 1.0f &&
> -  _mesa_format_has_color_component(format, i)) {
> +  if (!_mesa_format_has_color_component(format, i)) {
> + continue;
> +  }
> +
> +  if (brw->gen < 9 &&
> +  color->f[i] != 0.0f && color->f[i] != 1.0f) {
>   return false;
>}
> +
> +  if (format == MESA_FORMAT_L_FLOAT16 ||
> +  format == MESA_FORMAT_I_FLOAT16)
> + return false;
> }
> return true;
>  }
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 8fe480c..2381c83 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -188,14 +188,6 @@ gen8_emit_fast_clear_color(struct brw_context *brw,
> uint32_t *surf)
>  {
> if (brw->gen >= 9) {
> -#define check_fast_clear_val(x) \
> -  assert(mt->gen9_fast_clear_color.f[x] == 0.0 || \
> - mt->gen9_fast_clear_color.f[x] == 1.0)
> -  check_fast_clear_val(0);
> -  check_fast_clear_val(1);
> -  check_fast_clear_val(2);
> -  check_fast_clear_val(3);
> -#undef check_fast_clear_val
>surf[12] = mt->gen9_fast_clear_color.ui[0];
>surf[13] = mt->gen9_fast_clear_color.ui[1];
>surf[14] = mt->gen9_fast_clear_color.ui[2];
> -- 
> 2.6.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965: Handle lum, intensity and missing components in the fast clear

2015-11-12 Thread Neil Roberts
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.

v2: Handle luminance and intensity formats
---

I made a more extensive test case which tests all of the formats in
fbo_formats.h as well as using more than one test color here:

http://patchwork.freedesktop.org/patch/64578/

In the process I noticed that there is a similar problem with
luminance and intensity textures so here is a v2 to cope with that.

I've made another version of this patch which is rebased on top of
Ben's skl-fast-clear branch so we can take whichever version depending
on which lands first:

https://github.com/bpeel/mesa/commit/da7edcb6dfd93c7dd86b2e148c44dff7

 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 38 +++--
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 69fe7b4..3071590 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -361,12 +361,43 @@ is_color_fast_clear_compatible(struct brw_context *brw,
  * SURFACE_STATE.
  */
 static uint32_t
-compute_fast_clear_color_bits(const union gl_color_union *color)
+compute_fast_clear_color_bits(mesa_format format,
+  const union gl_color_union *color)
 {
+   union gl_color_union override_color = *color;
uint32_t bits = 0;
+
+   /* The sampler doesn't look at the format of the surface when the fast
+* clear color is used so we need to implement luminance, intensity and
+* missing components manually.
+*/
+   switch (_mesa_get_format_base_format(format)) {
+   case GL_INTENSITY:
+  override_color.ui[3] = override_color.ui[0];
+  /* flow through */
+   case GL_LUMINANCE:
+   case GL_LUMINANCE_ALPHA:
+  override_color.ui[1] = override_color.ui[0];
+  override_color.ui[2] = override_color.ui[0];
+  break;
+   default:
+  for (int i = 0; i < 3; i++) {
+ if (!_mesa_format_has_color_component(format, i))
+override_color.ui[i] = 0;
+  }
+  break;
+   }
+
+   if (!_mesa_format_has_color_component(format, 3)) {
+  if (_mesa_is_format_integer_color(format))
+ override_color.ui[3] = 1;
+  else
+ override_color.f[3] = 1.0f;
+   }
+
for (int i = 0; i < 4; i++) {
   /* Testing for non-0 works for integer and float colors */
-  if (color->f[i] != 0.0f)
+  if (override_color.f[i] != 0.0f)
  bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
}
return bits;
@@ -505,7 +536,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   switch (clear_type) {
   case FAST_CLEAR:
  irb->mt->fast_clear_color_value =
-compute_fast_clear_color_bits(&ctx->Color.ClearColor);
+compute_fast_clear_color_bits(irb->mt->format,
+  &ctx->Color.ClearColor);
  irb->need_downsample = true;
 
  /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/skl: Disable fast clear for formats without alpha

2015-11-13 Thread Neil Roberts
Ben Widawsky  writes:

> Here is one proposal to fix the issue. I noticed that only formats
> without alpha were failing. This sucks for RGBX formats (which
> technically aren't fast clearable based on the surface format). The
> hunk for moving the format should happen regardless of this patch.

If we do this we could limit it to just multisample buffers. I think the
problem only happens when sampling the cleared texels but we effectively
never do that for single-sampled surfaces because in that case we always
do a resolve before using them as textures. Being able to fast clear a
single-sampled RGBX window system buffer is probably one of the more
useful use cases so it's probably worth allowing that.

We could also remove the explicit check the those two FLOAT16 formats in
that case because neither of them have alpha bits.

> Neil has another patch which attempts to not disable formats, and
> workaround the hardware issue that we cannot explain. I just wanted to
> put this on the list.

If anyone's interested, the patch is here:

https://github.com/bpeel/mesa/commit/2c7b2ddc8dba388665c258a1182

Obviously I'm not suggesting we merge that patch because it's a massive
kludge.

The test case which we're using to find these issues is here:

http://patchwork.freedesktop.org/patch/64578/

- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] [v2] i965/meta/gen9: Individually fast clear color attachments

2015-11-13 Thread Neil Roberts
Hi,

You don't seem to have included any of the suggestions I made in my
review. Was this deliberate? If not, the main points were:

• You don't need to call brw_fast_clear_init or use_rectlist in the
  function because these are already called before entering it.

• I don't think it's worth creating a framebuffer. Instead you can just
  call _mesa_meta_drawbuffers_from_bitfield(1 << index) in the loop.
  Modifying the draw buffers state should be ok because it's saved in
  the meta state and it's already done for the Gen8 code path.

I went ahead and tried the changes in a patch here:

https://github.com/bpeel/mesa/commit/b2aa8f2d90572392030e5177952bf

It doesn't cause any Jenkins regressions. Feel free to squash it into
the patch if you want, or of course if you prefer to keep your patch as
it is it's up to you.

Regards,
- Neil

Ben Widawsky  writes:

> The impetus for this patch comes from a seemingly benign statement within the
> spec (quoted within the patch). For me, this patch was at some point critical
> for getting stable piglit results (though this did not seem to be the case on 
> a
> branch Chad was working on).
>
> It is very important for clearing multiple color buffer attachments and can be
> observed in the following piglit tests:
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
>
> v2: Doing the framebuffer binding only once (Chad)
> Directly use the renderbuffers from the mt (Chad)
>
> Cc: Chad Versace 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 94 
> +
>  1 file changed, 81 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index eac92d4..97444d7 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -428,6 +428,71 @@ use_rectlist(struct brw_context *brw, bool enable)
> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
>  }
>  
> +/**
> + * Individually fast clear each color buffer attachment. On previous gens 
> this
> + * isn't required. The motivation for this comes from one line (which seems 
> to
> + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> + * Render Target(s)_
> + *
> + *   "Since only one RT is bound with a clear pass, only one RT can be 
> cleared
> + *   at a time. To clear multiple RTs, multiple clear passes are required."
> + *
> + * The code follows the same idea as the resolve code which creates a fake 
> FBO
> + * to avoid interfering with too much of the GL state.
> + */
> +static void
> +fast_clear_attachments(struct brw_context *brw,
> +   struct gl_framebuffer *fb,
> +   uint32_t fast_clear_buffers,
> +   struct rect fast_clear_rect)
> +{
> +   assert(brw->gen >= 9);
> +   struct gl_context *ctx = &brw->ctx;
> +   const GLuint old_fb = ctx->DrawBuffer->Name;
> +   GLuint fbo;
> +
> +   _mesa_GenFramebuffers(1, &fbo);
> +   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
> +   _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
> +
> +   brw_fast_clear_init(brw);
> +   use_rectlist(brw, true);
> +   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +
> +   /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> +* bits to let us select the type of resolve.  For fast clear resolves, it
> +* turns out we can use the same value as pre-SKL though.
> +*/
> +   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> +
> +   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
> +  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
> +  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> +  int index = fb->_ColorDrawBufferIndexes[buf];
> +
> +  if ((fast_clear_buffers & (1 << index)) == 0)
> + continue;
> +
> +
> +  _mesa_framebuffer_renderbuffer(ctx, ctx->DrawBuffer,
> + GL_COLOR_ATTACHMENT0, rb,
> + "meta fast clear (per-attachment)");
> +
> +  brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
> +
> +  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
> +   * resolve them eventually.
> +   */
> +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> +   }
> +
> +   set_fast_clear_op(brw, 0);
> +   use_rectlist(brw, false);
> +
> +   _mesa_DeleteFramebuffers(1, &fbo);
> +   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb);
> +}
> +
>  bool
>  brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb,
>  GLbitfield buffers, bool partial_clear)
> @@ -603,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
> use_rectlist(brw, true);
>  
> layers = MAX2(

Re: [Mesa-dev] [PATCH 4/7] [v2] i965/meta/gen9: Individually fast clear color attachments

2015-11-13 Thread Neil Roberts
Ben Widawsky  writes:

> Thanks a lot, I will squash it in - and sorry again about ignoring your
> feedback.

Ok, no worries. Feel free to add

Reviewed-by: Neil Roberts 

if you squash the changes in.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Disable fast clears for MSRTs on SKL

2015-11-13 Thread Neil Roberts
There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
disables fast clears for MSRTs on SKL altogether. The assumption is
that clears are probably more likely to be used in single-sampled
applications anyway so we can at least get them working and we can
enable MSRTs later once we understand the problem better.
---

If we go with this patch we could remove the explicit check for the
half-float intensity and luminance formats as well.

 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index dc085ba..85576a8 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -524,6 +524,13 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   if (brw->gen < 7)
  clear_type = REP_CLEAR;
 
+  /* Certain formats have unresolved issues with sampling from the MCS
+   * buffer on Gen9. This disables fast clears altogether for MSRTs until
+   * we can figure out what's going on.
+   */
+  if (brw->gen >= 9 && irb->mt->num_samples > 1)
+ clear_type = REP_CLEAR;
+
   if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
  clear_type = REP_CLEAR;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: Handle lum, intensity and missing components in the fast clear

2015-11-14 Thread Neil Roberts
Ben Widawsky  writes:

>> +   case GL_LUMINANCE:
>> +   case GL_LUMINANCE_ALPHA:
>> +  override_color.ui[1] = override_color.ui[0];
>> +  override_color.ui[2] = override_color.ui[0];
>> +  break;
>
> The definition for GL_LUMINANCE afaict: "Each element is a single
> luminance value. The GL converts it to floating point, then assembles
> it into an RGBA element by replicating the luminance value three times
> for red, green, and blue and attaching 1 for alpha."
>
> doesn't that mean you need
> override_color.f[3] = 1.0f;

That is handled separately by the bit at the bottom which checks for
_mesa_format_has_color_component(format, 3). It's the same bit of code
that overrides the alpha channel for GL_RGB.

>> +   default:
>> +  for (int i = 0; i < 3; i++) {
>> + if (!_mesa_format_has_color_component(format, i))
>> +override_color.ui[i] = 0;
>> +  }
>
> Is there an easy way to verify that all formats want 0 for GB
> channels? It looks right to me, but with my knowledge of GL, that
> doesn't mean much (I am looking here:
> https://www.opengl.org/sdk/docs/man/html/glTexImage2D.xhtml)

In the GL 4.5 spec, section 15.2.1 it says “When a texture lookup is
performed in a fragment shader, the GL computes the filtered texture
value τ in the manner described in sections 8.14 and 8.15, and converts
it to a texture base color C b as shown in table 15.1”. Table 15.1 looks
something like this:

Texture Base   Texture base color
Internal Format  Cb Ab
RED(Rt, 0, 0)   1
RG (Rt, Gt, 0)  1
RGB(Rt, Gt, Bt) 1
RGBA   (Rt, Gt, Bt) At

In the compatibility spec there is also the luminance, intensity and
alpha formats and they all have 0 for the missing RGB components. I also
tried running the Piglit test I wrote on the nvidia binary blob so if we
can assume that that tests all renderable formats then we can be
confident we at least match what nvidia do.

> I also think that component 0 must always have a color, right? (I'm
> not requesting a change as such, just making sure my understanding of
> what you're trying to do is correct).

As Ilia mentioned, GL_ALPHA doesn't have a red component.

>> +  break;
>> +   }
>> +
>> +   if (!_mesa_format_has_color_component(format, 3)) {
>> +  if (_mesa_is_format_integer_color(format))
>> + override_color.ui[3] = 1;
>
> We shouldn't ever be fast clearing integer formats. We can on GEN8+,
> but we're not doing it today. So I think it should be safe to remove
> this check.

You're right, but I thought I'd add it anyway because I didn't think
this is a particularly hot code path and it might make life easier for
whoever eventually adds support for integer fast clears. I'm happy to
take it out thought if you think that'd be better.

> Seems like a good patch to me. It would probably be nice to track down
> a good spec reference if you manage to find one. I know I've seen such
> reference in SKL docs (which aren't SKL specific) - but I am having
> trouble finding it in PRMs. My VPN is broken, so I can't look at SKL
> docs right now.

I haven't been able to find anything yet sadly.

> With the explanation of why the luminance alpha channel isn't 1 (I
> also claim incompotence on the GL_LUMINANCE_ALPHA format):
> Reviewed-by: Ben Widawsky 

Many thanks for the review.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Force a BINDING_TABLE_POINTER_* command after push constant command

2015-01-29 Thread Neil Roberts
According to the SkyLake bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This
patch just makes it set the BRW_NEW_SURFACES state when uploading the push
constants to ensure the binding tables will be updated.

This fixes the fbo-blending-formats Piglit test and possibly others.

Reviewed-by: Kristian Høgsberg 
---
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 404dd20..c55d6ca 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -60,6 +60,13 @@ gen7_upload_constant_state(struct brw_context *brw,
}
 
ADVANCE_BATCH();
+
+  /* On SkyLake+ the new constants don't take effect until the next
+   * corresponding 3DSTATE_BINDING_TABLE_POINTER_* command is parsed so we
+   * need to ensure that is sent
+   */
+   if (brw->gen >= 9)
+  brw->state.dirty.brw |= BRW_NEW_SURFACES;
 }
 
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] dir-locals.el: Don't set variables for non-programming modes

2015-01-29 Thread Neil Roberts
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
---
 .dir-locals.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index 04a1a2c..d95eb48 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . nil)
   (tab-width . 8)
   (c-basic-offset . 3)
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] dir-locals.el: Don't set variables for non-programming modes

2015-01-31 Thread Neil Roberts
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.

v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
---
 .dir-locals.el| 2 +-
 src/gallium/drivers/freedreno/.dir-locals.el  | 2 +-
 src/gallium/drivers/r600/.dir-locals.el   | 2 +-
 src/gallium/drivers/radeon/.dir-locals.el | 2 +-
 src/gallium/drivers/radeonsi/.dir-locals.el   | 2 +-
 src/gallium/drivers/vc4/.dir-locals.el| 2 +-
 src/gallium/drivers/vc4/kernel/.dir-locals.el | 2 +-
 src/gallium/winsys/radeon/.dir-locals.el  | 2 +-
 src/mesa/drivers/dri/nouveau/.dir-locals.el   | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index 04a1a2c..d95eb48 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . nil)
   (tab-width . 8)
   (c-basic-offset . 3)
diff --git a/src/gallium/drivers/freedreno/.dir-locals.el 
b/src/gallium/drivers/freedreno/.dir-locals.el
index c26578b..aa20d49 100644
--- a/src/gallium/drivers/freedreno/.dir-locals.el
+++ b/src/gallium/drivers/freedreno/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . true)
   (tab-width . 4)
   (c-basic-offset . 4)
diff --git a/src/gallium/drivers/r600/.dir-locals.el 
b/src/gallium/drivers/r600/.dir-locals.el
index 8be6a30..4e35c12 100644
--- a/src/gallium/drivers/r600/.dir-locals.el
+++ b/src/gallium/drivers/r600/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . true)
   (tab-width . 8)
   (c-basic-offset . 8)
diff --git a/src/gallium/drivers/radeon/.dir-locals.el 
b/src/gallium/drivers/radeon/.dir-locals.el
index 8be6a30..4e35c12 100644
--- a/src/gallium/drivers/radeon/.dir-locals.el
+++ b/src/gallium/drivers/radeon/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . true)
   (tab-width . 8)
   (c-basic-offset . 8)
diff --git a/src/gallium/drivers/radeonsi/.dir-locals.el 
b/src/gallium/drivers/radeonsi/.dir-locals.el
index 8be6a30..4e35c12 100644
--- a/src/gallium/drivers/radeonsi/.dir-locals.el
+++ b/src/gallium/drivers/radeonsi/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . true)
   (tab-width . 8)
   (c-basic-offset . 8)
diff --git a/src/gallium/drivers/vc4/.dir-locals.el 
b/src/gallium/drivers/vc4/.dir-locals.el
index ed10dc2..ac94242 100644
--- a/src/gallium/drivers/vc4/.dir-locals.el
+++ b/src/gallium/drivers/vc4/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . nil)
   (tab-width . 8)
   (c-basic-offset . 8)
diff --git a/src/gallium/drivers/vc4/kernel/.dir-locals.el 
b/src/gallium/drivers/vc4/kernel/.dir-locals.el
index 2e58e90..49403de 100644
--- a/src/gallium/drivers/vc4/kernel/.dir-locals.el
+++ b/src/gallium/drivers/vc4/kernel/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . t)
   (tab-width . 8)
   (c-basic-offset . 8)
diff --git a/src/gallium/winsys/radeon/.dir-locals.el 
b/src/gallium/winsys/radeon/.dir-locals.el
index 0e937bb..d5f0f04 100644
--- a/src/gallium/winsys/radeon/.dir-locals.el
+++ b/src/gallium/winsys/radeon/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . nil)
   (tab-width . 8)
   (c-basic-offset . 4)
diff --git a/src/mesa/drivers/dri/nouveau/.dir-locals.el 
b/src/mesa/drivers/dri/nouveau/.dir-locals.el
index 29735e8..774f023 100644
--- a/src/mesa/drivers/dri/nouveau/.dir-locals.el
+++ b/src/mesa/drivers/dri/nouveau/.dir-locals.el
@@ -1,4 +1,4 @@
-((nil
+((prog-mode
   (indent-tabs-mode . true)
   (tab-width . 8)
   (c-basic-offset . 8)
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Fixing an x86 FPU bug.

2015-02-04 Thread Neil Roberts
Hi Marius,

This patch does indeed make the Piglit test pass but it doesn't seem
like a complete fix. It looks like there are still a number of places
that are copying via floats that the test wouldn't catch. There are also
lots of places that are still using GLfloat to store the attribute
values and then these are cast to gl_constant_value* whenever they are
used in combination with the places that the patch does change. I think
if we wanted to do a complete patch then we should really change all
places that are storing attribute values to use gl_constant_value to
make it clear that they aren't really just floats.

If it is too much of a change just to fix this relatively minor bug then
perhaps it would be better to do a simpler change that just fixes this
one case using a memcpy or something instead of changing so much code. I
don't really like the idea of only changing some of the places to
gl_constant_value and leaving the rest as GLfloat because it seems even
more confusing that way.

I don't think it makes sense to add this patch to the stable branch. The
bug has presumably been there since the introduction of integer
attributes in 2012 (see acf438f537) and nobody has complained so it
doesn't seem particularly urgent. The patch is non-trivial so I think
there is a risk it will introduce more bugs.

I've made some comments inline below.

marius.pre...@intel.com writes:

> From: Marius Predut 
>
> On 32-bit, for floating point operations is used x86 FPU registers instead 
> SSE,
> reason for  when reinterprets an integer as a float result is unexpected
> (modify floats when they are written to memory).
> The defect was checked with and without -O3 compiler flag.
>
> CC: 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82668
> Signed-off-by: Marius Predut 
> ---
>  src/mesa/main/context.c   |3 ++-
>  src/mesa/main/macros.h|   47 
> -
>  src/mesa/vbo/vbo_attrib_tmp.h |   20 ++
>  src/mesa/vbo/vbo_exec.h   |3 ++-
>  src/mesa/vbo/vbo_exec_api.c   |   31 +--
>  src/mesa/vbo/vbo_exec_eval.c  |   22 ++-
>  src/mesa/vbo/vbo_save_api.c   |   16 +++---
>  src/mesa/vbo/vbo_save_draw.c  |4 ++--
>  8 files changed, 90 insertions(+), 56 deletions(-)
>
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index 400c158..11ab8a9 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -134,6 +134,7 @@
>  #include "math/m_matrix.h"
>  #include "main/dispatch.h" /* for _gloffset_COUNT */
>  #include "uniforms.h"
> +#include "macros.h"
>  
>  #ifdef USE_SPARC_ASM
>  #include "sparc/sparc.h"
> @@ -656,7 +657,7 @@ _mesa_init_constants(struct gl_constants *consts, gl_api 
> api)
> consts->MaxSamples = 0;
>  
> /* GLSL default if NativeIntegers == FALSE */
> -   consts->UniformBooleanTrue = FLT_AS_UINT(1.0f);
> +   consts->UniformBooleanTrue = FLOAT_AS_UNION(1.0f).u;
>  
> /* GL_ARB_sync */
> consts->MaxServerWaitTimeout = 0x1fff7fffULL;
> diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h
> index cd5f2d6..2651ffc 100644
> --- a/src/mesa/main/macros.h
> +++ b/src/mesa/main/macros.h
> @@ -32,6 +32,7 @@
>  #define MACROS_H
>  
>  #include "imports.h"
> +#include "program/prog_parameter.h"
>  
>  
>  /**
> @@ -170,27 +171,26 @@ extern GLfloat _mesa_ubyte_to_float_color_tab[256];
>   ub = ((GLubyte) F_TO_I((f) * 255.0F))
>  #endif
>  
> -static inline GLfloat INT_AS_FLT(GLint i)
> +static union gl_constant_value UINT_AS_UNION(GLuint u)
>  {
> -   fi_type tmp;
> -   tmp.i = i;
> -   return tmp.f;
> +   union gl_constant_value tmp;
> +   tmp.u = u;
> +   return tmp;
>  }
>  
> -static inline GLfloat UINT_AS_FLT(GLuint u)
> +static inline union gl_constant_value INT_AS_UNION(GLint i)
>  {
> -   fi_type tmp;
> -   tmp.u = u;
> -   return tmp.f;
> +   union gl_constant_value tmp;
> +   tmp.i = i;
> +   return tmp;
>  }
>  
> -static inline unsigned FLT_AS_UINT(float f)
> +static inline union gl_constant_value FLOAT_AS_UNION(GLfloat f)
>  {
> -   fi_type tmp;
> +   union gl_constant_value tmp;
> tmp.f = f;
> -   return tmp.u;
> +   return tmp;
>  }
> -
>  /**
>   * Convert a floating point value to an unsigned fixed point value.
>   *
> @@ -382,6 +382,15 @@ do {\
>  V[3] = V3;  \
>  } while(0)
>  
> +/** Assignment union*/
> +#define ASSIGN_4V_UNION( V, V0, V1, V2, V3 )  \
> +do {\
> +V[0].f = V0;  \
> +V[1].f = V1;  \
> +V[2].f = V2;  \
> +V[3].f = V3;  \
> +} while(0)
> +

I think it would be better not to have this macro and just use the
ASSIGN_4V macro to copy the gl_constant_value structs directly. This
macro is copying the union values via a floating-point operation which
is what we're trying to avoid. Technically

Re: [Mesa-dev] [PATCH 01/14] i965: Don't tile 1D miptrees.

2015-02-09 Thread Neil Roberts
I made a similar patch in my local tree because it will be necessary for
Skylake which doesn't support tiling for 1D textures. I made a little
test to time rendering a large (4096) 1D texture here:

https://github.com/bpeel/glthing/tree/time-1d-texture

It gives about an 11% increase in FPS with this patch.

It seems like a good idea to me to make it check the height rather than
the texture dimensions as Kenneth says.

Regards,
- Neil

Kenneth Graunke  writes:

> On Saturday, February 07, 2015 02:46:31 AM Francisco Jerez wrote:
>> Kenneth Graunke  writes:
>> 
>> > On Friday, February 06, 2015 07:23:15 PM Francisco Jerez wrote:
>> >> It doesn't really improve locality of texture fetches, quite the
>> >> opposite it's a waste of memory bandwidth and space due to tile
>> >> alignment.
>> >> ---
>> >>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 
>> >>  1 file changed, 4 insertions(+)
>> >> 
>> >> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
>> >> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> >> index 64752dd..311b204 100644
>> >> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> >> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> >> @@ -488,6 +488,10 @@ intel_miptree_choose_tiling(struct brw_context *brw,
>> >> base_format == GL_DEPTH_STENCIL_EXT)
>> >>return I915_TILING_Y;
>> >>  
>> >> +   if (mt->target == GL_TEXTURE_1D ||
>> >> +   mt->target == GL_TEXTURE_1D_ARRAY)
>> >> +  return I915_TILING_NONE;
>> >> +
>> >> int minimum_pitch = mt->total_width * mt->cpp;
>> >>  
>> >> /* If the width is much smaller than a tile, don't bother tiling. */
>> >> 
>> >
>> > What do you think about checking mt->logical_height0 == 1 instead?
>> > GLES doesn't have 1D textures, but people might use a 2D texture with
>> > height 1, and I think the same logic applies.
>> >
>> > I've also been thinking of trying to make core Mesa hide 1D textures and
>> > simply turn them into 2D textures with height = 1, so drivers don't have
>> > to deal with them.
>> >
>> > Either way, this seems like a good idea to me.
>> 
>> I guess that would work because the layer count of 1D array textures
>> ends up in logical_depth0 rather than logical_height0?  Sounds good to
>> me, but it would also have the side effect of not tiling "degenerate" 3D
>> textures with height=1 but depth>1, I guess we would still like to tile
>> those?  How about (mt->logical_height0 == 1 && (mt->logical_depth0 == 1 ||
>> mt->target != GL_TEXTURE_3D))?
>> 
>> Thanks Ken!
>
> I suppose that's fine, though frankly, trying to optimize a texture with
> height 1 at all seems pretty unnecessary to me.  I've never seen a 3D
> texture with height 1 in the wild...
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgpPZwNTC8EO1.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Don't tile 1D miptrees.

2015-02-09 Thread Neil Roberts
Looks good to me.

Reviewed-by: Neil Roberts 

- Neil

Francisco Jerez  writes:

> It doesn't really improve locality of texture fetches, quite the
> opposite it's a waste of memory bandwidth and space due to tile
> alignment.
>
> v2: Check mt->logical_height0 instead of mt->target (Ken).  Add short
> comment explaining why they shouldn't be tiled.
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 64752dd..0e3888f 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -488,6 +488,13 @@ intel_miptree_choose_tiling(struct brw_context *brw,
> base_format == GL_DEPTH_STENCIL_EXT)
>return I915_TILING_Y;
>  
> +   /* 1D textures (and 1D array textures) don't get any benefit from tiling,
> +* in fact it leads to a less efficient use of memory space and bandwidth
> +* due to tile alignment.
> +*/
> +   if (mt->logical_height0 == 1)
> +  return I915_TILING_NONE;
> +
> int minimum_pitch = mt->total_width * mt->cpp;
>  
> /* If the width is much smaller than a tile, don't bother tiling. */
> -- 
> 2.1.3


pgpW186qYlh05.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Implement WaDisable1DDepthStencil

2015-02-09 Thread Neil Roberts
Skylake+ doesn't support setting a depth buffer to a 1D surface but it
does allow pretending it's a 2D texture with a height of 1 instead.

This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit
test (and also seems to avoid a subsequent GPU hang).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037
---
This probably only makes sense on top of Francisco Jerez's patch here:

http://lists.freedesktop.org/archives/mesa-dev/2015-February/076392.html

The copyteximage test is still failing with some other formats such as
GL_RGB16 and some intensity and luminance formats but I'm just looking
into that now.

 src/mesa/drivers/dri/i965/gen8_depth_state.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c 
b/src/mesa/drivers/dri/i965/gen8_depth_state.c
index e428089..b4eb6e1 100644
--- a/src/mesa/drivers/dri/i965/gen8_depth_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c
@@ -190,6 +190,18 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw,
case GL_TEXTURE_3D:
   assert(mt);
   depth = MAX2(mt->logical_depth0, 1);
+  surftype = translate_tex_target(gl_target);
+  break;
+   case GL_TEXTURE_1D_ARRAY:
+   case GL_TEXTURE_1D:
+  if (brw->gen >= 9) {
+ /* WaDisable1DDepthStencil. Skylake+ doesn't support 1D depth
+  * textures but it does allow pretending it's a 2D texture
+  * instead.
+  */
+ surftype = BRW_SURFACE_2D;
+ break;
+  }
   /* fallthrough */
default:
   surftype = translate_tex_target(gl_target);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] i965/skl: Fix handling 1D textures

2015-02-17 Thread Neil Roberts
Skylake has changed 1D textures so that the mipmaps are stored more
efficiently and the qpitch is calculated differently. These two
patches combined fix the following Piglit tests:

clear-color-all-types 1d_array mipmapped
clear-color-all-types 1d_array single_level
fbo-generatemipmap-1d
texsubimage array pbo
1D view rendering
array-texture
copyteximage 1D_ARRAY
copyteximage 1D_ARRAY samples=2
copyteximage 1D_ARRAY samples=4
copyteximage 1D_ARRAY samples=6
copyteximage 1D_ARRAY samples=8
fbo-generatemipmap-array
fbo-generatemipmap-array RGB9_E5
render-1darray
texsubimage array
gs-texelFetch-isampler1D
gs-texelFetch-isampler1DArray
gs-texelFetch-sampler1D
gs-texelFetch-sampler1DArray
gs-texelFetch-usampler1D
gs-texelFetch-usampler1DArray
gs-texelFetch-isampler1D
gs-texelFetch-isampler1DArray
gs-texelFetch-sampler1D
gs-texelFetch-sampler1DArray
gs-texelFetch-usampler1D
gs-texelFetch-usampler1DArray
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/skl: Layout a 1D miptree horizontally

2015-02-17 Thread Neil Roberts
On Gen9+ the 1D miptree is laid out with all of the mipmap levels in a
horizontal line.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 71 --
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 0e2841f..abf532a 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -158,6 +158,42 @@ intel_vertical_texture_alignment_unit(struct brw_context 
*brw,
 }
 
 static void
+gen9_miptree_layout_1d(struct intel_mipmap_tree *mt)
+{
+   unsigned x = 0;
+   unsigned width = mt->physical_width0;
+   unsigned depth = mt->physical_depth0; /* number of array layers. */
+   /* When this layout is used the horizontal alignment is fixed at 64 and the
+* hardware ignores the value given in the surface state
+*/
+   const unsigned int align_w = 64;
+
+   mt->total_height = mt->physical_height0;
+
+   if (mt->compressed) {
+  mt->total_height = ALIGN(mt->physical_height0, mt->align_h);
+   }
+
+   mt->total_width = 0;
+
+   for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
+  unsigned img_width;
+
+  intel_miptree_set_level_info(mt, level, x, 0, depth);
+
+  img_width = ALIGN(width, align_w);
+  if (mt->compressed)
+ img_width /= align_w;
+
+  mt->total_width = MAX2(mt->total_width, x + img_width);
+
+  x += img_width;
+
+  width = minify(width, 1);
+   }
+}
+
+static void
 brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 {
unsigned x = 0;
@@ -239,23 +275,49 @@ align_cube(struct intel_mipmap_tree *mt)
   mt->total_height += 2;
 }
 
+static bool
+use_linear_1d_layout(struct brw_context *brw,
+ struct intel_mipmap_tree *mt)
+{
+   /* On Gen9+ the mipmap levels of a 1D surface are all laid out in a
+* horizontal line. This isn't done for depth/stencil buffers however
+* because those will be using a tiled layout
+*/
+
+   if (brw->gen >= 9 &&
+   (mt->target == GL_TEXTURE_1D ||
+mt->target == GL_TEXTURE_1D_ARRAY)) {
+  GLenum base_format = _mesa_get_format_base_format(mt->format);
+
+  if (base_format != GL_DEPTH_COMPONENT &&
+  base_format != GL_DEPTH_STENCIL)
+ return true;
+   }
+
+   return false;
+}
+
 static void
 brw_miptree_layout_texture_array(struct brw_context *brw,
 struct intel_mipmap_tree *mt)
 {
int h0, h1;
unsigned height = mt->physical_height0;
+   bool layout_1d = use_linear_1d_layout(brw, mt);
 
h0 = ALIGN(mt->physical_height0, mt->align_h);
h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
-   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD)
+   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD || layout_1d)
   mt->qpitch = h0;
else
   mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h);
 
int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
 
-   brw_miptree_layout_2d(mt);
+   if (layout_1d)
+  gen9_miptree_layout_1d(mt);
+   else
+  brw_miptree_layout_2d(mt);
 
for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
   unsigned img_height;
@@ -386,7 +448,10 @@ brw_miptree_layout(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
  break;
   case INTEL_MSAA_LAYOUT_NONE:
   case INTEL_MSAA_LAYOUT_IMS:
- brw_miptree_layout_2d(mt);
+ if (use_linear_1d_layout(brw, mt))
+gen9_miptree_layout_1d(mt);
+ else
+brw_miptree_layout_2d(mt);
  break;
   }
   break;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965/skl: Upload qpitch in pixels for 1D textures

2015-02-17 Thread Neil Roberts
According to the bspec since Skylake the qpitch value in the surface
formats should be measured in pixels rather than rows for 1D textures.
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index d6b870e..4132b9b 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -145,6 +145,24 @@ gen8_emit_buffer_surface_state(struct brw_context *brw,
}
 }
 
+static uint32_t
+qpitch_for_mt(struct brw_context *brw,
+  struct intel_mipmap_tree *mt,
+  uint32_t surf_type)
+{
+   uint32_t qpitch = mt->qpitch;
+
+   /* On Skylake+ for 1D linear surfaces the qpitch is measured in pixels
+* rather than rows
+*/
+   if (brw->gen >= 9 &&
+   surf_type == BRW_SURFACE_1D &&
+   mt->tiling == I915_TILING_NONE)
+  qpitch *= mt->total_width;
+
+   return qpitch;
+}
+
 static void
 gen8_update_texture_surface(struct gl_context *ctx,
 unsigned unit,
@@ -197,7 +215,11 @@ gen8_update_texture_surface(struct gl_context *ctx,
 
uint32_t *surf = allocate_surface_state(brw, surf_offset);
 
-   surf[0] = translate_tex_target(tObj->Target) << BRW_SURFACE_TYPE_SHIFT |
+   uint32_t surf_type = translate_tex_target(tObj->Target);
+
+   uint32_t qpitch = qpitch_for_mt(brw, mt, surf_type);
+
+   surf[0] = surf_type << BRW_SURFACE_TYPE_SHIFT |
  tex_format << BRW_SURFACE_FORMAT_SHIFT |
  vertical_alignment(mt) |
  horizontal_alignment(mt) |
@@ -211,7 +233,7 @@ gen8_update_texture_surface(struct gl_context *ctx,
if (mt->logical_depth0 > 1 && tObj->Target != GL_TEXTURE_3D)
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
 
-   surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
+   surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | qpitch >> 2;
 
surf[2] = SET_FIELD(mt->logical_width0 - 1, GEN7_SURFACE_WIDTH) |
  SET_FIELD(mt->logical_height0 - 1, GEN7_SURFACE_HEIGHT);
@@ -390,6 +412,8 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
uint32_t *surf =
   allocate_surface_state(brw, &brw->wm.base.surf_offset[surf_index]);
 
+   uint32_t qpitch = qpitch_for_mt(brw, mt, surf_type);
+
surf[0] = (surf_type << BRW_SURFACE_TYPE_SHIFT) |
  (is_array ? GEN7_SURFACE_IS_ARRAY : 0) |
  (format << BRW_SURFACE_FORMAT_SHIFT) |
@@ -397,7 +421,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
  horizontal_alignment(mt) |
  surface_tiling_mode(tiling);
 
-   surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
+   surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | qpitch >> 2;
 
surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
  SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/skl: Upload qpitch in pixels for 1D textures

2015-02-18 Thread Neil Roberts
Ben Widawsky  writes:

> I promise to look at this again in more detail tomorrow when I am more
> awake, but meanwhile, I'd be very much in favor of just setting
> mt->qpith for all miptree layout types.

I'm not sure what you mean. Do you mean that we should just always set
mt->qpitch to the actual value that we're going to upload? Yes, that
seems like a good idea. I was originally worried about that because I
had thought that other bits of code were looking at the qpitch to work
out some offsets but that is not the case.

The qpitch seems to also be broken for 3D textures and compressed
textures so I think it might be a good idea to squash this patch into a
general patch which fixes everything. I made a start on that as part of
fixing the copyteximage test for 3D textures here:

https://github.com/bpeel/mesa/commit/fc95546c49f0e5c5805139712c51a0d13b2183c3

I don't want to post the patch to the mailing list yet because it
doesn't fix the test for the GL_RGBA32 format with 3D textures and I'd
like to work out why first.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/skl: Layout a 1D miptree horizontally

2015-02-19 Thread Neil Roberts
Ian Romanick  writes:

> There aren't any compressed formats that support 1D textures, so I
> don't think this can occur. Does the bspec say anything about
> compressed 1D textures?

Ah yes, you're right. I just copied it from brw_miptree_layout_2d
without really thinking it through. The SKL bspec explicitly states that
compressed formats can't be used with SURFTYPE_1D. I'll remove and send
a v2 along with the patches to fix 3D textures too.

Thanks for the review.

- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/6] i965/skl: Texture layout fixes

2015-02-20 Thread Neil Roberts
Here is a v2 of my patch series to fix 1D textures on Skylake. It's
now turned into just some general fixes and also helps with 3D
textures. It fixes 110 piglit tests but sadly seems to cause 3
regressions. It might not be worth landing until I can work out what
the regressions are so I guess I'm just posting it now in case
anyone's interested to look at it anyway.

I've uploaded the piglit html summary of the changes here:

http://busydoingnothing.co.uk/skl-qpitch-patches/changes.html

- Neil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965: Don't force x-tiling for 16-bpp formats on Gen>7

2015-02-20 Thread Neil Roberts
Sandybridge doesn't support y-tiling for surface formats with 16 or
more bpp. There was previously an override to explicitly allow this
for Gen7. However, this restriction is also removed in Gen8+ so we
should use y-tiling there too.

This is important to do for Skylake which doesn't support x-tiling for
3D surfaces.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 0e3888f..994670a 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -516,9 +516,9 @@ intel_miptree_choose_tiling(struct brw_context *brw,
 * "NOTE: 128BPE Format Color Buffer ( render target ) MUST be either TileX
 *  or Linear."
 * 128 bits per pixel translates to 16 bytes per pixel.  This is necessary
-* all the way back to 965, but is explicitly permitted on Gen7.
+* all the way back to 965, but is explicitly permitted on Gen7+.
 */
-   if (brw->gen != 7 && mt->cpp >= 16)
+   if (brw->gen < 7 && mt->cpp >= 16)
   return I915_TILING_X;
 
/* From the Ivy Bridge PRM, Vol4 Part1 2.12.2.1 (SURFACE_STATE for most
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] i965/skl: Don't use ALL_SLICES_AT_EACH_LOD

2015-02-20 Thread Neil Roberts
The render surface state command for Skylake doesn't have the surface
array spacing bit so I don't think it's possible to select this
layout. This avoids a kernel panic when running the piglit test below:

ext_framebuffer_multisample-no-color 8 stencil single

However the test still fails so there may be something else wrong as
well. The test was not causing a kernel panic before the patch to fix
the qpitch.

I think it's also not possible to select this layout on Gen8 so it may
need to be changed to only be used on Gen7.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 994670a..018e16b 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -371,19 +371,25 @@ intel_miptree_create_layout(struct brw_context *brw,
   }
}
 
-   /* Set array_layout to ALL_SLICES_AT_EACH_LOD when gen7+ array_spacing_lod0
-* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces.
+   /* Set array_layout to ALL_SLICES_AT_EACH_LOD when array_spacing_lod0
+* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces
+* on Gen 7 and 8.
 * TODO: can we use it elsewhere?
+* TODO: does this actually work on Gen 8?
 */
-   switch (mt->msaa_layout) {
-   case INTEL_MSAA_LAYOUT_NONE:
-   case INTEL_MSAA_LAYOUT_IMS:
+   if (brw->gen >= 9) {
   mt->array_layout = ALL_LOD_IN_EACH_SLICE;
-  break;
-   case INTEL_MSAA_LAYOUT_UMS:
-   case INTEL_MSAA_LAYOUT_CMS:
-  mt->array_layout = ALL_SLICES_AT_EACH_LOD;
-  break;
+   } else {
+  switch (mt->msaa_layout) {
+  case INTEL_MSAA_LAYOUT_NONE:
+  case INTEL_MSAA_LAYOUT_IMS:
+ mt->array_layout = ALL_LOD_IN_EACH_SLICE;
+ break;
+  case INTEL_MSAA_LAYOUT_UMS:
+  case INTEL_MSAA_LAYOUT_CMS:
+ mt->array_layout = ALL_SLICES_AT_EACH_LOD;
+ break;
+  }
}
 
if (target == GL_TEXTURE_CUBE_MAP) {
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] i965/skl: Align compressed textures to four times the block size

2015-02-20 Thread Neil Roberts
On Skylake it is possible to choose your own alignment values for
compressed textures but they are expressed as a multiple of the block
size. The minimum alignment value we can use is 4 so we effectively
have to align to 4 times the block size. This patch makes it initially
set mt->align_[wh] to the large alignment value and then later divides
it by the block size so that it can be uploaded as part of the surface
state.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 31 ++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index d89a04e..1fe2c26 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -73,7 +73,16 @@ intel_horizontal_texture_alignment_unit(struct brw_context 
*brw,
 */
   unsigned int i, j;
   _mesa_get_format_block_size(format, &i, &j);
-  return i;
+
+  /* On Gen9+ we can pick our own alignment for compressed textures but it
+   * has to be a multiple of the block size. The minimum alignment we can
+   * pick is 4 so we effectively have to align to 4 times the block
+   * size
+   */
+  if (brw->gen >= 9)
+ return i * 4;
+  else
+ return i;
 }
 
if (format == MESA_FORMAT_S_UINT8)
@@ -113,7 +122,8 @@ intel_vertical_texture_alignment_unit(struct brw_context 
*brw,
 * the SURFACE_STATE "Surface Vertical Alignment" field.
 */
if (_mesa_is_format_compressed(format))
-  return 4;
+  /* See comment above for the horizontal alignment */
+  return brw->gen >= 9 ? 16 : 4;
 
if (format == MESA_FORMAT_S_UINT8)
   return brw->gen >= 7 ? 8 : 4;
@@ -195,6 +205,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
unsigned width = mt->physical_width0;
unsigned height = mt->physical_height0;
unsigned depth = mt->physical_depth0; /* number of array layers. */
+   unsigned int bw, bh;
+
+   _mesa_get_format_block_size(mt->format, &bw, &bh);
 
mt->total_width = mt->physical_width0;
 
@@ -212,7 +225,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 
if (mt->compressed) {
   mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
- ALIGN(minify(mt->physical_width0, 2), mt->align_w);
+ ALIGN(minify(mt->physical_width0, 2), bw);
} else {
   mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
  minify(mt->physical_width0, 2);
@@ -232,7 +245,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 
   img_height = ALIGN(height, mt->align_h);
   if (mt->compressed)
-img_height /= mt->align_h;
+img_height /= bh;
 
   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
  /* Compact arrays with separated miplevels */
@@ -481,5 +494,15 @@ brw_miptree_layout(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
}
DBG("%s: %dx%dx%d\n", __FUNCTION__,
mt->total_width, mt->total_height, mt->cpp);
+
+   /* On Gen9+ the alignment values are expressed in multiples of the block
+* size
+*/
+   if (brw->gen >= 9) {
+  unsigned int i, j;
+  _mesa_get_format_block_size(mt->format, &i, &j);
+  mt->align_w /= i;
+  mt->align_h /= j;
+   }
 }
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] i965/skl: Layout 3D textures the same as array textures

2015-02-20 Thread Neil Roberts
On Gen9+ the 3D textures use the same mipmap layout as 2D array
textures.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 0e2841f..57922e9 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -224,6 +224,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 
   width  = minify(width, 1);
   height = minify(height, 1);
+
+  if (mt->target == GL_TEXTURE_3D)
+ depth = minify(depth, 1);
}
 }
 
@@ -263,7 +266,7 @@ brw_miptree_layout_texture_array(struct brw_context *brw,
   if (mt->compressed)
  img_height /= mt->align_h;
 
-  for (int q = 0; q < mt->physical_depth0; q++) {
+  for (int q = 0; q < mt->level[level].depth; q++) {
  if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
 intel_miptree_set_image_offset(mt, level, q, 0, q * img_height);
  } else {
@@ -368,7 +371,10 @@ brw_miptree_layout(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
   break;
 
case GL_TEXTURE_3D:
-  brw_miptree_layout_texture_3d(brw, mt);
+  if (brw->gen >= 9)
+ brw_miptree_layout_texture_array(brw, mt);
+  else
+ brw_miptree_layout_texture_3d(brw, mt);
   break;
 
case GL_TEXTURE_1D_ARRAY:
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] i965/skl: Layout a 1D miptree horizontally

2015-02-20 Thread Neil Roberts
On Gen9+ the 1D miptree is laid out with all of the mipmap levels in a
horizontal line.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 62 +-
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 57922e9..851742b 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -158,6 +158,36 @@ intel_vertical_texture_alignment_unit(struct brw_context 
*brw,
 }
 
 static void
+gen9_miptree_layout_1d(struct intel_mipmap_tree *mt)
+{
+   unsigned x = 0;
+   unsigned width = mt->physical_width0;
+   unsigned depth = mt->physical_depth0; /* number of array layers. */
+
+   /* When this layout is used the horizontal alignment is fixed at 64 and the
+* hardware ignores the value given in the surface state
+*/
+   const unsigned int align_w = 64;
+
+   mt->total_height = mt->physical_height0;
+   mt->total_width = 0;
+
+   for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
+  unsigned img_width;
+
+  intel_miptree_set_level_info(mt, level, x, 0, depth);
+
+  img_width = ALIGN(width, align_w);
+
+  mt->total_width = MAX2(mt->total_width, x + img_width);
+
+  x += img_width;
+
+  width = minify(width, 1);
+   }
+}
+
+static void
 brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
 {
unsigned x = 0;
@@ -242,12 +272,34 @@ align_cube(struct intel_mipmap_tree *mt)
   mt->total_height += 2;
 }
 
+static bool
+use_linear_1d_layout(struct brw_context *brw,
+ struct intel_mipmap_tree *mt)
+{
+   /* On Gen9+ the mipmap levels of a 1D surface are all laid out in a
+* horizontal line. This isn't done for depth/stencil buffers however
+* because those will be using a tiled layout
+*/
+   if (brw->gen >= 9 &&
+   (mt->target == GL_TEXTURE_1D ||
+mt->target == GL_TEXTURE_1D_ARRAY)) {
+  GLenum base_format = _mesa_get_format_base_format(mt->format);
+
+  if (base_format != GL_DEPTH_COMPONENT &&
+  base_format != GL_DEPTH_STENCIL)
+ return true;
+   }
+
+   return false;
+}
+
 static void
 brw_miptree_layout_texture_array(struct brw_context *brw,
 struct intel_mipmap_tree *mt)
 {
int h0, h1;
unsigned height = mt->physical_height0;
+   bool layout_1d = use_linear_1d_layout(brw, mt);
 
h0 = ALIGN(mt->physical_height0, mt->align_h);
h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
@@ -258,7 +310,10 @@ brw_miptree_layout_texture_array(struct brw_context *brw,
 
int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
 
-   brw_miptree_layout_2d(mt);
+   if (layout_1d)
+  gen9_miptree_layout_1d(mt);
+   else
+  brw_miptree_layout_2d(mt);
 
for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
   unsigned img_height;
@@ -392,7 +447,10 @@ brw_miptree_layout(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
  break;
   case INTEL_MSAA_LAYOUT_NONE:
   case INTEL_MSAA_LAYOUT_IMS:
- brw_miptree_layout_2d(mt);
+ if (use_linear_1d_layout(brw, mt))
+gen9_miptree_layout_1d(mt);
+ else
+brw_miptree_layout_2d(mt);
  break;
   }
   break;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965/skl: Fix the qpitch value

2015-02-20 Thread Neil Roberts
On Skylake the qpitch value is uploaded as part of the surface state
so we don't need to add the extra rows that are done for other
generations. However for 3D textures it needs to be aligned to the
tile height. Unlike previous generations the qpitch is measured as a
multiple of the block size for compressed surfaces. When the
horizontal mipmap layout is used for 1D textures then the qpitch is
measured in pixels instead of rows.
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c| 44 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 --
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 851742b..d89a04e 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -297,24 +297,48 @@ static void
 brw_miptree_layout_texture_array(struct brw_context *brw,
 struct intel_mipmap_tree *mt)
 {
-   int h0, h1;
unsigned height = mt->physical_height0;
bool layout_1d = use_linear_1d_layout(brw, mt);
-
-   h0 = ALIGN(mt->physical_height0, mt->align_h);
-   h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
-   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD)
-  mt->qpitch = h0;
-   else
-  mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h);
-
-   int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
+   int physical_qpitch;
 
if (layout_1d)
   gen9_miptree_layout_1d(mt);
else
   brw_miptree_layout_2d(mt);
 
+   if (layout_1d) {
+  physical_qpitch = mt->align_h;
+  /* When using the horizontal layout the qpitch is measured in pixels */
+  mt->qpitch = physical_qpitch * mt->total_width;
+   } else if (brw->gen >= 9) {
+  /* On Gen9 we can pick whatever qpitch we like as long as it's aligned
+   * to the vertical alignment so we don't need to add any extra rows.
+   */
+  mt->qpitch = mt->total_height;
+
+  /* However 3D textures need to be aligned to the tile height. At this
+   * point we don't know which tiling will be used so let's just align it
+   * to 32
+   */
+  if (mt->target == GL_TEXTURE_3D)
+ mt->qpitch = ALIGN(mt->qpitch, 32);
+
+  /* Unlike previous generations the qpitch is now a multiple of the
+   * compressed block size so physical_qpitch matches mt->qpitch.
+   */
+  physical_qpitch = mt->qpitch;
+   } else {
+  int h0 = ALIGN(mt->physical_height0, mt->align_h);
+  int h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
+
+  if (mt->array_layout == ALL_SLICES_AT_EACH_LOD)
+ mt->qpitch = h0;
+  else
+ mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h);
+
+  physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
+   }
+
for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
   unsigned img_height;
   img_height = ALIGN(height, mt->align_h);
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index ee9cf1e..247e5b8 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -380,10 +380,14 @@ struct intel_mipmap_tree
enum miptree_array_layout array_layout;
 
/**
-* The distance in rows between array slices in an uncompressed surface.
+* The distance in between array slices.
 *
-* For compressed surfaces, slices are stored closer together physically;
-* the real distance is (qpitch / block height).
+* The value is the one that is sent in the surface state. The actual
+* meaning depends on certain criteria. Usually it is simply the number of
+* uncompressed rows between each slice. However on Gen9+ for compressed
+* surfaces it is the number of blocks. For 1D surfaces that have the
+* mipmap tree stored horizontally it is the number of pixels between each
+* slice.
 */
uint32_t qpitch;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6 v2] i965/skl: Fix the qpitch value

2015-02-23 Thread Neil Roberts
On Skylake the qpitch value is uploaded as part of the surface state
so we don't need to add the extra rows that are done for other
generations. However for 3D textures it needs to be aligned to the
tile height and for depth/stencil textures it needs to be a multiple
of 8. Unlike previous generations the qpitch is measured as a multiple
of the block size for compressed surfaces. When the horizontal mipmap
layout is used for 1D textures then the qpitch is measured in pixels
instead of rows.

v2: Align the depth/stencil textures to a multiple of 8
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c| 52 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 --
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 851742b..834d4ec 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -297,24 +297,56 @@ static void
 brw_miptree_layout_texture_array(struct brw_context *brw,
 struct intel_mipmap_tree *mt)
 {
-   int h0, h1;
unsigned height = mt->physical_height0;
bool layout_1d = use_linear_1d_layout(brw, mt);
-
-   h0 = ALIGN(mt->physical_height0, mt->align_h);
-   h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
-   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD)
-  mt->qpitch = h0;
-   else
-  mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h);
-
-   int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
+   int physical_qpitch;
 
if (layout_1d)
   gen9_miptree_layout_1d(mt);
else
   brw_miptree_layout_2d(mt);
 
+   if (layout_1d) {
+  physical_qpitch = mt->align_h;
+  /* When using the horizontal layout the qpitch is measured in pixels */
+  mt->qpitch = physical_qpitch * mt->total_width;
+   } else if (brw->gen >= 9) {
+  GLenum base_format;
+
+  /* On Gen9 we can pick whatever qpitch we like as long as it's aligned
+   * to the vertical alignment so we don't need to add any extra rows.
+   */
+  mt->qpitch = mt->total_height;
+
+  /* If the surface might be used as a stencil buffer or HiZ buffer then
+   * it needs to be a multiple of 8.
+   */
+  base_format = _mesa_get_format_base_format(mt->format);
+  if (_mesa_is_depth_or_stencil_format(base_format))
+ mt->qpitch = ALIGN(mt->qpitch, 8);
+
+  /* 3D textures need to be aligned to the tile height. At this point we
+   * don't know which tiling will be used so let's just align it to 32
+   */
+  if (mt->target == GL_TEXTURE_3D)
+ mt->qpitch = ALIGN(mt->qpitch, 32);
+
+  /* Unlike previous generations the qpitch is now a multiple of the
+   * compressed block size so physical_qpitch matches mt->qpitch.
+   */
+  physical_qpitch = mt->qpitch;
+   } else {
+  int h0 = ALIGN(mt->physical_height0, mt->align_h);
+  int h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h);
+
+  if (mt->array_layout == ALL_SLICES_AT_EACH_LOD)
+ mt->qpitch = h0;
+  else
+ mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h);
+
+  physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch;
+   }
+
for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
   unsigned img_height;
   img_height = ALIGN(height, mt->align_h);
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index ee9cf1e..247e5b8 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -380,10 +380,14 @@ struct intel_mipmap_tree
enum miptree_array_layout array_layout;
 
/**
-* The distance in rows between array slices in an uncompressed surface.
+* The distance in between array slices.
 *
-* For compressed surfaces, slices are stored closer together physically;
-* the real distance is (qpitch / block height).
+* The value is the one that is sent in the surface state. The actual
+* meaning depends on certain criteria. Usually it is simply the number of
+* uncompressed rows between each slice. However on Gen9+ for compressed
+* surfaces it is the number of blocks. For 1D surfaces that have the
+* mipmap tree stored horizontally it is the number of pixels between each
+* slice.
 */
uint32_t qpitch;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/14] meta: Create temporary pbo in _mesa_meta_pbo_GetTexSubImage()

2015-02-24 Thread Neil Roberts
Anuj Phogat  writes:

> using a flag passed in as function parameter. This will enable
> _mesa_meta_pbo_GetTexSubImage to be used for reading in to
> non-pbo buffers.
>
> This will be useful to support reading from YF/YS tiled surfaces
> in Skylake.
>
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/common/meta.h   |  1 +
>  src/mesa/drivers/common/meta_tex_subimage.c  | 18 --
>  src/mesa/drivers/dri/i965/intel_pixel_read.c |  9 -
>  src/mesa/drivers/dri/i965/intel_tex_image.c  |  3 ++-
>  4 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
> index e7d894d..3de0d87 100644
> --- a/src/mesa/drivers/common/meta.h
> +++ b/src/mesa/drivers/common/meta.h
> @@ -542,6 +542,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
>int xoffset, int yoffset, int zoffset,
>int width, int height, int depth,
>GLenum format, GLenum type, const void *pixels,
> +  bool create_pbo,
>const struct gl_pixelstore_attrib *packing);
>  
>  extern void
> diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
> b/src/mesa/drivers/common/meta_tex_subimage.c
> index 68c8273..cd87a72 100644
> --- a/src/mesa/drivers/common/meta_tex_subimage.c
> +++ b/src/mesa/drivers/common/meta_tex_subimage.c
> @@ -34,6 +34,7 @@
>  #include "macros.h"
>  #include "meta.h"
>  #include "pbo.h"
> +#include "readpix.h"
>  #include "shaderapi.h"
>  #include "state.h"
>  #include "teximage.h"
> @@ -246,6 +247,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
>int xoffset, int yoffset, int zoffset,
>int width, int height, int depth,
>GLenum format, GLenum type, const void *pixels,
> +  bool create_pbo,
>const struct gl_pixelstore_attrib *packing)
>  {
> GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 };
> @@ -257,7 +259,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
> /* XXX: This should probably be passed in from somewhere */
> const char *where = "_mesa_meta_pbo_GetTexSubImage";
>  
> -   if (!_mesa_is_bufferobj(packing->BufferObj))
> +   if (!_mesa_is_bufferobj(packing->BufferObj) && !create_pbo)
>return false;
>  
> if (format == GL_DEPTH_COMPONENT ||
> @@ -282,7 +284,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
>return true;
> }
>  
> -   pbo_tex_image = create_texture_for_pbo(ctx, false, GL_PIXEL_PACK_BUFFER,
> +   pbo_tex_image = create_texture_for_pbo(ctx, create_pbo, 
> GL_PIXEL_PACK_BUFFER,
>width, height, depth,
>format, type, pixels, packing,
>&pbo, &pbo_tex);
> @@ -348,6 +350,18 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
>   GL_COLOR_BUFFER_BIT, GL_NEAREST);
> }
>  
> +   /* If we created a temporary pbo in meta to read the pixel data, that
> +* data will now get copied to memory allocated by client.
> +*/
> +   if (create_pbo) {

I think this should be changed to something like the below:

 if (create_pbo && !_mesa_is_bufferobj(packing->BufferObj))

It looks like the meaning of create_pbo in create_texture_for_pbo is
‘create pbo unless there is already a pbo’. With this patch
_mesa_meta_pbo_GetTexSubImage seems to interpret it to mean that it will
always create the pbo and if there already was a pbo it would end up
clobbering the state.

> +  /* Unbind the pbo from pack binding. */
> +  _mesa_BindBuffer(GL_PIXEL_PACK_BUFFER, 0);

I don't think this unbind is necessary. create_texture_for_pbo is
careful not to modify the PBO binding state and once that function is
finished the PBO is only accessed via the texture so we shouldn't need
to touch the PBO binding.

> +  _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[1]);
> +  _mesa_update_state(ctx);
> +  _mesa_readpixels(ctx, 0, 0, width, height, format, type,
> +   packing, (void *) pixels);

Doesn't this only read the last slice of the texture? Maybe this patch
should wait until this patch from Laura Ekstrand is landed:

http://lists.freedesktop.org/archives/mesa-dev/2015-February/077487.html

Regards,
- Neil


pgpaogQNb9zCf.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/14] i965: Make a function to check the conditions to use the blitter

2015-02-24 Thread Neil Roberts
Anuj Phogat  writes:

> No functional changes in the patch. Just makes the code look cleaner.
>
> Signed-off-by: Anuj Phogat 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 40 
> +++
>  1 file changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index abf20c9..147097b 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -2305,6 +2305,34 @@ can_blit_slice(struct intel_mipmap_tree *mt,
> return true;
>  }
>  
> +static bool
> +use_intel_mipree_map_blit(struct brw_context *brw,
> +  struct intel_mipmap_tree *mt,
> +  GLbitfield mode,
> +  unsigned int level,
> +  unsigned int slice)
> +{
> +   if (brw->has_llc &&
> +  /* It's probably not worth swapping to the blit ring because of
> +   * all the overhead involved.
> +   */
> +   !(mode & GL_MAP_WRITE_BIT) &&
> +   !mt->compressed &&
> +   (mt->tiling == I915_TILING_X ||
> +   /* Prior to Sandybridge, the blitter can't handle Y tiling */
> +   (brw->gen >= 6 && mt->tiling == I915_TILING_Y)) &&

This line used to have an extra space of indentation in the old version
which made it clearer that it is inside the brackets above. I think the
space should be added back here.

The patch is an improvement so I think it's worth landing but I wonder
if now that's in a separate function it might be worth tidying it up a
bit to make the logic easier to follow.

Either way, if the space is added back then it is

Reviewed-by: Neil Roberts 

Regards,
- Neil


pgpJNvsWcsFNX.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/14] meta: Return if pixels == null and no pixel unpack buffer set

2015-02-24 Thread Neil Roberts
It seems like it would be better to fold this into patch 4 because it
is fixing a problem that is effectively introduced by that patch,
isn't it?

Regards,
- Neil


pgphPhjb7Gl0I.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/14] meta: Remove duplicate power of two samples check

2015-02-24 Thread Neil Roberts
All of these patches except 4, 10 and 12 look good to me and are:

Reviewed-by: Neil Roberts 

However, I'm still relatively new to this so maybe my review should be
taken with a pinch of salt.

I've replied with comments for the three patches mentioned.

Regards,
- Neil


pgpW3_ZfUlLbu.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] i965/skl: Layout 3D textures the same as array textures

2015-02-25 Thread Neil Roberts
Ben Widawsky  writes:

>> +  if (mt->target == GL_TEXTURE_3D)
>> + depth = minify(depth, 1);
>> }
>>  }
>
> assert(brw->gen >= 9)? (up to you, I'm very assert happy)

Yes, that seems like a good idea.

>>  
>> @@ -263,7 +266,7 @@ brw_miptree_layout_texture_array(struct brw_context *brw,
>>if (mt->compressed)
>>   img_height /= mt->align_h;
>>  
>> -  for (int q = 0; q < mt->physical_depth0; q++) {
>> +  for (int q = 0; q < mt->level[level].depth; q++) {
>>   if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
>>  intel_miptree_set_image_offset(mt, level, q, 0, q * img_height);
>>   } else {
>
> Can you remind me how you can have different depths per level? Assuming 
> they're
> the same, I prefer this hunk removed because it's much clearer the old way in 
> my
> opinion.

With 3D textures, unlike array textures, the depth is halved at each
mipmap level as well as the width and height so each mipmap level has a
different depth. I'm attaching an SVG which might make it clearer. I
guess that means that using the same layout for array textures and 3D
textures means that 3D textures will now have more unused space, but
that is what the hardware now wants so that's the way it has to be.

I originally didn't make this change and I thought maybe you could get
away with allocating the depth to be the same for each level even though
some of them won't be used but some later code gets very confused and
asserts.

Thanks for the review.

Regards,
- Neil



pgpxxtknrqENs.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] i965/skl: Layout a 1D miptree horizontally

2015-02-25 Thread Neil Roberts
Ben Widawsky  writes:

> Okay, I'm guilty of a bikeshed here, but doesn't this look cleaner if you just
> do two for loops? One for depth, and one for levels.
>
> something like...
> const unsigned depth = mt->physical_depth0;
> for (i = 0; i < depth; i++) {
>   width = mt->physical_width0;
>   for (unsigned level = mt->first_level; level <= mt->last_level; 
> level++) {
>   intel_miptree_set_level_info(mt, level, x, 0, depth);
>   mt->total_width += ALIGN(width, 64);
>   width = minify(width, 1);
>   }
> }

I'm not sure what you mean. This function is just laying out a single
array slice so the depth isn't iterated. The other array slices are laid
out in brw_miptree_layout_texture_array by copying the offsets that this
function generates. It is done the same way for 2D textures.

It might look a bit confusing because the last three parameters of
intel_miptree_set_level_info look like x-offset, y-offset and z-offset
but it's actually x-offset, y-offset and depth of the entire level. The
depth is the same for each level.

I guess I could comment the arguments but the same problem exists in
brw_miptree_layout_2d.

Regards,
- Neil


pgpSYPbhBmvkc.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6 v2] i965/skl: Fix the qpitch value

2015-02-25 Thread Neil Roberts
Ben Widawsky  writes:

>> +   if (layout_1d) {
>> +  physical_qpitch = mt->align_h;
>> +  /* When using the horizontal layout the qpitch is measured in pixels 
>> */
>
> I think the docs words are a bit clearer:
> "Surface QPitch specifies the distance in pixels between array
> slices."

Ok, yeah, I'll change it to that.

>> +  mt->qpitch = physical_qpitch * mt->total_width;
>
> talking to myself - this seems more complicated than it ought to be.

Maybe it would be less confusing if I changed it to
mt->align_h * mt->total_width?

> This patch is a bit trickier than the previous, and I got lazy with fact
> checking in bspec, but it lgtm:
> Reviewed-by: Ben Widawsky 

Thanks again for the reviews.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] meta: In pbo_{Get, }TexSubImage don't repeatedly rebind the source tex

2015-02-25 Thread Neil Roberts
A layered PBO image is now interpreted as a single tall 2D image so
the z argument in _mesa_meta_bind_fbo_image is ignored. Therefore this
was just redundantly rebinding the same image repeatedly.
---
 src/mesa/drivers/common/meta_tex_subimage.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index ffbbacd..407940a 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -217,8 +217,6 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
height : depth;
 
for (z = 1; z < iters; z++) {
-  _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
-pbo_tex_image, z);
   _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
 tex_image, zoffset + z);
 
@@ -350,8 +348,6 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
for (z = 1; z < iters; z++) {
   _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
 tex_image, zoffset + z);
-  _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
-pbo_tex_image, z);
 
   _mesa_update_state(ctx);
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] common: Fix PBOs for 1D_ARRAY.

2015-02-25 Thread Neil Roberts
Sorry for the late review.

Can you explain what this patch does? The previous code was doing a blit
like this:

  _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
 0, z * height, width, (z + 1) * height,
 xoffset, yoffset,
 xoffset + width, yoffset + height,
 GL_COLOR_BUFFER_BIT, GL_NEAREST);

However, it was first setting the height to 1 so that would end up like
this:

  _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
 0, z * 1, width, (z + 1) * 1,
 xoffset, yoffset,
 xoffset + width, yoffset + 1,
 GL_COLOR_BUFFER_BIT, GL_NEAREST);

The patch makes it do this:

 _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
0, z, width, z + 1,
xoffset, yoffset,
xoffset + width, yoffset + 1,
GL_COLOR_BUFFER_BIT, GL_NEAREST);

This looks like it would have exactly the same result.

The patch doesn't modify the blit call for slice 0 which looks wrong to
me.

Neither the version with or without the patch appears to handle the
yoffset correctly because for the 1D array case this needs to be
interpreted as a slice offset.

I'm attaching a patch which I think fixes it, although I haven't managed
to test it with the arb_direct_state_access/gettextureimage-targets test
so I don't know if I'm misunderstanding something. It is also not
complete because it doesn't touch GetTexSubImage. I have tested it with
the texsubimage test which does use the yoffset, but in order to use
that test the code path first needs to be able to accept the
IMAGE_HEIGHT packing option which I will also post a patch for.

Regards,
- Neil

--- >8 --- (use git am --scissors to automatically chop here)
Subject: meta: Fix the y offset for 1D_ARRAY in _mesa_meta_pbo_TexSubImage

This partially reverts 546aba143d13ba3f99 and brings back the if
statement to move the height over to the depth. However it
additionally moves the yoffset to the zoffset. This fixes texsubimage
array pbo for 1D_ARRAY textures.

The previous fix in 546aba143 wasn't taking into account the yoffset
correctly because it needs to be used to alter the selected layer.
---
 src/mesa/drivers/common/meta_tex_subimage.c | 36 ++---
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index 3965d31..be89102 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -138,7 +138,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
-   int z, iters;
+   int z;
 
/* XXX: This should probably be passed in from somewhere */
const char *where = "_mesa_meta_pbo_TexSubImage";
@@ -193,6 +193,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
_mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]);
_mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbos[1]);
 
+   if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
+  assert(depth == 1);
+  assert(zoffset == 0);
+  depth = height;
+  height = 1;
+  image_height = 1;
+  zoffset = yoffset;
+  yoffset = 0;
+   }
+
_mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
  pbo_tex_image, 0);
/* If this passes on the first layer it should pass on the others */
@@ -216,28 +226,18 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
   GL_COLOR_BUFFER_BIT, GL_NEAREST))
   goto fail;
 
-   iters = tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY ?
-   height : depth;
-
-   for (z = 1; z < iters; z++) {
+   for (z = 1; z < depth; z++) {
   _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
 tex_image, zoffset + z);
 
   _mesa_update_state(ctx);
 
-  if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY)
- _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-0, z, width, z + 1,
-xoffset, yoffset,
-xoffset + width, yoffset + 1,
-GL_COLOR_BUFFER_BIT, GL_NEAREST);
-  else
- _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-0, z * image_height,
-width, z * image_height + height,
-xoffset, yo

[Mesa-dev] [PATCH] meta: Allow GL_UNPACK_IMAGE_HEIGHT in _mesa_meta_pbo_TexSubImage

2015-02-25 Thread Neil Roberts
Now that a layered source PBO is interpreted as a single tall 2D image
it's quite easy to accept the image height packing option by just
creating an image that is tall enough to include the image padding.

This is tested by the texsubimage Piglit test with the array and pbo
arguments. Previously this test was skipping this code path because it
always sets the image height.
---
 src/mesa/drivers/common/meta_tex_subimage.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index 407940a..3965d31 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -134,6 +134,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
const struct gl_pixelstore_attrib *packing)
 {
GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 };
+   int full_height, image_height;
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
@@ -167,14 +168,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
   return true;
}
 
-   /* Only accept tightly packed pixels from the user. */
-   if (packing->ImageHeight != 0 && packing->ImageHeight != height)
-  return false;
+   /* For arrays, use a tall (height * depth) 2D texture but taking into
+* account the inter-image padding specified with the image height packing
+* property.
+*/
+   image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight;
+   full_height = image_height * (depth - 1) + height;
 
-   /* For arrays, use a tall (height * depth) 2D texture. */
pbo_tex_image = create_texture_for_pbo(ctx, create_pbo,
   GL_PIXEL_UNPACK_BUFFER,
-  width, height * depth,
+  width, full_height,
   format, type, pixels, packing,
   &pbo, &pbo_tex);
if (!pbo_tex_image)
@@ -230,7 +233,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
 GL_COLOR_BUFFER_BIT, GL_NEAREST);
   else
  _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-0, z * height, width, (z + 1) * height,
+0, z * image_height,
+width, z * image_height + height,
 xoffset, yoffset,
 xoffset + width, yoffset + height,
 GL_COLOR_BUFFER_BIT, GL_NEAREST);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Should GL_UN/PACK_IMAGE_HEIGHT affect GL_TEXTURE_1D_ARRAY?

2015-02-26 Thread Neil Roberts
A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ *
+ * Test that GL_PACK_IMAGE_HEIGHT and GL_UNPACK_IMAGE_HEIGHT affect
+ * retrieving a GL_TEXTURE_1D_ARRAY. Setting the image height should
+ * cause gaps to be introduced between the rows.
+ *
+ * Author: Neil Roberts
+ */
+
+#include "piglit-util-gl.h"
+
+PIGLIT_GL_TEST_CONFIG_BEGIN
+
+   config.supports_gl_compat_version = 11;
+
+PIGLIT_GL_TEST_CONFIG_END
+
+enum piglit_result
+piglit_display(void)
+{
+   /* Not used */
+   return PIGLIT_FAIL;
+}
+
+static bool
+test_tex_image(bool use_pbo)
+{
+   /* 1x2 RGBA image, but with an extra row that should be
+* skipped if image height is working correctly */
+   GLubyte src_tex_data[] = {
+   0xff, 0x00, 0x00, 0xff,
+   0x00, 0xff, 0x00, 0xff, /* padding row */
+   0x00, 0x00, 0xff, 0xff,
+   0x24, 0x24, 0x24, 0xff, /* padding row */
+   };
+   /* Space for a 1x2 RGBA image */
+   GLubyte dst_tex_data[4 * 2];
+   GLuint tex;
+   bool pass = true;
+   GLuint pbo = 0;
+
+   glGenTextures(1, &tex);
+   glBindTexture(GL_TEXTURE_1D_ARRAY, tex);
+
+   if (use_pbo) {
+   glGenBuffers(1, &pbo);
+   glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
+   glBufferData(GL_PIXEL_UNPACK_BUFFER,
+sizeof(src_tex_data),
+src_tex_data,
+GL_STATIC_DRAW);
+   }
+
+   /* Set the image height. This should cause the second row of
+* the source data to be skipped */
+   glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, 2);
+   glTexImage2D(GL_TEXTURE_1D_ARRAY,
+0, /* level */
+GL_RGBA, /* internal format */
+1, 2, /* width/height */
+0, /* border */
+GL_RGBA, /* format */
+GL_UNSIGNED_BYTE,
+use_pbo ? NULL : src_tex_data);
+   glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, 0);
+
+   if (use_pbo) {
+   glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
+   glDeleteBuffers(1, &pbo);
+   }
+
+   memset(dst_tex_data, 0x42, sizeof(dst_tex_data));
+
+   glGetTexImage(GL_TEXTURE_1D_ARRAY,
+ 0, /* level */
+ GL_RGBA, GL_UNSIGNED_BYTE,
+ dst_tex_data);
+
+   if (memcmp(dst_tex_data + 4, src_tex_data + 4 * 2, 4) == 0) {
+   printf("GL_UNPACK_IMAGE_HEIGHT *does* affect "
+  "glTexImage2D(GL_TEXTURE_1D_ARRAY)\n");
+   } else if (memcmp(dst_tex_data + 4, src_tex_data + 4, 4) == 0) {
+   printf("GL_UNPACK_IMAGE_HEIGHT *does not* affect "
+  "glTexImage2D(GL_TEXTURE_1D_ARRAY)\n");
+   pass = false;
+   } else {
+   printf("Unexpected values read from glGetTexImage\n");
+   pass = false;
+   }
+
+   return pass;
+}
+
+static bool
+test_get_tex_image(bool use_pbo)
+{
+   /* 1x2 RGBA image */
+   GLubyte src_tex_data[] = {
+   0xff, 0x00, 0x00, 0xff,
+   0x00, 0xff, 0x00, 0xff
+   };
+   /* Enough space for image padding in the destination texture */
+   GLubyte dst_tex_data[4 * 4];
+   GLuint tex;
+   bool pass = true;
+   GLuint pbo = 0;
+
+   glGenTextures(1, &tex);
+   glBindTexture(GL_TEXTURE_1D_ARRAY, tex);
+   glTexImage2D(GL_TEXTURE_1D_ARRAY,
+0, /* level */
+GL_RGBA, /* internal format */
+1, 2, /* width/height */
+0, /* border */
+GL_RGBA, /* format */
+GL_UNSIGNED_BYTE,
+src_tex_data);
+
+   memset(dst_tex_data, 0x42, sizeof(dst_tex_data));
+
+   if (use_pbo) {
+   glGenBuffers(1, &pbo);
+   glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
+   glBufferData(GL_PIXEL_PACK_BUFFER,
+sizeof(dst_tex_data),
+NULL,
+GL_STREAM_READ);
+   }
+
+   /* Set the image height. This should make the rows be padded out */
+   glPixelStorei(GL_PACK_IMAGE_HEIGHT, 2);
+   glGetTexImage(GL_TEXTURE_1D_ARRAY,
+ 0, /* level */
+ GL_RGBA, GL_UNSIGNED_BYTE,
+ use_pbo ? NULL : dst_tex_data);
+   glPixelStorei(GL_PACK_IMAGE_HEIGHT, 0);
+
+   if (use_pbo) {
+   glGetBufferSubData(GL_PIXEL_PACK_BUFFER,
+  0, /* 

Re: [Mesa-dev] Should GL_UN/PACK_IMAGE_HEIGHT affect GL_TEXTURE_1D_ARRAY?

2015-02-27 Thread Neil Roberts
Ian Romanick  writes:

> IMAGE_HEIGHT is definitely used for glGetTexImage because that and
> IMAGE_WIDTH allow you to get a subimage (before ARB_get_texture_sub_image).
>
> I guess on glTexImage2D those could let you initialize a subregion of a
> larger texture... kind of like doing glTexImage2D(..., NULL) followed by
> glTexSubImage2D.  It does seem pretty weird.

No, that's not right. The packing properties are used to describe the
layout of the client-side image, not the texture, so they aren't a
replacement for glGetTexSubImage or glTexSubImage?D. You can use them to
download to or upload from a sub-region of a larger image in your
client-side malloc'd buffer, but not a subregion of the texture.

I don't think there's any question whether the IMAGE_HEIGHT property
should affect glGetTexImage or glTexImage3D in general but the question
I'm asking is specifically whether it should affect 1D array textures.
The IMAGE_HEIGHT property is normally only used to help specify a
subregion of a 3D image. Arguably you could say that a 1D array texture
is like a 3D image with a width and depth but the height fixed at 1,
however GL doesn't seem to treat it like that because you have to use
glTexImage2D to upload it. It seems more like it is treating it as a 2D
image and in that case IMAGE_HEIGHT doesn't make any sense because it is
only used to create padding between the multiple images of a 3D texture.
You can already use the ROW_LENGTH property to describe the padding
between the 1D images of your client-side representation of the 1D
texture.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Fix the maximum thread count format for the PS

2015-02-27 Thread Neil Roberts
According to the bspec for some reason the format of the maximum
number of threads field has changed from U8-2 to U8-1 for the PS.

I've run this through Piglit and it doesn't cause any regressions.
---
 src/mesa/drivers/dri/i965/gen8_ps_state.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
b/src/mesa/drivers/dri/i965/gen8_ps_state.c
index d4a58e4..5f39e12 100644
--- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
@@ -146,8 +146,13 @@ upload_ps_state(struct brw_context *brw)
 
/* 3DSTATE_PS expects the number of threads per PSD, which is always 64;
 * it implicitly scales for different GT levels (which have some # of PSDs).
+*
+* In Gen8 the format is U8-2 whereas in Gen9 it is U8-1.
 */
-   dw6 |= (64 - 2) << HSW_PS_MAX_THREADS_SHIFT;
+   if (brw->gen >= 9)
+  dw6 |= (64 - 1) << HSW_PS_MAX_THREADS_SHIFT;
+   else
+  dw6 |= (64 - 2) << HSW_PS_MAX_THREADS_SHIFT;
 
if (prog_data->base.nr_params > 0)
   dw6 |= GEN7_PS_PUSH_CONSTANT_ENABLE;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/skl: Ignore the vertical alignment for the qpitch of 1D textures

2015-02-27 Thread Neil Roberts
The vertical alignment is ignored in the surface state for 1D array
textures so we can tightly pack them.

I've run this through Piglit and it doesn't cause any regressions.

(This should probably be squashed into the patch entitled “i965/skl:
Fix the qpitch value”)
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 15a0eca..2969db0 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -323,11 +323,11 @@ brw_miptree_layout_texture_array(struct brw_context *brw,
   brw_miptree_layout_2d(mt);
 
if (layout_1d) {
-  physical_qpitch = mt->align_h;
+  physical_qpitch = 1;
   /* When using the horizontal layout the qpitch specifies the distance in
* pixels between array slices.
*/
-  mt->qpitch = physical_qpitch * mt->total_width;
+  mt->qpitch = mt->total_width;
} else if (brw->gen >= 9) {
   GLenum base_format;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/skl: Ignore the vertical alignment for the qpitch of 1D textures

2015-03-02 Thread Neil Roberts
Ben Widawsky  writes:

> I guess I'd appreciate a comment about how the total_width is
> guaranteed to be a multiple of 64, and therefore is a multiple of all
> possible H_ALIGNS. This is required to meet the qpitch restraint in
> the surface format, "This field must be set to an integer multiple of
> the Surface Horizontal Alignment."

My interpretation of that comment is that for 1D surfaces 64 *is* the
horizontal alignment so other possible alignments don't matter. I think
it's expected that total_width is aligned to the horizontal alignment,
and in this case that is enforced in gen9_miptree_layout_1d. I will add
a comment to make that clearer.

> Note, I don't know anything about compressed textures and what the
> block widths can be, but just doing the math, if block size > 16 and
> not a multiple of 16, this constraint will not hold.

Currently I think all of the block sizes are either 4x4 or 8x4. None of
these would make any sense for a 1D texture and wouldn't make the
texture any smaller. The bspec explicitly disallows compressed formats
for 1D surfaces. If someone were to eventually invent a compressed
format with a block height of 1 then I guess the block alignment would
be handled in gen9_miptree_layout_1d in the unlikely event that the size
isn't a factor of 64.

Note that I'm putting off landing this patch and the other qpitch one
until someone can review patch 6 in the series because without that
patch the qpitch patch seems to cause a GPU hang which makes the kernel
panic.

Thanks for the review.

Regards,
- Neil


pgp6HowmsfWV7.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] meta/TexSubImage: Stash everything other than PIXEL_TRANSFER/store in meta_begin

2015-03-02 Thread Neil Roberts
This looks good to me. It does seem a shame to save everything though
when so little state affects glBlitFramebuffer. However I think even if
we changed it to just save the scissor state we'd still have a problem
with not saving the the GL_FRAMEBUFFER_SRGB state which I think we also
need to do. I guess it's more robust to just save everything in case
more state gets added one day. The meta stuff is really difficult
because every time you add new state you really need to check every use
of _mesa_meta_begin to decide whether that state should affect that
operation or not and it's really easy to get it wrong.

I think the patch could do with a small commit message mentioning that
the scissor and SRGB states were broken.

Either way,
Reviewed-by: Neil Roberts 

Regards,
- Neil

Jason Ekstrand  writes:

> Cc: 10.5 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89292
> ---
>  src/mesa/drivers/common/meta_tex_subimage.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
> b/src/mesa/drivers/common/meta_tex_subimage.c
> index 5ae12c8..802430a 100644
> --- a/src/mesa/drivers/common/meta_tex_subimage.c
> +++ b/src/mesa/drivers/common/meta_tex_subimage.c
> @@ -192,8 +192,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
> dims,
> if (allocate_storage)
>ctx->Driver.AllocTextureImageBuffer(ctx, tex_image);
>  
> -   /* Only stash the current FBO */
> -   _mesa_meta_begin(ctx, 0);
> +   _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER |
> +   MESA_META_PIXEL_STORE));
>  
> _mesa_GenFramebuffers(2, fbos);
> _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]);
> @@ -311,8 +311,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
> GLuint dims,
> if (!pbo_tex_image)
>return false;
>  
> -   /* Only stash the current FBO */
> -   _mesa_meta_begin(ctx, 0);
> +   _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER |
> +   MESA_META_PIXEL_STORE));
>  
> _mesa_GenFramebuffers(2, fbos);
>  
> -- 
> 2.3.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgp0l5y_C1LVa.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main/base_tex_format: Properly handle STENCIL_INDEX1/4/16

2015-03-03 Thread Neil Roberts
Jason Ekstrand  writes:

> On Mon, Mar 2, 2015 at 11:33 AM, Ilia Mirkin  wrote:
>
>> On Mon, Mar 2, 2015 at 2:32 PM, Jason Ekstrand 
>> wrote:
>> >
>> >
>> > On Mon, Mar 2, 2015 at 11:18 AM, Ilia Mirkin 
>> wrote:
>> >>
>> >> Hmmm... I was just looking at this code in connection to attepmting to
>> >> enable ARB_texture_stencil8, and it _seems_ like that should be if
>> >> (ARB_texture_stencil8) -- I didn't see what in ARB_stencil_texturing
>> >> had to do with being able to have a GL_STENCIL_INDEX* internal
>> >> format...
>> >
>> >
>> > I just pushed it because it does fix a bug...  However, you do raise a
>> good
>> > point.  Maybe we should change the i965 driver to use GL_STENCIL_INDEX
>> for
>> > the internal format for our internal stencil buffers.
>>
>> How do you end up with a GL_STENCIL_INDEX internal format in the first
>> place? Without ARB_texture_stencil8 that's not a thing, is it?
>>
>
> We're using it internally for doing stencil blits.  I'm going to Cc Neil
> and he can explain what's going on there better than I can as he was the
> one who wrote most of that code.

Do you mean the code to do the meta stencil blit? I haven't touched
that. Maybe I can pass the buck on to Topi :)

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/skl: Disable SIMD16 when 3-source instructions are used

2015-03-04 Thread Neil Roberts
Stepping C0 of Skylake fails when using SIMD16 with 3-source
instructions (such as MAD). This patch just makes it disable SIMD16 in
that case.

This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
tests.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 21 +
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
 2 files changed, 22 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 533feb4..62f29e5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3729,6 +3729,26 @@ fs_visitor::fixup_3src_null_dest()
}
 }
 
+/**
+ * 3-source instructions don't work on Skylake stepping C0. If the revision
+ * isn't known then it will take the safe option and disable it in case it's
+ * C0. Implements WaDisableSIMD16On3SrcInstr.
+ */
+void
+fs_visitor::disable_simd16_with_3src()
+{
+   if (brw->gen != 9 || (brw->revision != 2 && brw->revision != -1))
+  return;
+
+   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
+  if (inst->is_3src()) {
+ no16("SIMD16 doesn't work with 3-source instructions on "
+  "Gen9 stepping C0\n");
+ return;
+  }
+   }
+}
+
 void
 fs_visitor::allocate_registers()
 {
@@ -3910,6 +3930,7 @@ fs_visitor::run_fs()
   assign_urb_setup();
 
   fixup_3src_null_dest();
+  disable_simd16_with_3src();
   allocate_registers();
 
   if (failed)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 70098d8..7091b9f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -196,6 +196,7 @@ public:
void setup_payload_gen6();
void setup_vs_payload();
void fixup_3src_null_dest();
+   void disable_simd16_with_3src();
void assign_curb_setup();
void calculate_urb_setup();
void assign_urb_setup();
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Store the GPU revision number in brw_context

2015-03-04 Thread Neil Roberts
brwContextInit now queries the GPU revision number via a new parameter
for DRM_I915_GETPARAM. This new parameter requires a kernel patch and
a patch to libdrm. If the kernel doesn't support it then it will
continue but set the revision number to -1. The intention is to use
this to implement workarounds that are only needed on certain
steppings of the GPU.
---
 configure.ac|  2 +-
 src/mesa/drivers/dri/i965/brw_context.c | 19 +++
 src/mesa/drivers/dri/i965/brw_context.h |  4 
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c5abbfd..23ac927 100644
--- a/configure.ac
+++ b/configure.ac
@@ -61,7 +61,7 @@ AC_SUBST([OSMESA_VERSION])
 dnl Versions for external dependencies
 LIBDRM_REQUIRED=2.4.38
 LIBDRM_RADEON_REQUIRED=2.4.56
-LIBDRM_INTEL_REQUIRED=2.4.52
+LIBDRM_INTEL_REQUIRED=2.4.59
 LIBDRM_NVVIEUX_REQUIRED=2.4.33
 LIBDRM_NOUVEAU_REQUIRED="2.4.33 libdrm >= 2.4.41"
 LIBDRM_FREEDRENO_REQUIRED=2.4.57
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 786e6f5..18f72d8 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -651,6 +651,24 @@ brw_process_driconf_options(struct brw_context *brw)
   driQueryOptionb(options, "allow_glsl_extension_directive_midshader");
 }
 
+static int
+brw_get_revision(int fd)
+{
+   struct drm_i915_getparam gp;
+   int revision;
+   int ret;
+
+   memset(&gp, 0, sizeof(gp));
+   gp.param = I915_PARAM_REVISION;
+   gp.value = &revision;
+
+   ret = drmCommandWriteRead(fd, DRM_I915_GETPARAM, &gp, sizeof(gp));
+   if (ret)
+  revision = -1;
+
+   return revision;
+}
+
 GLboolean
 brwCreateContext(gl_api api,
 const struct gl_config *mesaVis,
@@ -709,6 +727,7 @@ brwCreateContext(gl_api api,
brw->has_negative_rhw_bug = devinfo->has_negative_rhw_bug;
brw->needs_unlit_centroid_workaround =
   devinfo->needs_unlit_centroid_workaround;
+   brw->revision = brw_get_revision(sPriv->fd);
 
brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil;
brw->has_swizzling = screen->hw_has_swizzling;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 682fbe9..10e8274 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1064,6 +1064,10 @@ struct brw_context
 
int gen;
int gt;
+   /* GT revision. This will be -1 if the revision couldn't be determined (eg,
+* if the kernel doesn't support the query).
+*/
+   int revision;
 
bool is_g4x;
bool is_baytrail;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/skl: Disable SIMD16 when 3-source instructions are used

2015-03-04 Thread Neil Roberts
Ilia Mirkin  writes:

> On Wed, Mar 4, 2015 at 9:33 AM, Neil Roberts  wrote:
>> Stepping C0 of Skylake fails when using SIMD16 with 3-source
>> instructions (such as MAD). This patch just makes it disable SIMD16 in
>> that case.
>>
>> This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
>> tests.
>
> Just curious -- is this faster than lowering MAD into MUL + ADD but
> staying in SIMD16? (Should be fairly easy to implement as a post-RA
> fixup I'd think, although not totally familiar with how the i965
> compiler works.)

I haven't tested it but I think you're right that it would probably be
faster to lower the instruction. However that sounds like more work and
it's only relevant for a limited set of hardware so it doesn't seem
worth doing. I think there are also other 3-source instructions and it
might be more difficult to lower those (although I'm not sure).

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965/skl: Disable SIMD16 when 3-source instructions are used

2015-03-04 Thread Neil Roberts
Steppings C0 and D0 of Skylake fail when using SIMD16 with 3-source
instructions (such as MAD). This patch just makes it disable SIMD16 in
those cases.

This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
tests.

v2: Also apply on stepping D0
---

Damien Lespiau pointed out that the workaround also needs to be
applied on stepping D0.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 22 ++
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
 2 files changed, 23 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 533feb4..b61208b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3729,6 +3729,27 @@ fs_visitor::fixup_3src_null_dest()
}
 }
 
+/**
+ * 3-source instructions don't work on Skylake steppings C0 and D0. If the
+ * revision isn't known then it will take the safe option and disable it in
+ * case it's one of those. Implements WaDisableSIMD16On3SrcInstr.
+ */
+void
+fs_visitor::disable_simd16_with_3src()
+{
+   if (brw->gen != 9 ||
+   (brw->revision != 2 && brw->revision != 3 && brw->revision != -1))
+  return;
+
+   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
+  if (inst->is_3src()) {
+ no16("SIMD16 doesn't work with 3-source instructions on "
+  "Gen9 steppings C0 and D0\n");
+ return;
+  }
+   }
+}
+
 void
 fs_visitor::allocate_registers()
 {
@@ -3910,6 +3931,7 @@ fs_visitor::run_fs()
   assign_urb_setup();
 
   fixup_3src_null_dest();
+  disable_simd16_with_3src();
   allocate_registers();
 
   if (failed)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 70098d8..7091b9f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -196,6 +196,7 @@ public:
void setup_payload_gen6();
void setup_vs_payload();
void fixup_3src_null_dest();
+   void disable_simd16_with_3src();
void assign_curb_setup();
void calculate_urb_setup();
void assign_urb_setup();
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] meta: Fix the y offset for 1D_ARRAY in _mesa_meta_pbo_TexSubImage

2015-03-04 Thread Neil Roberts
The yoffset needs to be interpreted as a slice offset for 1D array
textures. This patch implements that by moving the yoffset into
zoffset similar to how it moves the height into depth.
---
 src/mesa/drivers/common/meta_tex_subimage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index f360d64..e29addb 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -205,8 +205,12 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
 
if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
   assert(depth == 1);
+  assert(zoffset == 0);
   depth = height;
   height = 1;
+  image_height = 1;
+  zoffset = yoffset;
+  yoffset = 0;
}
 
_mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
@@ -322,8 +326,12 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
 
if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
   assert(depth == 1);
+  assert(zoffset == 0);
   depth = height;
   height = 1;
+  image_height = 1;
+  zoffset = yoffset;
+  yoffset = 0;
}
 
/* If we were given a texture, bind it to the read framebuffer.  If not,
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] meta: Allow GL_UN/PACK_IMAGE_HEIGHT in _mesa_meta_pbo_Get/TexSubImage

2015-03-04 Thread Neil Roberts
Now that a layered source PBO is interpreted as a single tall 2D image
it's quite easy to accept the image height packing option by just
creating an image that is tall enough to include the image padding.

I'm not sure whether the image height property should affect 1D_ARRAY
textures. My intuition and interpretation of the GL spec (which is a
bit vague) would be that it shouldn't. However the software fallback
path in Mesa uses the property for packing but not for unpacking. The
binary NVidia driver uses it for both. This patch doesn't use it for
either case so it is different from the software fallback. There is
some discussion about this here:

http://lists.freedesktop.org/archives/mesa-dev/2015-February/077925.html

This is tested by the texsubimage Piglit test with the array and pbo
arguments. Previously this test was skipping this code path because it
always sets the image height.

I've also tested it by modifying the getteximage-targets test. It
wasn't using this code path before because it was using the default
texture object so this code couldn't successfully create a frame
buffer. I also modified it to add some image padding with the image
height in the PBO.
---
 src/mesa/drivers/common/meta_tex_subimage.c | 32 ++---
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index 1fef79d..f360d64 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -144,6 +144,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
const struct gl_pixelstore_attrib *packing)
 {
GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 };
+   int full_height, image_height;
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
@@ -177,14 +178,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
   return true;
}
 
-   /* Only accept tightly packed pixels from the user. */
-   if (packing->ImageHeight != 0 && packing->ImageHeight != height)
-  return false;
+   /* For arrays, use a tall (height * depth) 2D texture but taking into
+* account the inter-image padding specified with the image height packing
+* property.
+*/
+   image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight;
+   full_height = image_height * (depth - 1) + height;
 
-   /* For arrays, use a tall (height * depth) 2D texture. */
pbo_tex_image = create_texture_for_pbo(ctx, create_pbo,
   GL_PIXEL_UNPACK_BUFFER,
-  width, height * depth,
+  width, full_height,
   format, type, pixels, packing,
   &pbo, &pbo_tex);
if (!pbo_tex_image)
@@ -236,7 +239,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
   _mesa_update_state(ctx);
 
   _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
- 0, z * height, width, (z + 1) * height,
+ 0, z * image_height,
+ width, z * image_height + height,
  xoffset, yoffset,
  xoffset + width, yoffset + height,
  GL_COLOR_BUFFER_BIT, GL_NEAREST);
@@ -263,6 +267,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
   const struct gl_pixelstore_attrib *packing)
 {
GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 };
+   int full_height, image_height;
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
@@ -296,13 +301,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
   return true;
}
 
-   /* Only accept tightly packed pixels from the user. */
-   if (packing->ImageHeight != 0 && packing->ImageHeight != height)
-  return false;
+   /* For arrays, use a tall (height * depth) 2D texture but taking into
+* account the inter-image padding specified with the image height packing
+* property.
+*/
+   image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight;
+   full_height = image_height * (depth - 1) + height;
 
-   /* For arrays, use a tall (height * depth) 2D texture. */
pbo_tex_image = create_texture_for_pbo(ctx, false, GL_PIXEL_PACK_BUFFER,
-  width, height * depth,
+  width, full_height * depth,
   format, type, pixels, packing,
   &pbo, &pbo_tex);
if (!pbo_tex_image)
@@ -361,7 +368,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
   _mesa_meta_BlitFramebuf

[Mesa-dev] [PATCH 1/3] Revert "common: Fix PBOs for 1D_ARRAY."

2015-03-04 Thread Neil Roberts
This reverts commit 546aba143d13ba3f993ead4cc30b2404abfc0202.

I think the changes to the calls to glBlitFramebuffer from this patch
are no different to what it was doing previously because it used to
set height to 1 before doing the blits. However it was introducing
some problems with the blit for layer 0 because this was no longer
special cased. It didn't fix problems with the yoffset which needs to
be interpreted as a slice offset. I think a better solution would be
to modify the original if statement to cope with the yoffset.

Conflicts:
src/mesa/drivers/common/meta_tex_subimage.c
---
 src/mesa/drivers/common/meta_tex_subimage.c | 62 -
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
b/src/mesa/drivers/common/meta_tex_subimage.c
index 971ed59..1fef79d 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -147,7 +147,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
-   int z, iters;
+   int z;
 
/* XXX: This should probably be passed in from somewhere */
const char *where = "_mesa_meta_pbo_TexSubImage";
@@ -200,6 +200,12 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
_mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]);
_mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbos[1]);
 
+   if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
+  assert(depth == 1);
+  depth = height;
+  height = 1;
+   }
+
_mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
  pbo_tex_image, 0);
/* If this passes on the first layer it should pass on the others */
@@ -223,27 +229,17 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint 
dims,
   GL_COLOR_BUFFER_BIT, GL_NEAREST))
   goto fail;
 
-   iters = tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY ?
-   height : depth;
-
-   for (z = 1; z < iters; z++) {
+   for (z = 1; z < depth; z++) {
   _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
 tex_image, zoffset + z);
 
   _mesa_update_state(ctx);
 
-  if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY)
- _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-0, z, width, z + 1,
-xoffset, yoffset,
-xoffset + width, yoffset + 1,
-GL_COLOR_BUFFER_BIT, GL_NEAREST);
-  else
- _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-0, z * height, width, (z + 1) * height,
-xoffset, yoffset,
-xoffset + width, yoffset + height,
-GL_COLOR_BUFFER_BIT, GL_NEAREST);
+  _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
+ 0, z * height, width, (z + 1) * height,
+ xoffset, yoffset,
+ xoffset + width, yoffset + height,
+ GL_COLOR_BUFFER_BIT, GL_NEAREST);
}
 
success = true;
@@ -270,7 +266,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
struct gl_texture_image *pbo_tex_image;
GLenum status;
bool success = false;
-   int z, iters;
+   int z;
 
/* XXX: This should probably be passed in from somewhere */
const char *where = "_mesa_meta_pbo_GetTexSubImage";
@@ -317,6 +313,12 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
 
_mesa_GenFramebuffers(2, fbos);
 
+   if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) {
+  assert(depth == 1);
+  depth = height;
+  height = 1;
+   }
+
/* If we were given a texture, bind it to the read framebuffer.  If not,
 * we're doing a ReadPixels and we should just use whatever framebuffer
 * the client has bound.
@@ -350,29 +352,17 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
GLuint dims,
   GL_COLOR_BUFFER_BIT, GL_NEAREST))
   goto fail;
 
-   if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY)
-  iters = height;
-   else
-  iters = depth;
-
-   for (z = 1; z < iters; z++) {
+   for (z = 1; z < depth; z++) {
   _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
 tex_image, zoffset + z);
 
   _mesa_update_state(ctx);
 
-  if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY)
- _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer,
-xoffset, yoffset,
-  

Re: [Mesa-dev] [PATCH 2/2] i965/skl: Break down SIMD16 3-source instructions when required.

2015-03-05 Thread Neil Roberts
Yes, I like this approach much better. I ran it through Piglit and I can
confirm it fixes the same tests as my patch.

Reviewed-by: Neil Roberts 

There's no need to reset the author to me. Thanks for looking at this.

Regards,
- Neil

Kenneth Graunke  writes:

> Several steppings of Skylake fail when using SIMD16 with 3-source
> instructions (such as MAD).
>
> This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
> tests.
>
> Based on a patch by Neil Roberts.
>
> Signed-off-by: Kenneth Graunke 
> Cc: Neil Roberts 
> Cc: Matt Turner 
> ---
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 6 ++
>  1 file changed, 6 insertions(+)
>
> Neil, what do you think of this approach?  It's a bit smaller of a hammer
> than turning off SIMD16 altogether, and pretty simple.  I haven't tested
> it at all, though.
>
> Feel free to --reset-author and claim authorship on this patch - it's
> really your code, I just moved it over a bit.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index fed4ba3..74c0e50 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -400,6 +400,12 @@ bool
>  brw_instruction_supports_simd16(const struct brw_context *brw, enum opcode 
> op)
>  {
> bool supports_3src = brw->is_haswell || brw->gen >= 8;
> +   /* WaDisableSIMD16On3SrcInstr: 3-source instructions don't work in SIMD16
> +* on a few steppings of Skylake.
> +*/
> +   if (brw->gen == 9 && (brw->revision == 2 || brw->revision == 3 ||
> + brw->revision == -1))
> +  supports_3src = false;
>  
> switch (op) {
> case BRW_OPCODE_MAD:
> -- 
> 2.2.2


pgpZro1cTftj9.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   >