Re: [Mesa-dev] [PATCH] i965: correct alignment units for 2D compressed textures on Skylake
Hi Nanley, Could you explain the reasoning behind this patch? I can't find any mention of needing to align to the square of the block size in the docs. I think how it works is that on Skylake you can pick any alignment value you want out of 4, 8 or 16 but for compressed textures that is counted in units of blocks rather than pixels. Currently we effectively always pick 4 for compressed textures because I don't think there's any reason to align to a higher value. The mt->align_w/h values are used for two things; in intel_miptree_set_total_width_height they are used to choose the positions for the mipmap images and after that they are only used to upload the alignment values as part of the surface state. On Skylake mt->align_w/h is temporarily set to be in units of pixels during brw_miptree_layout but at the end of the function it divides the values by the block size so that they will be in units of blocks ready for uploading as part of the surface state. Your patch modifies how it picks the alignment value but it doesn't modify the bit at the end which divides the chosen alignment by the block size. For FXT1 I think that would end up making it pick a horizontal alignment value of ALIGN_8 (ie, the pixel alignment value is now 8*8=64 which when divided by the block width ends up as 64/8=ALIGN_8). Before my patches for Skylake bits of the code were lazily assuming that mt->align_w/h is always the same as the block size because that previously happened to be always correct. I fixed some of the cases within the layouting code to instead directly query the block size. However it looks like I missed one in intel_miptree_copy_slice. I think your patch would make this particular case work because now the alignment value is ALIGN_8 which happens to match the block width for FXT1. I'm about to post an alternative patch which fixes this case to use the block size instead and it does seem to fix the broken FXT1 test cases as well. Regards, - Neil Nanley Chery writes: > From: Nanley Chery > > On Gen9+, vertical and horizontal alignment values for compressed textures are > equal to the pre-Gen9 value squared. Each miplevel must be aligned to this > value. > > Signed-off-by: Nanley Chery > --- > > This fixes an FXT1 Piglit test regression and shows no failures on Jenkins. > > src/mesa/drivers/dri/i965/brw_tex_layout.c | 20 > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > index 312a887..dffc699 100644 > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > @@ -149,15 +149,8 @@ intel_horizontal_texture_alignment_unit(struct > brw_context *brw, >unsigned int i, j; >_mesa_get_format_block_size(mt->format, &i, &j); > > - /* On Gen9+ we can pick our own alignment for compressed textures but > it > - * has to be a multiple of the block size. The minimum alignment we can > - * pick is 4 so we effectively have to align to 4 times the block > - * size > - */ > - if (brw->gen >= 9) > - return i * 4; > - else > - return i; > + /* On Gen9+ the alignment value is squared. */ > + return brw->gen >= 9 ? i * i : i; > } > > if (mt->format == MESA_FORMAT_S_UINT8) > @@ -269,9 +262,12 @@ intel_vertical_texture_alignment_unit(struct brw_context > *brw, > * Where "*" means either VALIGN_2 or VALIGN_4 depending on the setting of > * the SURFACE_STATE "Surface Vertical Alignment" field. > */ > - if (_mesa_is_format_compressed(mt->format)) > + if (_mesa_is_format_compressed(mt->format)) { > + unsigned int i, j; > + _mesa_get_format_block_size(mt->format, &i, &j); >/* See comment above for the horizontal alignment */ > - return brw->gen >= 9 ? 16 : 4; > + return brw->gen >= 9 ? j * j : j; > + } > > if (mt->format == MESA_FORMAT_S_UINT8) >return brw->gen >= 7 ? 8 : 4; > @@ -379,7 +375,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) > > if (mt->compressed) { >mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + > - ALIGN(minify(mt->physical_width0, 2), bw); > + ALIGN(minify(mt->physical_width0, 2), mt->align_w); > } else { >mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + > minify(mt->physical_width0, 2); > -- > 2.4.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Fix aligning to the block size in intel_miptree_copy_slice
This function was trying to align the width and height to a multiple of the block size for compressed textures. It was using align_w/h as a shortcut to get the block size as up until Gen9 this always happens to match. However in Gen9+ the alignment values are expressed as multiples of the block size so in effect the alignment values are always 4 for compressed textures as that is the minimum value we can pick. This happened to work for most compressed formats because the block size is also 4, but for FXT1 this was breaking because it has a block width of 8. This fixes some Piglit tests testing FXT1 such as spec@3dfx_texture_compression_fxt1@fbo-generatemipmap-formats --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 615cbfb..6c00581 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1168,8 +1168,10 @@ intel_miptree_copy_slice(struct brw_context *brw, assert(src_mt->format == dst_mt->format); if (dst_mt->compressed) { - height = ALIGN(height, dst_mt->align_h) / dst_mt->align_h; - width = ALIGN(width, dst_mt->align_w); + unsigned int i, j; + _mesa_get_format_block_size(dst_mt->format, &i, &j); + height = ALIGN(height, j) / j; + width = ALIGN(width, i); } /* If it's a packed depth/stencil buffer with separate stencil, the blit -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta: Abort texture upload if pixels == null and no pixel unpack buffer set
Ah, good catch. Looks good to me. Reviewed-by: Neil Roberts It seems a bit weird to use create_pbo=true at all for glTexImage{1,2,3}D because in that case we are completely replacing the texture. If the texture's buffer is busy instead of allocating a PBO we might as well just directly allocate some new storage for the texture and abandon the old storage. That would be a separate patch though and either way I think this patch makes sense. Anuj Phogat writes: > in case of glTex{Sub}Image{1,2,3}D(). I think in practice this patch only applies to glTexImage* (not glTexSubImage*) doesn't it? It wouldn't make any sense to call glTexSubImage without a PBO and with NULL pixels. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Don't create a temp PBO when uploading data from glTexImage*
Previously when glTexImage* is called it would attempt to create a temporary PBO if the texture is busy in order to avoid blocking when mapping the texture. This doesn't make much sense for glTexImage because in that case we are completely replacing the texture anyway so instead of allocating a PBO we can just allocate new storage for the texture. The code was buggy anyway because it was checking whether the buffer was busy before calling Driver->AllocTextureImageBuffer. That function actually always frees the buffer and recreates a new one so it was checking whether the previous buffer was busy and this is irrelevant. In practice I think this wouldn't matter too much because the upper layers of Mesa always call Driver->FreeTextureImageBuffer before calling Driver->TexImage anyway so there would never be a buffer that could be busy. --- src/mesa/drivers/dri/i965/intel_tex_image.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c index 85d3d04..2874e5b 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_image.c +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c @@ -95,8 +95,6 @@ intelTexImage(struct gl_context * ctx, struct intel_texture_image *intelImage = intel_texture_image(texImage); bool ok; - bool tex_busy = intelImage->mt && drm_intel_bo_busy(intelImage->mt->bo); - DBG("%s mesa_format %s target %s format %s type %s level %d %dx%dx%d\n", __func__, _mesa_get_format_name(texImage->TexFormat), _mesa_lookup_enum_by_nr(texImage->TexObject->Target), @@ -116,7 +114,8 @@ intelTexImage(struct gl_context * ctx, texImage->Depth, format, type, pixels, false /*allocate_storage*/, - tex_busy, unpack); + false /*create_pbo*/, + unpack); if (ok) return; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta: Abort texture upload if pixels == null and no pixel unpack buffer set
Neil Roberts writes: > It seems a bit weird to use create_pbo=true at all for > glTexImage{1,2,3}D because in that case we are completely replacing > the texture. If the texture's buffer is busy instead of allocating a > PBO we might as well just directly allocate some new storage for the > texture and abandon the old storage. That would be a separate patch > though and either way I think this patch makes sense. I've filed a patch for this: http://patchwork.freedesktop.org/patch/51757/ I think it would be good to land your patch as well anyway. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix aligning to the block size in intel_miptree_copy_slice
Hi, Could you perhaps send me a list of the regressions? I ran it through on my Skylake and the only changes I see are the two FXT1 tests. However, I have to admit that my device is being a bit flakey so I cheated a little bit to make this work. Originally when I compared the differences with and without the patch I had about 150 differences (some fixes, some regressions) that were pretty random. I got the list of these with "piglit summary console -d" and then reran just those tests (with the --test-list option) after a reboot with and without the patch. I then have a noddy Python script to merge a Piglit results file so that it replaces test results from the first file with any that appear in the second file. Once I do that then there are no regressions and only the two FXT1 tests appear in the fixes. It might be interesting to see if you can run some of the regressing tests by hand after a reboot to see if they fail consistently. Regards, - Neil Nanley Chery writes: > Hey Neil, > > While this patch does fix FXT1, it also regresses 21 other Piglit tests on > SKL. > > - Nanley > > On Thu, Jun 11, 2015 at 8:59 AM, Neil Roberts wrote: >> This function was trying to align the width and height to a multiple >> of the block size for compressed textures. It was using align_w/h as a >> shortcut to get the block size as up until Gen9 this always happens to >> match. However in Gen9+ the alignment values are expressed as >> multiples of the block size so in effect the alignment values are >> always 4 for compressed textures as that is the minimum value we can >> pick. This happened to work for most compressed formats because the >> block size is also 4, but for FXT1 this was breaking because it has a >> block width of 8. >> >> This fixes some Piglit tests testing FXT1 such as >> >> spec@3dfx_texture_compression_fxt1@fbo-generatemipmap-formats >> --- >> src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> index 615cbfb..6c00581 100644 >> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> @@ -1168,8 +1168,10 @@ intel_miptree_copy_slice(struct brw_context *brw, >> assert(src_mt->format == dst_mt->format); >> >> if (dst_mt->compressed) { >> - height = ALIGN(height, dst_mt->align_h) / dst_mt->align_h; >> - width = ALIGN(width, dst_mt->align_w); >> + unsigned int i, j; >> + _mesa_get_format_block_size(dst_mt->format, &i, &j); >> + height = ALIGN(height, j) / j; >> + width = ALIGN(width, i); >> } >> >> /* If it's a packed depth/stencil buffer with separate stencil, the blit >> -- >> 1.9.3 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Don't create a temp PBO when uploading data from glTexImage*
> On Fri, Jun 12, 2015 at 7:34 AM, Neil Roberts wrote: >> The code was buggy anyway because it was checking whether the buffer >> was busy before calling Driver->AllocTextureImageBuffer. That function >> actually always frees the buffer and recreates a new one so it was >> checking whether the previous buffer was busy and this is irrelevant. Jason Ekstrand writes: > I'm not sure this is correct. You can still do partial updates with > TexImage if you are updating a single miplevel or cube face. Ah yes, you're right. I didn't look into the code far enough. When it calls Driver->AllocTextureImageBuffer it will likely get the same miptree back because there is a pointer to it from the texture object. However, it still seems like creating a PBO would be counter-productive most of the time because you are most probably going to replace all of the mipmaps if you replace one of them. Maybe a good thing to do would be to check whether intel_texobj->mt is busy in intel_alloc_texture_image_buffer and then let it allocate a new mt in that case. That way in the likely case that the rest of the images are updated too then it would use a new buffer and avoid blocking without introducing an extra copy. If it turns out that the application really is just updating one image then it will be copied in anyway when the texture is finalized which achieves more or less the same thing as using a PBO. >> In practice I think this wouldn't matter too much because the upper >> layers of Mesa always call Driver->FreeTextureImageBuffer before >> calling Driver->TexImage anyway so there would never be a buffer that >> could be busy. I think this point still holds and means it's effectively impossible to hit the create_pbo=true case. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: correct alignment units for 2D compressed textures on Skylake
Nanley Chery writes: > Although most of the patch is incorrect, the following change is still > necessary isn't it? > if (mt->compressed) { >mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + > - ALIGN(minify(mt->physical_width0, 2), bw); > + ALIGN(minify(mt->physical_width0, 2), mt->align_w); > > Each LOD is aligned by the alignment term. I don't see why we're > switching to block width here. I'm pretty sure the original version is correct and we do want to align to the block width here. This bit of code is not being used to work out the position of an image so it doesn't need to take into account the image alignment. It is only being used to calculate the mt->total_width value. That value is not aligned to the image alignment. You can see this because it will usually just be directly taken from mt->physical_width0 and that is not aligned. In the non-compressed case just below you can see that the second half of the addition is not aligned to anything at all. The total_width does however need to be at least a multiple of the block size because it isn't possible to allocate space for half of a block. If I understand correctly this code is just trying to cope with cases where the third image in the mipmap is positioned so that it extends past the width of the first image. If that happens then the width of the image containing all of the mipmap images needs to be extended slightly or it would crop the third mipmap image. Please see the attached SVG. Looking at it a bit more I think this bit above is wrong: if (mt->compressed) { mt->total_width = ALIGN(mt->physical_width0, mt->align_w); } That should also be using bw instead of mt->align_w for the same reason. I think it could only end up making the total_width a bit larger than necessary so it probably isn't causing any actual problems. I'll make a patch and test it. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size
brw_miptree_layout_2d tries to ensure that mt->total_width is a multiple of the compressed block size, presumably because it wouldn't be possible to make an image that has a fraction of a block. However it was doing this by aligning mt->total_width to align_w. Previously align_w has been used as a shortcut for getting the block width because before Gen9 the block width was always equal to the alignment. Commit 4ab8d59a2 tried to fix these cases to use the block width instead of the alignment but it missed this case. I think in practice this probably won't make any difference because the buffer for the texture will be allocated to be large enough to contain the entire pitch and libdrm aligns the pitch to the tile width anyway. However I think the patch is worth having to make the intention clearer. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 1e7d8a1..dbb6cef 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) mt->total_width = mt->physical_width0; - if (mt->compressed) { - mt->total_width = ALIGN(mt->physical_width0, mt->align_w); - } + if (mt->compressed) + mt->total_width = ALIGN(mt->total_width, bw); /* May need to adjust width to accommodate the placement of * the 2nd mipmap. This occurs when the alignment -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation
Hi, If we are going to have to prod all of the code using this list implementation to solve this problem anyway maybe it would make more sense to switch to a kernel-style list instead. There is already an implementation of this in src/util/list.h. I think this style of list only accesses the pointers through a single struct so there shouldn't be any aliasing issues. It should also end up using one fewer pointer to store the head sentinel. It would be nice to reduce the number of list implementations used in Mesa. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size
Ben Widawsky writes: > I think this is beginning to infringe upon the definition of align_w. > The total width is a function of it's miptree properties and not the > compressed block properties, right? > > In other words, if there is a case where align_w != bw, I think > total_width should be aligned to align_w, NOT bw. I don't think it's so clear cut. In practice the mt->total_width doesn't really need to be aligned to anything because as far as I can tell it is only used to calculate the row stride. The row stride is separately aligned to whatever constraints necessary by libdrm so it doesn't really matter what we pick here. The reason I think that the intention was to align it to the block width rather than the horizontal alignment is that in the non-compressed case the total width isn't aligned to anything at all. It's probably not worth making too much of a fuss over this patch seeing as it doesn't make any practical difference. I'm happy to forget about it and pretend I never noticed the inconsistency. Regards, - Neil > > (I'm not opposed to the patch, just making sure I understand.) > >> --- >> src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c >> b/src/mesa/drivers/dri/i965/brw_tex_layout.c >> index 1e7d8a1..dbb6cef 100644 >> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c >> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c >> @@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) >> >> mt->total_width = mt->physical_width0; >> >> - if (mt->compressed) { >> - mt->total_width = ALIGN(mt->physical_width0, mt->align_w); >> - } >> + if (mt->compressed) >> + mt->total_width = ALIGN(mt->total_width, bw); >> >> /* May need to adjust width to accommodate the placement of >> * the 2nd mipmap. This occurs when the alignment >> -- >> 1.9.3 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Don't try to print the GLSL IR if it has been freed
Since commit 104c8fc2c2aa5621261f8 the GLSL IR will be freed if NIR is being used. This was causing it to segfault if INTEL_DEBUG=wm is set. This patch just makes it avoid dumping the GLSL IR in that case. --- src/mesa/drivers/dri/i965/brw_program.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 2327af7..85e271d 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -574,10 +574,13 @@ brw_dump_ir(const char *stage, struct gl_shader_program *shader_prog, struct gl_shader *shader, struct gl_program *prog) { if (shader_prog) { - fprintf(stderr, - "GLSL IR for native %s shader %d:\n", stage, shader_prog->Name); - _mesa_print_ir(stderr, shader->ir, NULL); - fprintf(stderr, "\n\n"); + if (shader->ir) { + fprintf(stderr, + "GLSL IR for native %s shader %d:\n", + stage, shader_prog->Name); + _mesa_print_ir(stderr, shader->ir, NULL); + fprintf(stderr, "\n\n"); + } } else { fprintf(stderr, "ARB_%s_program %d ir for native %s shader\n", stage, prog->Id, stage); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Don't use the pixel interpolater for centroid interpolation
For centroid interpolation we can just directly use the values set up in the shader payload instead of querying the pixel interpolator. To do this we need to modify brw_compute_barycentric_interp_modes to detect when interpolateAtCentroid is called. This fixes the interpolateAtCentroid tests on SKL in Piglit but it is probably cheating a bit because there still seems to be some underlying problem with using the pixel interpolater and this patch just avoids it. The other interpolateAt* tests are still failing. --- I'm not really sure if we want this patch because although it does make using interpolateAtCentroid more efficient I don't think we know of anything that is actually using that so it's not really a priority to optimise. It adds a bit of complexity to the compiler so maybe we are better off without it? I'm mostly just posting it to gauge opinion. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 42 src/mesa/drivers/dri/i965/brw_wm.c | 55 2 files changed, 83 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..fcde545 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1217,6 +1217,21 @@ fs_visitor::emit_percomp(const fs_builder &bld, const fs_inst &inst, } } +/* For most messages, we need one reg of ignored data; the hardware requires + * mlen==1 even when there is no payload. in the per-slot offset case, we'll + * replace this with the proper source data. + */ +static void +setup_pixel_interpolater_instruction(nir_intrinsic_instr *instr, + fs_inst *inst, + int mlen = 1) +{ + inst->mlen = mlen; + inst->regs_written = 2; /* 2 floats per slot returned */ + inst->pi_noperspective = instr->variables[0]->var->data.interpolation == + INTERP_QUALIFIER_NOPERSPECTIVE; +} + void fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr) { @@ -1469,19 +1484,21 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2); - /* For most messages, we need one reg of ignored data; the hardware - * requires mlen==1 even when there is no payload. in the per-slot - * offset case, we'll replace this with the proper source data. - */ fs_reg src = vgrf(glsl_type::float_type); - int mlen = 1; /* one reg unless overriden */ fs_inst *inst; switch (instr->intrinsic) { - case nir_intrinsic_interp_var_at_centroid: - inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_CENTROID, - dst_xy, src, fs_reg(0u)); + case nir_intrinsic_interp_var_at_centroid: { + enum brw_wm_barycentric_interp_mode interp_mode; + if (instr->variables[0]->var->data.interpolation == + INTERP_QUALIFIER_NOPERSPECTIVE) +interp_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC; + else +interp_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC; + uint8_t reg = payload.barycentric_coord_reg[interp_mode]; + dst_xy = fs_reg(brw_vec16_grf(reg, 0)); break; + } case nir_intrinsic_interp_var_at_sample: { /* XXX: We should probably handle non-constant sample id's */ @@ -1490,6 +1507,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr unsigned msg_data = const_sample ? const_sample->i[0] << 4 : 0; inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SAMPLE, dst_xy, src, fs_reg(msg_data)); + setup_pixel_interpolater_instruction(instr, inst); break; } @@ -1502,6 +1520,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET, dst_xy, src, fs_reg(off_x | (off_y << 4))); +setup_pixel_interpolater_instruction(instr, inst); } else { src = vgrf(glsl_type::ivec2_type); fs_reg offset_src = retype(get_nir_src(instr->src[0]), @@ -1531,9 +1550,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr bld.SEL(offset(src, i), itemp, fs_reg(7))); } -mlen = 2; inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, src, fs_reg(0u)); +setup_pixel_interpolater_instruction(instr, inst, 2); } break; } @@ -1542,11 +1561,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr unreachable("Invalid intrinsic"); } - inst->mlen = mlen; - inst->regs_written = 2; /* 2 floa
Re: [Mesa-dev] [PATCH] i965/fs: Don't use the pixel interpolater for centroid interpolation
Ben Widawsky writes: > I am not the right person to judge the complexity tradeoff, but it > seems like a worthwhile patch to me. I spent a few minutes thinking > about how it could hurt performance and was unable to come up with > anything. Thanks. I was thinking more that the complexity means more code to maintain rather than that it would be a performance hit. If we have this patch and another patch to make it not fall back to SIMD8 then using interpolateAtCentroid ends up being exactly the same as using the centroid interpolation qualifier on the varying except that it is more convenient because you don't need to put it in the vertex shader as well. However I think centroid interpolation is a pretty obscure feature that is only useful if you're doing multisampling with an algorithm that breaks if it is used for samples that lie outside of the polygon, so maybe it's not really worth worrying about. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] glsl: fix some strict aliasing issues in exec_list
Hi, If we wanted to avoid growing the size of exec_list to four pointers instead of three, maybe we could store it in a union like below: struct exec_list { union { struct { struct exec_node head_sentinel; struct exec_node *dummy_pointer_a; }; struct { struct exec_node *dummy_pointer_b; struct exec_node tail_sentinel; }; }; /* ... */ }; This is using anonymous structs and unions so that you wouldn't have to otherwise modify the patch. However we might have to avoid anonymous structs and unions on MSVC. If that's the case then it shouldn't be too much effort to just name them and modify all of the code that is accessing them. You could even put a static assert somewhere to make sure it works like this: STATIC_ASSERT(offsetof(struct exec_list, tail_sentinel.next) == offsetof(struct exec_list, head_sentinel.prev)); It looks like this might also require removing the constructor for exec_node because you can't have objects with constructors in a union apparently. If the constructor is just a safety net anyway then maybe this won't cause any trouble. I don't know. Also just to note, your patch doesn't apply with git-am. Have you cut-and-paste the patch file into an email? It looks like something has done some word wrapping which has corrupted it. Eg: diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index cfe0df3..f5efcc5 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -842,8 +842,9 @@ ir_validate::visit_enter(ir_call *ir) abort(); } - const exec_node *formal_param_node = callee->parameters.head; - const exec_node *actual_param_node = ir->actual_parameters.head; + const exec_node *formal_param_node = callee->parameters.head_sentinel.next; This isn't a valid patch because all of the lines have to begin with either +, - or a space. If you can't use git-send-email for whatever reason, maybe you can find a way to disable the word wrapping? Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] glsl: fix some strict aliasing issues in exec_list
Davin McCall writes: > I actually had thought about this, but technically, you can only use > unions for type aliasing if you perform all accesses (that are not to > the 'active' member) through the union. All the list processing code > that iterates through all the nodes including the tail sentinel would > *technically* still have an aliasing problem, because it doesn't go > through the exec_list structure at all (though in practice, I don't > think it would ever manifest in as an issue in the compiled code). I don't think it matters that the list iterating code doesn't use exec_list. If something modifies the list pointers through an exec_node then the compiler will know that that potentially aliases the pointers in an exec_list because the pointers in exec_list are also wrapped in an exec_node. With your patch there is no type aliasing at all and the union modification doesn't alter that. > Why must all accesses go through the union? Consider the case: > > int someMethod(float *f, int *i) { > *f = 4.0;// LINE A > int a = *i; // LINE B > return a; > } > > If I had some union 'u' with { float ff; int ii; } then I could call > the above method, even if it was in a different module, with: > > someMethod(&u.ff, &u.ii). > > Now, that would mean that the compiler would not be allowed to > re-order LINE A and LINE B. But as I said, someMethod might be in a > different module, where the compiler does not know that 'i' and 'f' > point to members of the same union. In that case it assumes that 'i' > and 'f' don't alias. Compare that to: > > int someMethod(union u *a, union u *b) > { > u->ff = 4.0; > int a = u->ii; > return a; > } > > In this version, accesses are through the union, and the compiler > knows that they potentially alias. I don't think this example is relevant in this case because all of the relevant members of the union I suggested are the same type (struct exec_node). There is no type aliasing. Maybe a hypothetical problem with this sort of use could be if you had a function like this: struct exec_node * some_method(struct exec_node *a, struct exec_node *b) { a->prev = &something; b->next = &something_else; return a->prev; } If you called this with the head and tail sentinels then the compiler won't know that a->prev and b->next alias each other so it might return &something instead of &something_else. However in practice for this use case the only part that is aliased is a NULL pointer that is never written to so I don't think it would actually matter. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator
There was a comment saying that in SIMD16 mode the pixel interpolator returns coords interleaved 8 channels at a time and that this requires extra work to support. However, this interleaved format is exactly what the PLN instruction requires so I don't think anything needs to be done to support it apart from removing the line to disable it and to ensure that the message lengths for the send message are correct. I am more convinced that this is correct because as it says in the comment this interleaved output is identical to what is given in the thread payload. The code generated to apply the plane equation to these coordinates is identical on SIMD16 and SIMD8 except that the dispatch width is larger which implies no special unmangling is needed. Perhaps the confusion stems from the fact that the description of the PLN instruction in the IVB PRM seems to imply that the src1 inputs are not interleaved so it wouldn't work. However, in the HSW and BDW PRMs, the pseudo-code is different and looks like it expects the interleaved format. Mesa doesn't seem to generate different code on IVB to uninterleave the payload registers and everything is working so I can only assume that the PRM is wrong. I tested the interpolateAt tests on HSW and did a full Piglit run on IVB on there were no regressions. --- I've CC'd Chris Forbes because according to git-annotate he wrote the original comment so he might know something I don't. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..717e597 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1461,12 +1461,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr case nir_intrinsic_interp_var_at_centroid: case nir_intrinsic_interp_var_at_sample: case nir_intrinsic_interp_var_at_offset: { - /* in SIMD16 mode, the pixel interpolator returns coords interleaved - * 8 channels at a time, same as the barycentric coords presented in - * the FS payload. this requires a bit of extra work to support. - */ - no16("interpolate_at_* not yet supported in SIMD16 mode."); - fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2); /* For most messages, we need one reg of ignored data; the hardware @@ -1531,7 +1525,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr bld.SEL(offset(src, i), itemp, fs_reg(7))); } -mlen = 2; +mlen = 2 * dispatch_width / 8; inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, src, fs_reg(0u)); } @@ -1543,7 +1537,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr } inst->mlen = mlen; - inst->regs_written = 2; /* 2 floats per slot returned */ + /* 2 floats per slot returned */ + inst->regs_written = 2 * dispatch_width / 8; inst->pi_noperspective = instr->variables[0]->var->data.interpolation == INTERP_QUALIFIER_NOPERSPECTIVE; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if the shader sends a message to the pixel interpolator. This fixes the interpolateAt* tests on SKL, apart from interpolateatsample-nonconst but that is not implemented anywhere so it's not a regression. --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++ 4 files changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3553f6e..7596139 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -415,6 +415,7 @@ struct brw_wm_prog_data { bool uses_pos_offset; bool uses_omask; bool uses_kill; + bool pulls_bary; uint32_t prog_offset_16; /** diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 66b9abc..19489ab 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode { # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 << 7) # define GEN8_PSX_SHADER_IS_PER_SAMPLE (1 << 6) # define GEN8_PSX_SHADER_COMPUTES_STENCIL (1 << 5) +# define GEN9_PSX_SHADER_PULLS_BARY (1 << 3) # define GEN8_PSX_SHADER_HAS_UAV(1 << 2) # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK (1 << 1) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index bd71404..3ebc3a2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr case nir_intrinsic_interp_var_at_centroid: case nir_intrinsic_interp_var_at_sample: case nir_intrinsic_interp_var_at_offset: { + assert(stage == MESA_SHADER_FRAGMENT); + + ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true; + fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2); /* For most messages, we need one reg of ignored data; the hardware diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index a88f109..d544509 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw, if (prog_data->uses_omask) dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET; + if (brw->gen >= 9 && prog_data->pulls_bary) + dw1 |= GEN9_PSX_SHADER_PULLS_BARY; + if (_mesa_active_fragment_shader_has_atomic_ops(&brw->ctx)) dw1 |= GEN8_PSX_SHADER_HAS_UAV; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glsl: Add missing check for whether an expression is an add operation
There is a piece of code that is trying to match expressions of the form (mul (floor (add (abs x) 0.5) (sign x))). However the check for the add expression wasn't checking whether it had the expected operation. It looks like this was just an oversight because it doesn't match the pattern for the rest of the code snippet. The existing line to check whether add_expr!=NULL was added as part of a coverity fix in 3384179f. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91226 Cc: Matt Turner Cc: "10.6" --- src/glsl/opt_algebraic.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index fa5db70..9b8a426 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -580,7 +580,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) continue; ir_expression *add_expr = floor_expr->operands[0]->as_expression(); - if (!add_expr) + if (!add_expr || add_expr->operation != ir_binop_add) continue; for (int j = 0; j < 2; j++) { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] glsl: Make sure not to dereference NULL
In this bit of code point_five can be NULL if the expression is not a constant. This fixes it to match the pattern of the rest of the chunk of code so that it checks for NULLs. Cc: Matt Turner Cc: "10.6" --- src/glsl/opt_algebraic.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 9b8a426..c4b8715 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -589,7 +589,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) continue; ir_constant *point_five = add_expr->operands[1 - j]->as_constant(); -if (!point_five->is_value(0.5, 0)) +if (!point_five || !point_five->is_value(0.5, 0)) continue; if (abs_expr->operands[0]->equals(sign_expr->operands[0])) { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation
For centroid interpolation we can just directly use the values set up in the shader payload instead of querying the pixel interpolator. To do this we need to modify brw_compute_barycentric_interp_modes to detect when interpolateAtCentroid is called. v2: Rebase on top of changes to set the pulls bary bit on SKL --- As an aside, I was deliberating over whether to call the function set_up_blah instead of setup_blah because I think the former is more correct. The rest of Mesa seems to use setup so maybe it's more important to be consistent than correct. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 52 +++--- src/mesa/drivers/dri/i965/brw_wm.c | 55 2 files changed, 88 insertions(+), 19 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 5d1ea21..fd7f1b8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1238,6 +1238,25 @@ fs_visitor::emit_percomp(const fs_builder &bld, const fs_inst &inst, } } +/* For most messages, we need one reg of ignored data; the hardware requires + * mlen==1 even when there is no payload. in the per-slot offset case, we'll + * replace this with the proper source data. + */ +static void +setup_pixel_interpolater_instruction(fs_visitor *v, + nir_intrinsic_instr *instr, + fs_inst *inst, + int mlen = 1) +{ + inst->mlen = mlen; + inst->regs_written = 2 * v->dispatch_width / 8; + inst->pi_noperspective = instr->variables[0]->var->data.interpolation == + INTERP_QUALIFIER_NOPERSPECTIVE; + + assert(v->stage == MESA_SHADER_FRAGMENT); + ((struct brw_wm_prog_data *) v->prog_data)->pulls_bary = true; +} + void fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr) { @@ -1482,25 +1501,23 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr case nir_intrinsic_interp_var_at_centroid: case nir_intrinsic_interp_var_at_sample: case nir_intrinsic_interp_var_at_offset: { - assert(stage == MESA_SHADER_FRAGMENT); - - ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true; - fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2); - /* For most messages, we need one reg of ignored data; the hardware - * requires mlen==1 even when there is no payload. in the per-slot - * offset case, we'll replace this with the proper source data. - */ fs_reg src = vgrf(glsl_type::float_type); - int mlen = 1; /* one reg unless overriden */ fs_inst *inst; switch (instr->intrinsic) { - case nir_intrinsic_interp_var_at_centroid: - inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_CENTROID, - dst_xy, src, fs_reg(0u)); + case nir_intrinsic_interp_var_at_centroid: { + enum brw_wm_barycentric_interp_mode interp_mode; + if (instr->variables[0]->var->data.interpolation == + INTERP_QUALIFIER_NOPERSPECTIVE) +interp_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC; + else +interp_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC; + uint8_t reg = payload.barycentric_coord_reg[interp_mode]; + dst_xy = fs_reg(brw_vec16_grf(reg, 0)); break; + } case nir_intrinsic_interp_var_at_sample: { /* XXX: We should probably handle non-constant sample id's */ @@ -1509,6 +1526,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr unsigned msg_data = const_sample ? const_sample->i[0] << 4 : 0; inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SAMPLE, dst_xy, src, fs_reg(msg_data)); + setup_pixel_interpolater_instruction(this, instr, inst); break; } @@ -1521,6 +1539,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET, dst_xy, src, fs_reg(off_x | (off_y << 4))); +setup_pixel_interpolater_instruction(this, instr, inst); } else { src = vgrf(glsl_type::ivec2_type); fs_reg offset_src = retype(get_nir_src(instr->src[0]), @@ -1550,9 +1569,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr bld.SEL(offset(src, bld, i), itemp, fs_reg(7))); } -mlen = 2 * dispatch_width / 8; inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_xy, src, fs_reg(0u)); +setup_pixel_interpolater_instruction(this, instr, inst, + 2 * dispatch_width / 8); } break;
[Mesa-dev] [PATCH] i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly reordered so that the edge flag is always the last one. This was confusing the code to upload the 3DSTATE_VF_INSTANCING state because that is uploaded with a separate loop which has an instruction for each element. The indices used in these instructions weren't taking into account the reordering so the state would be incorrect. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292 Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 1af90ec..65b7625 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -218,13 +218,24 @@ gen8_emit_vertices(struct brw_context *brw) } ADVANCE_BATCH(); - for (unsigned i = 0; i < brw->vb.nr_enabled; i++) { + for (unsigned i = 0, j = 0; i < brw->vb.nr_enabled; i++) { const struct brw_vertex_element *input = brw->vb.enabled[i]; const struct brw_vertex_buffer *buffer = &brw->vb.buffers[input->buffer]; + unsigned element_index; + + /* The edge flag element is reordered to be the last one in the code + * above so we need to compensate for that in the element indices used + * below. + */ + if (input == gen6_edgeflag_input) + element_index = brw->vb.nr_enabled - 1; + else + element_index = j++; BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2)); - OUT_BATCH(i | (buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); + OUT_BATCH(element_index | +(buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); OUT_BATCH(buffer->step_rate); ADVANCE_BATCH(); } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965/fs: Don't use the pixel interpolater for centroid interpolation
Chris Forbes writes: > Nitpicks aside, I don't think this is a great idea now that you've got > the SKL PI working. Can you explain why you don't think this is a good idea? Is it because it is an optimisation for something that is not known to be a big use case so carrying around the extra code just adds unnecessary maintenance burden? I could agree with that so I'm happy to abandon the patch for now if that's the general consensus. > I also think it's broken -- you need to arrange to have the centroid > barycentric coords delivered to the FS thread, which won't be > happening if this is the *only* use of them. Masked in the tests, > because they compare with a centroid-qualified input. [I'm assuming > you don't always get these delivered to the FS in SKL, but no docs > access...] The changes to brw_compute_barycentric_interp_modes in the patch ensure that the centroid barycentric coords are delivered whenever interpolateAtCentroid is used in a shader. I don't think this is a problem. At least it seems to work in a simple test without using a separate varying with the centroid qualifier. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/bdw: Fix setting the instancing state for the SGVS element
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS instruction is sent to create a sort of element to store the generated values. The last instruction in this chunk of code looks like it was trying to set the instancing state for the element using the 3DSTATE_VF_INSTANCING instruction. However it was sending brw->vb.nr_buffers instead of the element index. This instruction is supposed to take an element index and that is how it is used further down in the function so the previous code looks wrong. Perhaps previously the number of buffers coincidentally matched the number of enabled elements so the value was generally correct anyway. In a subsequent patch I want to change a bit how it chooses the SGVS element index so this needs to be fixed. Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 1af90ec..f7d9952 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -74,7 +74,7 @@ gen8_emit_vertices(struct brw_context *brw) BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2)); - OUT_BATCH(brw->vb.nr_buffers | GEN8_VF_INSTANCING_ENABLE); + OUT_BATCH(vue | GEN8_VF_INSTANCING_ENABLE); OUT_BATCH(0); ADVANCE_BATCH(); } else { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] i965: Fix problems with gl_Vertex/InstanceID and glPolygonMode
This series fixes these two related bugs: https://bugs.freedesktop.org/show_bug.cgi?id=84677 https://bugs.freedesktop.org/show_bug.cgi?id=91292 The first bug has had a fix for a while but until now I wasn't able to figure out what to do on BDW. I had previously made a fix for the second bug but as pointed out by the original bug reporter the two bugs interact and the fixes would conflict. This series has a v2 of the patch so that the two fixes work together. I've posted a patch to make the Piglit testing more thorough here: http://patchwork.freedesktop.org/patch/54320/ In the v1 for patch 3 Chris Forbes had a comment that it would be better to move the edge flag element within brw->vb.* instead of while emitting the state. I think this would be a good idea except it won't help in this case now that it works with the SGVS element as well because there is no input for that at all in brw->vb.*. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3 v2] i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is used
When the edge flag element is enabled then the elements are slightly reordered so that the edge flag is always the last one. This was confusing the code to upload the 3DSTATE_VF_INSTANCING state because that is uploaded with a separate loop which has an instruction for each element. The indices used in these instructions weren't taking into account the reordering so the state would be incorrect. v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope when gl_VertexID is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292 Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 2bac5ff..1b48643 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -246,13 +246,24 @@ gen8_emit_vertices(struct brw_context *brw) } ADVANCE_BATCH(); - for (unsigned i = 0; i < brw->vb.nr_enabled; i++) { + for (unsigned i = 0, j = 0; i < brw->vb.nr_enabled; i++) { const struct brw_vertex_element *input = brw->vb.enabled[i]; const struct brw_vertex_buffer *buffer = &brw->vb.buffers[input->buffer]; + unsigned element_index; + + /* The edge flag element is reordered to be the last one in the code + * above so we need to compensate for that in the element indices used + * below. + */ + if (input == gen6_edgeflag_input) + element_index = nr_elements - 1; + else + element_index = j++; BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_VF_INSTANCING << 16 | (3 - 2)); - OUT_BATCH(i | (buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); + OUT_BATCH(element_index | +(buffer->step_rate ? GEN8_VF_INSTANCING_ENABLE : 0)); OUT_BATCH(buffer->step_rate); ADVANCE_BATCH(); } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3 v2] i965: Swap the order of the vertex ID and edge flag attributes
The edge flag data on Gen6+ is passed through the fixed function hardware as an extra attribute. According to the PRM it must be the last valid VERTEX_ELEMENT structure. However if the vertex ID is also used then another extra element is added to source the VID. This made it so the vertex ID is in the wrong register in the vertex shader and the edge attribute is no longer in the last element. v2: Also implement for BDW+ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677 Cc: "10.6 10.5" --- src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +++ src/mesa/drivers/dri/i965/gen8_draw_upload.c | 56 +--- 2 files changed, 57 insertions(+), 29 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c index 320e40e..67a8dfd 100644 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c @@ -787,21 +787,6 @@ static void brw_emit_vertices(struct brw_context *brw) ((i * 4) << BRW_VE1_DST_OFFSET_SHIFT)); } - if (brw->gen >= 6 && gen6_edgeflag_input) { - uint32_t format = - brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); - - OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | -GEN6_VE0_VALID | -GEN6_VE0_EDGE_FLAG_ENABLE | -(format << BRW_VE0_FORMAT_SHIFT) | -(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT)); - OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | -(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); - } - if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) { uint32_t dw0 = 0, dw1 = 0; uint32_t comp0 = BRW_VE1_COMPONENT_STORE_0; @@ -842,6 +827,21 @@ static void brw_emit_vertices(struct brw_context *brw) OUT_BATCH(dw1); } + if (brw->gen >= 6 && gen6_edgeflag_input) { + uint32_t format = + brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); + + OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | +GEN6_VE0_VALID | +GEN6_VE0_EDGE_FLAG_ENABLE | +(format << BRW_VE0_FORMAT_SHIFT) | +(gen6_edgeflag_input->offset << BRW_VE0_SRC_OFFSET_SHIFT)); + OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << BRW_VE1_COMPONENT_0_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | +(BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); + } + ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index f7d9952..2bac5ff 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -40,16 +40,25 @@ gen8_emit_vertices(struct brw_context *brw) { struct gl_context *ctx = &brw->ctx; uint32_t mocs_wb = brw->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB; + bool uses_edge_flag; brw_prepare_vertices(brw); brw_prepare_shader_draw_parameters(brw); + uses_edge_flag = (ctx->Polygon.FrontMode != GL_FILL || + ctx->Polygon.BackMode != GL_FILL); + if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) { unsigned vue = brw->vb.nr_enabled; - WARN_ONCE(brw->vs.prog_data->inputs_read & VERT_BIT_EDGEFLAG, -"Using VID/IID with edgeflags, need to reorder the " -"vertex attributes"); + /* The element for the edge flags must always be last, so we have to + * insert the SGVS before it in that case. + */ + if (uses_edge_flag) { + assert(vue > 0); + vue--; + } + WARN_ONCE(vue >= 33, "Trying to insert VID/IID past 33rd vertex element, " "need to reorder the vertex attrbutes."); @@ -138,7 +147,18 @@ gen8_emit_vertices(struct brw_context *brw) ADVANCE_BATCH(); } - unsigned nr_elements = brw->vb.nr_enabled + brw->vs.prog_data->uses_vertexid; + /* Normally we don't need an element for the SGVS attribute because the +* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an +* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if the +* vertex ID is used then it needs an element for the base vertex buffer. +* Additionally if there is an edge flag element then the SGVS can't be +* inserted past that so we need a dummy element to ensure that the edge +* flag is the last one. +*/ + bool needs_sgvs_element = (brw->vs.prog_data->uses_vertexid || + (brw->vs.prog_data->use
[Mesa-dev] [PATCH] i965/fs: Handle non-const sample number in interpolateAtSample
If a non-const sample number is given to interpolateAtSample it will now generate an indirect send message with the sample ID similar to how non-const sampler array indexing works. Previously non-const values were ignored and instead it ended up using a constant 0 value. Note that unlike sampler array indexing I can't find any mention in the GLSL spec about the sample ID having to be dynamically uniform. That wouldn't be handled by this patch and I'm not sure how that could be implemented unless we do something like sending a separate message for each fragment (ie, 16 messages in SIMD16). --- I've added a Piglit test for this here: http://patchwork.freedesktop.org/patch/55028/ src/mesa/drivers/dri/i965/brw_eu.h | 2 +- src/mesa/drivers/dri/i965/brw_eu_emit.c| 34 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 5 ++-- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 23 ++--- 4 files changed, 45 insertions(+), 19 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 761aa0e..0ac1ad9 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -461,7 +461,7 @@ brw_pixel_interpolator_query(struct brw_codegen *p, struct brw_reg mrf, bool noperspective, unsigned mode, - unsigned data, + struct brw_reg data, unsigned msg_length, unsigned response_length); diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 4d39762..25524d4 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -3192,26 +3192,38 @@ brw_pixel_interpolator_query(struct brw_codegen *p, struct brw_reg mrf, bool noperspective, unsigned mode, - unsigned data, + struct brw_reg data, unsigned msg_length, unsigned response_length) { const struct brw_device_info *devinfo = p->devinfo; - struct brw_inst *insn = next_insn(p, BRW_OPCODE_SEND); + struct brw_inst *insn; + uint16_t exec_size; - brw_set_dest(p, insn, dest); - brw_set_src0(p, insn, mrf); - brw_set_message_descriptor(p, insn, GEN7_SFID_PIXEL_INTERPOLATOR, - msg_length, response_length, - false /* header is never present for PI */, - false); + if (data.file == BRW_IMMEDIATE_VALUE) { + insn = next_insn(p, BRW_OPCODE_SEND); + brw_set_dest(p, insn, dest); + brw_set_src0(p, insn, mrf); + brw_set_message_descriptor(p, insn, GEN7_SFID_PIXEL_INTERPOLATOR, + msg_length, response_length, + false /* header is never present for PI */, + false); + brw_inst_set_pi_message_data(devinfo, insn, data.dw1.ud); + } else { + insn = brw_send_indirect_message(p, + GEN7_SFID_PIXEL_INTERPOLATOR, + dest, + mrf, + vec1(data)); + brw_inst_set_mlen(devinfo, insn, msg_length); + brw_inst_set_rlen(devinfo, insn, response_length); + } - brw_inst_set_pi_simd_mode( - devinfo, insn, brw_inst_exec_size(devinfo, insn) == BRW_EXECUTE_16); + exec_size = brw_inst_exec_size(devinfo, p->current); + brw_inst_set_pi_simd_mode(devinfo, insn, exec_size == BRW_EXECUTE_16); brw_inst_set_pi_slot_group(devinfo, insn, 0); /* zero unless 32/64px dispatch */ brw_inst_set_pi_nopersp(devinfo, insn, noperspective); brw_inst_set_pi_message_type(devinfo, insn, mode); - brw_inst_set_pi_message_data(devinfo, insn, data); } void diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index bae7216..3c92341 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1324,15 +1324,14 @@ fs_generator::generate_pixel_interpolator_query(fs_inst *inst, struct brw_reg msg_data, unsigned msg_type) { - assert(msg_data.file == BRW_IMMEDIATE_VALUE && - msg_data.type == BRW_REGISTER_TYPE_UD); + assert(msg_data.type == BRW_REGISTER_TYPE_UD); brw_pixel_interpolator_query(p, retype(dst, BRW_REGISTER_TYPE_UW), src, inst->pi_noperspective, msg_type, - msg_data.dw1.ud, + msg_data, inst->mlen, inst->r
Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments
Ben Widawsky writes: > The impetus for this patch comes from a seemingly benign statement within the > spec (quoted within the patch). For me, this patch was at some point critical > for getting stable piglit results (though this did not seem to be the case on > a > branch Chad was working on). > > It is very important for clearing multiple color buffer attachments and can be > observed in the following piglit tests: > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 > + > 1 file changed, 84 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index 7bf52f0..9e6711e 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable) > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM; > } > > +/** > + * Individually fast clear each color buffer attachment. On previous gens > this > + * isn't required. The motivation for this comes from one line (which seems > to > + * be specific to SKL+). The list item is in section titled _MCS Buffer for > + * Render Target(s)_ > + * > + * "Since only one RT is bound with a clear pass, only one RT can be > cleared > + * at a time. To clear multiple RTs, multiple clear passes are required." This sentence also appears in the HSW PRM so it seems a bit odd if it's only causing problems on SKL. I guess if we get Piglit regressions without it then it makes sense to have the patch. It might be worth just double checking whether this patch is completely necessary. The wording in the commit message seems a little unsure. > + * > + * The code follows the same idea as the resolve code which creates a fake > FBO > + * to avoid interfering with too much of the GL state. > + */ What state are you trying to avoid interfering with? Maybe the resolve code only creates an FBO because it needs to work on an abritrary intel_mipmap_tree which might not already have an FBO associated with it. It seems like it should be enough just to call _mesa_meta_drawbuffers_from_bitfield(1 << index) instead of creating an FBO. If you did that you could also make the loop a bit nicer by just looping over the bits set in fast_clear_buffers with ffs. The glDrawBuffers state is already saved as part of the meta save state and the existing code for gen<9 also modifies it. > +static void > +fast_clear_attachments(struct brw_context *brw, > + struct gl_framebuffer *fb, > + uint32_t fast_clear_buffers, > + struct rect fast_clear_rect) > +{ > + assert(brw->gen >= 9); > + struct gl_context *ctx = &brw->ctx; > + > + GLuint old_fb = ctx->DrawBuffer->Name; > + > + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { > + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; > + struct intel_renderbuffer *irb = intel_renderbuffer(rb); > + GLuint fbo, rbo; > + int index = fb->_ColorDrawBufferIndexes[buf]; > + > + if (!((1 << index) & fast_clear_buffers)) > + continue; > + > + _mesa_GenFramebuffers(1, &fbo); > + rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false); > + > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); > + _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, > +GL_COLOR_ATTACHMENT0, > +GL_RENDERBUFFER, rbo); > + _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0); > + > + brw_fast_clear_init(brw); Is is really necessary to call this for every buffer? It looks like it's called once before fast_clear_attachments is called so maybe it doesn't need to be in this function at all? > + > + use_rectlist(brw, true); Same for this. > + > + brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + > + /* SKL+ also has a resolve mode for compressed render targets and thus > more > + * bits to let us select the type of resolve. For fast clear > resolves, it > + * turns out we can use the same value as pre-SKL though. > + */ > + set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); > + brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers)); > + set_fast_clear_op(brw, 0); > + use_rectlist(brw, false); And this. > + > + _mesa_DeleteRenderbuffers(1, &rbo); > + _mesa_DeleteFramebuffers(1, &fbo); > + > + /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll > + * resolve them eventually. > + */ > + irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; > + } > + > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb); > +} > + > bool > b
Re: [Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable
This patch doesn't look right. See this sentence in “Render Target Fast Clear”: “The pixel shader kernel requires no attributes, and delivers a value of 0x in all channels of the render target write message” Presumably the fast_clear_color is trying to implement this restriction. Regards, - Neil Ben Widawsky writes: > It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's > meant to do). > > Cc: Kristian Høgsberg > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index 41afc9a..9c51ffb 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw, > } > } > > -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 }; > - > static void > set_fast_clear_op(struct brw_context *brw, uint32_t op) > { > @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw, > >use_rectlist(brw, true); > > - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); > >/* SKL+ also has a resolve mode for compressed render targets and thus > more > * bits to let us select the type of resolve. For fast clear > resolves, it > @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct > gl_framebuffer *fb, >fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect); > } else if (fast_clear_buffers) { >_mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers); > - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); >set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); >brw_draw_rectlist(ctx, &fast_clear_rect, layers); >set_fast_clear_op(brw, 0); > @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw, > > use_rectlist(brw, true); > > - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); > > /* SKL+ also has a resolve mode for compressed render targets and thus > more > * bits to let us select the type of resolve. For fast clear resolves, it > -- > 2.6.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats
It would be nice if you could give some indication of where this list of formats came from. Unless we expect the list to change with future generations, maybe it would be better to make it a static const table? It's a shame to grow the context size unnecessarily. Regards, - Neil Ben Widawsky writes: > Initially I had this planned as a patch to be squashed in to the enabling > patch > because there is no point enabling fast clears without this. However, Chad > merged a patch which disables fast clears on gen9 explicitly, and so I can > hide > this behind the revert of that patch. This is a nice I really wanted this > patch > as a distinct patch for review. This is a new, weird, and poorly documented > restriction for SKL. (In fact, I am still not 100% certain the restriction is > entirely necessary, but there are around 30 piglit regressions without this). > > SKL adds compressible render targets and as a result mutates some of the > programming for fast clears and resolves. There is a new internal surface type > called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface > is > a CCS (Color Control Surface) with compression disabled or an MCS with > compression enabled, depending on number of multisamples. MCS (Multisample > Control Surface) is a special type of CCS." > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 > + > src/mesa/drivers/dri/i965/gen8_surface_state.c | 8 ++-- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 3 +++ > 4 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index e59478a..32b8250 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1546,6 +1546,7 @@ struct brw_context > > uint32_t render_target_format[MESA_FORMAT_COUNT]; > bool format_supported_as_render_target[MESA_FORMAT_COUNT]; > + bool losslessly_compressable[MESA_FORMAT_COUNT]; > > /* Interpolation modes, one byte per vue slot. > * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+. > diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c > b/src/mesa/drivers/dri/i965/brw_surface_formats.c > index 97fff60..d706ecc 100644 > --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c > +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c > @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw) >} > } > > + if (brw->gen >= 9) { > + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true; > + } > + > /* We will check this table for FBO completeness, but the surface format > * table above only covered color rendering. > */ > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > b/src/mesa/drivers/dri/i965/gen8_surface_state.c > index 995b4dd..b19b492 100644 > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw, > * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, > HALIGN > * 16 must be used." > */ > - if (brw->gen >= 9 || mt->num_samples =
Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)
Looks good, it'll be great to get this landed. Patches 1-3 and 6-8 are: Reviewed-by: Neil Roberts I've sent comments separately for 4, 5 and 9. Hopefully I can try to help with patch 10 once my SKL machine arrives. Regards, - Neil Ben Widawsky writes: > This patch series adds support for fast color clears on SKL as it exists on > previous generations of hardware minus the new hardware restriction on surface > formats. Additionally, it adds support for utilizing clear values with up to > 32b > per color channel (see note at the bottom). It is based on work originally > done > by Kristian, so thanks to him for that initial work as well as helping me > debug > some of the issues. > > Additionally, thanks to Chad for helping track down the last bug in the > rectangle > scaling code which was (for me) being masked by another bug (#3 below). I > imagine it would have been several more weeks at least before I uncovered it. > > We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order > to > support the 32b per channel. As it turned out though, Skylake made other > changes > to support this which caused weird failures which seemed to interfere with > each other. > > 1. Not all surface formats support lossless compression. > 2. Clearing multiple color buffer attachments must happen in n passes > 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this > was the bug which Chad helped uncover, I had it correct in my patch from March > http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we > had other problems which prevented merge, including #1 and #2 above). > > I have no piglit, dEQP or CTS regressions (except for the last patch). I > haven't > yet, but will collect perf data on this ASAP. Historically we've come to > expect > this to provide large gains in tests which are memory bandwidth limited and > doing many clears. > > Ben Widawsky (10): > i965/gen8+: Remove redundant zeroing of surface state > i965/gen8+: Extract color clear surface state > i965/skl: Enable fast color clears on SKL > i965/skl: skip fast clears for certain surface formats > i965/meta/gen9: Individually fast clear color attachments > Revert "i965/gen9: Disable MCS for 1x color surfaces" > Revert "i965/gen9: Enable rep clears on gen9" > i965/meta: Assert fast clears and rep clears never overlap > i965/meta: Remove fast_clear_color variable > i965/gen9: Support fast clears for 32b float > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 > ++-- > src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 > src/mesa/drivers/dri/i965/gen8_surface_state.c | 48 --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 20 +-- > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 +- > 6 files changed, 205 insertions(+), 70 deletions(-) > > -- > 2.6.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader
Ping, could you please push this patch? It's a pain to use the optimise debug output without it. Thanks. Reviewed-by: Neil Roberts - Neil Jason Ekstrand writes: > This has the better name to use. Aparently, sh->Name is usually 0. > --- > src/glsl/nir/glsl_to_nir.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > index 6e1dd84..3284bdc 100644 > --- a/src/glsl/nir/glsl_to_nir.cpp > +++ b/src/glsl/nir/glsl_to_nir.cpp > @@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog, >if (sh->Program->SamplersUsed & (1 << i)) > num_textures = i; > > - shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name); > + shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name); > if (shader_prog->Label) >shader->info.label = ralloc_strdup(shader, shader_prog->Label); > shader->info.num_textures = num_textures; > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Run all of the optimisations after lower_load_payload
Instead of just running a couple of the possible optimisations in one single iteration, it now runs the whole loop again after lowering the load payloads. According to shader-db this gives: total instructions in shared programs: 6493365 -> 6493289 (-0.00%) instructions in affected programs: 1696 -> 1620 (-4.48%) total loops in shared programs:2237 -> 2237 (0.00%) helped:20 HURT: 0 Most of the shaders just benefit from running the register coalesce pass multiple times. However the following two additionally benefit from an extra pass of opt_saturate_propagation which causes a bunch of further optimisations: steam-metro-last-light-1719.shader_test steam-metro-last-light-836.shader_test The optimisations that get run after lowering the load payloads are: 04-14-register_coalesce 04-17-compact_virtual_grfs 05-12-opt_saturate_propagation 06-02-opt_algebraic 06-04-opt_copy_propagate 06-07-dead_code_eliminate 06-12-opt_saturate_propagation 06-14-register_coalesce 06-17-compact_virtual_grfs 07-18-opt_combine_constants In the first shader this drops four instructions. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 94 +++- src/mesa/drivers/dri/i965/brw_fs.h | 3 ++ 2 files changed, 53 insertions(+), 44 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 01a7c99..8d1db23 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4754,33 +4754,6 @@ fs_visitor::calculate_register_pressure() } } -void -fs_visitor::optimize() -{ - /* Start by validating the shader we currently have. */ - validate(); - - /* bld is the common builder object pointing at the end of the program we -* used to translate it into i965 IR. For the optimization and lowering -* passes coming next, any code added after the end of the program without -* having explicitly called fs_builder::at() clearly points at a mistake. -* Ideally optimization passes wouldn't be part of the visitor so they -* wouldn't have access to bld at all, but they do, so just in case some -* pass forgets to ask for a location explicitly set it to NULL here to -* make it trip. The dispatch width is initialized to a bogus value to -* make sure that optimizations set the execution controls explicitly to -* match the code they are manipulating instead of relying on the defaults. -*/ - bld = fs_builder(this, 64); - - assign_constant_locations(); - demote_pull_constants(); - - validate(); - - split_virtual_grfs(); - validate(); - #define OPT(pass, args...) ({ \ pass_num++; \ bool this_progress = pass(args); \ @@ -4799,20 +4772,10 @@ fs_visitor::optimize() this_progress;\ }) - if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) { - char filename[64]; - snprintf(filename, 64, "%s%d-%s-00-start", - stage_abbrev, dispatch_width, nir->info.name); - - backend_shader::dump_instructions(filename); - } - - bool progress = false; - int iteration = 0; - int pass_num = 0; - - OPT(lower_simd_width); - OPT(lower_logical_sends); +void +fs_visitor::optimize_loop(int &iteration, int &pass_num) +{ + bool progress; do { progress = false; @@ -4839,6 +4802,51 @@ fs_visitor::optimize() OPT(compact_virtual_grfs); } while (progress); +} + +void +fs_visitor::optimize() +{ + /* Start by validating the shader we currently have. */ + validate(); + + /* bld is the common builder object pointing at the end of the program we +* used to translate it into i965 IR. For the optimization and lowering +* passes coming next, any code added after the end of the program without +* having explicitly called fs_builder::at() clearly points at a mistake. +* Ideally optimization passes wouldn't be part of the visitor so they +* wouldn't have access to bld at all, but they do, so just in case some +* pass forgets to ask for a location explicitly set it to NULL here to +* make it trip. The dispatch width is initialized to a bogus value to +* make sure that optimizations set the execution controls explicitly to +* match the code they are manipulating instead of relying on the defaults. +*/ + bld = fs_builder(this, 64); + + assign_constant_locations(); + demote_pull_constants(); + + validate(); + + split_virtual_grfs(); + validate(); + + if (unlikely(INTEL_DEBUG & DEBUG_OPTIMIZER)) { + char filename[64]; + snprintf(filename, 64, "%s%d-%s-00-start", + stage_abbrev, dispatch_width, nir->info.name); + + backend_shader::dump_instructions(filename); + } + + bool progress = false; + int iteration = 0
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Just a thought, would it be better to move this check into the eliminate_find_live_channel optimisation? That way it could catch sources that become immediates through later optimisations. One problem with this that I've seen before is that eliminating the FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be eliminated because the copy propagation doesn't work. I made this patch a while ago but I never posted it anywhere because it's a of a kludge and it would probably be better to fix the copy propagation: https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa Either way though I don't think it would do any harm to have Kristian's patch as well even if we did improve elimintate_find_live_channel so it is: Reviewed-by: Neil Roberts - Neil Kristian Høgsberg Kristensen writes: > An immdiate is already uniform so just return it up front. Without this, > brw_fs_surface_builder ends up passing immediate surface indices through > SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't > constant propagate out of, and further, we don't constant propagate into > the typed/untype read/write opcodes at all. The end result is that all > typed/untyped read/write/atomics end up as indirect sends. > > Code generation should always produce either an immediate or an actual > indirect surface index, so we can fix this by just special casing > immediates in emit_uniformize. > > Signed-off-by: Kristian Høgsberg Kristensen > --- > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > index df10a9d..98ce71e 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -394,6 +394,9 @@ namespace brw { > const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0); > const dst_reg dst = component(vgrf(src.type), 0); > > + if (src.file == IMM) > +return src; > + > ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); > ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); > > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Remove block arg from foreach_inst_in_block_*_starting_from
Since 49374fab5d793 these macros no longer actually use the block argument. I think this is worth doing to make the macros easier to use because they already have really long names and a confusing set of arguments. --- src/mesa/drivers/dri/i965/brw_cfg.h | 4 ++-- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +++--- src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +-- src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp| 2 +- 5 files changed, 8 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index a094917..a06b0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -327,12 +327,12 @@ struct cfg_t { #define foreach_inst_in_block_reverse_safe(__type, __inst, __block) \ foreach_in_list_reverse_safe(__type, __inst, &(__block)->instructions) -#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst, __block) \ +#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst) \ for (__type *__scan_inst = (__type *)__inst->next; \ !__scan_inst->is_tail_sentinel(); \ __scan_inst = (__type *)__scan_inst->next) -#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, __inst, __block) \ +#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, __inst) \ for (__type *__scan_inst = (__type *)__inst->prev; \ !__scan_inst->is_head_sentinel(); \ __scan_inst = (__type *)__scan_inst->prev) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 49323eb..97d7fd7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2456,7 +2456,7 @@ fs_visitor::compute_to_mrf() /* Found a move of a GRF to a MRF. Let's see if we can go * rewrite the thing that made this GRF to write into the MRF. */ - foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, block) { + foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { if (scan_inst->dst.file == GRF && scan_inst->dst.reg == inst->src[0].reg) { /* Found the last thing to write our reg we want to turn @@ -2789,7 +2789,7 @@ fs_visitor::insert_gen4_pre_send_dependency_workarounds(bblock_t *block, * we assume that there are no outstanding dependencies on entry to the * program. */ - foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, block) { + foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { /* If we hit control flow, assume that there *are* outstanding * dependencies, and force their cleanup before our instruction. */ @@ -2855,7 +2855,7 @@ fs_visitor::insert_gen4_post_send_dependency_workarounds(bblock_t *block, fs_ins /* Walk forwards looking for writes to registers we're writing which aren't * read before being written. */ - foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst, block) { + foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst) { /* If we hit control flow, force resolve all remaining dependencies. */ if (block->end() == scan_inst) { for (int i = 0; i < write_len; i++) { diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp index 469f2ea..883e8d2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp @@ -87,8 +87,7 @@ opt_cmod_propagation_local(bblock_t *block) continue; bool read_flag = false; - foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, - block) { + foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { if (scan_inst->overwrites_reg(inst->src[0])) { if (scan_inst->is_partial_write() || scan_inst->dst.reg_offset != inst->src[0].reg_offset) diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp index 8792a8c..862e324 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp @@ -64,7 +64,7 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t *block) int src_end_ip = v->live_intervals->end[src_var]; bool interfered = false; - foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, block) { + foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) { if (scan_inst->overwrites_reg(inst->src[0])) { if (scan_inst->is_partial_write
[Mesa-dev] [PATCH 2/2] i965/fs: Improve search for the argument source in opt_zero_samples
The opt_zero_samples instruction tries to find the corresponding load_payload instruction for each sample instruction. However it was previously only looking at the previous instruction. This patch makes it search back within the block to whatever was the last instruction to write to each individual argument to the send message. There are two reasons two do this: On Gen<=6 load_payload isn't used and there is a separate message register file. This version of the optimisation also finds MOVs into the MRF registers so it now also works on SNB. Unfortunately this doesn't show up in a shader-db report because the dead code eliminator doesn't do anything for instructions writing to MRF registers so it can't remove the redundant MOVs. However if I hack Mesa to report the message lengths instead of the instruction counts then it shows this: total mlen in shared programs: 2600373 -> 2574663 (-0.99%) mlen in affected programs: 237077 -> 211367 (-10.84%) helped:3508 HURT: 0 I haven't tested whether reducing the message length without decreasing the instruction count is actually a performance benefit but it's hard to imagine that it could possibly be a disadvantage. It also paves the way to reduce the instruction count later if someone improves the dead code eliminator. Secondly it could help on other gens because sometimes the load_payload instruction can become separated from the corresponding send instruction and the old version wouldn't work in those cases. Currently this doesn't seem to make any difference in practice because the register coalescer is run after this optimisation. However it seems like this version is more robust. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 54 1 file changed, 42 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 97d7fd7..f87a5a7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2150,6 +2150,41 @@ fs_visitor::opt_algebraic() return progress; } +static bool +last_texture_source_is_zero(const fs_inst *send_inst) +{ + int reg_offset = send_inst->mlen - send_inst->exec_size / 8; + fs_reg src; + + /* Get the last argument of the texture instruction */ + if (send_inst->is_send_from_grf()) + src = byte_offset(send_inst->src[0], reg_offset * 32); + else + src = fs_reg(MRF, send_inst->base_mrf + reg_offset); + + /* Look for the last instruction that writes to the source */ + foreach_inst_in_block_reverse_starting_from(const fs_inst, inst, send_inst) { + if (inst->overwrites_reg(src)) { + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { +const int src_num = ((send_inst->mlen - send_inst->header_size) / + (inst->exec_size / 8) + + inst->header_size - 1); +return inst->src[src_num].is_zero(); + } else if (inst->opcode == BRW_OPCODE_MOV) { +if (inst->is_partial_write() || !inst->dst.equals(src)) + return false; + +return inst->src[0].is_zero(); + } + + /* Something unknown is writing to the src */ + break; + } + } + + return false; +} + /** * Optimize sample messages that have constant zero values for the trailing * texture coordinates. We can just reduce the message length for these @@ -2173,12 +2208,6 @@ fs_visitor::opt_zero_samples() if (!inst->is_tex()) continue; - fs_inst *load_payload = (fs_inst *) inst->prev; - - if (load_payload->is_head_sentinel() || - load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD) - continue; - /* We don't want to remove the message header or the first parameter. * Removing the first parameter is not allowed, see the Haswell PRM * volume 7, page 149: @@ -2186,12 +2215,13 @@ fs_visitor::opt_zero_samples() * "Parameter 0 is required except for the sampleinfo message, which * has no parameter 0" */ - while (inst->mlen > inst->header_size + inst->exec_size / 8 && - load_payload->src[(inst->mlen - inst->header_size) / - (inst->exec_size / 8) + - inst->header_size - 1].is_zero()) { - inst->mlen -= inst->exec_size / 8; - progress = true; + while (inst->mlen > inst->header_size + inst->exec_size / 8) { + if (last_texture_source_is_zero(inst)) { +inst->mlen -= inst->exec_size / 8; +progress = true; + } else { +break; + } } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types
In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4 message types because I found by experimentation that it doesn't work. I wrote in the comment that I couldn't find any documentation for this problem. However I've now found the documentation and it has additional restrictions on further message types so this patch updates the comment and adds the others. --- That paragraph in the spec also mentions further restrictions that we should probably worry about like that the shader shouldn't combine this optimisation with any other render target data port read/writes. It also has a fairly pessimistic note saying the optimisation is only really good for large polygons in a GUI-like workload. I wonder whether we should be doing some more benchmarking to decide whether it's really a good idea to enable this as a general optimisation even for games. src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 49323eb..bf9ff84 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot() if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex()) return false; - /* This optimisation doesn't seem to work for textureGather for some -* reason. I can't find any documentation or known workarounds to indicate -* that this is expected, but considering that it is probably pretty -* unlikely that a shader would directly write out the results from -* textureGather we might as well just disable it. + /* 3D Sampler » Messages » Message Format +* +* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler +* messages except sample+killpix, resinfo, sampleinfo, LOD, and gather4*” */ - if (tex_inst->opcode == SHADER_OPCODE_TG4 || + if (tex_inst->opcode == SHADER_OPCODE_TXS || + tex_inst->opcode == SHADER_OPCODE_LOD || + tex_inst->opcode == SHADER_OPCODE_TG4 || tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET) return false; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types
Ilia Mirkin writes: >> - if (tex_inst->opcode == SHADER_OPCODE_TG4 || >> + if (tex_inst->opcode == SHADER_OPCODE_TXS || >> + tex_inst->opcode == SHADER_OPCODE_LOD || >> + tex_inst->opcode == SHADER_OPCODE_TG4 || >> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET) > > Do you also need to include SHADER_OPCODE_SAMPLEINFO? Oops, yes, thanks! I've gone ahead and pushed it with this additional change. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Bump. Does anybody have some time to review this patch? I think it's the only one holding up landing 16x MSAA support. The following three only have an ack-by but I'm hoping it is reasonable to just push the branch with that. i965/meta: Support 16x MSAA in the meta stencil blit http://patchwork.freedesktop.org/patch/59790/ i965/fs: Add a sampler program key for whether the texture is 16x MSAA http://patchwork.freedesktop.org/patch/59791/ i965/vec4/skl+: Use ld2dms_w instead of ld2dms http://patchwork.freedesktop.org/patch/59788/ I've rebased a branch with all of these patches here: https://github.com/bpeel/mesa/tree/wip/16x-msaa Thanks. - Neil Neil Roberts writes: > Previously there was a problem in i965 where if 16x MSAA is used then > some of the sample positions are exactly on the 0 x or y axis. When > the MSAA copy blit shader interpolates the texture coordinates at > these sample positions it was possible that it would jump to a > neighboring texel due to rounding errors. It is likely that these > positions would be used on 16x MSAA because that is where they are > defined to be in D3D. > > To fix that this patch makes it use interpolateAtOffset in the blit > shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension > is available. This forces it to interpolate the texture coordinates at > the pixel center to avoid these problematic positions. > > This fixes ext_framebuffer_multisample-unaligned-blit and > ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on > SKL+. > --- > src/mesa/drivers/common/meta_blit.c | 64 > ++--- > 1 file changed, 52 insertions(+), 12 deletions(-) > > diff --git a/src/mesa/drivers/common/meta_blit.c > b/src/mesa/drivers/common/meta_blit.c > index 1f9515a..e812ecb 100644 > --- a/src/mesa/drivers/common/meta_blit.c > +++ b/src/mesa/drivers/common/meta_blit.c > @@ -352,17 +352,27 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, > shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY || > shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) { >char *sample_index; > - const char *arb_sample_shading_extension_string; > + const char *extra_extensions; > + const char *tex_coords = "texCoords"; > >if (dst_is_msaa) { > - arb_sample_shading_extension_string = "#extension > GL_ARB_sample_shading : enable"; > sample_index = "gl_SampleID"; > name = "depth MSAA copy"; > + > + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) { > +extra_extensions = > + "#extension GL_ARB_sample_shading : enable\n" > + "#extension GL_ARB_gpu_shader5 : enable\n"; > +/* See comment below for the color copy */ > +tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))"; > + } else { > +extra_extensions = "#extension GL_ARB_sample_shading : enable\n"; > + } >} else { > - /* Don't need that extension, since we're drawing to a > single-sampled > - * destination. > + /* Don't need any extra extensions, since we're drawing to a > + * single-sampled destination. >*/ > - arb_sample_shading_extension_string = ""; > + extra_extensions = ""; > /* From the GL 4.3 spec: >* >* "If there is a multisample buffer (the value of > SAMPLE_BUFFERS > @@ -399,27 +409,57 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >"\n" >"void main()\n" >"{\n" > - " gl_FragDepth = texelFetch(texSampler, > i%s(texCoords), %s).r;\n" > + " gl_FragDepth = texelFetch(texSampler, > i%s(%s), %s).r;\n" >"}\n", > - arb_sample_shading_extension_string, > + extra_extensions, >sampler_array_suffix, >texcoord_type, >texcoord_type, > + tex_coords, >sample_index); > } else { >/* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning > 1 > * sample). Yes, this is ridiculous. > */ >char *sample_resolve; > - c
Re: [Mesa-dev] [PATCH v2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Ian Romanick writes: >> To fix that this patch makes it use interpolateAtOffset in the blit >> shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension >> is available. This forces it to interpolate the texture coordinates at >> the pixel center to avoid these problematic positions. > > Would it also work to use "centroid in" interpolation qualifier? Do we > have any data about the relative cost of the three interpolation > methods? I don't think centroid interpolation does anything for per-sample shading. Centroid interpolation is just meant to ensure that the interpolated values are within the polygon (it's confusingly named and has nothing to do with the center). For per-sample shading the sample position will always be within the polygon so it will just use that for the interpolation and we would be stuck with the same problem that some of these positions are on the pixel boundary. >> + >> + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) { >> +extra_extensions = >> + "#extension GL_ARB_sample_shading : enable\n" >> + "#extension GL_ARB_gpu_shader5 : enable\n"; > > You can unconditionally add the enables. If the implementation doesn't > support the extension, enable will still succeed while require will > not. Ok, yes that is probably worth doing. The GL_ARB_sample_shading one was already conditionally added before my patch, so maybe I can make a second patch that first stops it from doing that. Thanks for looking at the patch. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] meta/blit: Always try to enable GL_ARB_sample_shading
Previously this extension was only enabled when blitting between two multisampled buffers. However I don't think it does any harm to just enable it all the time. The ‘enable’ option is used instead of ‘require’ so that the shader will still compile if the extension isn't available in the cases where it isn't used. This will make the next patch simpler because it wants to add another optional extension. --- src/mesa/drivers/common/meta_blit.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index b92c2e2..496ce45 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -357,17 +357,11 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY || shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) { char *sample_index; - const char *arb_sample_shading_extension_string; if (dst_is_msaa) { - arb_sample_shading_extension_string = "#extension GL_ARB_sample_shading : enable"; sample_index = "gl_SampleID"; name = "depth MSAA copy"; } else { - /* Don't need that extension, since we're drawing to a single-sampled - * destination. - */ - arb_sample_shading_extension_string = ""; /* From the GL 4.3 spec: * * "If there is a multisample buffer (the value of SAMPLE_BUFFERS @@ -397,7 +391,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, fs_source = ralloc_asprintf(mem_ctx, "#version 130\n" "#extension GL_ARB_texture_multisample : enable\n" - "%s\n" + "#extension GL_ARB_sample_shading : enable\n" "uniform sampler2DMS%s texSampler;\n" "in %s texCoords;\n" "out vec4 out_color;\n" @@ -406,7 +400,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, "{\n" " gl_FragDepth = texelFetch(texSampler, i%s(texCoords), %s).r;\n" "}\n", - arb_sample_shading_extension_string, sampler_array_suffix, texcoord_type, texcoord_type, @@ -416,14 +409,12 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, * sample). Yes, this is ridiculous. */ char *sample_resolve; - const char *arb_sample_shading_extension_string; const char *merge_function; name = ralloc_asprintf(mem_ctx, "%svec4 MSAA %s", vec4_prefix, dst_is_msaa ? "copy" : "resolve"); if (dst_is_msaa) { - arb_sample_shading_extension_string = "#extension GL_ARB_sample_shading : enable"; sample_resolve = ralloc_asprintf(mem_ctx, " out_color = texelFetch(texSampler, i%s(texCoords), gl_SampleID);", texcoord_type); merge_function = ""; } else { @@ -439,8 +430,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, "vec4 merge(vec4 a, vec4 b) { return (a + b); }\n"; } - arb_sample_shading_extension_string = ""; - /* We're assuming power of two samples for this resolution procedure. * * To avoid losing any floating point precision if the samples all @@ -496,7 +485,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, fs_source = ralloc_asprintf(mem_ctx, "#version 130\n" "#extension GL_ARB_texture_multisample : enable\n" - "%s\n" + "#extension GL_ARB_sample_shading : enable\n" "#define gvec4 %svec4\n" "uniform %ssampler2DMS%s texSampler;\n" "in %s texCoords;\n" @@ -507,7 +496,6 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, "{\n" "%s\n" /* sample_resolve */ "}\n", - arb_sample_shading_extension_string, vec4_prefix, vec4_prefix, sampler_array_suffix, -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/2] mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Previously there was a problem in i965 where if 16x MSAA is used then some of the sample positions are exactly on the 0 x or y axis. When the MSAA copy blit shader interpolates the texture coordinates at these sample positions it was possible that it would jump to a neighboring texel due to rounding errors. It is likely that these positions would be used on 16x MSAA because that is where they are defined to be in D3D. To fix that this patch makes it use interpolateAtOffset in the blit shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension is available. This forces it to interpolate the texture coordinates at the pixel center to avoid these problematic positions. This fixes ext_framebuffer_multisample-unaligned-blit and ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on SKL+. v2: Use interpolateAtOffset instead of interpolateAtSample v3: Always try to enable GL_ARB_gpu_shader5 in the shader [Ian Romanick] --- src/mesa/drivers/common/meta_blit.c | 39 +++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 496ce45..4a2444a 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -357,10 +357,16 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_DEPTH_COPY || shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) { char *sample_index; + const char *tex_coords = "texCoords"; if (dst_is_msaa) { sample_index = "gl_SampleID"; name = "depth MSAA copy"; + + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) { +/* See comment below for the color copy */ +tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))"; + } } else { /* From the GL 4.3 spec: * @@ -392,17 +398,19 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, "#version 130\n" "#extension GL_ARB_texture_multisample : enable\n" "#extension GL_ARB_sample_shading : enable\n" + "#extension GL_ARB_gpu_shader5 : enable\n" "uniform sampler2DMS%s texSampler;\n" "in %s texCoords;\n" "out vec4 out_color;\n" "\n" "void main()\n" "{\n" - " gl_FragDepth = texelFetch(texSampler, i%s(texCoords), %s).r;\n" + " gl_FragDepth = texelFetch(texSampler, i%s(%s), %s).r;\n" "}\n", sampler_array_suffix, texcoord_type, texcoord_type, + tex_coords, sample_index); } else { /* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning 1 @@ -415,7 +423,33 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, dst_is_msaa ? "copy" : "resolve"); if (dst_is_msaa) { - sample_resolve = ralloc_asprintf(mem_ctx, " out_color = texelFetch(texSampler, i%s(texCoords), gl_SampleID);", texcoord_type); + const char *tex_coords; + + if (ctx->Extensions.ARB_gpu_shader5 && samples >= 16) { +/* If interpolateAtOffset is available then it will be used to + * force the interpolation to the center. This is required at + * least on Intel hardware because it is possible to have a sample + * position on the 0 x or y axis which means it will lie exactly + * on the pixel boundary. If we let the hardware interpolate the + * coordinates at one of these positions then it is possible for + * it to jump to a neighboring texel when converting to ints due + * to rounding errors. This is only done for >= 16x MSAA because + * it probably has some overhead. It is more likely that some + * hardware will use one of these problematic positions at 16x + * MSAA because in that case in D3D they are defined to be at + * these positions. + */ +tex_coords = "interpolateAtOffset(texCoords, vec2(0.0))"; + } else { +tex_coords = "texCoords"; + } + + sample_resolve = +ralloc_asprintf(mem_ctx, +" out_color = texelFetch(texSampler, " +"i%s(%s), gl_SampleID);", +texcoord_type, tex_coords); + merge_function = ""; } else { int
Re: [Mesa-dev] [PATCH v6 2/2] mesa/teximage: accept ASTC formats for 3D texture specification
Nanley Chery writes: > + /* Throw an INVALID_OPERATION error if the target is > + * TEXTURE_CUBE_MAP_ARRAY and the format is not ASTC. > + */ > + if (target_can_be_compresed && > + ctx->Extensions.KHR_texture_compression_astc_ldr && > + layout != MESA_FORMAT_LAYOUT_ASTC) > return write_error(error, GL_INVALID_OPERATION); This seems to cause regressions in the following Piglit tests for Gen9: piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array s3tc_dxt1 I think the spec for the extension is based on the GLES spec. Perhaps the intention was to add ASTC support for cube-map array textures whereas previously in GLES no compressed formats were supported for this target. However in regular GL presumably S3TC is supposed to be supported. As it stands this patch disables cube-map array textures for any formats other than ASTC whenever the ASTC extension is available, so it disables S3TC on Gen9. Maybe this section should be limited to GLES? Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color
It seems that if you have a surface format with less than 4 components then the sampler still reports the fast color clear value for the components that are missing. So for example if you clear a GL_RED surface to 1,1,1,1 then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1. Normally this doesn't matter because fast color clears are only available on Gen7+ and for that hardware we also always set the texture swizzle to force the missing components to zero or one. However if the blorp blitter is used then the swizzle is not programmed in the surface state so it doesn't work. To avoid this situation this patch makes it just force the surface's clear color to 0 or 1 for any missing components of the surface's format. --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 69fe7b4..ed26a44 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -361,13 +361,20 @@ is_color_fast_clear_compatible(struct brw_context *brw, * SURFACE_STATE. */ static uint32_t -compute_fast_clear_color_bits(const union gl_color_union *color) +compute_fast_clear_color_bits(mesa_format format, + const union gl_color_union *color) { uint32_t bits = 0; for (int i = 0; i < 4; i++) { - /* Testing for non-0 works for integer and float colors */ - if (color->f[i] != 0.0f) + /* Testing for non-0 works for integer and float colors. If the + * component doesn't exist in the format then force the color to 0 for + * the RGB components and 1 for the alpha. + */ + if (_mesa_format_has_color_component(format, i) ? + (color->f[i] != 0.0f) : + (i == 3)) { bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); + } } return bits; } @@ -505,7 +512,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, switch (clear_type) { case FAST_CLEAR: irb->mt->fast_clear_color_value = -compute_fast_clear_color_bits(&ctx->Color.ClearColor); +compute_fast_clear_color_bits(irb->mt->format, + &ctx->Color.ClearColor); irb->need_downsample = true; /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa/blit: Don't require the same format for mulitisample blits
Previously the GL spec required that whenever glBlitFramebuffer is used with either buffer being multisampled, the internal formats must match. However the GL 4.4 spec was later changed to remove this restriction. In the section entitled “Changes in the released Specification of July 22, 2013” it says: “Relax BlitFramebuffer in section 18.3.1 so that format conversion can take place during multisample blits, since drivers already allow this and some apps depend on it.” If most drivers already allowed this in earlier versions I think it's safe to assume that this is a spec bug and it should also be allowed in all versions. This patch just removes the restriction on desktop GL. For GLES there are conformance tests that assert the previous behaviour so it is probably safer to leave it in. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92706 --- There is a corresponding update to Piglit to make it expect this new behaviour here: http://patchwork.freedesktop.org/patch/63670/ I've only tested this with SKL and HSW. It passed fine on SKL but it needs the previous patch to work on HSW. I put it through a full Jenkins run but it doesn't really test much without the Piglit patch as well and I don't think I can post Piglit patches to Jenkins. As this enables the new behaviour for all Mesa drivers of course it would need some testing with other drivers before landing. src/mesa/main/blit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c index a32f1a4..6803062 100644 --- a/src/mesa/main/blit.c +++ b/src/mesa/main/blit.c @@ -285,7 +285,8 @@ _mesa_blit_framebuffer(struct gl_context *ctx, /* extra checks for multisample copies... */ if (readFb->Visual.samples > 0 || drawFb->Visual.samples > 0) { /* color formats must match */ - if (!compatible_resolve_formats(colorReadRb, colorDrawRb)) { + if (_mesa_is_gles(ctx) && + !compatible_resolve_formats(colorReadRb, colorDrawRb)) { _mesa_error(ctx, GL_INVALID_OPERATION, "%s(bad src/dst multisample pixel formats)", func); return; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color
Neil Roberts writes: > Normally this doesn't matter because fast color clears are only > available on Gen7+ and for that hardware we also always set the > texture swizzle to force the missing components to zero or one. Looking at it a bit more I think this part is wrong and it doesn't actually set the texture swizzle in for this case. It is actually a general problem even without the blorp. I can demonstrate it with this simpler test which I've posted to the Piglit mailing list: http://patchwork.freedesktop.org/patch/63678/ Apart from the commit message I think the patch is still sound and it does fix the test case. I tried to make a version that is based on top of Ben's skl-fast-clear branch but it doesn't seem to work on SKL. I am currently looking into this. - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: Don't include missing components in the fast clear color
Ben Widawsky writes: > It seems reasonable to me, but since you touch the same code I touch > in fast clears, and since I always screw up rebases, any chance I > could persuade you to wait until I land fast clears? Sure, I don't mind waiting. I did have a go at rebasing the patch on top of your branch and I think the code should look something like this: if (brw->gen >= 9) { mt->gen9_fast_clear_color = *color; + + /* Override components that aren't in the surface format */ + for (int i = 0; i < 4; i++) { + if (!_mesa_format_has_color_component(mt->format, i)) { +if (_mesa_is_format_integer(mt->format)) + mt->gen9_fast_clear_color.i[i] = (i == 3); +else + mt->gen9_fast_clear_color.f[i] = (i == 3); + } + } } else { mt->fast_clear_color_value = 0; for (int i = 0; i < 4; i++) { - /* Testing for non-0 works for integer and float colors */ - if (color->f[i] != 0.0f) { - mt->fast_clear_color_value |= -1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); + /* Testing for non-0 works for integer and float colors. If the + * component doesn't exist in the format then force the color to 0 for + * the RGB components and 1 for the alpha. + */ + if (_mesa_format_has_color_component(mt->format, i) ? + (color->f[i] != 0.0f) : + (i == 3)) { +mt->fast_clear_color_value |= + 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); } } } However I haven't been able to get it to work. It looks like there is a more general problem with fast clears on SKL where a bunch of formats (such as GL_RGB) always return 0,0,0,0 for the pixels that are cleared regardless of the clear color. I think we haven't noticed this in the Piglit tests because most of them clear to black anyway. However if you modify ColorGradientSunburst to clear to white instead then the multisample formats tests starts failing for a lot more formats. I am currently trying to look into this but maybe we should discuss this in a different thread because I think it's not directly related to this patch. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/7] [v2] i965/gen9: Support fast clears for 32b float
Ben Widawsky writes: > Two formats are explicitly disabled because they fail piglit tests, > LUMINANCE16F and INTENSITY16F. There is some question about the > validity of sampling from these surfaces for all gens, however, there > seem to be no other failures, so I'd prefer to leave tackling that for > a separate series. Did you see my message in another thread pointing out that there are other issues with the SKL fast clear? If you clear to white instead of black then a bunch of other more normal formats fail such as GL_RGB. This isn't related to this 32b float patch because obviously white should fit the criteria of being just 0 and 1. Perhaps the easiest way to demonstrate this bug is to tweak the formats test so that it clears to white instead of black, like this: diff --git a/tests/util/piglit-test-pattern.cpp b/tests/util/piglit-test-pattern.cpp index 12d9918..bec25bc 100644 --- a/tests/util/piglit-test-pattern.cpp +++ b/tests/util/piglit-test-pattern.cpp @@ -664,7 +664,7 @@ ColorGradientSunburst::draw_with_scale_and_offset(const float (*proj)[4], } case GL_UNSIGNED_NORMALIZED: case GL_FLOAT: { - glClearColor(offset, offset, offset, offset); + glClearColor(1.0, 1.0, 1.0, 1.0); glClear(GL_COLOR_BUFFER_BIT); break; } This passes on HSW but makes a bunch of formats fail on SKL. It seems like we're missing something about how we're supposed to program the fast colour state. For GL_RGB it seems to be expecting an integer value in the range 0→255 instead of a float. Eg, if you program the integer 128 you get back 0.5 when reading. However this isn't the case for GL_RGBA which does seem to expect float values. I can't find anything that would explain this in the specs. It makes no sense! There is this bit in the spec which we are ignoring: “If Number of Multisamples is not MULTISAMPLECOUNT_1, only 0/1 values allowed” But that doesn't explain the problem because in this case we are clearing to white. http://lists.freedesktop.org/archives/mesa-dev/2015-November/099274.html Regards, - Neil > > v2: Just reject the two failing types. > > Cc: Neil Roberts Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 12 ++-- > src/mesa/drivers/dri/i965/gen8_surface_state.c | 8 > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index 67dd22d..a024d02 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -360,10 +360,18 @@ is_color_fast_clear_compatible(struct brw_context *brw, > } > > for (int i = 0; i < 4; i++) { > - if (color->f[i] != 0.0f && color->f[i] != 1.0f && > - _mesa_format_has_color_component(format, i)) { > + if (!_mesa_format_has_color_component(format, i)) { > + continue; > + } > + > + if (brw->gen < 9 && > + color->f[i] != 0.0f && color->f[i] != 1.0f) { > return false; >} > + > + if (format == MESA_FORMAT_L_FLOAT16 || > + format == MESA_FORMAT_I_FLOAT16) > + return false; > } > return true; > } > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > b/src/mesa/drivers/dri/i965/gen8_surface_state.c > index 8fe480c..2381c83 100644 > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > @@ -188,14 +188,6 @@ gen8_emit_fast_clear_color(struct brw_context *brw, > uint32_t *surf) > { > if (brw->gen >= 9) { > -#define check_fast_clear_val(x) \ > - assert(mt->gen9_fast_clear_color.f[x] == 0.0 || \ > - mt->gen9_fast_clear_color.f[x] == 1.0) > - check_fast_clear_val(0); > - check_fast_clear_val(1); > - check_fast_clear_val(2); > - check_fast_clear_val(3); > -#undef check_fast_clear_val >surf[12] = mt->gen9_fast_clear_color.ui[0]; >surf[13] = mt->gen9_fast_clear_color.ui[1]; >surf[14] = mt->gen9_fast_clear_color.ui[2]; > -- > 2.6.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965: Handle lum, intensity and missing components in the fast clear
It looks like the sampler hardware doesn't take into account the surface format when sampling a cleared color after a fast clear has been done. So for example if you clear a GL_RED surface to 1,1,1,1 then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1. This patch makes it override the color that is programmed in the surface state in order to swizzle for luminance and intensity as well as overriding the missing components. v2: Handle luminance and intensity formats --- I made a more extensive test case which tests all of the formats in fbo_formats.h as well as using more than one test color here: http://patchwork.freedesktop.org/patch/64578/ In the process I noticed that there is a similar problem with luminance and intensity textures so here is a v2 to cope with that. I've made another version of this patch which is rebased on top of Ben's skl-fast-clear branch so we can take whichever version depending on which lands first: https://github.com/bpeel/mesa/commit/da7edcb6dfd93c7dd86b2e148c44dff7 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 38 +++-- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 69fe7b4..3071590 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -361,12 +361,43 @@ is_color_fast_clear_compatible(struct brw_context *brw, * SURFACE_STATE. */ static uint32_t -compute_fast_clear_color_bits(const union gl_color_union *color) +compute_fast_clear_color_bits(mesa_format format, + const union gl_color_union *color) { + union gl_color_union override_color = *color; uint32_t bits = 0; + + /* The sampler doesn't look at the format of the surface when the fast +* clear color is used so we need to implement luminance, intensity and +* missing components manually. +*/ + switch (_mesa_get_format_base_format(format)) { + case GL_INTENSITY: + override_color.ui[3] = override_color.ui[0]; + /* flow through */ + case GL_LUMINANCE: + case GL_LUMINANCE_ALPHA: + override_color.ui[1] = override_color.ui[0]; + override_color.ui[2] = override_color.ui[0]; + break; + default: + for (int i = 0; i < 3; i++) { + if (!_mesa_format_has_color_component(format, i)) +override_color.ui[i] = 0; + } + break; + } + + if (!_mesa_format_has_color_component(format, 3)) { + if (_mesa_is_format_integer_color(format)) + override_color.ui[3] = 1; + else + override_color.f[3] = 1.0f; + } + for (int i = 0; i < 4; i++) { /* Testing for non-0 works for integer and float colors */ - if (color->f[i] != 0.0f) + if (override_color.f[i] != 0.0f) bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); } return bits; @@ -505,7 +536,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, switch (clear_type) { case FAST_CLEAR: irb->mt->fast_clear_color_value = -compute_fast_clear_color_bits(&ctx->Color.ClearColor); +compute_fast_clear_color_bits(irb->mt->format, + &ctx->Color.ClearColor); irb->need_downsample = true; /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Disable fast clear for formats without alpha
Ben Widawsky writes: > Here is one proposal to fix the issue. I noticed that only formats > without alpha were failing. This sucks for RGBX formats (which > technically aren't fast clearable based on the surface format). The > hunk for moving the format should happen regardless of this patch. If we do this we could limit it to just multisample buffers. I think the problem only happens when sampling the cleared texels but we effectively never do that for single-sampled surfaces because in that case we always do a resolve before using them as textures. Being able to fast clear a single-sampled RGBX window system buffer is probably one of the more useful use cases so it's probably worth allowing that. We could also remove the explicit check the those two FLOAT16 formats in that case because neither of them have alpha bits. > Neil has another patch which attempts to not disable formats, and > workaround the hardware issue that we cannot explain. I just wanted to > put this on the list. If anyone's interested, the patch is here: https://github.com/bpeel/mesa/commit/2c7b2ddc8dba388665c258a1182 Obviously I'm not suggesting we merge that patch because it's a massive kludge. The test case which we're using to find these issues is here: http://patchwork.freedesktop.org/patch/64578/ - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] [v2] i965/meta/gen9: Individually fast clear color attachments
Hi, You don't seem to have included any of the suggestions I made in my review. Was this deliberate? If not, the main points were: • You don't need to call brw_fast_clear_init or use_rectlist in the function because these are already called before entering it. • I don't think it's worth creating a framebuffer. Instead you can just call _mesa_meta_drawbuffers_from_bitfield(1 << index) in the loop. Modifying the draw buffers state should be ok because it's saved in the meta state and it's already done for the Gen8 code path. I went ahead and tried the changes in a patch here: https://github.com/bpeel/mesa/commit/b2aa8f2d90572392030e5177952bf It doesn't cause any Jenkins regressions. Feel free to squash it into the patch if you want, or of course if you prefer to keep your patch as it is it's up to you. Regards, - Neil Ben Widawsky writes: > The impetus for this patch comes from a seemingly benign statement within the > spec (quoted within the patch). For me, this patch was at some point critical > for getting stable piglit results (though this did not seem to be the case on > a > branch Chad was working on). > > It is very important for clearing multiple color buffer attachments and can be > observed in the following piglit tests: > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 > > v2: Doing the framebuffer binding only once (Chad) > Directly use the renderbuffers from the mt (Chad) > > Cc: Chad Versace > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 94 > + > 1 file changed, 81 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index eac92d4..97444d7 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -428,6 +428,71 @@ use_rectlist(struct brw_context *brw, bool enable) > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM; > } > > +/** > + * Individually fast clear each color buffer attachment. On previous gens > this > + * isn't required. The motivation for this comes from one line (which seems > to > + * be specific to SKL+). The list item is in section titled _MCS Buffer for > + * Render Target(s)_ > + * > + * "Since only one RT is bound with a clear pass, only one RT can be > cleared > + * at a time. To clear multiple RTs, multiple clear passes are required." > + * > + * The code follows the same idea as the resolve code which creates a fake > FBO > + * to avoid interfering with too much of the GL state. > + */ > +static void > +fast_clear_attachments(struct brw_context *brw, > + struct gl_framebuffer *fb, > + uint32_t fast_clear_buffers, > + struct rect fast_clear_rect) > +{ > + assert(brw->gen >= 9); > + struct gl_context *ctx = &brw->ctx; > + const GLuint old_fb = ctx->DrawBuffer->Name; > + GLuint fbo; > + > + _mesa_GenFramebuffers(1, &fbo); > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); > + _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0); > + > + brw_fast_clear_init(brw); > + use_rectlist(brw, true); > + brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + > + /* SKL+ also has a resolve mode for compressed render targets and thus > more > +* bits to let us select the type of resolve. For fast clear resolves, it > +* turns out we can use the same value as pre-SKL though. > +*/ > + set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); > + > + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { > + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; > + struct intel_renderbuffer *irb = intel_renderbuffer(rb); > + int index = fb->_ColorDrawBufferIndexes[buf]; > + > + if ((fast_clear_buffers & (1 << index)) == 0) > + continue; > + > + > + _mesa_framebuffer_renderbuffer(ctx, ctx->DrawBuffer, > + GL_COLOR_ATTACHMENT0, rb, > + "meta fast clear (per-attachment)"); > + > + brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers)); > + > + /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll > + * resolve them eventually. > + */ > + irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; > + } > + > + set_fast_clear_op(brw, 0); > + use_rectlist(brw, false); > + > + _mesa_DeleteFramebuffers(1, &fbo); > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb); > +} > + > bool > brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, > GLbitfield buffers, bool partial_clear) > @@ -603,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct > gl_framebuffer *fb, > use_rectlist(brw, true); > > layers = MAX2(
Re: [Mesa-dev] [PATCH 4/7] [v2] i965/meta/gen9: Individually fast clear color attachments
Ben Widawsky writes: > Thanks a lot, I will squash it in - and sorry again about ignoring your > feedback. Ok, no worries. Feel free to add Reviewed-by: Neil Roberts if you squash the changes in. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Disable fast clears for MSRTs on SKL
There are currently a bunch of formats that behave strangely when sampling the cleared color from the MCS buffer on SKL. They seem to mostly be formats that don't have an alpha component, although it's not all of them, and we haven't yet found anything in the specs which would explain this. For now to be on the safe side this patch just disables fast clears for MSRTs on SKL altogether. The assumption is that clears are probably more likely to be used in single-sampled applications anyway so we can at least get them working and we can enable MSRTs later once we understand the problem better. --- If we go with this patch we could remove the explicit check for the half-float intensity and luminance formats as well. src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index dc085ba..85576a8 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -524,6 +524,13 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, if (brw->gen < 7) clear_type = REP_CLEAR; + /* Certain formats have unresolved issues with sampling from the MCS + * buffer on Gen9. This disables fast clears altogether for MSRTs until + * we can figure out what's going on. + */ + if (brw->gen >= 9 && irb->mt->num_samples > 1) + clear_type = REP_CLEAR; + if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) clear_type = REP_CLEAR; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965: Handle lum, intensity and missing components in the fast clear
Ben Widawsky writes: >> + case GL_LUMINANCE: >> + case GL_LUMINANCE_ALPHA: >> + override_color.ui[1] = override_color.ui[0]; >> + override_color.ui[2] = override_color.ui[0]; >> + break; > > The definition for GL_LUMINANCE afaict: "Each element is a single > luminance value. The GL converts it to floating point, then assembles > it into an RGBA element by replicating the luminance value three times > for red, green, and blue and attaching 1 for alpha." > > doesn't that mean you need > override_color.f[3] = 1.0f; That is handled separately by the bit at the bottom which checks for _mesa_format_has_color_component(format, 3). It's the same bit of code that overrides the alpha channel for GL_RGB. >> + default: >> + for (int i = 0; i < 3; i++) { >> + if (!_mesa_format_has_color_component(format, i)) >> +override_color.ui[i] = 0; >> + } > > Is there an easy way to verify that all formats want 0 for GB > channels? It looks right to me, but with my knowledge of GL, that > doesn't mean much (I am looking here: > https://www.opengl.org/sdk/docs/man/html/glTexImage2D.xhtml) In the GL 4.5 spec, section 15.2.1 it says “When a texture lookup is performed in a fragment shader, the GL computes the filtered texture value τ in the manner described in sections 8.14 and 8.15, and converts it to a texture base color C b as shown in table 15.1”. Table 15.1 looks something like this: Texture Base Texture base color Internal Format Cb Ab RED(Rt, 0, 0) 1 RG (Rt, Gt, 0) 1 RGB(Rt, Gt, Bt) 1 RGBA (Rt, Gt, Bt) At In the compatibility spec there is also the luminance, intensity and alpha formats and they all have 0 for the missing RGB components. I also tried running the Piglit test I wrote on the nvidia binary blob so if we can assume that that tests all renderable formats then we can be confident we at least match what nvidia do. > I also think that component 0 must always have a color, right? (I'm > not requesting a change as such, just making sure my understanding of > what you're trying to do is correct). As Ilia mentioned, GL_ALPHA doesn't have a red component. >> + break; >> + } >> + >> + if (!_mesa_format_has_color_component(format, 3)) { >> + if (_mesa_is_format_integer_color(format)) >> + override_color.ui[3] = 1; > > We shouldn't ever be fast clearing integer formats. We can on GEN8+, > but we're not doing it today. So I think it should be safe to remove > this check. You're right, but I thought I'd add it anyway because I didn't think this is a particularly hot code path and it might make life easier for whoever eventually adds support for integer fast clears. I'm happy to take it out thought if you think that'd be better. > Seems like a good patch to me. It would probably be nice to track down > a good spec reference if you manage to find one. I know I've seen such > reference in SKL docs (which aren't SKL specific) - but I am having > trouble finding it in PRMs. My VPN is broken, so I can't look at SKL > docs right now. I haven't been able to find anything yet sadly. > With the explanation of why the luminance alpha channel isn't 1 (I > also claim incompotence on the GL_LUMINANCE_ALPHA format): > Reviewed-by: Ben Widawsky Many thanks for the review. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Force a BINDING_TABLE_POINTER_* command after push constant command
According to the SkyLake bspec the 3DSTATE_CONSTANT_* commands only take effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This patch just makes it set the BRW_NEW_SURFACES state when uploading the push constants to ensure the binding tables will be updated. This fixes the fbo-blending-formats Piglit test and possibly others. Reviewed-by: Kristian Høgsberg --- src/mesa/drivers/dri/i965/gen7_vs_state.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c b/src/mesa/drivers/dri/i965/gen7_vs_state.c index 404dd20..c55d6ca 100644 --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c @@ -60,6 +60,13 @@ gen7_upload_constant_state(struct brw_context *brw, } ADVANCE_BATCH(); + + /* On SkyLake+ the new constants don't take effect until the next + * corresponding 3DSTATE_BINDING_TABLE_POINTER_* command is parsed so we + * need to ensure that is sent + */ + if (brw->gen >= 9) + brw->state.dirty.brw |= BRW_NEW_SURFACES; } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] dir-locals.el: Don't set variables for non-programming modes
This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. --- .dir-locals.el | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.dir-locals.el b/.dir-locals.el index 04a1a2c..d95eb48 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . nil) (tab-width . 8) (c-basic-offset . 3) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] dir-locals.el: Don't set variables for non-programming modes
This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. v2: Apply to all the .dir-locals.el files, not just the one in the root directory. --- .dir-locals.el| 2 +- src/gallium/drivers/freedreno/.dir-locals.el | 2 +- src/gallium/drivers/r600/.dir-locals.el | 2 +- src/gallium/drivers/radeon/.dir-locals.el | 2 +- src/gallium/drivers/radeonsi/.dir-locals.el | 2 +- src/gallium/drivers/vc4/.dir-locals.el| 2 +- src/gallium/drivers/vc4/kernel/.dir-locals.el | 2 +- src/gallium/winsys/radeon/.dir-locals.el | 2 +- src/mesa/drivers/dri/nouveau/.dir-locals.el | 2 +- 9 files changed, 9 insertions(+), 9 deletions(-) diff --git a/.dir-locals.el b/.dir-locals.el index 04a1a2c..d95eb48 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . nil) (tab-width . 8) (c-basic-offset . 3) diff --git a/src/gallium/drivers/freedreno/.dir-locals.el b/src/gallium/drivers/freedreno/.dir-locals.el index c26578b..aa20d49 100644 --- a/src/gallium/drivers/freedreno/.dir-locals.el +++ b/src/gallium/drivers/freedreno/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . true) (tab-width . 4) (c-basic-offset . 4) diff --git a/src/gallium/drivers/r600/.dir-locals.el b/src/gallium/drivers/r600/.dir-locals.el index 8be6a30..4e35c12 100644 --- a/src/gallium/drivers/r600/.dir-locals.el +++ b/src/gallium/drivers/r600/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . true) (tab-width . 8) (c-basic-offset . 8) diff --git a/src/gallium/drivers/radeon/.dir-locals.el b/src/gallium/drivers/radeon/.dir-locals.el index 8be6a30..4e35c12 100644 --- a/src/gallium/drivers/radeon/.dir-locals.el +++ b/src/gallium/drivers/radeon/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . true) (tab-width . 8) (c-basic-offset . 8) diff --git a/src/gallium/drivers/radeonsi/.dir-locals.el b/src/gallium/drivers/radeonsi/.dir-locals.el index 8be6a30..4e35c12 100644 --- a/src/gallium/drivers/radeonsi/.dir-locals.el +++ b/src/gallium/drivers/radeonsi/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . true) (tab-width . 8) (c-basic-offset . 8) diff --git a/src/gallium/drivers/vc4/.dir-locals.el b/src/gallium/drivers/vc4/.dir-locals.el index ed10dc2..ac94242 100644 --- a/src/gallium/drivers/vc4/.dir-locals.el +++ b/src/gallium/drivers/vc4/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . nil) (tab-width . 8) (c-basic-offset . 8) diff --git a/src/gallium/drivers/vc4/kernel/.dir-locals.el b/src/gallium/drivers/vc4/kernel/.dir-locals.el index 2e58e90..49403de 100644 --- a/src/gallium/drivers/vc4/kernel/.dir-locals.el +++ b/src/gallium/drivers/vc4/kernel/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . t) (tab-width . 8) (c-basic-offset . 8) diff --git a/src/gallium/winsys/radeon/.dir-locals.el b/src/gallium/winsys/radeon/.dir-locals.el index 0e937bb..d5f0f04 100644 --- a/src/gallium/winsys/radeon/.dir-locals.el +++ b/src/gallium/winsys/radeon/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . nil) (tab-width . 8) (c-basic-offset . 4) diff --git a/src/mesa/drivers/dri/nouveau/.dir-locals.el b/src/mesa/drivers/dri/nouveau/.dir-locals.el index 29735e8..774f023 100644 --- a/src/mesa/drivers/dri/nouveau/.dir-locals.el +++ b/src/mesa/drivers/dri/nouveau/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . true) (tab-width . 8) (c-basic-offset . 8) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Fixing an x86 FPU bug.
Hi Marius, This patch does indeed make the Piglit test pass but it doesn't seem like a complete fix. It looks like there are still a number of places that are copying via floats that the test wouldn't catch. There are also lots of places that are still using GLfloat to store the attribute values and then these are cast to gl_constant_value* whenever they are used in combination with the places that the patch does change. I think if we wanted to do a complete patch then we should really change all places that are storing attribute values to use gl_constant_value to make it clear that they aren't really just floats. If it is too much of a change just to fix this relatively minor bug then perhaps it would be better to do a simpler change that just fixes this one case using a memcpy or something instead of changing so much code. I don't really like the idea of only changing some of the places to gl_constant_value and leaving the rest as GLfloat because it seems even more confusing that way. I don't think it makes sense to add this patch to the stable branch. The bug has presumably been there since the introduction of integer attributes in 2012 (see acf438f537) and nobody has complained so it doesn't seem particularly urgent. The patch is non-trivial so I think there is a risk it will introduce more bugs. I've made some comments inline below. marius.pre...@intel.com writes: > From: Marius Predut > > On 32-bit, for floating point operations is used x86 FPU registers instead > SSE, > reason for when reinterprets an integer as a float result is unexpected > (modify floats when they are written to memory). > The defect was checked with and without -O3 compiler flag. > > CC: > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82668 > Signed-off-by: Marius Predut > --- > src/mesa/main/context.c |3 ++- > src/mesa/main/macros.h| 47 > - > src/mesa/vbo/vbo_attrib_tmp.h | 20 ++ > src/mesa/vbo/vbo_exec.h |3 ++- > src/mesa/vbo/vbo_exec_api.c | 31 +-- > src/mesa/vbo/vbo_exec_eval.c | 22 ++- > src/mesa/vbo/vbo_save_api.c | 16 +++--- > src/mesa/vbo/vbo_save_draw.c |4 ++-- > 8 files changed, 90 insertions(+), 56 deletions(-) > > diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c > index 400c158..11ab8a9 100644 > --- a/src/mesa/main/context.c > +++ b/src/mesa/main/context.c > @@ -134,6 +134,7 @@ > #include "math/m_matrix.h" > #include "main/dispatch.h" /* for _gloffset_COUNT */ > #include "uniforms.h" > +#include "macros.h" > > #ifdef USE_SPARC_ASM > #include "sparc/sparc.h" > @@ -656,7 +657,7 @@ _mesa_init_constants(struct gl_constants *consts, gl_api > api) > consts->MaxSamples = 0; > > /* GLSL default if NativeIntegers == FALSE */ > - consts->UniformBooleanTrue = FLT_AS_UINT(1.0f); > + consts->UniformBooleanTrue = FLOAT_AS_UNION(1.0f).u; > > /* GL_ARB_sync */ > consts->MaxServerWaitTimeout = 0x1fff7fffULL; > diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h > index cd5f2d6..2651ffc 100644 > --- a/src/mesa/main/macros.h > +++ b/src/mesa/main/macros.h > @@ -32,6 +32,7 @@ > #define MACROS_H > > #include "imports.h" > +#include "program/prog_parameter.h" > > > /** > @@ -170,27 +171,26 @@ extern GLfloat _mesa_ubyte_to_float_color_tab[256]; > ub = ((GLubyte) F_TO_I((f) * 255.0F)) > #endif > > -static inline GLfloat INT_AS_FLT(GLint i) > +static union gl_constant_value UINT_AS_UNION(GLuint u) > { > - fi_type tmp; > - tmp.i = i; > - return tmp.f; > + union gl_constant_value tmp; > + tmp.u = u; > + return tmp; > } > > -static inline GLfloat UINT_AS_FLT(GLuint u) > +static inline union gl_constant_value INT_AS_UNION(GLint i) > { > - fi_type tmp; > - tmp.u = u; > - return tmp.f; > + union gl_constant_value tmp; > + tmp.i = i; > + return tmp; > } > > -static inline unsigned FLT_AS_UINT(float f) > +static inline union gl_constant_value FLOAT_AS_UNION(GLfloat f) > { > - fi_type tmp; > + union gl_constant_value tmp; > tmp.f = f; > - return tmp.u; > + return tmp; > } > - > /** > * Convert a floating point value to an unsigned fixed point value. > * > @@ -382,6 +382,15 @@ do {\ > V[3] = V3; \ > } while(0) > > +/** Assignment union*/ > +#define ASSIGN_4V_UNION( V, V0, V1, V2, V3 ) \ > +do {\ > +V[0].f = V0; \ > +V[1].f = V1; \ > +V[2].f = V2; \ > +V[3].f = V3; \ > +} while(0) > + I think it would be better not to have this macro and just use the ASSIGN_4V macro to copy the gl_constant_value structs directly. This macro is copying the union values via a floating-point operation which is what we're trying to avoid. Technically
Re: [Mesa-dev] [PATCH 01/14] i965: Don't tile 1D miptrees.
I made a similar patch in my local tree because it will be necessary for Skylake which doesn't support tiling for 1D textures. I made a little test to time rendering a large (4096) 1D texture here: https://github.com/bpeel/glthing/tree/time-1d-texture It gives about an 11% increase in FPS with this patch. It seems like a good idea to me to make it check the height rather than the texture dimensions as Kenneth says. Regards, - Neil Kenneth Graunke writes: > On Saturday, February 07, 2015 02:46:31 AM Francisco Jerez wrote: >> Kenneth Graunke writes: >> >> > On Friday, February 06, 2015 07:23:15 PM Francisco Jerez wrote: >> >> It doesn't really improve locality of texture fetches, quite the >> >> opposite it's a waste of memory bandwidth and space due to tile >> >> alignment. >> >> --- >> >> src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 >> >> 1 file changed, 4 insertions(+) >> >> >> >> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> >> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> >> index 64752dd..311b204 100644 >> >> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> >> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c >> >> @@ -488,6 +488,10 @@ intel_miptree_choose_tiling(struct brw_context *brw, >> >> base_format == GL_DEPTH_STENCIL_EXT) >> >>return I915_TILING_Y; >> >> >> >> + if (mt->target == GL_TEXTURE_1D || >> >> + mt->target == GL_TEXTURE_1D_ARRAY) >> >> + return I915_TILING_NONE; >> >> + >> >> int minimum_pitch = mt->total_width * mt->cpp; >> >> >> >> /* If the width is much smaller than a tile, don't bother tiling. */ >> >> >> > >> > What do you think about checking mt->logical_height0 == 1 instead? >> > GLES doesn't have 1D textures, but people might use a 2D texture with >> > height 1, and I think the same logic applies. >> > >> > I've also been thinking of trying to make core Mesa hide 1D textures and >> > simply turn them into 2D textures with height = 1, so drivers don't have >> > to deal with them. >> > >> > Either way, this seems like a good idea to me. >> >> I guess that would work because the layer count of 1D array textures >> ends up in logical_depth0 rather than logical_height0? Sounds good to >> me, but it would also have the side effect of not tiling "degenerate" 3D >> textures with height=1 but depth>1, I guess we would still like to tile >> those? How about (mt->logical_height0 == 1 && (mt->logical_depth0 == 1 || >> mt->target != GL_TEXTURE_3D))? >> >> Thanks Ken! > > I suppose that's fine, though frankly, trying to optimize a texture with > height 1 at all seems pretty unnecessary to me. I've never seen a 3D > texture with height 1 in the wild... > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpPZwNTC8EO1.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Don't tile 1D miptrees.
Looks good to me. Reviewed-by: Neil Roberts - Neil Francisco Jerez writes: > It doesn't really improve locality of texture fetches, quite the > opposite it's a waste of memory bandwidth and space due to tile > alignment. > > v2: Check mt->logical_height0 instead of mt->target (Ken). Add short > comment explaining why they shouldn't be tiled. > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index 64752dd..0e3888f 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -488,6 +488,13 @@ intel_miptree_choose_tiling(struct brw_context *brw, > base_format == GL_DEPTH_STENCIL_EXT) >return I915_TILING_Y; > > + /* 1D textures (and 1D array textures) don't get any benefit from tiling, > +* in fact it leads to a less efficient use of memory space and bandwidth > +* due to tile alignment. > +*/ > + if (mt->logical_height0 == 1) > + return I915_TILING_NONE; > + > int minimum_pitch = mt->total_width * mt->cpp; > > /* If the width is much smaller than a tile, don't bother tiling. */ > -- > 2.1.3 pgpW186qYlh05.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Implement WaDisable1DDepthStencil
Skylake+ doesn't support setting a depth buffer to a 1D surface but it does allow pretending it's a 2D texture with a height of 1 instead. This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit test (and also seems to avoid a subsequent GPU hang). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037 --- This probably only makes sense on top of Francisco Jerez's patch here: http://lists.freedesktop.org/archives/mesa-dev/2015-February/076392.html The copyteximage test is still failing with some other formats such as GL_RGB16 and some intensity and luminance formats but I'm just looking into that now. src/mesa/drivers/dri/i965/gen8_depth_state.c | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index e428089..b4eb6e1 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -190,6 +190,18 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw, case GL_TEXTURE_3D: assert(mt); depth = MAX2(mt->logical_depth0, 1); + surftype = translate_tex_target(gl_target); + break; + case GL_TEXTURE_1D_ARRAY: + case GL_TEXTURE_1D: + if (brw->gen >= 9) { + /* WaDisable1DDepthStencil. Skylake+ doesn't support 1D depth + * textures but it does allow pretending it's a 2D texture + * instead. + */ + surftype = BRW_SURFACE_2D; + break; + } /* fallthrough */ default: surftype = translate_tex_target(gl_target); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] i965/skl: Fix handling 1D textures
Skylake has changed 1D textures so that the mipmaps are stored more efficiently and the qpitch is calculated differently. These two patches combined fix the following Piglit tests: clear-color-all-types 1d_array mipmapped clear-color-all-types 1d_array single_level fbo-generatemipmap-1d texsubimage array pbo 1D view rendering array-texture copyteximage 1D_ARRAY copyteximage 1D_ARRAY samples=2 copyteximage 1D_ARRAY samples=4 copyteximage 1D_ARRAY samples=6 copyteximage 1D_ARRAY samples=8 fbo-generatemipmap-array fbo-generatemipmap-array RGB9_E5 render-1darray texsubimage array gs-texelFetch-isampler1D gs-texelFetch-isampler1DArray gs-texelFetch-sampler1D gs-texelFetch-sampler1DArray gs-texelFetch-usampler1D gs-texelFetch-usampler1DArray gs-texelFetch-isampler1D gs-texelFetch-isampler1DArray gs-texelFetch-sampler1D gs-texelFetch-sampler1DArray gs-texelFetch-usampler1D gs-texelFetch-usampler1DArray ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965/skl: Layout a 1D miptree horizontally
On Gen9+ the 1D miptree is laid out with all of the mipmap levels in a horizontal line. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 71 -- 1 file changed, 68 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 0e2841f..abf532a 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -158,6 +158,42 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, } static void +gen9_miptree_layout_1d(struct intel_mipmap_tree *mt) +{ + unsigned x = 0; + unsigned width = mt->physical_width0; + unsigned depth = mt->physical_depth0; /* number of array layers. */ + /* When this layout is used the horizontal alignment is fixed at 64 and the +* hardware ignores the value given in the surface state +*/ + const unsigned int align_w = 64; + + mt->total_height = mt->physical_height0; + + if (mt->compressed) { + mt->total_height = ALIGN(mt->physical_height0, mt->align_h); + } + + mt->total_width = 0; + + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { + unsigned img_width; + + intel_miptree_set_level_info(mt, level, x, 0, depth); + + img_width = ALIGN(width, align_w); + if (mt->compressed) + img_width /= align_w; + + mt->total_width = MAX2(mt->total_width, x + img_width); + + x += img_width; + + width = minify(width, 1); + } +} + +static void brw_miptree_layout_2d(struct intel_mipmap_tree *mt) { unsigned x = 0; @@ -239,23 +275,49 @@ align_cube(struct intel_mipmap_tree *mt) mt->total_height += 2; } +static bool +use_linear_1d_layout(struct brw_context *brw, + struct intel_mipmap_tree *mt) +{ + /* On Gen9+ the mipmap levels of a 1D surface are all laid out in a +* horizontal line. This isn't done for depth/stencil buffers however +* because those will be using a tiled layout +*/ + + if (brw->gen >= 9 && + (mt->target == GL_TEXTURE_1D || +mt->target == GL_TEXTURE_1D_ARRAY)) { + GLenum base_format = _mesa_get_format_base_format(mt->format); + + if (base_format != GL_DEPTH_COMPONENT && + base_format != GL_DEPTH_STENCIL) + return true; + } + + return false; +} + static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { int h0, h1; unsigned height = mt->physical_height0; + bool layout_1d = use_linear_1d_layout(brw, mt); h0 = ALIGN(mt->physical_height0, mt->align_h); h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); - if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) + if (mt->array_layout == ALL_SLICES_AT_EACH_LOD || layout_1d) mt->qpitch = h0; else mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h); int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; - brw_miptree_layout_2d(mt); + if (layout_1d) + gen9_miptree_layout_1d(mt); + else + brw_miptree_layout_2d(mt); for (unsigned level = mt->first_level; level <= mt->last_level; level++) { unsigned img_height; @@ -386,7 +448,10 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) break; case INTEL_MSAA_LAYOUT_NONE: case INTEL_MSAA_LAYOUT_IMS: - brw_miptree_layout_2d(mt); + if (use_linear_1d_layout(brw, mt)) +gen9_miptree_layout_1d(mt); + else +brw_miptree_layout_2d(mt); break; } break; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965/skl: Upload qpitch in pixels for 1D textures
According to the bspec since Skylake the qpitch value in the surface formats should be measured in pixels rather than rows for 1D textures. --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index d6b870e..4132b9b 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -145,6 +145,24 @@ gen8_emit_buffer_surface_state(struct brw_context *brw, } } +static uint32_t +qpitch_for_mt(struct brw_context *brw, + struct intel_mipmap_tree *mt, + uint32_t surf_type) +{ + uint32_t qpitch = mt->qpitch; + + /* On Skylake+ for 1D linear surfaces the qpitch is measured in pixels +* rather than rows +*/ + if (brw->gen >= 9 && + surf_type == BRW_SURFACE_1D && + mt->tiling == I915_TILING_NONE) + qpitch *= mt->total_width; + + return qpitch; +} + static void gen8_update_texture_surface(struct gl_context *ctx, unsigned unit, @@ -197,7 +215,11 @@ gen8_update_texture_surface(struct gl_context *ctx, uint32_t *surf = allocate_surface_state(brw, surf_offset); - surf[0] = translate_tex_target(tObj->Target) << BRW_SURFACE_TYPE_SHIFT | + uint32_t surf_type = translate_tex_target(tObj->Target); + + uint32_t qpitch = qpitch_for_mt(brw, mt, surf_type); + + surf[0] = surf_type << BRW_SURFACE_TYPE_SHIFT | tex_format << BRW_SURFACE_FORMAT_SHIFT | vertical_alignment(mt) | horizontal_alignment(mt) | @@ -211,7 +233,7 @@ gen8_update_texture_surface(struct gl_context *ctx, if (mt->logical_depth0 > 1 && tObj->Target != GL_TEXTURE_3D) surf[0] |= GEN8_SURFACE_IS_ARRAY; - surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | mt->qpitch >> 2; + surf[1] = SET_FIELD(mocs_wb, GEN8_SURFACE_MOCS) | qpitch >> 2; surf[2] = SET_FIELD(mt->logical_width0 - 1, GEN7_SURFACE_WIDTH) | SET_FIELD(mt->logical_height0 - 1, GEN7_SURFACE_HEIGHT); @@ -390,6 +412,8 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, uint32_t *surf = allocate_surface_state(brw, &brw->wm.base.surf_offset[surf_index]); + uint32_t qpitch = qpitch_for_mt(brw, mt, surf_type); + surf[0] = (surf_type << BRW_SURFACE_TYPE_SHIFT) | (is_array ? GEN7_SURFACE_IS_ARRAY : 0) | (format << BRW_SURFACE_FORMAT_SHIFT) | @@ -397,7 +421,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, horizontal_alignment(mt) | surface_tiling_mode(tiling); - surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2; + surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | qpitch >> 2; surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) | SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/skl: Upload qpitch in pixels for 1D textures
Ben Widawsky writes: > I promise to look at this again in more detail tomorrow when I am more > awake, but meanwhile, I'd be very much in favor of just setting > mt->qpith for all miptree layout types. I'm not sure what you mean. Do you mean that we should just always set mt->qpitch to the actual value that we're going to upload? Yes, that seems like a good idea. I was originally worried about that because I had thought that other bits of code were looking at the qpitch to work out some offsets but that is not the case. The qpitch seems to also be broken for 3D textures and compressed textures so I think it might be a good idea to squash this patch into a general patch which fixes everything. I made a start on that as part of fixing the copyteximage test for 3D textures here: https://github.com/bpeel/mesa/commit/fc95546c49f0e5c5805139712c51a0d13b2183c3 I don't want to post the patch to the mailing list yet because it doesn't fix the test for the GL_RGBA32 format with 3D textures and I'd like to work out why first. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/skl: Layout a 1D miptree horizontally
Ian Romanick writes: > There aren't any compressed formats that support 1D textures, so I > don't think this can occur. Does the bspec say anything about > compressed 1D textures? Ah yes, you're right. I just copied it from brw_miptree_layout_2d without really thinking it through. The SKL bspec explicitly states that compressed formats can't be used with SURFTYPE_1D. I'll remove and send a v2 along with the patches to fix 3D textures too. Thanks for the review. - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] i965/skl: Texture layout fixes
Here is a v2 of my patch series to fix 1D textures on Skylake. It's now turned into just some general fixes and also helps with 3D textures. It fixes 110 piglit tests but sadly seems to cause 3 regressions. It might not be worth landing until I can work out what the regressions are so I guess I'm just posting it now in case anyone's interested to look at it anyway. I've uploaded the piglit html summary of the changes here: http://busydoingnothing.co.uk/skl-qpitch-patches/changes.html - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965: Don't force x-tiling for 16-bpp formats on Gen>7
Sandybridge doesn't support y-tiling for surface formats with 16 or more bpp. There was previously an override to explicitly allow this for Gen7. However, this restriction is also removed in Gen8+ so we should use y-tiling there too. This is important to do for Skylake which doesn't support x-tiling for 3D surfaces. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 0e3888f..994670a 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -516,9 +516,9 @@ intel_miptree_choose_tiling(struct brw_context *brw, * "NOTE: 128BPE Format Color Buffer ( render target ) MUST be either TileX * or Linear." * 128 bits per pixel translates to 16 bytes per pixel. This is necessary -* all the way back to 965, but is explicitly permitted on Gen7. +* all the way back to 965, but is explicitly permitted on Gen7+. */ - if (brw->gen != 7 && mt->cpp >= 16) + if (brw->gen < 7 && mt->cpp >= 16) return I915_TILING_X; /* From the Ivy Bridge PRM, Vol4 Part1 2.12.2.1 (SURFACE_STATE for most -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] i965/skl: Don't use ALL_SLICES_AT_EACH_LOD
The render surface state command for Skylake doesn't have the surface array spacing bit so I don't think it's possible to select this layout. This avoids a kernel panic when running the piglit test below: ext_framebuffer_multisample-no-color 8 stencil single However the test still fails so there may be something else wrong as well. The test was not causing a kernel panic before the patch to fix the qpitch. I think it's also not possible to select this layout on Gen8 so it may need to be changed to only be used on Gen7. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 994670a..018e16b 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -371,19 +371,25 @@ intel_miptree_create_layout(struct brw_context *brw, } } - /* Set array_layout to ALL_SLICES_AT_EACH_LOD when gen7+ array_spacing_lod0 -* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces. + /* Set array_layout to ALL_SLICES_AT_EACH_LOD when array_spacing_lod0 +* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces +* on Gen 7 and 8. * TODO: can we use it elsewhere? +* TODO: does this actually work on Gen 8? */ - switch (mt->msaa_layout) { - case INTEL_MSAA_LAYOUT_NONE: - case INTEL_MSAA_LAYOUT_IMS: + if (brw->gen >= 9) { mt->array_layout = ALL_LOD_IN_EACH_SLICE; - break; - case INTEL_MSAA_LAYOUT_UMS: - case INTEL_MSAA_LAYOUT_CMS: - mt->array_layout = ALL_SLICES_AT_EACH_LOD; - break; + } else { + switch (mt->msaa_layout) { + case INTEL_MSAA_LAYOUT_NONE: + case INTEL_MSAA_LAYOUT_IMS: + mt->array_layout = ALL_LOD_IN_EACH_SLICE; + break; + case INTEL_MSAA_LAYOUT_UMS: + case INTEL_MSAA_LAYOUT_CMS: + mt->array_layout = ALL_SLICES_AT_EACH_LOD; + break; + } } if (target == GL_TEXTURE_CUBE_MAP) { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] i965/skl: Align compressed textures to four times the block size
On Skylake it is possible to choose your own alignment values for compressed textures but they are expressed as a multiple of the block size. The minimum alignment value we can use is 4 so we effectively have to align to 4 times the block size. This patch makes it initially set mt->align_[wh] to the large alignment value and then later divides it by the block size so that it can be uploaded as part of the surface state. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 31 ++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index d89a04e..1fe2c26 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -73,7 +73,16 @@ intel_horizontal_texture_alignment_unit(struct brw_context *brw, */ unsigned int i, j; _mesa_get_format_block_size(format, &i, &j); - return i; + + /* On Gen9+ we can pick our own alignment for compressed textures but it + * has to be a multiple of the block size. The minimum alignment we can + * pick is 4 so we effectively have to align to 4 times the block + * size + */ + if (brw->gen >= 9) + return i * 4; + else + return i; } if (format == MESA_FORMAT_S_UINT8) @@ -113,7 +122,8 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, * the SURFACE_STATE "Surface Vertical Alignment" field. */ if (_mesa_is_format_compressed(format)) - return 4; + /* See comment above for the horizontal alignment */ + return brw->gen >= 9 ? 16 : 4; if (format == MESA_FORMAT_S_UINT8) return brw->gen >= 7 ? 8 : 4; @@ -195,6 +205,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) unsigned width = mt->physical_width0; unsigned height = mt->physical_height0; unsigned depth = mt->physical_depth0; /* number of array layers. */ + unsigned int bw, bh; + + _mesa_get_format_block_size(mt->format, &bw, &bh); mt->total_width = mt->physical_width0; @@ -212,7 +225,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) if (mt->compressed) { mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + - ALIGN(minify(mt->physical_width0, 2), mt->align_w); + ALIGN(minify(mt->physical_width0, 2), bw); } else { mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + minify(mt->physical_width0, 2); @@ -232,7 +245,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) img_height = ALIGN(height, mt->align_h); if (mt->compressed) -img_height /= mt->align_h; +img_height /= bh; if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) { /* Compact arrays with separated miplevels */ @@ -481,5 +494,15 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) } DBG("%s: %dx%dx%d\n", __FUNCTION__, mt->total_width, mt->total_height, mt->cpp); + + /* On Gen9+ the alignment values are expressed in multiples of the block +* size +*/ + if (brw->gen >= 9) { + unsigned int i, j; + _mesa_get_format_block_size(mt->format, &i, &j); + mt->align_w /= i; + mt->align_h /= j; + } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] i965/skl: Layout 3D textures the same as array textures
On Gen9+ the 3D textures use the same mipmap layout as 2D array textures. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 0e2841f..57922e9 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -224,6 +224,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) width = minify(width, 1); height = minify(height, 1); + + if (mt->target == GL_TEXTURE_3D) + depth = minify(depth, 1); } } @@ -263,7 +266,7 @@ brw_miptree_layout_texture_array(struct brw_context *brw, if (mt->compressed) img_height /= mt->align_h; - for (int q = 0; q < mt->physical_depth0; q++) { + for (int q = 0; q < mt->level[level].depth; q++) { if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) { intel_miptree_set_image_offset(mt, level, q, 0, q * img_height); } else { @@ -368,7 +371,10 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) break; case GL_TEXTURE_3D: - brw_miptree_layout_texture_3d(brw, mt); + if (brw->gen >= 9) + brw_miptree_layout_texture_array(brw, mt); + else + brw_miptree_layout_texture_3d(brw, mt); break; case GL_TEXTURE_1D_ARRAY: -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] i965/skl: Layout a 1D miptree horizontally
On Gen9+ the 1D miptree is laid out with all of the mipmap levels in a horizontal line. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 62 +- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 57922e9..851742b 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -158,6 +158,36 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, } static void +gen9_miptree_layout_1d(struct intel_mipmap_tree *mt) +{ + unsigned x = 0; + unsigned width = mt->physical_width0; + unsigned depth = mt->physical_depth0; /* number of array layers. */ + + /* When this layout is used the horizontal alignment is fixed at 64 and the +* hardware ignores the value given in the surface state +*/ + const unsigned int align_w = 64; + + mt->total_height = mt->physical_height0; + mt->total_width = 0; + + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { + unsigned img_width; + + intel_miptree_set_level_info(mt, level, x, 0, depth); + + img_width = ALIGN(width, align_w); + + mt->total_width = MAX2(mt->total_width, x + img_width); + + x += img_width; + + width = minify(width, 1); + } +} + +static void brw_miptree_layout_2d(struct intel_mipmap_tree *mt) { unsigned x = 0; @@ -242,12 +272,34 @@ align_cube(struct intel_mipmap_tree *mt) mt->total_height += 2; } +static bool +use_linear_1d_layout(struct brw_context *brw, + struct intel_mipmap_tree *mt) +{ + /* On Gen9+ the mipmap levels of a 1D surface are all laid out in a +* horizontal line. This isn't done for depth/stencil buffers however +* because those will be using a tiled layout +*/ + if (brw->gen >= 9 && + (mt->target == GL_TEXTURE_1D || +mt->target == GL_TEXTURE_1D_ARRAY)) { + GLenum base_format = _mesa_get_format_base_format(mt->format); + + if (base_format != GL_DEPTH_COMPONENT && + base_format != GL_DEPTH_STENCIL) + return true; + } + + return false; +} + static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { int h0, h1; unsigned height = mt->physical_height0; + bool layout_1d = use_linear_1d_layout(brw, mt); h0 = ALIGN(mt->physical_height0, mt->align_h); h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); @@ -258,7 +310,10 @@ brw_miptree_layout_texture_array(struct brw_context *brw, int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; - brw_miptree_layout_2d(mt); + if (layout_1d) + gen9_miptree_layout_1d(mt); + else + brw_miptree_layout_2d(mt); for (unsigned level = mt->first_level; level <= mt->last_level; level++) { unsigned img_height; @@ -392,7 +447,10 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) break; case INTEL_MSAA_LAYOUT_NONE: case INTEL_MSAA_LAYOUT_IMS: - brw_miptree_layout_2d(mt); + if (use_linear_1d_layout(brw, mt)) +gen9_miptree_layout_1d(mt); + else +brw_miptree_layout_2d(mt); break; } break; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] i965/skl: Fix the qpitch value
On Skylake the qpitch value is uploaded as part of the surface state so we don't need to add the extra rows that are done for other generations. However for 3D textures it needs to be aligned to the tile height. Unlike previous generations the qpitch is measured as a multiple of the block size for compressed surfaces. When the horizontal mipmap layout is used for 1D textures then the qpitch is measured in pixels instead of rows. --- src/mesa/drivers/dri/i965/brw_tex_layout.c| 44 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 -- 2 files changed, 41 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 851742b..d89a04e 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -297,24 +297,48 @@ static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { - int h0, h1; unsigned height = mt->physical_height0; bool layout_1d = use_linear_1d_layout(brw, mt); - - h0 = ALIGN(mt->physical_height0, mt->align_h); - h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); - if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) - mt->qpitch = h0; - else - mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h); - - int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; + int physical_qpitch; if (layout_1d) gen9_miptree_layout_1d(mt); else brw_miptree_layout_2d(mt); + if (layout_1d) { + physical_qpitch = mt->align_h; + /* When using the horizontal layout the qpitch is measured in pixels */ + mt->qpitch = physical_qpitch * mt->total_width; + } else if (brw->gen >= 9) { + /* On Gen9 we can pick whatever qpitch we like as long as it's aligned + * to the vertical alignment so we don't need to add any extra rows. + */ + mt->qpitch = mt->total_height; + + /* However 3D textures need to be aligned to the tile height. At this + * point we don't know which tiling will be used so let's just align it + * to 32 + */ + if (mt->target == GL_TEXTURE_3D) + mt->qpitch = ALIGN(mt->qpitch, 32); + + /* Unlike previous generations the qpitch is now a multiple of the + * compressed block size so physical_qpitch matches mt->qpitch. + */ + physical_qpitch = mt->qpitch; + } else { + int h0 = ALIGN(mt->physical_height0, mt->align_h); + int h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); + + if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) + mt->qpitch = h0; + else + mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h); + + physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; + } + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { unsigned img_height; img_height = ALIGN(height, mt->align_h); diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index ee9cf1e..247e5b8 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -380,10 +380,14 @@ struct intel_mipmap_tree enum miptree_array_layout array_layout; /** -* The distance in rows between array slices in an uncompressed surface. +* The distance in between array slices. * -* For compressed surfaces, slices are stored closer together physically; -* the real distance is (qpitch / block height). +* The value is the one that is sent in the surface state. The actual +* meaning depends on certain criteria. Usually it is simply the number of +* uncompressed rows between each slice. However on Gen9+ for compressed +* surfaces it is the number of blocks. For 1D surfaces that have the +* mipmap tree stored horizontally it is the number of pixels between each +* slice. */ uint32_t qpitch; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6 v2] i965/skl: Fix the qpitch value
On Skylake the qpitch value is uploaded as part of the surface state so we don't need to add the extra rows that are done for other generations. However for 3D textures it needs to be aligned to the tile height and for depth/stencil textures it needs to be a multiple of 8. Unlike previous generations the qpitch is measured as a multiple of the block size for compressed surfaces. When the horizontal mipmap layout is used for 1D textures then the qpitch is measured in pixels instead of rows. v2: Align the depth/stencil textures to a multiple of 8 --- src/mesa/drivers/dri/i965/brw_tex_layout.c| 52 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 -- 2 files changed, 49 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 851742b..834d4ec 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -297,24 +297,56 @@ static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { - int h0, h1; unsigned height = mt->physical_height0; bool layout_1d = use_linear_1d_layout(brw, mt); - - h0 = ALIGN(mt->physical_height0, mt->align_h); - h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); - if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) - mt->qpitch = h0; - else - mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h); - - int physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; + int physical_qpitch; if (layout_1d) gen9_miptree_layout_1d(mt); else brw_miptree_layout_2d(mt); + if (layout_1d) { + physical_qpitch = mt->align_h; + /* When using the horizontal layout the qpitch is measured in pixels */ + mt->qpitch = physical_qpitch * mt->total_width; + } else if (brw->gen >= 9) { + GLenum base_format; + + /* On Gen9 we can pick whatever qpitch we like as long as it's aligned + * to the vertical alignment so we don't need to add any extra rows. + */ + mt->qpitch = mt->total_height; + + /* If the surface might be used as a stencil buffer or HiZ buffer then + * it needs to be a multiple of 8. + */ + base_format = _mesa_get_format_base_format(mt->format); + if (_mesa_is_depth_or_stencil_format(base_format)) + mt->qpitch = ALIGN(mt->qpitch, 8); + + /* 3D textures need to be aligned to the tile height. At this point we + * don't know which tiling will be used so let's just align it to 32 + */ + if (mt->target == GL_TEXTURE_3D) + mt->qpitch = ALIGN(mt->qpitch, 32); + + /* Unlike previous generations the qpitch is now a multiple of the + * compressed block size so physical_qpitch matches mt->qpitch. + */ + physical_qpitch = mt->qpitch; + } else { + int h0 = ALIGN(mt->physical_height0, mt->align_h); + int h1 = ALIGN(minify(mt->physical_height0, 1), mt->align_h); + + if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) + mt->qpitch = h0; + else + mt->qpitch = (h0 + h1 + (brw->gen >= 7 ? 12 : 11) * mt->align_h); + + physical_qpitch = mt->compressed ? mt->qpitch / 4 : mt->qpitch; + } + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { unsigned img_height; img_height = ALIGN(height, mt->align_h); diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index ee9cf1e..247e5b8 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -380,10 +380,14 @@ struct intel_mipmap_tree enum miptree_array_layout array_layout; /** -* The distance in rows between array slices in an uncompressed surface. +* The distance in between array slices. * -* For compressed surfaces, slices are stored closer together physically; -* the real distance is (qpitch / block height). +* The value is the one that is sent in the surface state. The actual +* meaning depends on certain criteria. Usually it is simply the number of +* uncompressed rows between each slice. However on Gen9+ for compressed +* surfaces it is the number of blocks. For 1D surfaces that have the +* mipmap tree stored horizontally it is the number of pixels between each +* slice. */ uint32_t qpitch; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/14] meta: Create temporary pbo in _mesa_meta_pbo_GetTexSubImage()
Anuj Phogat writes: > using a flag passed in as function parameter. This will enable > _mesa_meta_pbo_GetTexSubImage to be used for reading in to > non-pbo buffers. > > This will be useful to support reading from YF/YS tiled surfaces > in Skylake. > > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/common/meta.h | 1 + > src/mesa/drivers/common/meta_tex_subimage.c | 18 -- > src/mesa/drivers/dri/i965/intel_pixel_read.c | 9 - > src/mesa/drivers/dri/i965/intel_tex_image.c | 3 ++- > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h > index e7d894d..3de0d87 100644 > --- a/src/mesa/drivers/common/meta.h > +++ b/src/mesa/drivers/common/meta.h > @@ -542,6 +542,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, >int xoffset, int yoffset, int zoffset, >int width, int height, int depth, >GLenum format, GLenum type, const void *pixels, > + bool create_pbo, >const struct gl_pixelstore_attrib *packing); > > extern void > diff --git a/src/mesa/drivers/common/meta_tex_subimage.c > b/src/mesa/drivers/common/meta_tex_subimage.c > index 68c8273..cd87a72 100644 > --- a/src/mesa/drivers/common/meta_tex_subimage.c > +++ b/src/mesa/drivers/common/meta_tex_subimage.c > @@ -34,6 +34,7 @@ > #include "macros.h" > #include "meta.h" > #include "pbo.h" > +#include "readpix.h" > #include "shaderapi.h" > #include "state.h" > #include "teximage.h" > @@ -246,6 +247,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, >int xoffset, int yoffset, int zoffset, >int width, int height, int depth, >GLenum format, GLenum type, const void *pixels, > + bool create_pbo, >const struct gl_pixelstore_attrib *packing) > { > GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 }; > @@ -257,7 +259,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, > /* XXX: This should probably be passed in from somewhere */ > const char *where = "_mesa_meta_pbo_GetTexSubImage"; > > - if (!_mesa_is_bufferobj(packing->BufferObj)) > + if (!_mesa_is_bufferobj(packing->BufferObj) && !create_pbo) >return false; > > if (format == GL_DEPTH_COMPONENT || > @@ -282,7 +284,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, >return true; > } > > - pbo_tex_image = create_texture_for_pbo(ctx, false, GL_PIXEL_PACK_BUFFER, > + pbo_tex_image = create_texture_for_pbo(ctx, create_pbo, > GL_PIXEL_PACK_BUFFER, >width, height, depth, >format, type, pixels, packing, >&pbo, &pbo_tex); > @@ -348,6 +350,18 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, > GL_COLOR_BUFFER_BIT, GL_NEAREST); > } > > + /* If we created a temporary pbo in meta to read the pixel data, that > +* data will now get copied to memory allocated by client. > +*/ > + if (create_pbo) { I think this should be changed to something like the below: if (create_pbo && !_mesa_is_bufferobj(packing->BufferObj)) It looks like the meaning of create_pbo in create_texture_for_pbo is ‘create pbo unless there is already a pbo’. With this patch _mesa_meta_pbo_GetTexSubImage seems to interpret it to mean that it will always create the pbo and if there already was a pbo it would end up clobbering the state. > + /* Unbind the pbo from pack binding. */ > + _mesa_BindBuffer(GL_PIXEL_PACK_BUFFER, 0); I don't think this unbind is necessary. create_texture_for_pbo is careful not to modify the PBO binding state and once that function is finished the PBO is only accessed via the texture so we shouldn't need to touch the PBO binding. > + _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[1]); > + _mesa_update_state(ctx); > + _mesa_readpixels(ctx, 0, 0, width, height, format, type, > + packing, (void *) pixels); Doesn't this only read the last slice of the texture? Maybe this patch should wait until this patch from Laura Ekstrand is landed: http://lists.freedesktop.org/archives/mesa-dev/2015-February/077487.html Regards, - Neil pgpaogQNb9zCf.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/14] i965: Make a function to check the conditions to use the blitter
Anuj Phogat writes: > No functional changes in the patch. Just makes the code look cleaner. > > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 40 > +++ > 1 file changed, 29 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index abf20c9..147097b 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -2305,6 +2305,34 @@ can_blit_slice(struct intel_mipmap_tree *mt, > return true; > } > > +static bool > +use_intel_mipree_map_blit(struct brw_context *brw, > + struct intel_mipmap_tree *mt, > + GLbitfield mode, > + unsigned int level, > + unsigned int slice) > +{ > + if (brw->has_llc && > + /* It's probably not worth swapping to the blit ring because of > + * all the overhead involved. > + */ > + !(mode & GL_MAP_WRITE_BIT) && > + !mt->compressed && > + (mt->tiling == I915_TILING_X || > + /* Prior to Sandybridge, the blitter can't handle Y tiling */ > + (brw->gen >= 6 && mt->tiling == I915_TILING_Y)) && This line used to have an extra space of indentation in the old version which made it clearer that it is inside the brackets above. I think the space should be added back here. The patch is an improvement so I think it's worth landing but I wonder if now that's in a separate function it might be worth tidying it up a bit to make the logic easier to follow. Either way, if the space is added back then it is Reviewed-by: Neil Roberts Regards, - Neil pgpJNvsWcsFNX.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/14] meta: Return if pixels == null and no pixel unpack buffer set
It seems like it would be better to fold this into patch 4 because it is fixing a problem that is effectively introduced by that patch, isn't it? Regards, - Neil pgphPhjb7Gl0I.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/14] meta: Remove duplicate power of two samples check
All of these patches except 4, 10 and 12 look good to me and are: Reviewed-by: Neil Roberts However, I'm still relatively new to this so maybe my review should be taken with a pinch of salt. I've replied with comments for the three patches mentioned. Regards, - Neil pgpW3_ZfUlLbu.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] i965/skl: Layout 3D textures the same as array textures
Ben Widawsky writes: >> + if (mt->target == GL_TEXTURE_3D) >> + depth = minify(depth, 1); >> } >> } > > assert(brw->gen >= 9)? (up to you, I'm very assert happy) Yes, that seems like a good idea. >> >> @@ -263,7 +266,7 @@ brw_miptree_layout_texture_array(struct brw_context *brw, >>if (mt->compressed) >> img_height /= mt->align_h; >> >> - for (int q = 0; q < mt->physical_depth0; q++) { >> + for (int q = 0; q < mt->level[level].depth; q++) { >> if (mt->array_layout == ALL_SLICES_AT_EACH_LOD) { >> intel_miptree_set_image_offset(mt, level, q, 0, q * img_height); >> } else { > > Can you remind me how you can have different depths per level? Assuming > they're > the same, I prefer this hunk removed because it's much clearer the old way in > my > opinion. With 3D textures, unlike array textures, the depth is halved at each mipmap level as well as the width and height so each mipmap level has a different depth. I'm attaching an SVG which might make it clearer. I guess that means that using the same layout for array textures and 3D textures means that 3D textures will now have more unused space, but that is what the hardware now wants so that's the way it has to be. I originally didn't make this change and I thought maybe you could get away with allocating the depth to be the same for each level even though some of them won't be used but some later code gets very confused and asserts. Thanks for the review. Regards, - Neil pgpxxtknrqENs.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] i965/skl: Layout a 1D miptree horizontally
Ben Widawsky writes: > Okay, I'm guilty of a bikeshed here, but doesn't this look cleaner if you just > do two for loops? One for depth, and one for levels. > > something like... > const unsigned depth = mt->physical_depth0; > for (i = 0; i < depth; i++) { > width = mt->physical_width0; > for (unsigned level = mt->first_level; level <= mt->last_level; > level++) { > intel_miptree_set_level_info(mt, level, x, 0, depth); > mt->total_width += ALIGN(width, 64); > width = minify(width, 1); > } > } I'm not sure what you mean. This function is just laying out a single array slice so the depth isn't iterated. The other array slices are laid out in brw_miptree_layout_texture_array by copying the offsets that this function generates. It is done the same way for 2D textures. It might look a bit confusing because the last three parameters of intel_miptree_set_level_info look like x-offset, y-offset and z-offset but it's actually x-offset, y-offset and depth of the entire level. The depth is the same for each level. I guess I could comment the arguments but the same problem exists in brw_miptree_layout_2d. Regards, - Neil pgpSYPbhBmvkc.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6 v2] i965/skl: Fix the qpitch value
Ben Widawsky writes: >> + if (layout_1d) { >> + physical_qpitch = mt->align_h; >> + /* When using the horizontal layout the qpitch is measured in pixels >> */ > > I think the docs words are a bit clearer: > "Surface QPitch specifies the distance in pixels between array > slices." Ok, yeah, I'll change it to that. >> + mt->qpitch = physical_qpitch * mt->total_width; > > talking to myself - this seems more complicated than it ought to be. Maybe it would be less confusing if I changed it to mt->align_h * mt->total_width? > This patch is a bit trickier than the previous, and I got lazy with fact > checking in bspec, but it lgtm: > Reviewed-by: Ben Widawsky Thanks again for the reviews. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meta: In pbo_{Get, }TexSubImage don't repeatedly rebind the source tex
A layered PBO image is now interpreted as a single tall 2D image so the z argument in _mesa_meta_bind_fbo_image is ignored. Therefore this was just redundantly rebinding the same image repeatedly. --- src/mesa/drivers/common/meta_tex_subimage.c | 4 1 file changed, 4 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index ffbbacd..407940a 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -217,8 +217,6 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, height : depth; for (z = 1; z < iters; z++) { - _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, -pbo_tex_image, z); _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); @@ -350,8 +348,6 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, for (z = 1; z < iters; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); - _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, -pbo_tex_image, z); _mesa_update_state(ctx); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] common: Fix PBOs for 1D_ARRAY.
Sorry for the late review. Can you explain what this patch does? The previous code was doing a blit like this: _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, 0, z * height, width, (z + 1) * height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); However, it was first setting the height to 1 so that would end up like this: _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, 0, z * 1, width, (z + 1) * 1, xoffset, yoffset, xoffset + width, yoffset + 1, GL_COLOR_BUFFER_BIT, GL_NEAREST); The patch makes it do this: _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, 0, z, width, z + 1, xoffset, yoffset, xoffset + width, yoffset + 1, GL_COLOR_BUFFER_BIT, GL_NEAREST); This looks like it would have exactly the same result. The patch doesn't modify the blit call for slice 0 which looks wrong to me. Neither the version with or without the patch appears to handle the yoffset correctly because for the 1D array case this needs to be interpreted as a slice offset. I'm attaching a patch which I think fixes it, although I haven't managed to test it with the arb_direct_state_access/gettextureimage-targets test so I don't know if I'm misunderstanding something. It is also not complete because it doesn't touch GetTexSubImage. I have tested it with the texsubimage test which does use the yoffset, but in order to use that test the code path first needs to be able to accept the IMAGE_HEIGHT packing option which I will also post a patch for. Regards, - Neil --- >8 --- (use git am --scissors to automatically chop here) Subject: meta: Fix the y offset for 1D_ARRAY in _mesa_meta_pbo_TexSubImage This partially reverts 546aba143d13ba3f99 and brings back the if statement to move the height over to the depth. However it additionally moves the yoffset to the zoffset. This fixes texsubimage array pbo for 1D_ARRAY textures. The previous fix in 546aba143 wasn't taking into account the yoffset correctly because it needs to be used to alter the selected layer. --- src/mesa/drivers/common/meta_tex_subimage.c | 36 ++--- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 3965d31..be89102 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -138,7 +138,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; - int z, iters; + int z; /* XXX: This should probably be passed in from somewhere */ const char *where = "_mesa_meta_pbo_TexSubImage"; @@ -193,6 +193,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]); _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbos[1]); + if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) { + assert(depth == 1); + assert(zoffset == 0); + depth = height; + height = 1; + image_height = 1; + zoffset = yoffset; + yoffset = 0; + } + _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, pbo_tex_image, 0); /* If this passes on the first layer it should pass on the others */ @@ -216,28 +226,18 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; - iters = tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY ? - height : depth; - - for (z = 1; z < iters; z++) { + for (z = 1; z < depth; z++) { _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); _mesa_update_state(ctx); - if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) - _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -0, z, width, z + 1, -xoffset, yoffset, -xoffset + width, yoffset + 1, -GL_COLOR_BUFFER_BIT, GL_NEAREST); - else - _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -0, z * image_height, -width, z * image_height + height, -xoffset, yo
[Mesa-dev] [PATCH] meta: Allow GL_UNPACK_IMAGE_HEIGHT in _mesa_meta_pbo_TexSubImage
Now that a layered source PBO is interpreted as a single tall 2D image it's quite easy to accept the image height packing option by just creating an image that is tall enough to include the image padding. This is tested by the texsubimage Piglit test with the array and pbo arguments. Previously this test was skipping this code path because it always sets the image height. --- src/mesa/drivers/common/meta_tex_subimage.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 407940a..3965d31 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -134,6 +134,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, const struct gl_pixelstore_attrib *packing) { GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 }; + int full_height, image_height; struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; @@ -167,14 +168,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, return true; } - /* Only accept tightly packed pixels from the user. */ - if (packing->ImageHeight != 0 && packing->ImageHeight != height) - return false; + /* For arrays, use a tall (height * depth) 2D texture but taking into +* account the inter-image padding specified with the image height packing +* property. +*/ + image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight; + full_height = image_height * (depth - 1) + height; - /* For arrays, use a tall (height * depth) 2D texture. */ pbo_tex_image = create_texture_for_pbo(ctx, create_pbo, GL_PIXEL_UNPACK_BUFFER, - width, height * depth, + width, full_height, format, type, pixels, packing, &pbo, &pbo_tex); if (!pbo_tex_image) @@ -230,7 +233,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST); else _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -0, z * height, width, (z + 1) * height, +0, z * image_height, +width, z * image_height + height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Should GL_UN/PACK_IMAGE_HEIGHT affect GL_TEXTURE_1D_ARRAY?
A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * + * Test that GL_PACK_IMAGE_HEIGHT and GL_UNPACK_IMAGE_HEIGHT affect + * retrieving a GL_TEXTURE_1D_ARRAY. Setting the image height should + * cause gaps to be introduced between the rows. + * + * Author: Neil Roberts + */ + +#include "piglit-util-gl.h" + +PIGLIT_GL_TEST_CONFIG_BEGIN + + config.supports_gl_compat_version = 11; + +PIGLIT_GL_TEST_CONFIG_END + +enum piglit_result +piglit_display(void) +{ + /* Not used */ + return PIGLIT_FAIL; +} + +static bool +test_tex_image(bool use_pbo) +{ + /* 1x2 RGBA image, but with an extra row that should be +* skipped if image height is working correctly */ + GLubyte src_tex_data[] = { + 0xff, 0x00, 0x00, 0xff, + 0x00, 0xff, 0x00, 0xff, /* padding row */ + 0x00, 0x00, 0xff, 0xff, + 0x24, 0x24, 0x24, 0xff, /* padding row */ + }; + /* Space for a 1x2 RGBA image */ + GLubyte dst_tex_data[4 * 2]; + GLuint tex; + bool pass = true; + GLuint pbo = 0; + + glGenTextures(1, &tex); + glBindTexture(GL_TEXTURE_1D_ARRAY, tex); + + if (use_pbo) { + glGenBuffers(1, &pbo); + glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo); + glBufferData(GL_PIXEL_UNPACK_BUFFER, +sizeof(src_tex_data), +src_tex_data, +GL_STATIC_DRAW); + } + + /* Set the image height. This should cause the second row of +* the source data to be skipped */ + glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, 2); + glTexImage2D(GL_TEXTURE_1D_ARRAY, +0, /* level */ +GL_RGBA, /* internal format */ +1, 2, /* width/height */ +0, /* border */ +GL_RGBA, /* format */ +GL_UNSIGNED_BYTE, +use_pbo ? NULL : src_tex_data); + glPixelStorei(GL_UNPACK_IMAGE_HEIGHT, 0); + + if (use_pbo) { + glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); + glDeleteBuffers(1, &pbo); + } + + memset(dst_tex_data, 0x42, sizeof(dst_tex_data)); + + glGetTexImage(GL_TEXTURE_1D_ARRAY, + 0, /* level */ + GL_RGBA, GL_UNSIGNED_BYTE, + dst_tex_data); + + if (memcmp(dst_tex_data + 4, src_tex_data + 4 * 2, 4) == 0) { + printf("GL_UNPACK_IMAGE_HEIGHT *does* affect " + "glTexImage2D(GL_TEXTURE_1D_ARRAY)\n"); + } else if (memcmp(dst_tex_data + 4, src_tex_data + 4, 4) == 0) { + printf("GL_UNPACK_IMAGE_HEIGHT *does not* affect " + "glTexImage2D(GL_TEXTURE_1D_ARRAY)\n"); + pass = false; + } else { + printf("Unexpected values read from glGetTexImage\n"); + pass = false; + } + + return pass; +} + +static bool +test_get_tex_image(bool use_pbo) +{ + /* 1x2 RGBA image */ + GLubyte src_tex_data[] = { + 0xff, 0x00, 0x00, 0xff, + 0x00, 0xff, 0x00, 0xff + }; + /* Enough space for image padding in the destination texture */ + GLubyte dst_tex_data[4 * 4]; + GLuint tex; + bool pass = true; + GLuint pbo = 0; + + glGenTextures(1, &tex); + glBindTexture(GL_TEXTURE_1D_ARRAY, tex); + glTexImage2D(GL_TEXTURE_1D_ARRAY, +0, /* level */ +GL_RGBA, /* internal format */ +1, 2, /* width/height */ +0, /* border */ +GL_RGBA, /* format */ +GL_UNSIGNED_BYTE, +src_tex_data); + + memset(dst_tex_data, 0x42, sizeof(dst_tex_data)); + + if (use_pbo) { + glGenBuffers(1, &pbo); + glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo); + glBufferData(GL_PIXEL_PACK_BUFFER, +sizeof(dst_tex_data), +NULL, +GL_STREAM_READ); + } + + /* Set the image height. This should make the rows be padded out */ + glPixelStorei(GL_PACK_IMAGE_HEIGHT, 2); + glGetTexImage(GL_TEXTURE_1D_ARRAY, + 0, /* level */ + GL_RGBA, GL_UNSIGNED_BYTE, + use_pbo ? NULL : dst_tex_data); + glPixelStorei(GL_PACK_IMAGE_HEIGHT, 0); + + if (use_pbo) { + glGetBufferSubData(GL_PIXEL_PACK_BUFFER, + 0, /*
Re: [Mesa-dev] Should GL_UN/PACK_IMAGE_HEIGHT affect GL_TEXTURE_1D_ARRAY?
Ian Romanick writes: > IMAGE_HEIGHT is definitely used for glGetTexImage because that and > IMAGE_WIDTH allow you to get a subimage (before ARB_get_texture_sub_image). > > I guess on glTexImage2D those could let you initialize a subregion of a > larger texture... kind of like doing glTexImage2D(..., NULL) followed by > glTexSubImage2D. It does seem pretty weird. No, that's not right. The packing properties are used to describe the layout of the client-side image, not the texture, so they aren't a replacement for glGetTexSubImage or glTexSubImage?D. You can use them to download to or upload from a sub-region of a larger image in your client-side malloc'd buffer, but not a subregion of the texture. I don't think there's any question whether the IMAGE_HEIGHT property should affect glGetTexImage or glTexImage3D in general but the question I'm asking is specifically whether it should affect 1D array textures. The IMAGE_HEIGHT property is normally only used to help specify a subregion of a 3D image. Arguably you could say that a 1D array texture is like a 3D image with a width and depth but the height fixed at 1, however GL doesn't seem to treat it like that because you have to use glTexImage2D to upload it. It seems more like it is treating it as a 2D image and in that case IMAGE_HEIGHT doesn't make any sense because it is only used to create padding between the multiple images of a 3D texture. You can already use the ROW_LENGTH property to describe the padding between the 1D images of your client-side representation of the 1D texture. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Fix the maximum thread count format for the PS
According to the bspec for some reason the format of the maximum number of threads field has changed from U8-2 to U8-1 for the PS. I've run this through Piglit and it doesn't cause any regressions. --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index d4a58e4..5f39e12 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -146,8 +146,13 @@ upload_ps_state(struct brw_context *brw) /* 3DSTATE_PS expects the number of threads per PSD, which is always 64; * it implicitly scales for different GT levels (which have some # of PSDs). +* +* In Gen8 the format is U8-2 whereas in Gen9 it is U8-1. */ - dw6 |= (64 - 2) << HSW_PS_MAX_THREADS_SHIFT; + if (brw->gen >= 9) + dw6 |= (64 - 1) << HSW_PS_MAX_THREADS_SHIFT; + else + dw6 |= (64 - 2) << HSW_PS_MAX_THREADS_SHIFT; if (prog_data->base.nr_params > 0) dw6 |= GEN7_PS_PUSH_CONSTANT_ENABLE; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Ignore the vertical alignment for the qpitch of 1D textures
The vertical alignment is ignored in the surface state for 1D array textures so we can tightly pack them. I've run this through Piglit and it doesn't cause any regressions. (This should probably be squashed into the patch entitled “i965/skl: Fix the qpitch value”) --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 15a0eca..2969db0 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -323,11 +323,11 @@ brw_miptree_layout_texture_array(struct brw_context *brw, brw_miptree_layout_2d(mt); if (layout_1d) { - physical_qpitch = mt->align_h; + physical_qpitch = 1; /* When using the horizontal layout the qpitch specifies the distance in * pixels between array slices. */ - mt->qpitch = physical_qpitch * mt->total_width; + mt->qpitch = mt->total_width; } else if (brw->gen >= 9) { GLenum base_format; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Ignore the vertical alignment for the qpitch of 1D textures
Ben Widawsky writes: > I guess I'd appreciate a comment about how the total_width is > guaranteed to be a multiple of 64, and therefore is a multiple of all > possible H_ALIGNS. This is required to meet the qpitch restraint in > the surface format, "This field must be set to an integer multiple of > the Surface Horizontal Alignment." My interpretation of that comment is that for 1D surfaces 64 *is* the horizontal alignment so other possible alignments don't matter. I think it's expected that total_width is aligned to the horizontal alignment, and in this case that is enforced in gen9_miptree_layout_1d. I will add a comment to make that clearer. > Note, I don't know anything about compressed textures and what the > block widths can be, but just doing the math, if block size > 16 and > not a multiple of 16, this constraint will not hold. Currently I think all of the block sizes are either 4x4 or 8x4. None of these would make any sense for a 1D texture and wouldn't make the texture any smaller. The bspec explicitly disallows compressed formats for 1D surfaces. If someone were to eventually invent a compressed format with a block height of 1 then I guess the block alignment would be handled in gen9_miptree_layout_1d in the unlikely event that the size isn't a factor of 64. Note that I'm putting off landing this patch and the other qpitch one until someone can review patch 6 in the series because without that patch the qpitch patch seems to cause a GPU hang which makes the kernel panic. Thanks for the review. Regards, - Neil pgp6HowmsfWV7.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta/TexSubImage: Stash everything other than PIXEL_TRANSFER/store in meta_begin
This looks good to me. It does seem a shame to save everything though when so little state affects glBlitFramebuffer. However I think even if we changed it to just save the scissor state we'd still have a problem with not saving the the GL_FRAMEBUFFER_SRGB state which I think we also need to do. I guess it's more robust to just save everything in case more state gets added one day. The meta stuff is really difficult because every time you add new state you really need to check every use of _mesa_meta_begin to decide whether that state should affect that operation or not and it's really easy to get it wrong. I think the patch could do with a small commit message mentioning that the scissor and SRGB states were broken. Either way, Reviewed-by: Neil Roberts Regards, - Neil Jason Ekstrand writes: > Cc: 10.5 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89292 > --- > src/mesa/drivers/common/meta_tex_subimage.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/common/meta_tex_subimage.c > b/src/mesa/drivers/common/meta_tex_subimage.c > index 5ae12c8..802430a 100644 > --- a/src/mesa/drivers/common/meta_tex_subimage.c > +++ b/src/mesa/drivers/common/meta_tex_subimage.c > @@ -192,8 +192,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint > dims, > if (allocate_storage) >ctx->Driver.AllocTextureImageBuffer(ctx, tex_image); > > - /* Only stash the current FBO */ > - _mesa_meta_begin(ctx, 0); > + _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER | > + MESA_META_PIXEL_STORE)); > > _mesa_GenFramebuffers(2, fbos); > _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]); > @@ -311,8 +311,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, > GLuint dims, > if (!pbo_tex_image) >return false; > > - /* Only stash the current FBO */ > - _mesa_meta_begin(ctx, 0); > + _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER | > + MESA_META_PIXEL_STORE)); > > _mesa_GenFramebuffers(2, fbos); > > -- > 2.3.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgp0l5y_C1LVa.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] main/base_tex_format: Properly handle STENCIL_INDEX1/4/16
Jason Ekstrand writes: > On Mon, Mar 2, 2015 at 11:33 AM, Ilia Mirkin wrote: > >> On Mon, Mar 2, 2015 at 2:32 PM, Jason Ekstrand >> wrote: >> > >> > >> > On Mon, Mar 2, 2015 at 11:18 AM, Ilia Mirkin >> wrote: >> >> >> >> Hmmm... I was just looking at this code in connection to attepmting to >> >> enable ARB_texture_stencil8, and it _seems_ like that should be if >> >> (ARB_texture_stencil8) -- I didn't see what in ARB_stencil_texturing >> >> had to do with being able to have a GL_STENCIL_INDEX* internal >> >> format... >> > >> > >> > I just pushed it because it does fix a bug... However, you do raise a >> good >> > point. Maybe we should change the i965 driver to use GL_STENCIL_INDEX >> for >> > the internal format for our internal stencil buffers. >> >> How do you end up with a GL_STENCIL_INDEX internal format in the first >> place? Without ARB_texture_stencil8 that's not a thing, is it? >> > > We're using it internally for doing stencil blits. I'm going to Cc Neil > and he can explain what's going on there better than I can as he was the > one who wrote most of that code. Do you mean the code to do the meta stencil blit? I haven't touched that. Maybe I can pass the buck on to Topi :) Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965/skl: Disable SIMD16 when 3-source instructions are used
Stepping C0 of Skylake fails when using SIMD16 with 3-source instructions (such as MAD). This patch just makes it disable SIMD16 in that case. This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit tests. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 21 + src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 22 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 533feb4..62f29e5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3729,6 +3729,26 @@ fs_visitor::fixup_3src_null_dest() } } +/** + * 3-source instructions don't work on Skylake stepping C0. If the revision + * isn't known then it will take the safe option and disable it in case it's + * C0. Implements WaDisableSIMD16On3SrcInstr. + */ +void +fs_visitor::disable_simd16_with_3src() +{ + if (brw->gen != 9 || (brw->revision != 2 && brw->revision != -1)) + return; + + foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { + if (inst->is_3src()) { + no16("SIMD16 doesn't work with 3-source instructions on " + "Gen9 stepping C0\n"); + return; + } + } +} + void fs_visitor::allocate_registers() { @@ -3910,6 +3930,7 @@ fs_visitor::run_fs() assign_urb_setup(); fixup_3src_null_dest(); + disable_simd16_with_3src(); allocate_registers(); if (failed) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 70098d8..7091b9f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -196,6 +196,7 @@ public: void setup_payload_gen6(); void setup_vs_payload(); void fixup_3src_null_dest(); + void disable_simd16_with_3src(); void assign_curb_setup(); void calculate_urb_setup(); void assign_urb_setup(); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Store the GPU revision number in brw_context
brwContextInit now queries the GPU revision number via a new parameter for DRM_I915_GETPARAM. This new parameter requires a kernel patch and a patch to libdrm. If the kernel doesn't support it then it will continue but set the revision number to -1. The intention is to use this to implement workarounds that are only needed on certain steppings of the GPU. --- configure.ac| 2 +- src/mesa/drivers/dri/i965/brw_context.c | 19 +++ src/mesa/drivers/dri/i965/brw_context.h | 4 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index c5abbfd..23ac927 100644 --- a/configure.ac +++ b/configure.ac @@ -61,7 +61,7 @@ AC_SUBST([OSMESA_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.38 LIBDRM_RADEON_REQUIRED=2.4.56 -LIBDRM_INTEL_REQUIRED=2.4.52 +LIBDRM_INTEL_REQUIRED=2.4.59 LIBDRM_NVVIEUX_REQUIRED=2.4.33 LIBDRM_NOUVEAU_REQUIRED="2.4.33 libdrm >= 2.4.41" LIBDRM_FREEDRENO_REQUIRED=2.4.57 diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 786e6f5..18f72d8 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -651,6 +651,24 @@ brw_process_driconf_options(struct brw_context *brw) driQueryOptionb(options, "allow_glsl_extension_directive_midshader"); } +static int +brw_get_revision(int fd) +{ + struct drm_i915_getparam gp; + int revision; + int ret; + + memset(&gp, 0, sizeof(gp)); + gp.param = I915_PARAM_REVISION; + gp.value = &revision; + + ret = drmCommandWriteRead(fd, DRM_I915_GETPARAM, &gp, sizeof(gp)); + if (ret) + revision = -1; + + return revision; +} + GLboolean brwCreateContext(gl_api api, const struct gl_config *mesaVis, @@ -709,6 +727,7 @@ brwCreateContext(gl_api api, brw->has_negative_rhw_bug = devinfo->has_negative_rhw_bug; brw->needs_unlit_centroid_workaround = devinfo->needs_unlit_centroid_workaround; + brw->revision = brw_get_revision(sPriv->fd); brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil; brw->has_swizzling = screen->hw_has_swizzling; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 682fbe9..10e8274 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1064,6 +1064,10 @@ struct brw_context int gen; int gt; + /* GT revision. This will be -1 if the revision couldn't be determined (eg, +* if the kernel doesn't support the query). +*/ + int revision; bool is_g4x; bool is_baytrail; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/skl: Disable SIMD16 when 3-source instructions are used
Ilia Mirkin writes: > On Wed, Mar 4, 2015 at 9:33 AM, Neil Roberts wrote: >> Stepping C0 of Skylake fails when using SIMD16 with 3-source >> instructions (such as MAD). This patch just makes it disable SIMD16 in >> that case. >> >> This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit >> tests. > > Just curious -- is this faster than lowering MAD into MUL + ADD but > staying in SIMD16? (Should be fairly easy to implement as a post-RA > fixup I'd think, although not totally familiar with how the i965 > compiler works.) I haven't tested it but I think you're right that it would probably be faster to lower the instruction. However that sounds like more work and it's only relevant for a limited set of hardware so it doesn't seem worth doing. I think there are also other 3-source instructions and it might be more difficult to lower those (although I'm not sure). Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965/skl: Disable SIMD16 when 3-source instructions are used
Steppings C0 and D0 of Skylake fail when using SIMD16 with 3-source instructions (such as MAD). This patch just makes it disable SIMD16 in those cases. This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit tests. v2: Also apply on stepping D0 --- Damien Lespiau pointed out that the workaround also needs to be applied on stepping D0. src/mesa/drivers/dri/i965/brw_fs.cpp | 22 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 23 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 533feb4..b61208b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3729,6 +3729,27 @@ fs_visitor::fixup_3src_null_dest() } } +/** + * 3-source instructions don't work on Skylake steppings C0 and D0. If the + * revision isn't known then it will take the safe option and disable it in + * case it's one of those. Implements WaDisableSIMD16On3SrcInstr. + */ +void +fs_visitor::disable_simd16_with_3src() +{ + if (brw->gen != 9 || + (brw->revision != 2 && brw->revision != 3 && brw->revision != -1)) + return; + + foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { + if (inst->is_3src()) { + no16("SIMD16 doesn't work with 3-source instructions on " + "Gen9 steppings C0 and D0\n"); + return; + } + } +} + void fs_visitor::allocate_registers() { @@ -3910,6 +3931,7 @@ fs_visitor::run_fs() assign_urb_setup(); fixup_3src_null_dest(); + disable_simd16_with_3src(); allocate_registers(); if (failed) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 70098d8..7091b9f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -196,6 +196,7 @@ public: void setup_payload_gen6(); void setup_vs_payload(); void fixup_3src_null_dest(); + void disable_simd16_with_3src(); void assign_curb_setup(); void calculate_urb_setup(); void assign_urb_setup(); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] meta: Fix the y offset for 1D_ARRAY in _mesa_meta_pbo_TexSubImage
The yoffset needs to be interpreted as a slice offset for 1D array textures. This patch implements that by moving the yoffset into zoffset similar to how it moves the height into depth. --- src/mesa/drivers/common/meta_tex_subimage.c | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index f360d64..e29addb 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -205,8 +205,12 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) { assert(depth == 1); + assert(zoffset == 0); depth = height; height = 1; + image_height = 1; + zoffset = yoffset; + yoffset = 0; } _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, @@ -322,8 +326,12 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) { assert(depth == 1); + assert(zoffset == 0); depth = height; height = 1; + image_height = 1; + zoffset = yoffset; + yoffset = 0; } /* If we were given a texture, bind it to the read framebuffer. If not, -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] meta: Allow GL_UN/PACK_IMAGE_HEIGHT in _mesa_meta_pbo_Get/TexSubImage
Now that a layered source PBO is interpreted as a single tall 2D image it's quite easy to accept the image height packing option by just creating an image that is tall enough to include the image padding. I'm not sure whether the image height property should affect 1D_ARRAY textures. My intuition and interpretation of the GL spec (which is a bit vague) would be that it shouldn't. However the software fallback path in Mesa uses the property for packing but not for unpacking. The binary NVidia driver uses it for both. This patch doesn't use it for either case so it is different from the software fallback. There is some discussion about this here: http://lists.freedesktop.org/archives/mesa-dev/2015-February/077925.html This is tested by the texsubimage Piglit test with the array and pbo arguments. Previously this test was skipping this code path because it always sets the image height. I've also tested it by modifying the getteximage-targets test. It wasn't using this code path before because it was using the default texture object so this code couldn't successfully create a frame buffer. I also modified it to add some image padding with the image height in the PBO. --- src/mesa/drivers/common/meta_tex_subimage.c | 32 ++--- 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 1fef79d..f360d64 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -144,6 +144,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, const struct gl_pixelstore_attrib *packing) { GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 }; + int full_height, image_height; struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; @@ -177,14 +178,16 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, return true; } - /* Only accept tightly packed pixels from the user. */ - if (packing->ImageHeight != 0 && packing->ImageHeight != height) - return false; + /* For arrays, use a tall (height * depth) 2D texture but taking into +* account the inter-image padding specified with the image height packing +* property. +*/ + image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight; + full_height = image_height * (depth - 1) + height; - /* For arrays, use a tall (height * depth) 2D texture. */ pbo_tex_image = create_texture_for_pbo(ctx, create_pbo, GL_PIXEL_UNPACK_BUFFER, - width, height * depth, + width, full_height, format, type, pixels, packing, &pbo, &pbo_tex); if (!pbo_tex_image) @@ -236,7 +239,8 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, - 0, z * height, width, (z + 1) * height, + 0, z * image_height, + width, z * image_height + height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); @@ -263,6 +267,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, const struct gl_pixelstore_attrib *packing) { GLuint pbo = 0, pbo_tex = 0, fbos[2] = { 0, 0 }; + int full_height, image_height; struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; @@ -296,13 +301,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, return true; } - /* Only accept tightly packed pixels from the user. */ - if (packing->ImageHeight != 0 && packing->ImageHeight != height) - return false; + /* For arrays, use a tall (height * depth) 2D texture but taking into +* account the inter-image padding specified with the image height packing +* property. +*/ + image_height = packing->ImageHeight == 0 ? height : packing->ImageHeight; + full_height = image_height * (depth - 1) + height; - /* For arrays, use a tall (height * depth) 2D texture. */ pbo_tex_image = create_texture_for_pbo(ctx, false, GL_PIXEL_PACK_BUFFER, - width, height * depth, + width, full_height * depth, format, type, pixels, packing, &pbo, &pbo_tex); if (!pbo_tex_image) @@ -361,7 +368,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_BlitFramebuf
[Mesa-dev] [PATCH 1/3] Revert "common: Fix PBOs for 1D_ARRAY."
This reverts commit 546aba143d13ba3f993ead4cc30b2404abfc0202. I think the changes to the calls to glBlitFramebuffer from this patch are no different to what it was doing previously because it used to set height to 1 before doing the blits. However it was introducing some problems with the blit for layer 0 because this was no longer special cased. It didn't fix problems with the yoffset which needs to be interpreted as a slice offset. I think a better solution would be to modify the original if statement to cope with the yoffset. Conflicts: src/mesa/drivers/common/meta_tex_subimage.c --- src/mesa/drivers/common/meta_tex_subimage.c | 62 - 1 file changed, 26 insertions(+), 36 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 971ed59..1fef79d 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -147,7 +147,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; - int z, iters; + int z; /* XXX: This should probably be passed in from somewhere */ const char *where = "_mesa_meta_pbo_TexSubImage"; @@ -200,6 +200,12 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]); _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbos[1]); + if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) { + assert(depth == 1); + depth = height; + height = 1; + } + _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, pbo_tex_image, 0); /* If this passes on the first layer it should pass on the others */ @@ -223,27 +229,17 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; - iters = tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY ? - height : depth; - - for (z = 1; z < iters; z++) { + for (z = 1; z < depth; z++) { _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); _mesa_update_state(ctx); - if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) - _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -0, z, width, z + 1, -xoffset, yoffset, -xoffset + width, yoffset + 1, -GL_COLOR_BUFFER_BIT, GL_NEAREST); - else - _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -0, z * height, width, (z + 1) * height, -xoffset, yoffset, -xoffset + width, yoffset + height, -GL_COLOR_BUFFER_BIT, GL_NEAREST); + _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, + 0, z * height, width, (z + 1) * height, + xoffset, yoffset, + xoffset + width, yoffset + height, + GL_COLOR_BUFFER_BIT, GL_NEAREST); } success = true; @@ -270,7 +266,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; - int z, iters; + int z; /* XXX: This should probably be passed in from somewhere */ const char *where = "_mesa_meta_pbo_GetTexSubImage"; @@ -317,6 +313,12 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_GenFramebuffers(2, fbos); + if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) { + assert(depth == 1); + depth = height; + height = 1; + } + /* If we were given a texture, bind it to the read framebuffer. If not, * we're doing a ReadPixels and we should just use whatever framebuffer * the client has bound. @@ -350,29 +352,17 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; - if (tex_image && tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) - iters = height; - else - iters = depth; - - for (z = 1; z < iters; z++) { + for (z = 1; z < depth; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); _mesa_update_state(ctx); - if (tex_image->TexObject->Target == GL_TEXTURE_1D_ARRAY) - _mesa_meta_BlitFramebuffer(ctx, ctx->ReadBuffer, ctx->DrawBuffer, -xoffset, yoffset, -
Re: [Mesa-dev] [PATCH 2/2] i965/skl: Break down SIMD16 3-source instructions when required.
Yes, I like this approach much better. I ran it through Piglit and I can confirm it fixes the same tests as my patch. Reviewed-by: Neil Roberts There's no need to reset the author to me. Thanks for looking at this. Regards, - Neil Kenneth Graunke writes: > Several steppings of Skylake fail when using SIMD16 with 3-source > instructions (such as MAD). > > This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit > tests. > > Based on a patch by Neil Roberts. > > Signed-off-by: Kenneth Graunke > Cc: Neil Roberts > Cc: Matt Turner > --- > src/mesa/drivers/dri/i965/brw_shader.cpp | 6 ++ > 1 file changed, 6 insertions(+) > > Neil, what do you think of this approach? It's a bit smaller of a hammer > than turning off SIMD16 altogether, and pretty simple. I haven't tested > it at all, though. > > Feel free to --reset-author and claim authorship on this patch - it's > really your code, I just moved it over a bit. > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp > b/src/mesa/drivers/dri/i965/brw_shader.cpp > index fed4ba3..74c0e50 100644 > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp > @@ -400,6 +400,12 @@ bool > brw_instruction_supports_simd16(const struct brw_context *brw, enum opcode > op) > { > bool supports_3src = brw->is_haswell || brw->gen >= 8; > + /* WaDisableSIMD16On3SrcInstr: 3-source instructions don't work in SIMD16 > +* on a few steppings of Skylake. > +*/ > + if (brw->gen == 9 && (brw->revision == 2 || brw->revision == 3 || > + brw->revision == -1)) > + supports_3src = false; > > switch (op) { > case BRW_OPCODE_MAD: > -- > 2.2.2 pgpZro1cTftj9.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev