date:20200402

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

2020-04-02 Thread Danylo Piliaiev


"spirv: Implement OpCopyObject and OpCopyLogical as blind copies" was reverted 
yesterday
due to the failures in several dEQP-VK tests, see:
 
https://gitlab.freedesktop.org/mesa/mesa/-/commit/68f325b256d96dca923f6c7d84bc6faf43911245
 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375

I'm not sure if it's already known or how important it is, but I'd better say 
it than not.


On 02.04.20 00:52, Eric Engestrom wrote:

Hi all,

I'd like to announce the release of Mesa 20.0.3.

Quite a busy cycle again, with fixes all over the tree, but nothing
extraordinary; mostly AMD (radv, aco), NIR and Intel (isl, anv), as
expected.

Cheers,
   Eric

---

Git shortlog


Caio Marcelo de Oliveira Filho (1):
   mesa/main: Fix overflow in validation of DispatchComputeGroupSizeARB

Dylan Baker (6):
   docs/relnotes: Add sha256 sums for 20.0.2
   .pick_status.json: Update to cf62c2b2ac69637785f55b790fdd601c17e7e9d5
   .pick_status.json: Mark 672d10619980687acec329742f055f7f3796c1b8 as 
backported
   .pick_status.json: Mark c923de68dd0ab10a5a5fb3196f539707d046d897 as 
backported
   .pick_status.json: Mark 56de6f698e3f164d97f132203e8159ef0b8e9bb8 as 
denominated
   .pick_status.json: Update to aee004a7c8900938d1c17f0ac299d40001b383b0

Eric Engestrom (8):
   .pick_status.json: Update to 3252041a7872c49e53bb02ffe8b079b5fc43f15e
   .pick_status.json: Update to 12711939320e4fcd3a0d86af22da1042ad92035f
   .pick_status.json: Update to 05069e1f0794aadd40ce9269f858e50c64254388
   .pick_status.json: Update to 8970b7839aebefa7207c9535ac34ab4e8cc0ae25
   .pick_status.json: Update to 5f4d9b419a1c931ad468b8b22b8a95b1216891e4
   .pick_status.json: Update to 70ac7f5b0c46370075a35067c9f7dfe78e84b16d
   docs: add release notes for 20.0.3
   VERSION: bump to 20.0.3

Erik Faye-Lund (3):
   rbug: do not return void-value
   pipebuffer: clean up cast-warnings
   vtn/opencl: fully enable OpenCLstd_Clz

Francisco Jerez (1):
   intel/fs/gen12: Fix interaction of SWSB dependency combination with EU 
fusion workaround.

Greg V (1):
   amd/addrlib: fix build on non-x86 platforms

Ian Romanick (2):
   soft-fp64/fsat: Correctly handle NaN
   soft-fp64: Split a block that was missing a cast on a comparison

Jason Ekstrand (5):
   intel/blorp: Add support for swizzling fast-clear colors
   anv: Swizzle fast-clear values
   nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
   anv: Account for the header in anv_state_stream_alloc
   spirv: Implement OpCopyObject and OpCopyLogical as blind copies

John Stultz (2):
   gallium: hud_context: Fix scalar initializer warning.
   vc4_bufmgr: Remove duplicative VC definition

Jordan Justen (2):
   intel: Update TGL PCI strings
   intel: Add TGL PCI ID

Lionel Landwerlin (5):
   isl: implement linear tiling row pitch requirement for display
   isl: properly filter supported display modifiers on Gen9+
   isl: only apply main surface ccs pitch constraint with CCS
   isl: drop min row pitch alignment when set by the driver
   intel: add new TGL pci ids

Marek Olšák (3):
   nir: fix clip/cull_distance_array_size in 
nir_lower_clip_cull_distance_arrays
   ac: fix fast division
   st/mesa: fix use of uninitialized memory due to st_nir_lower_builtin

Marek Vasut (1):
   etnaviv: Emit PE.ALPHA_COLOR_EXT* on GPUs with half-float support

Neil Armstrong (1):
   Revert "ci: Remove T820 from CI temporarily"

Pierre-Eric Pelloux-Prayer (1):
   st/mesa: disallow deferred flush if there are multiple contexts

Rhys Perry (11):
   nir/gather_info: handle emit_vertex_with_counter
   aco: set has_divergent_branch for discards in loops
   aco: handle missing second predecessors at merge block phis
   aco: skip NIR in unreachable merge blocks
   aco: improve check for unreachable loop continue blocks
   aco: emit IR in IF's merge block instead if the other side ends in a jump
   aco: fix boolean undef regclass
   nir/gather_info: fix per-vertex handling in try_mask_partial_io
   aco: implement 64-bit VGPR constant copies in handle_operands()
   glsl: fix race in instance getters
   util/u_queue: fix race in total_jobs_size access

Rob Clark (2):
   freedreno/ir3/ra: fix array liveranges
   util: fix u_fifo_pop()

Samuel Pitoiset (7):
   radv/gfx10: fix required subgroup size with VK_EXT_subgroup_size_control
   radv/gfx10: fix required ballot size with VK_EXT_subgroup_size_control
   radv: fix optional pSizes parameter when binding streamout buffers
   radv: enable VK_KHR_8bit_storage on GFX6-GFX7
   ac/nir: use llvm.amdgcn.rcp for nir_op_frcp
   ac/nir: use llvm.amdgcn.rsq for nir_op_frsq
   ac/nir: use llvm.amdgcn.rcp in ac_build_fdiv()

Tapani Pälli (1):
   glsl: set error_emitted true if type not ok for assignment

Thomas Hellstrom (1):
   svg

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

2020-04-02 Thread Samuel Pitoiset


Good catch!

Yes, please revert it asap, it breaks a bunch of things ... :(

On 4/2/20 11:11 AM, Danylo Piliaiev wrote:

"spirv: Implement OpCopyObject and OpCopyLogical as blind copies" was reverted 
yesterday
due to the failures in several dEQP-VK tests, see:
  
https://gitlab.freedesktop.org/mesa/mesa/-/commit/68f325b256d96dca923f6c7d84bc6faf43911245
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375
I'm not sure if it's already known or how important it is, but I'd better say 
it than not.

On 02.04.20 00:52, Eric Engestrom wrote:

Hi all,

I'd like to announce the release of Mesa 20.0.3.

Quite a busy cycle again, with fixes all over the tree, but nothing
extraordinary; mostly AMD (radv, aco), NIR and Intel (isl, anv), as
expected.

Cheers,
   Eric

---

Git shortlog


Caio Marcelo de Oliveira Filho (1):
   mesa/main: Fix overflow in validation of DispatchComputeGroupSizeARB

Dylan Baker (6):
   docs/relnotes: Add sha256 sums for 20.0.2
   .pick_status.json: Update to cf62c2b2ac69637785f55b790fdd601c17e7e9d5
   .pick_status.json: Mark 672d10619980687acec329742f055f7f3796c1b8 as 
backported
   .pick_status.json: Mark c923de68dd0ab10a5a5fb3196f539707d046d897 as 
backported
   .pick_status.json: Mark 56de6f698e3f164d97f132203e8159ef0b8e9bb8 as 
denominated
   .pick_status.json: Update to aee004a7c8900938d1c17f0ac299d40001b383b0

Eric Engestrom (8):
   .pick_status.json: Update to 3252041a7872c49e53bb02ffe8b079b5fc43f15e
   .pick_status.json: Update to 12711939320e4fcd3a0d86af22da1042ad92035f
   .pick_status.json: Update to 05069e1f0794aadd40ce9269f858e50c64254388
   .pick_status.json: Update to 8970b7839aebefa7207c9535ac34ab4e8cc0ae25
   .pick_status.json: Update to 5f4d9b419a1c931ad468b8b22b8a95b1216891e4
   .pick_status.json: Update to 70ac7f5b0c46370075a35067c9f7dfe78e84b16d
   docs: add release notes for 20.0.3
   VERSION: bump to 20.0.3

Erik Faye-Lund (3):
   rbug: do not return void-value
   pipebuffer: clean up cast-warnings
   vtn/opencl: fully enable OpenCLstd_Clz

Francisco Jerez (1):
   intel/fs/gen12: Fix interaction of SWSB dependency combination with EU 
fusion workaround.

Greg V (1):
   amd/addrlib: fix build on non-x86 platforms

Ian Romanick (2):
   soft-fp64/fsat: Correctly handle NaN
   soft-fp64: Split a block that was missing a cast on a comparison

Jason Ekstrand (5):
   intel/blorp: Add support for swizzling fast-clear colors
   anv: Swizzle fast-clear values
   nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
   anv: Account for the header in anv_state_stream_alloc
   spirv: Implement OpCopyObject and OpCopyLogical as blind copies

John Stultz (2):
   gallium: hud_context: Fix scalar initializer warning.
   vc4_bufmgr: Remove duplicative VC definition

Jordan Justen (2):
   intel: Update TGL PCI strings
   intel: Add TGL PCI ID

Lionel Landwerlin (5):
   isl: implement linear tiling row pitch requirement for display
   isl: properly filter supported display modifiers on Gen9+
   isl: only apply main surface ccs pitch constraint with CCS
   isl: drop min row pitch alignment when set by the driver
   intel: add new TGL pci ids

Marek Olšák (3):
   nir: fix clip/cull_distance_array_size in 
nir_lower_clip_cull_distance_arrays
   ac: fix fast division
   st/mesa: fix use of uninitialized memory due to st_nir_lower_builtin

Marek Vasut (1):
   etnaviv: Emit PE.ALPHA_COLOR_EXT* on GPUs with half-float support

Neil Armstrong (1):
   Revert "ci: Remove T820 from CI temporarily"

Pierre-Eric Pelloux-Prayer (1):
   st/mesa: disallow deferred flush if there are multiple contexts

Rhys Perry (11):
   nir/gather_info: handle emit_vertex_with_counter
   aco: set has_divergent_branch for discards in loops
   aco: handle missing second predecessors at merge block phis
   aco: skip NIR in unreachable merge blocks
   aco: improve check for unreachable loop continue blocks
   aco: emit IR in IF's merge block instead if the other side ends in a jump
   aco: fix boolean undef regclass
   nir/gather_info: fix per-vertex handling in try_mask_partial_io
   aco: implement 64-bit VGPR constant copies in handle_operands()
   glsl: fix race in instance getters
   util/u_queue: fix race in total_jobs_size access

Rob Clark (2):
   freedreno/ir3/ra: fix array liveranges
   util: fix u_fifo_pop()

Samuel Pitoiset (7):
   radv/gfx10: fix required subgroup size with VK_EXT_subgroup_size_control
   radv/gfx10: fix required ballot size with VK_EXT_subgroup_size_control
   radv: fix optional pSizes parameter when binding streamout buffers
   radv: enable VK_KHR_8bit_storage on GFX6-GFX7
   ac/nir: use llvm.amdgcn.rcp for nir_op_frcp
   ac/nir: use llvm.amdgcn.rsq for nir_op_frsq
   ac/nir: use llvm.amdgcn.rcp in ac_build_fdiv

Re: [Mesa-dev] nir: find_msb vs clz

2020-04-02 Thread Daniel Schürmann


W.r.t AMD hardware, it seems to map quite exactly the DX behavior,
so given that I would, of course favor a NIR instruction which maps that :)

Small overview (scalar and vector):
S_FLBIT_I32_{B32,B64} - D = the number of zeros before the first one 
starting from the MSB.Returns -1 if none.
S_FLBIT_I32{_I64} - Count how many bits in a row (from MSB to LSB) are 
the same as the sign bit. Return -1 if the input is zero or all 1's (-1).
V_FFBH_U32 - D.u = position of first 1 in S0 from MSB; D=0x if 
S0==0.
V_FFBH_I32 - D.u = position of first bit different from sign bit in S0 
from MSB; D=0x if S0==0 or 0x.


So, currently we emit a subtraction and bcsel every time we encounter 
*find_msb. For DX->VK, I didn't check yet how they map,
but we could probably also add the reverse optimization, like we do for 
bfm/bfe and such.


@Jason, take the above as naming suggestions :P

Daniel

On 1/4/20 22:52, Jason Ekstrand  wrote:


On Wed, Apr 1, 2020 at 1:52 PM Eric Anholt  wrote:

On Wed, Apr 1, 2020 at 11:39 AM Erik Faye-Lund
 wrote:

While working on the NIR to DXIL conversion code for D3D12, I've
noticed that we're not exactly doing the best we could here.

First some background:

NIR currently has a few instructions that does kinda the same:

1. nir_op_ufind_msb: Finds the index of the most significant bit,
counting from the least significant bit. It returns -1 on zero-input.

2. nir_op_ifind_msb: A signed version of ufind_msb; looks for the first
non sign-bit. It's not terribly interesting in this context, as it can
be trivially lowered if missing, and it doesn't seem like any hardware
supports this natively. I'm just mentioning it for completeness.

3. nir_op_uclz: Counts the amount of leading zeroes, counding from the
most significant bit. It returns 32 on zero-input, and only exist in an
unsigned 32-bit variation.

ufind_msb is kinda the O.G here, uclz was recently added, and is as far
as I can see only used in an intel-specific SPIR-V instruction.

Additionally, there's the OpenCLstd_Clz SPIR-V instruction, which we
lower to ufind_msb using nir_clz_u(), regardless if the backend
supports nir_op_uclz or not.

It seems only the nouveau's NV50 backend actually wants ufind_msb,
everything else seems to convert ufind_msb to some clz-variant while
emitting code. Some have to special-case on zero-input, and some
not...

All of this is not really awesome in my eyes.

So, while adding support for DXIL, I need to figure out how to map
these (well, ufind_msb at least) onto the DXIL intrinsics. DXIL doesn't
have a ufind_msb, but it has a firstbit_hi that is identical to
nir_op_uclz... except that it returns -1 on zero-input :(

For now, I'm lowering ufind_msb to something ufind_msb while emitting
code, like everyone else. But this feels a bit dirty, *especially*
since we have a clz-instruction that *almost* fits. And since we're
targetting OpenCL, which use clz as it's primitive, we end up doing 32
- (32 - x), and since that inner isub happens while emitting, we can't
easily optimize it away without introducing an optimizing backend...

The solution seems obvious; use nir_op_uclz instead.

But that's also a bit annoying, for a few reasons:

1. Only *one* backend actually implements support for it. So this
either means a lot of work, or making it an opt-in feature somehow.

That's likely fairly easily fixed.  That said, making it an optional
feature is also easy.  Add lowering in nir_opt_algebraic.py hidden
behind a lower_ufind_msb_to_clz flag.  If setting that flag on Intel
doesn't hurt shader-db (I think our back-end lowering may be slightly
more efficient), we'll set it and delete a pile of code.


2. We would probably have to support lowering in either direction to
support what all hardware prefers.

I suspect that virtually everyone who has an instruction for this in
hardware has one that supports returning the bit-width for 0.  There's
an interesting wikipedia page on this:

https://en.wikipedia.org/wiki/Find_first_set

According to the table there, virtually all CPUs that implement this
return the bit-width for 0 except for the old way to do it on Intel.
Since this is also what's defined for OpenCL, that's what we're likely
to see on mobile.  Intel has instructions for both and I would guess
AMD and Nvidia do as well since they care a lot about D3D.


3. That zero-case still needs special treatment in several backends, it
seems. We could alternatively declare that nir_op_uclz is undefined for
zero-input, and handle this when lowering...?

4. It seems some (Intel?) hardware only supports 32-bit clz, so we
would have to lower to something else for other bit-sizes. That's not
too hard, though.

On Intel, we have two instructions for this: FBH which returns -1 for
0 and LZD which returns 32 for 0.  Both count leading zeros from the
MSB side.  We don't have native hardware support for computing from
the LSB side.  And, yeah, we can only do it on 32-bit types so that
sucks a bit.


So yeah...

I

Re: [Mesa-dev] nir: find_msb vs clz

2020-04-02 Thread Jason Ekstrand

On Thu, Apr 2, 2020 at 7:23 AM Daniel Schürmann  wrote:
>
> W.r.t AMD hardware, it seems to map quite exactly the DX behavior,
> so given that I would, of course favor a NIR instruction which maps that :)

I'm a bit surprised.  I thought AMD HW usually had 25 opcodes whenever
3 were needed just so you have options. :-P

> Small overview (scalar and vector):
> S_FLBIT_I32_{B32,B64} - D = the number of zeros before the first one starting 
> from the MSB. Returns -1 if none.
> S_FLBIT_I32{_I64} - Count how many bits in a row (from MSB to LSB) are the 
> same as the sign bit. Return -1 if the input is zero or all 1's (-1).
> V_FFBH_U32 - D.u = position of first 1 in S0 from MSB; D=0x if S0==0.
> V_FFBH_I32 - D.u = position of first bit different from sign bit in S0 from 
> MSB; D=0x if S0==0 or 0x.
>
> So, currently we emit a subtraction and bcsel every time we encounter 
> *find_msb. For DX->VK, I didn't check yet how they map,
> but we could probably also add the reverse optimization, like we do for 
> bfm/bfe and such.

As I said in my mail, I'm very happy to have both.  Intel can do both
so we'd rather do the optimal thing for DX when we're running an app
through DXVK and do the optimal thing for CL-style compute when
running an app ported from CL.  That said, only one app in our shader
database uses the GLSL findMSB and it uses it exactly twice so.
There is that.

> @Jason, take the above as naming suggestions :P

I believe I did suggest FBH as a name for something that... isn't
AMD's FFBH. :-P

--Jason


>
> Daniel
>
> On 1/4/20 22:52, Jason Ekstrand  wrote:
>
> On Wed, Apr 1, 2020 at 1:52 PM Eric Anholt  wrote:
>
> On Wed, Apr 1, 2020 at 11:39 AM Erik Faye-Lund
>  wrote:
>
> While working on the NIR to DXIL conversion code for D3D12, I've
> noticed that we're not exactly doing the best we could here.
>
> First some background:
>
> NIR currently has a few instructions that does kinda the same:
>
> 1. nir_op_ufind_msb: Finds the index of the most significant bit,
> counting from the least significant bit. It returns -1 on zero-input.
>
> 2. nir_op_ifind_msb: A signed version of ufind_msb; looks for the first
> non sign-bit. It's not terribly interesting in this context, as it can
> be trivially lowered if missing, and it doesn't seem like any hardware
> supports this natively. I'm just mentioning it for completeness.
>
> 3. nir_op_uclz: Counts the amount of leading zeroes, counding from the
> most significant bit. It returns 32 on zero-input, and only exist in an
> unsigned 32-bit variation.
>
> ufind_msb is kinda the O.G here, uclz was recently added, and is as far
> as I can see only used in an intel-specific SPIR-V instruction.
>
> Additionally, there's the OpenCLstd_Clz SPIR-V instruction, which we
> lower to ufind_msb using nir_clz_u(), regardless if the backend
> supports nir_op_uclz or not.
>
> It seems only the nouveau's NV50 backend actually wants ufind_msb,
> everything else seems to convert ufind_msb to some clz-variant while
> emitting code. Some have to special-case on zero-input, and some
> not...
>
> All of this is not really awesome in my eyes.
>
> So, while adding support for DXIL, I need to figure out how to map
> these (well, ufind_msb at least) onto the DXIL intrinsics. DXIL doesn't
> have a ufind_msb, but it has a firstbit_hi that is identical to
> nir_op_uclz... except that it returns -1 on zero-input :(
>
> For now, I'm lowering ufind_msb to something ufind_msb while emitting
> code, like everyone else. But this feels a bit dirty, *especially*
> since we have a clz-instruction that *almost* fits. And since we're
> targetting OpenCL, which use clz as it's primitive, we end up doing 32
> - (32 - x), and since that inner isub happens while emitting, we can't
> easily optimize it away without introducing an optimizing backend...
>
> The solution seems obvious; use nir_op_uclz instead.
>
> But that's also a bit annoying, for a few reasons:
>
> 1. Only *one* backend actually implements support for it. So this
> either means a lot of work, or making it an opt-in feature somehow.
>
> That's likely fairly easily fixed.  That said, making it an optional
> feature is also easy.  Add lowering in nir_opt_algebraic.py hidden
> behind a lower_ufind_msb_to_clz flag.  If setting that flag on Intel
> doesn't hurt shader-db (I think our back-end lowering may be slightly
> more efficient), we'll set it and delete a pile of code.
>
> 2. We would probably have to support lowering in either direction to
> support what all hardware prefers.
>
> I suspect that virtually everyone who has an instruction for this in
> hardware has one that supports returning the bit-width for 0.  There's
> an interesting wikipedia page on this:
>
> https://en.wikipedia.org/wiki/Find_first_set
>
> According to the table there, virtually all CPUs that implement this
> return the bit-width for 0 except for the old way to do it on Intel.
> Since this is also what's defined for OpenCL, that's what we're

Re: [Mesa-dev] nir: find_msb vs clz

2020-04-02 Thread Ian Romanick

On 4/1/20 11:39 AM, Erik Faye-Lund wrote:
> While working on the NIR to DXIL conversion code for D3D12, I've
> noticed that we're not exactly doing the best we could here.
> 
> First some background:
> 
> NIR currently has a few instructions that does kinda the same:
> 
> 1. nir_op_ufind_msb: Finds the index of the most significant bit,
> counting from the least significant bit. It returns -1 on zero-input.
> 
> 2. nir_op_ifind_msb: A signed version of ufind_msb; looks for the first
> non sign-bit. It's not terribly interesting in this context, as it can
> be trivially lowered if missing, and it doesn't seem like any hardware
> supports this natively. I'm just mentioning it for completeness.

These instructions map almost directly to GLSL findMSB().

> 3. nir_op_uclz: Counts the amount of leading zeroes, counding from the
> most significant bit. It returns 32 on zero-input, and only exist in an
> unsigned 32-bit variation.
> 
> ufind_msb is kinda the O.G here, uclz was recently added, and is as far
> as I can see only used in an intel-specific SPIR-V instruction.
> 
> Additionally, there's the OpenCLstd_Clz SPIR-V instruction, which we
> lower to ufind_msb using nir_clz_u(), regardless if the backend
> supports nir_op_uclz or not.

That extension (mostly) brings an handful of OpenCL instructions to
graphics.  The only outlier is the instruction for 32-bit × 16-bit
multiplication.

> It seems only the nouveau's NV50 backend actually wants ufind_msb,
> everything else seems to convert ufind_msb to some clz-variant while
> emitting code. Some have to special-case on zero-input, and some
> not... 
> 
> All of this is not really awesome in my eyes.
> 
> So, while adding support for DXIL, I need to figure out how to map
> these (well, ufind_msb at least) onto the DXIL intrinsics. DXIL doesn't
> have a ufind_msb, but it has a firstbit_hi that is identical to
> nir_op_uclz... except that it returns -1 on zero-input :(

Here's the first question you should be asking: how often does this
occur in real shaders?  Is it worth caring about generating the optimal
thing?  As Jason pointed out in a different message, the GLSL findMSB
functions seem to occur within epsilon of never.  If the same is true
for the DXIL instructions, is it worth spending any effort?

Honestly, until there is a known user, I would just do the thing that
has the least impact and move on.  Queue the old quote about premature
optimization...

> For now, I'm lowering ufind_msb to something ufind_msb while emitting
> code, like everyone else. But this feels a bit dirty, *especially*
> since we have a clz-instruction that *almost* fits. And since we're
> targetting OpenCL, which use clz as it's primitive, we end up doing 32
> - (32 - x), and since that inner isub happens while emitting, we can't
> easily optimize it away without introducing an optimizing backend...

If 12+ years of graphics compiler experience has taught us anything, it
has taught us that you need an optimizing backend.  Maybe not as job #1,
but probably very, very soon.  Being able to recognize common code
patterns like 32 - fbh(x) and generating the right thing is not that
hard.  If you have a code-generator generator (!2680), it is trivial.

> The solution seems obvious; use nir_op_uclz instead.
> 
> But that's also a bit annoying, for a few reasons:
> 
> 1. Only *one* backend actually implements support for it. So this
> either means a lot of work, or making it an opt-in feature somehow.
> 
> 2. We would probably have to support lowering in either direction to
> support what all hardware prefers.
> 
> 3. That zero-case still needs special treatment in several backends, it
> seems. We could alternatively declare that nir_op_uclz is undefined for
> zero-input, and handle this when lowering...?
> 
> 4. It seems some (Intel?) hardware only supports 32-bit clz, so we
> would have to lower to something else for other bit-sizes. That's not
> too hard, though.
> 
> So yeah...
> 
> I guess the first step would be to add a switch to use nir_uclz()
> instead of nir_clz_u() when handling OpenCLstd_Clz in vtn.
> 
> Next, I guess I would add a lower_ufind_msb flag to
> nir_shader_compiler_options, and make nir_opt_algebraic.py lower
> ufind_msb to uclz.
> 
> Finally, we can start implementing support for this in more drivers,
> and flip on some switches.
> 
> I'm still not really sold on what to do about the special-case for
> zero... By making it undefined, I think we're just punishing all
> backends, just in the name of making the compiler backends a bit
> simpler, so that doesn't seem too good of an idea either.
> 
> Does anyone have a better idea? I would kinda love to optimize away the
> zero-case if it's obvious that it's impossible, e.g cases like "clz(x |
> 1)"... 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-d

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

2020-04-02 Thread Eric Engestrom

Sorry about that; looks like it was missed by my testing.

I need to go to sleep now, but I'll do an emergency release tomorrow
morning with just this revert.



On Thursday, 2020-04-02 11:25:17 +0200, Samuel Pitoiset wrote:
> Good catch!
> 
> Yes, please revert it asap, it breaks a bunch of things ... :(
> 
> On 4/2/20 11:11 AM, Danylo Piliaiev wrote:
> > "spirv: Implement OpCopyObject and OpCopyLogical as blind copies" was 
> > reverted yesterday
> > due to the failures in several dEQP-VK tests, see:
> >   
> > https://gitlab.freedesktop.org/mesa/mesa/-/commit/68f325b256d96dca923f6c7d84bc6faf43911245
> >   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375
> > I'm not sure if it's already known or how important it is, but I'd better 
> > say it than not.
> > 
> > On 02.04.20 00:52, Eric Engestrom wrote:
> > > Hi all,
> > > 
> > > I'd like to announce the release of Mesa 20.0.3.
> > > 
> > > Quite a busy cycle again, with fixes all over the tree, but nothing
> > > extraordinary; mostly AMD (radv, aco), NIR and Intel (isl, anv), as
> > > expected.
> > > 
> > > Cheers,
> > >Eric
> > > 
> > > ---
> > > 
> > > Git shortlog
> > > 
> > > 
> > > Caio Marcelo de Oliveira Filho (1):
> > >mesa/main: Fix overflow in validation of 
> > > DispatchComputeGroupSizeARB
> > > 
> > > Dylan Baker (6):
> > >docs/relnotes: Add sha256 sums for 20.0.2
> > >.pick_status.json: Update to 
> > > cf62c2b2ac69637785f55b790fdd601c17e7e9d5
> > >.pick_status.json: Mark 672d10619980687acec329742f055f7f3796c1b8 
> > > as backported
> > >.pick_status.json: Mark c923de68dd0ab10a5a5fb3196f539707d046d897 
> > > as backported
> > >.pick_status.json: Mark 56de6f698e3f164d97f132203e8159ef0b8e9bb8 
> > > as denominated
> > >.pick_status.json: Update to 
> > > aee004a7c8900938d1c17f0ac299d40001b383b0
> > > 
> > > Eric Engestrom (8):
> > >.pick_status.json: Update to 
> > > 3252041a7872c49e53bb02ffe8b079b5fc43f15e
> > >.pick_status.json: Update to 
> > > 12711939320e4fcd3a0d86af22da1042ad92035f
> > >.pick_status.json: Update to 
> > > 05069e1f0794aadd40ce9269f858e50c64254388
> > >.pick_status.json: Update to 
> > > 8970b7839aebefa7207c9535ac34ab4e8cc0ae25
> > >.pick_status.json: Update to 
> > > 5f4d9b419a1c931ad468b8b22b8a95b1216891e4
> > >.pick_status.json: Update to 
> > > 70ac7f5b0c46370075a35067c9f7dfe78e84b16d
> > >docs: add release notes for 20.0.3
> > >VERSION: bump to 20.0.3
> > > 
> > > Erik Faye-Lund (3):
> > >rbug: do not return void-value
> > >pipebuffer: clean up cast-warnings
> > >vtn/opencl: fully enable OpenCLstd_Clz
> > > 
> > > Francisco Jerez (1):
> > >intel/fs/gen12: Fix interaction of SWSB dependency combination 
> > > with EU fusion workaround.
> > > 
> > > Greg V (1):
> > >amd/addrlib: fix build on non-x86 platforms
> > > 
> > > Ian Romanick (2):
> > >soft-fp64/fsat: Correctly handle NaN
> > >soft-fp64: Split a block that was missing a cast on a comparison
> > > 
> > > Jason Ekstrand (5):
> > >intel/blorp: Add support for swizzling fast-clear colors
> > >anv: Swizzle fast-clear values
> > >nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
> > >anv: Account for the header in anv_state_stream_alloc
> > >spirv: Implement OpCopyObject and OpCopyLogical as blind copies
> > > 
> > > John Stultz (2):
> > >gallium: hud_context: Fix scalar initializer warning.
> > >vc4_bufmgr: Remove duplicative VC definition
> > > 
> > > Jordan Justen (2):
> > >intel: Update TGL PCI strings
> > >intel: Add TGL PCI ID
> > > 
> > > Lionel Landwerlin (5):
> > >isl: implement linear tiling row pitch requirement for display
> > >isl: properly filter supported display modifiers on Gen9+
> > >isl: only apply main surface ccs pitch constraint with CCS
> > >isl: drop min row pitch alignment when set by the driver
> > >intel: add new TGL pci ids
> > > 
> > > Marek Olšák (3):
> > >nir: fix clip/cull_distance_array_size in 
> > > nir_lower_clip_cull_distance_arrays
> > >ac: fix fast division
> > >st/mesa: fix use of uninitialized memory due to 
> > > st_nir_lower_builtin
> > > 
> > > Marek Vasut (1):
> > >etnaviv: Emit PE.ALPHA_COLOR_EXT* on GPUs with half-float support
> > > 
> > > Neil Armstrong (1):
> > >Revert "ci: Remove T820 from CI temporarily"
> > > 
> > > Pierre-Eric Pelloux-Prayer (1):
> > >st/mesa: disallow deferred flush if there are multiple contexts
> > > 
> > > Rhys Perry (11):
> > >nir/gather_info: handle emit_vertex_with_counter
> > >aco: set has_divergent_branch for discards in loops
> > >aco: handle missing second predecessors at merge block phis
> > >aco: skip NIR in unreachable merge blocks
> > >aco: improve check for unreachab

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

Re: [Mesa-dev] nir: find_msb vs clz

Re: [Mesa-dev] nir: find_msb vs clz

Re: [Mesa-dev] nir: find_msb vs clz

Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.3

6 matches

Site Navigation

Mail list logo

Footer information