Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Shira Bezalel
Hi All,

I'm testing an upgrade from Postgres 9.6.16 to 12.1 and seeing a
significant performance gain in one specific query. This is really great,
but I'm just looking to understand why. Reading through the release notes
across all the new versions (10, 11, 12) hasn't yielded an obvious cause,
but maybe I missed something. Also, I realize it could be related to other
factors (config parameters, physical hosts, etc), but the systems are
pretty similar so just wondering about Postgres changes.

The query is the following:

SELECT pvc.value, SUM(pvc.count) AS sum
FROM
(SELECT (ST_ValueCount(cv.rast, 1)).*
FROM calveg_whrtype_20m AS cv) AS pvc
GROUP BY pvc.value

Here is the EXPLAIN (ANALYZE ON, BUFFERS ON) output from both systems:

9.6 plan 
12.1 plan 

In the 9.6 plan, the Seq Scan node produced 15,812 rows.
In the 12 plan, the Seq Scan produced 2,502 rows, and then the ProjectSet
node produced 15,812 rows.

Note that the table (calveg_whrtype_20m) in the two databases have the same
number of rows (2,502).

So it seems something about the introduction of the ProjectSet node between
the Seq Scan and HashAggregate is optimizing things...? Is this the right
conclusion to draw and if so, why might this be happening? Is there
something that was changed/improved in either 10, 11 or 12 that this
behavior can be attributed to?

Two more notes --

1. If I run the inner subquery without the outer sum/group by, the plans
between the two systems are identical.

2. As the calgeg_whrtype_20m table is a raster, I started my question on
the PostGIS list, but there was no obvious answer that the gain is related
to a change in the PostGIS code so I'm now turning to this list.

Thank you,
Shira


Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Justin Pryzby
On Mon, Jan 13, 2020 at 08:29:05AM -0800, Shira Bezalel wrote:
> Here is the EXPLAIN (ANALYZE ON, BUFFERS ON) output from both systems:
> 
> 9.6 plan 
> 12.1 plan 

> Is there something that was changed/improved in either 10, 11 or 12 that this
> behavior can be attributed to?

v12 has JIT enabled by default.
You can test if that's significant with SET jit=off.




Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Shira Bezalel
Hi Justin,

I'm seeing no difference in the query plan with JIT disabled in 12.1.

Thanks,
Shira

On Mon, Jan 13, 2020 at 8:42 AM Justin Pryzby  wrote:

> On Mon, Jan 13, 2020 at 08:29:05AM -0800, Shira Bezalel wrote:
> > Here is the EXPLAIN (ANALYZE ON, BUFFERS ON) output from both systems:
> >
> > 9.6 plan 
> > 12.1 plan 
>
> > Is there something that was changed/improved in either 10, 11 or 12 that
> this
> > behavior can be attributed to?
>
> v12 has JIT enabled by default.
> You can test if that's significant with SET jit=off.
>


-- 
Shira Bezalel
Database Administrator & Desktop Support Manager
San Francisco Estuary Institute
www.sfei.org
Ph: 510-746-7304


Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Michael Lewis
I am not at all familiar with PostGIS so perhaps this is a silly question,
is bloat an issue on the older instance? Correlation isn't causation, but
half the buffers scanned and half the runtime in the v12 plan has me
curious why that might be.

>


Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Shira Bezalel
Hi Michael,

I appreciate your question. I ran a vacuum analyze on the 9.6 table and it
yielded no difference. Same number of buffers were read, same query plan.

Thanks,
Shira

On Mon, Jan 13, 2020 at 10:07 AM Michael Lewis  wrote:

> I am not at all familiar with PostGIS so perhaps this is a silly question,
> is bloat an issue on the older instance? Correlation isn't causation, but
> half the buffers scanned and half the runtime in the v12 plan has me
> curious why that might be.
>
>>

-- 
Shira Bezalel
Database Administrator & Desktop Support Manager
San Francisco Estuary Institute
www.sfei.org
Ph: 510-746-7304


Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Tomas Vondra

On Mon, Jan 13, 2020 at 12:44:14PM -0800, Shira Bezalel wrote:

Hi Michael,

I appreciate your question. I ran a vacuum analyze on the 9.6 table and it
yielded no difference. Same number of buffers were read, same query plan.



VACUUM ANALYZE won't shrink the table - the number of buffers will be
exactly the same. You need to do VACUUM FULL, but be careful as that
acquires exclusive lock on the table.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Shira Bezalel
Thanks Tomas. I ran a vacuum full on the 9.6 table -- still no difference
in the query plan. The shared buffers hit went up slightly to 36069.

Shira

On Mon, Jan 13, 2020 at 1:12 PM Tomas Vondra 
wrote:

> On Mon, Jan 13, 2020 at 12:44:14PM -0800, Shira Bezalel wrote:
> >Hi Michael,
> >
> >I appreciate your question. I ran a vacuum analyze on the 9.6 table and it
> >yielded no difference. Same number of buffers were read, same query plan.
> >
>
> VACUUM ANALYZE won't shrink the table - the number of buffers will be
> exactly the same. You need to do VACUUM FULL, but be careful as that
> acquires exclusive lock on the table.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Shira Bezalel
Database Administrator & Desktop Support Manager
San Francisco Estuary Institute
www.sfei.org
Ph: 510-746-7304


Re: Bad query plan when you add many OR conditions

2020-01-13 Thread Tomas Vondra

On Fri, Jan 10, 2020 at 02:30:27PM +0100, Marco Colli wrote:

@Justin Pryzby I have tried this as you suggested:

CREATE STATISTICS statistics_on_subscriptions_project_id_and_tags ON
project_id, tags FROM subscriptions;
VACUUM ANALYZE subscriptions;

Unfortunately nothing changes and Postgresql continues to use the wrong
plan (maybe stats don't work well on array fields like tags??).



We support this type of clause for extended statistics (yet).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Alvaro Herrera
On 2020-Jan-13, Shira Bezalel wrote:

> Hi All,
> 
> I'm testing an upgrade from Postgres 9.6.16 to 12.1 and seeing a
> significant performance gain in one specific query. This is really great,
> but I'm just looking to understand why.

pg12 reads half the number of buffers.  I bet it's because of this change:

commit 4d0e994eed83c845a05da6e9a417b4efec67efaf
Author: Stephen Frost 
AuthorDate: Tue Apr 2 12:35:32 2019 -0400
CommitDate: Tue Apr 2 12:35:32 2019 -0400

Add support for partial TOAST decompression

When asked for a slice of a TOAST entry, decompress enough to return the
slice instead of decompressing the entire object.

For use cases where the slice is at, or near, the beginning of the entry,
this avoids a lot of unnecessary decompression work.

This changes the signature of pglz_decompress() by adding a boolean to
indicate if it's ok for the call to finish before consuming all of the
source or destination buffers.

Author: Paul Ramsey
Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe
Discussion: 
https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Seeking reason behind performance gain in 12 with HashAggregate

2020-01-13 Thread Shira Bezalel
On Mon, Jan 13, 2020 at 2:15 PM Alvaro Herrera 
wrote:

> On 2020-Jan-13, Shira Bezalel wrote:
>
> > Hi All,
> >
> > I'm testing an upgrade from Postgres 9.6.16 to 12.1 and seeing a
> > significant performance gain in one specific query. This is really great,
> > but I'm just looking to understand why.
>
> pg12 reads half the number of buffers.  I bet it's because of this change:
>
> commit 4d0e994eed83c845a05da6e9a417b4efec67efaf
> Author: Stephen Frost 
> AuthorDate: Tue Apr 2 12:35:32 2019 -0400
> CommitDate: Tue Apr 2 12:35:32 2019 -0400
>
> Add support for partial TOAST decompression
>
> When asked for a slice of a TOAST entry, decompress enough to return
> the
> slice instead of decompressing the entire object.
>
> For use cases where the slice is at, or near, the beginning of the
> entry,
> this avoids a lot of unnecessary decompression work.
>
> This changes the signature of pglz_decompress() by adding a boolean to
> indicate if it's ok for the call to finish before consuming all of the
> source or destination buffers.
>
> Author: Paul Ramsey
> Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe
> Discussion:
> https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

That sounds like a possibility. Thanks Alvaro.

Shira