Re:Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread gzh
Thank you for providing the requested information.




The WebSite has been used for many years, and this upgrade is only a version 
upgrade of the PostgreSQL database.

My customer does not want to modify the SQL because it will increase the cost 
of the project(All SQL that contains a LIMIT clause needs to be analyzed and 
checked). 



Is there no other way to solve the problem?













At 2022-10-11 13:24:12, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 7:08 odesílatel gzh  napsal:


Hi, Pavel




> The LIMIT clause changes total cost.  This is a very aggressive clause. And

> although it is absolutely useless in this case, Postgres does not have any

> logic for removing it. Postgres doesn't try to fix developer's mistakes.

Sorry,I didn't understand what you mean.

Couldn't the LIMIT clause be used like the SQL statement below?




>> new=# explain analyze select 2 from analyze_word_reports where (cseid =

>> 94) limit 1;



there was query



SELECT aggregate() FROM xx LIMIT 1






This SQL statement is no problem under PostgreSQL 8.4, the index works well.






The optimizer is under nonstop change. And you can expect from any new release


75% queries are without change, 20% queries are faster, and 5% queries are 
slower


The optimization is based on statistics and estimations, and searching for the 
optimal solution in space of all solutions. In any version there are smaller or 
bigger changes of estimation methods, and between old 8.4 and 12 there are big 
changes in possibilities of how the query can be executed. So there is a higher 
possibility to find some really fast queries, but there is a higher possibility 
to find some local optimum or slow query too.  Usually the optimizer is smarter 
(what is the benefit), but more sensitive too (what is the cost). You cannot 
expect the same result, when the data and algorithm is changed in any version. 
Postgres doesn't garant the immutability of execution plans.



The clause LIMIT with low LIMIT value can be problematic in more cases. The 
model in Postgres expects data are uniformly stored in the table (heap), but 
the reality can be different. The common trick in these cases is using OFFSET 0 
clause like


SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.










 














At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:





Hi, Pavel

Thank you for your reply.




> the LIMIT clause is in this case totally useless and messy, and maybe can

> negative impacts optimizer

Yes. After removing the LIMIT clause, the performance is improved. 

The execution plan shows that the index worked.

We've noticed it, but I don't want to fix the problem by modifying the SQL 
until I find the cause.



The LIMIT clause changes total cost.  This is a very aggressive clause. And 
although it is absolutely useless in this case, Postgres does not have any 
logic for removing it. Postgres doesn't try to fix developer's mistakes.



 














At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:

Hi, Tom
Thank you for your reply.


> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work". 

I understand.




> Did the plan (including rowcount

> estimates) change at all?  To what?  How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.




new=# show enable_seqscan;

 enable_seqscan



 on

(1 行)




new=# select count(*) from analyze_word_reports;

  count   

--

 21331980

(1 行)




new=# select count(*) from analyze_word_reports where (cseid = 94);

  count

-

 1287156

(1 行)




new=# explain analyze select count(2) from analyze_word_reports where (cseid = 
94) limit 1;


 QUERY PLAN   

  



the LIMIT clause is in this case totally useless and messy, and maybe can 
negative impacts optimizer


Regards


Pavel


 

--

--

 Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 
rows=1 loops=1)

   ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual 
time=123.712..133.033 rows=1 loops=1)

 ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual 
time=123.548..133.024 rows=3 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) 
(actual time=119.495..119.496 rows=1 loops=3)

 ->  

Re: Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread Pavel Stehule
út 11. 10. 2022 v 10:01 odesílatel gzh  napsal:

> Thank you for providing the requested information.
>
>
> The WebSite has been used for many years, and this upgrade is only a
> version upgrade of the PostgreSQL database.
> My customer does not want to modify the SQL because it will increase the
> cost of the project(All SQL that contains a LIMIT clause needs to be
> analyzed and checked).
>
> Is there no other way to solve the problem?
>

I don't know about any alternative

Regards

Pavel


>
>
>
>
> At 2022-10-11 13:24:12, "Pavel Stehule"  wrote:
>
>
>
> út 11. 10. 2022 v 7:08 odesílatel gzh  napsal:
>
>> Hi, Pavel
>>
>>
>> > The LIMIT clause changes total cost.  This is a very aggressive clause.
>> And
>>
>> > although it is absolutely useless in this case, Postgres does not have
>> any
>>
>> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>
>> Sorry,I didn't understand what you mean.
>>
>> Couldn't the LIMIT clause be used like the SQL statement below?
>>
>>
>> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>>
>> >> 94) limit 1;
>>
>
> there was query
>
> SELECT aggregate() FROM xx LIMIT 1
>
>
>> This SQL statement is no problem under PostgreSQL 8.4, the index works
>> well.
>>
>>
>>
> The optimizer is under nonstop change. And you can expect from any new
> release
>
> 75% queries are without change, 20% queries are faster, and 5% queries are
> slower
>
> The optimization is based on statistics and estimations, and searching for
> the optimal solution in space of all solutions. In any version there are
> smaller or bigger changes of estimation methods, and between old 8.4 and 12
> there are big changes in possibilities of how the query can be executed. So
> there is a higher possibility to find some really fast queries, but there
> is a higher possibility to find some local optimum or slow query too.
> Usually the optimizer is smarter (what is the benefit), but more sensitive
> too (what is the cost). You cannot expect the same result, when the data
> and algorithm is changed in any version. Postgres doesn't garant the
> immutability of execution plans.
>
> The clause LIMIT with low LIMIT value can be problematic in more cases.
> The model in Postgres expects data are uniformly stored in the table
> (heap), but the reality can be different. The common trick in these cases
> is using OFFSET 0 clause like
>
> SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
>
>
>
>
>
>
>
>>
>>
>>
>>
>> At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:
>>
>>
>>
>> út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:
>>
>>>
>>> Hi, Pavel
>>>
>>> Thank you for your reply.
>>>
>>>
>>> > the LIMIT clause is in this case totally useless and messy, and maybe
>>> can
>>>
>>> > negative impacts optimizer
>>>
>>> Yes. After removing the LIMIT clause, the performance is improved.
>>>
>>> The execution plan shows that the index worked.
>>>
>>> We've noticed it, but I don't want to fix the problem by modifying the
>>> SQL until I find the cause.
>>>
>>
>> The LIMIT clause changes total cost.  This is a very aggressive clause.
>> And although it is absolutely useless in this case, Postgres does not have
>> any logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>
>>
>>
>>>
>>>
>>>
>>>
>>> At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:
>>>
>>>
>>>
>>> út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:
>>>
 Hi, Tom
 Thank you for your reply.

 > When you're asking for help, please don't give us vague statements

 > like "doesn't seem to work".

 I understand.


 > Did the plan (including rowcount

 > estimates) change at all?  To what?  How far off is that rowcount

 > estimate, anyway --- that is, how many rows actually have cseid = 94?

 Please refer to the new execution plan (PostgreSQL 12.11) below.


 new=# show enable_seqscan;

  enable_seqscan

 

  on

 (1 行)


 new=# select count(*) from analyze_word_reports;

   count

 --

  21331980

 (1 行)


 new=# select count(*) from analyze_word_reports where (cseid = 94);

   count

 -

  1287156

 (1 行)


 new=# explain analyze select count(2) from analyze_word_reports where
 (cseid = 94) limit 1;


  QUERY PLAN




>>>
>>> the LIMIT clause is in this case totally useless and messy, and maybe
>>> can negative impacts optimizer
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>

 --


 --

  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual
 time=123.713..133.035 rows=1 loops=1)


Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread gzh
Thank you for all your assistance.




By communicating with my customer, we have adopted the following solution to 
fix the problem.




set enable_seqscan = off
















At 2022-10-11 16:21:42, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 10:01 odesílatel gzh  napsal:


Thank you for providing the requested information.




The WebSite has been used for many years, and this upgrade is only a version 
upgrade of the PostgreSQL database.

My customer does not want to modify the SQL because it will increase the cost 
of the project(All SQL that contains a LIMIT clause needs to be analyzed and 
checked). 



Is there no other way to solve the problem?



I don't know about any alternative


Regards


Pavel
 














At 2022-10-11 13:24:12, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 7:08 odesílatel gzh  napsal:


Hi, Pavel




> The LIMIT clause changes total cost.  This is a very aggressive clause. And

> although it is absolutely useless in this case, Postgres does not have any

> logic for removing it. Postgres doesn't try to fix developer's mistakes.

Sorry,I didn't understand what you mean.

Couldn't the LIMIT clause be used like the SQL statement below?




>> new=# explain analyze select 2 from analyze_word_reports where (cseid =

>> 94) limit 1;



there was query



SELECT aggregate() FROM xx LIMIT 1






This SQL statement is no problem under PostgreSQL 8.4, the index works well.






The optimizer is under nonstop change. And you can expect from any new release


75% queries are without change, 20% queries are faster, and 5% queries are 
slower


The optimization is based on statistics and estimations, and searching for the 
optimal solution in space of all solutions. In any version there are smaller or 
bigger changes of estimation methods, and between old 8.4 and 12 there are big 
changes in possibilities of how the query can be executed. So there is a higher 
possibility to find some really fast queries, but there is a higher possibility 
to find some local optimum or slow query too.  Usually the optimizer is smarter 
(what is the benefit), but more sensitive too (what is the cost). You cannot 
expect the same result, when the data and algorithm is changed in any version. 
Postgres doesn't garant the immutability of execution plans.



The clause LIMIT with low LIMIT value can be problematic in more cases. The 
model in Postgres expects data are uniformly stored in the table (heap), but 
the reality can be different. The common trick in these cases is using OFFSET 0 
clause like


SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.










 














At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:





Hi, Pavel

Thank you for your reply.




> the LIMIT clause is in this case totally useless and messy, and maybe can

> negative impacts optimizer

Yes. After removing the LIMIT clause, the performance is improved. 

The execution plan shows that the index worked.

We've noticed it, but I don't want to fix the problem by modifying the SQL 
until I find the cause.



The LIMIT clause changes total cost.  This is a very aggressive clause. And 
although it is absolutely useless in this case, Postgres does not have any 
logic for removing it. Postgres doesn't try to fix developer's mistakes.



 














At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:

Hi, Tom
Thank you for your reply.


> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work". 

I understand.




> Did the plan (including rowcount

> estimates) change at all?  To what?  How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.




new=# show enable_seqscan;

 enable_seqscan



 on

(1 行)




new=# select count(*) from analyze_word_reports;

  count   

--

 21331980

(1 行)




new=# select count(*) from analyze_word_reports where (cseid = 94);

  count

-

 1287156

(1 行)




new=# explain analyze select count(2) from analyze_word_reports where (cseid = 
94) limit 1;


 QUERY PLAN   

  



the LIMIT clause is in this case totally useless and messy, and maybe can 
negative impacts optimizer


Regards


Pavel


 

--

--

 Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 
rows=1 loops=1)

   ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual 
time=123.712..133.033 rows=1 lo

Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread Josef Šimánek
út 11. 10. 2022 v 11:17 odesílatel gzh  napsal:
>
> Thank you for all your assistance.
>
>
> By communicating with my customer, we have adopted the following solution to 
> fix the problem.
>
>
> set enable_seqscan = off
>
This can make some queries fail since there will be no way to gather
data without seqscan.
>
>
>
>
> At 2022-10-11 16:21:42, "Pavel Stehule"  wrote:
>
>
>
> út 11. 10. 2022 v 10:01 odesílatel gzh  napsal:
>>
>> Thank you for providing the requested information.
>>
>>
>> The WebSite has been used for many years, and this upgrade is only a version 
>> upgrade of the PostgreSQL database.
>>
>> My customer does not want to modify the SQL because it will increase the 
>> cost of the project(All SQL that contains a LIMIT clause needs to be 
>> analyzed and checked).
>>
>> Is there no other way to solve the problem?
>
>
> I don't know about any alternative
>
> Regards
>
> Pavel
>
>>
>>
>>
>>
>>
>> At 2022-10-11 13:24:12, "Pavel Stehule"  wrote:
>>
>>
>>
>> út 11. 10. 2022 v 7:08 odesílatel gzh  napsal:
>>>
>>> Hi, Pavel
>>>
>>>
>>> > The LIMIT clause changes total cost.  This is a very aggressive clause. 
>>> > And
>>>
>>> > although it is absolutely useless in this case, Postgres does not have any
>>>
>>> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>> Sorry,I didn't understand what you mean.
>>>
>>> Couldn't the LIMIT clause be used like the SQL statement below?
>>>
>>>
>>> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>>>
>>> >> 94) limit 1;
>>
>>
>> there was query
>>
>> SELECT aggregate() FROM xx LIMIT 1
>>
>>>
>>> This SQL statement is no problem under PostgreSQL 8.4, the index works well.
>>>
>>>
>>
>> The optimizer is under nonstop change. And you can expect from any new 
>> release
>>
>> 75% queries are without change, 20% queries are faster, and 5% queries are 
>> slower
>>
>> The optimization is based on statistics and estimations, and searching for 
>> the optimal solution in space of all solutions. In any version there are 
>> smaller or bigger changes of estimation methods, and between old 8.4 and 12 
>> there are big changes in possibilities of how the query can be executed. So 
>> there is a higher possibility to find some really fast queries, but there is 
>> a higher possibility to find some local optimum or slow query too.  Usually 
>> the optimizer is smarter (what is the benefit), but more sensitive too (what 
>> is the cost). You cannot expect the same result, when the data and algorithm 
>> is changed in any version. Postgres doesn't garant the immutability of 
>> execution plans.
>>
>> The clause LIMIT with low LIMIT value can be problematic in more cases. The 
>> model in Postgres expects data are uniformly stored in the table (heap), but 
>> the reality can be different. The common trick in these cases is using 
>> OFFSET 0 clause like
>>
>> SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
>>
>>
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:
>>>
>>>
>>>
>>> út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:


 Hi, Pavel

 Thank you for your reply.


 > the LIMIT clause is in this case totally useless and messy, and maybe can

 > negative impacts optimizer

 Yes. After removing the LIMIT clause, the performance is improved.

 The execution plan shows that the index worked.

 We've noticed it, but I don't want to fix the problem by modifying the SQL 
 until I find the cause.
>>>
>>>
>>> The LIMIT clause changes total cost.  This is a very aggressive clause. And 
>>> although it is absolutely useless in this case, Postgres does not have any 
>>> logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>>





 At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:



 út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:
>
> Hi, Tom
> Thank you for your reply.
>
> > When you're asking for help, please don't give us vague statements
>
> > like "doesn't seem to work".
>
> I understand.
>
>
> > Did the plan (including rowcount
>
> > estimates) change at all?  To what?  How far off is that rowcount
>
> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>
> Please refer to the new execution plan (PostgreSQL 12.11) below.
>
>
> new=# show enable_seqscan;
>
>  enable_seqscan
>
> 
>
>  on
>
> (1 行)
>
>
> new=# select count(*) from analyze_word_reports;
>
>   count
>
> --
>
>  21331980
>
> (1 行)
>
>
> new=# select count(*) from analyze_word_reports where (cseid = 94);
>
>   count
>
> -
>
>  1287156
>
> (1 行)
>
>
> new=# explain analyze select count(2) from analyze_word_reports where 
> (c

Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread David Rowley
On Tue, 11 Oct 2022 at 22:52, Josef Šimánek  wrote:
> This can make some queries fail since there will be no way to gather
> data without seqscan.

Disabling enable_seqscan only adds a const penalty to Seq Scans. It
does not outright disallow them altogether.

Having said that, having Paths with the disabled high cost penalty can
cause other issues like the planner thinking two Paths are "fuzzily"
similar enough in costs and rejecting better Paths when in fact the
better Path is really quite a bit better when you subtract the
disabling cost penalty.

David




Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread David Rowley
On Tue, 11 Oct 2022 at 16:13, gzh  wrote:
> new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) 
> limit 1;
>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 
> rows=1 loops=1)
>->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 
> width=4) (actual time=2156.962..2156.964 rows=1 loops=
> 1)
>  Filter: (cseid = 94)
>  Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution 
> Time: 2156.985 ms

It's a bit unfortunate that the planner picked this plan.  I can
recreate the problem on the master branch with:

create table t1 (a int, b int);
insert into t1 select x,x from generate_Series(1,1000)x;
insert into t1 select 0,0 from generate_Series(1,1000)x;
analyze t1;
create index on t1(a);
set synchronize_seqscans=off;
explain analyze select * from t1 where a=0 limit 1;
 QUERY PLAN

 Limit  (cost=0.00..0.03 rows=1 width=8) (actual
time=1865.838..1865.840 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..338496.00 rows=10076667 width=8)
(actual time=1865.831..1865.831 rows=1 loops=1)
 Filter: (a = 0)
 Rows Removed by Filter: 1000
 Planning Time: 1.507 ms
 Execution Time: 1866.326 ms
(6 rows)

What seems to be going on is that the index path is considered on the
base relation, but it's rejected by add_path() due to the costs being
higher than the seq scan costs.

I see even after dropping random_page_cost right down to 0.0 that we
do start to keep the Index path as a base relation path, but then the
LimitPath with the Seqscan subpath wins out over the LimitPath with
the index scan due to the Index scan having a higher startup cost.

It feels like something is a bit lacking in our cost model here. I'm
just not sure what that is.

David




Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread Tom Lane
David Rowley  writes:
> It feels like something is a bit lacking in our cost model here. I'm
> just not sure what that is.

The example you show is the same old problem that we've understood for
decades: for cost-estimation purposes, we assume that matching rows
are more or less evenly distributed in the table.  Their actual
location doesn't matter that much if you're scanning the whole table;
but if you're hoping that a LIMIT will be able to stop after scanning
just a few rows, it does matter.

While it'd be pretty easy to insert some ad-hoc penalty into the
LIMIT estimation to reduce the chance of being fooled this way,
that would also discourage us from using fast-start plans when
they *do* help.  So I don't see any easy fix.

regards, tom lane




Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread Bruce Momjian
On Tue, Oct 11, 2022 at 09:59:43AM -0400, Tom Lane wrote:
> David Rowley  writes:
> > It feels like something is a bit lacking in our cost model here. I'm
> > just not sure what that is.
> 
> The example you show is the same old problem that we've understood for
> decades: for cost-estimation purposes, we assume that matching rows
> are more or less evenly distributed in the table.  Their actual
> location doesn't matter that much if you're scanning the whole table;
> but if you're hoping that a LIMIT will be able to stop after scanning
> just a few rows, it does matter.

We do have a correlation statistics value for each column but I am
unclear if that would help here.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Weird planner issue on a standby

2022-10-11 Thread Guillaume Lelarge
Hello,

My customer has a really weird planner issue on a standby.

First, the context. There's a primary, and two standbys, all on a 11.8
release. (I know this isn't the latest release for the v11 branch.) The
cluster is 5.5TB. The PostgreSQL configuration is the same on all servers.
pg_db_role_setting is almost empty, and there's nothing specific to
planning and replication.

Here is the issue. Queries on both standbys take a lot more time than on
the primary. We eventually found that the queries take a lot of time to be
planned, not to be executed. For example:

On the primary:
 Planning Time: 1806.574 ms
 Execution Time: 771.888 ms
On any of the standbys:
 Planning Time: 41566.439 ms
 Execution Time: 1125.352 ms

A simple EXPLAIN already shows this difference in duration.

The query in itself isn't really that big. 8 tables (without partitions), a
few predicates. Nothing fancy. Nothing huge to plan. Here it is, a bit
anonymized:

select ...
from t1
left outer join t2 on ...
left outer join t3 on ...
left outer join t4 on ...
left outer join t5 on ...
left outer join t6 on ...
left outer join t7 on ...
left outer join t8 on ...
where c1='...' and c2='...'
  and c3>='...' and c4<='...' ;

Join conditions are really simple. There's no function called anywhere.

Plans on the three servers are exactly the same. Same nodes, same
statistics, same calculations' results.

Moreover, we've looked into what the planner was doing with strace, and
we've been surprised that it involved massive reading on tables (more than
130,000 calls to read() on 15 files). We found that most of these files are
the files for one of the tables on this query. Also, this massive reading
happens only on standbys, not on the primary.

Our customer took the time to drop and create the standbys yesterday night,
but the new standbys still have the issue.

And finally, we discovered that promoting a standby brings back the old
performance. We have the same performance between the primary and the
promoted standby.

To be honest, I'm lost, I don't know what to do next. But I have some
questions :)

* The planner seems to read tables to generate the plans and/or select the
right plan. Why does it do that? I thought it only reads indexes if needed,
but it is quite clear it reads tables also.
* How can the replication have an impact on the planner ? (I'm not sure I'm
asking the right question... I mean, why do I see a difference in behaviour
between a standby and an old standby, new autonomous server?)
* Do you have any idea on how to fix this?

Thank you.

Regards.


-- 
Guillaume.


Re: Weird planner issue on a standby

2022-10-11 Thread Tom Lane
Guillaume Lelarge  writes:
> * The planner seems to read tables to generate the plans and/or select the
> right plan. Why does it do that? I thought it only reads indexes if needed,
> but it is quite clear it reads tables also.

I'm guessing to some extent, but it seems plausible that this is an
artifact of the logic that tries to get the current min and/or max column
values to be sure we are estimating mergejoin costs accurately.  Normally,
yeah, that is only supposed to consult the extremal values in indexes
and therefore not take very long.  However, we've had to go back to the
drawing board several times to fix cases where it made a lot of expensive
table accesses because the extremal index entries were of uncertain
liveness.  That behavior could very easily depend on whether you're in
a primary or standby server.

Are there any tables in this query where extremal values of the join
key are likely to be in recently-added or recently-dead rows?  Does
VACUUM'ing on the primary help?

And, not to put too fine a point on it, but does updating to a recent
minor release help?  In a quick scan of the git history I don't see any
related bugfixes later than 11.5, but I might've missed something.

regards, tom lane




Re: Weird planner issue on a standby

2022-10-11 Thread Alvaro Herrera
On 2022-Oct-11, Tom Lane wrote:

> Are there any tables in this query where extremal values of the join
> key are likely to be in recently-added or recently-dead rows?  Does
> VACUUM'ing on the primary help?

I remember having an hypothesis, upon getting a report of this exact
problem on a customer system once, that it could be due to killtuple not
propagating to standbys except by FPIs.  I do not remember if we proved
that true or not.  I do not remember observing that tables were being
read, however.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Weird planner issue on a standby

2022-10-11 Thread Guillaume Lelarge
Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera  a
écrit :

> On 2022-Oct-11, Tom Lane wrote:
>
> > Are there any tables in this query where extremal values of the join
> > key are likely to be in recently-added or recently-dead rows?  Does
> > VACUUM'ing on the primary help?
>
> I remember having an hypothesis, upon getting a report of this exact
> problem on a customer system once, that it could be due to killtuple not
> propagating to standbys except by FPIs.  I do not remember if we proved
> that true or not.  I do not remember observing that tables were being
> read, however.
>
>
Thanks for your answers.

The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have no
idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
yesterday) is much probably recently-added. I can ask my customer if you
want but this looks like a pretty safe bet.

On the VACUUM question, I didn't say, but we're kind of wondering if it was
lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the
database (and especially on the 1.6TB table which is part of the query).
I'm kind of skeptical because if the VACUUM wasn't enough on the standby,
it should be the same on the primary.

Actually, there are two things that really bug me:
* why the difference between primary and both standbys?
* why now? (it worked great before this weekend, and the only thing I know
happened before is a batch delete on sunday... which may be a good-enough
reason for things to get screwed, but once again, why only both standbys?)

Julien Rouhaud also told me about killtuples, but I have no idea what they
are. I suppose this is different from dead tuples. Anyway, if you can
enlighten me, I'll be happy :)


-- 
Guillaume.


Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread Klint Gore

From: Bruce Momjian  Sent: Wednesday, 12 October 2022 1:30 AM

>On Tue, Oct 11, 2022 at 09:59:43AM -0400, Tom Lane wrote:
>> David Rowley  writes:
>> > It feels like something is a bit lacking in our cost model here. I'm
>> > just not sure what that is.
>>
>> The example you show is the same old problem that we've understood for
>> decades: for cost-estimation purposes, we assume that matching rows
>> are more or less evenly distributed in the table.  Their actual
>> location doesn't matter that much if you're scanning the whole table;
>> but if you're hoping that a LIMIT will be able to stop after scanning
>> just a few rows, it does matter.
>
> We do have a correlation statistics value for each column but I am
> unclear if that would help here.

This might give someone an idea -  the best query I come up with was

   explain analyze select distinct 2 from tbl where (fld = 230) limit 1;


Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 
loops=1)
  ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 
rows=1 loops=1)
->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 rows=995241 
width=4) (actual time=0.038..0.038 rows=1 loops=1)
  Index Cond: (fld = 230)
  Heap Fetches: 0
Planning Time: 0.066 ms
Execution Time: 0.047 ms


With the distinct and the limit, the planner somehow knows to push the either 
the distinct or the limit into the index only scan so the unique for distinct 
only had 1 row and the outer limit only had 1 row.  Without the limit, the 
distinct still does the index only scan but has to do the unique on the million 
rows and execution time goes to about 100ms.


fld is mostly ordered - it's a serial primary key in another table.  The 
cardinality of the 131 distinct values is an exponential distribution. Of the 
20m rows,   the fld values ordered by count is 8m, 5m 2m, 1m, 1m,  down to 
about 10k.  index is btree with stats target of 1000.  table is analyzed and 
vacuum frozen.  there is a "create statistics" on this table for n:1 
relationship between another field and this one.


Without the distinct, choosing a different value with lower number of rows 
changed the plan to index only scan with limit somewhere between 3.7% and 4.7% 
of the table.  With a brin index on a similar size/distributed table that is in 
fld order, that changed to somewhere between 0.6% and 0.7%.





Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-11 Thread David Rowley
On Wed, 12 Oct 2022 at 13:06, Klint Gore  wrote:
> Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 
> loops=1)
>   ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 
> rows=1 loops=1)
> ->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 
> rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)
>   Index Cond: (fld = 230)
>   Heap Fetches: 0
> Planning Time: 0.066 ms
> Execution Time: 0.047 ms
>
> With the distinct and the limit, the planner somehow knows to push the either 
> the distinct or the limit into the index only scan so the unique for distinct 
> only had 1 row and the outer limit only had 1 row.  Without the limit, the 
> distinct still does the index only scan but has to do the unique on the 
> million rows and execution time goes to about 100ms.

I think that would be very simple to fix. I believe I've done that
locally but just detecting if needed_pathkeys == NULL in
create_final_distinct_paths().

i.e.

-   if (pathkeys_contained_in(needed_pathkeys,
path->pathkeys))
+   if (needed_pathkeys == NIL)
+   {
+   Node *limitCount = makeConst(INT8OID,
-1, InvalidOid,
+
  sizeof(int64),
+
  Int64GetDatum(1), false,
+
  FLOAT8PASSBYVAL);
+   add_path(distinct_rel, (Path *)
+
create_limit_path(root, distinct_rel, path, NULL,
+
limitCount, LIMIT_OPTION_COUNT, 0,
+
1));
+   }
+   else if
(pathkeys_contained_in(needed_pathkeys, path->pathkeys))

That just adds a Limit Path instead of the Unique Path. i.e:

postgres=# explain (analyze, costs off) select distinct a from t1 where a = 0;
  QUERY PLAN
--
 Limit (actual time=0.074..0.075 rows=1 loops=1)
   ->  Index Only Scan using t1_a_idx on t1 (actual time=0.072..0.073
rows=1 loops=1)
 Index Cond: (a = 0)
 Heap Fetches: 1
 Planning Time: 0.146 ms
 Execution Time: 0.100 ms
(6 rows)

However, I might be wrong about that. I've not given it too much thought.

David




Re: Weird planner issue on a standby

2022-10-11 Thread Julien Rouhaud
On Tue, Oct 11, 2022 at 07:42:55PM +0200, Guillaume Lelarge wrote:
> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera  a
> écrit :
> 
> > On 2022-Oct-11, Tom Lane wrote:
> >
> > > Are there any tables in this query where extremal values of the join
> > > key are likely to be in recently-added or recently-dead rows?  Does
> > > VACUUM'ing on the primary help?
> >
> > I remember having an hypothesis, upon getting a report of this exact
> > problem on a customer system once, that it could be due to killtuple not
> > propagating to standbys except by FPIs.  I do not remember if we proved
> > that true or not.  I do not remember observing that tables were being
> > read, however.
> >
> >
> Thanks for your answers.
> 
> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have no
> idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
> yesterday) is much probably recently-added. I can ask my customer if you
> want but this looks like a pretty safe bet.
> 
> On the VACUUM question, I didn't say, but we're kind of wondering if it was
> lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the
> database (and especially on the 1.6TB table which is part of the query).
> I'm kind of skeptical because if the VACUUM wasn't enough on the standby,
> it should be the same on the primary.
> 
> Actually, there are two things that really bug me:
> * why the difference between primary and both standbys?
> * why now? (it worked great before this weekend, and the only thing I know
> happened before is a batch delete on sunday... which may be a good-enough
> reason for things to get screwed, but once again, why only both standbys?)
> 
> Julien Rouhaud also told me about killtuples, but I have no idea what they
> are. I suppose this is different from dead tuples. Anyway, if you can
> enlighten me, I'll be happy :)

That's an optimisation where an index scan can mark an index entry as dead
(LP_DEAD) if if tries to fetch some data from the heap that turns out to be all
dead, so further scans won't have to check again (you can grep kill_prior_tuple
in the source for more details).  As that's a hint bit, it may not be
replicated unless you enable wal_log_hints or data_checksums (or write it as a
FPI indeed), which could explain discrepancy between primary (after a first
slow index scan) and standby nodes.

But since your customer recreated their standbys from scratch *after* that
delete, all the nodes should have those hint bits set (Guillaume confirmed
off-list that they used a fresh BASE_BACKUP).  Note that Guillaume also
confirmed off-list that the customer has checksums enabled, which means that
MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty, so I'm
out of ideas to explain the different behavior on standbys.




Re: Weird planner issue on a standby

2022-10-11 Thread Ron

On 10/11/22 22:35, Julien Rouhaud wrote:

On Tue, Oct 11, 2022 at 07:42:55PM +0200, Guillaume Lelarge wrote:

Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera  a
écrit :


On 2022-Oct-11, Tom Lane wrote:


Are there any tables in this query where extremal values of the join
key are likely to be in recently-added or recently-dead rows?  Does
VACUUM'ing on the primary help?

I remember having an hypothesis, upon getting a report of this exact
problem on a customer system once, that it could be due to killtuple not
propagating to standbys except by FPIs.  I do not remember if we proved
that true or not.  I do not remember observing that tables were being
read, however.



Thanks for your answers.

The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have no
idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
yesterday) is much probably recently-added. I can ask my customer if you
want but this looks like a pretty safe bet.

On the VACUUM question, I didn't say, but we're kind of wondering if it was
lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the
database (and especially on the 1.6TB table which is part of the query).
I'm kind of skeptical because if the VACUUM wasn't enough on the standby,
it should be the same on the primary.

Actually, there are two things that really bug me:
* why the difference between primary and both standbys?
* why now? (it worked great before this weekend, and the only thing I know
happened before is a batch delete on sunday... which may be a good-enough
reason for things to get screwed, but once again, why only both standbys?)

Julien Rouhaud also told me about killtuples, but I have no idea what they
are. I suppose this is different from dead tuples. Anyway, if you can
enlighten me, I'll be happy :)

That's an optimisation where an index scan can mark an index entry as dead
(LP_DEAD) if if tries to fetch some data from the heap that turns out to be all
dead, so further scans won't have to check again (you can grep kill_prior_tuple
in the source for more details).  As that's a hint bit, it may not be
replicated unless you enable wal_log_hints or data_checksums (or write it as a
FPI indeed), which could explain discrepancy between primary (after a first
slow index scan) and standby nodes.

But since your customer recreated their standbys from scratch *after* that
delete, all the nodes should have those hint bits set (Guillaume confirmed
off-list that they used a fresh BASE_BACKUP).  Note that Guillaume also
confirmed off-list that the customer has checksums enabled, which means that
MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty, so I'm
out of ideas to explain the different behavior on standbys.


Would EXPLAIN (VERBOSE, COSTS, FORMAT JSON) run on both nodes help show any 
differences?


--
Angular momentum makes the world go 'round.




Re: Weird planner issue on a standby

2022-10-11 Thread Tom Lane
Julien Rouhaud  writes:
> But since your customer recreated their standbys from scratch *after* that
> delete, all the nodes should have those hint bits set (Guillaume confirmed
> off-list that they used a fresh BASE_BACKUP).  Note that Guillaume also
> confirmed off-list that the customer has checksums enabled, which means that
> MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty, so 
> I'm
> out of ideas to explain the different behavior on standbys.

Do we propagate visibility-map bits to standbys?

regards, tom lane




Re: Weird planner issue on a standby

2022-10-11 Thread Peter Geoghegan
On Tue, Oct 11, 2022 at 9:27 AM Alvaro Herrera  wrote:
> I remember having an hypothesis, upon getting a report of this exact
> problem on a customer system once, that it could be due to killtuple not
> propagating to standbys except by FPIs.  I do not remember if we proved
> that true or not.  I do not remember observing that tables were being
> read, however.

That's true, but it doesn't matter whether or not there are LP_DEAD
bits set on the standby, since in any case they cannot be trusted when
in Hot Standby mode. IndexScanDescData.ignore_killed_tuples will be
set to false on the standby.

-- 
Peter Geoghegan




Re: Weird planner issue on a standby

2022-10-11 Thread Peter Geoghegan
On Tue, Oct 11, 2022 at 10:04 PM Tom Lane  wrote:
> Do we propagate visibility-map bits to standbys?

Yes.

-- 
Peter Geoghegan




Re: Weird planner issue on a standby

2022-10-11 Thread Guillaume Lelarge
Le mer. 12 oct. 2022 à 06:08, Ron  a écrit :

> On 10/11/22 22:35, Julien Rouhaud wrote:
> > On Tue, Oct 11, 2022 at 07:42:55PM +0200, Guillaume Lelarge wrote:
> >> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera 
> a
> >> écrit :
> >>
> >>> On 2022-Oct-11, Tom Lane wrote:
> >>>
>  Are there any tables in this query where extremal values of the join
>  key are likely to be in recently-added or recently-dead rows?  Does
>  VACUUM'ing on the primary help?
> >>> I remember having an hypothesis, upon getting a report of this exact
> >>> problem on a customer system once, that it could be due to killtuple
> not
> >>> propagating to standbys except by FPIs.  I do not remember if we proved
> >>> that true or not.  I do not remember observing that tables were being
> >>> read, however.
> >>>
> >>>
> >> Thanks for your answers.
> >>
> >> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I
> have no
> >> idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
> >> yesterday) is much probably recently-added. I can ask my customer if you
> >> want but this looks like a pretty safe bet.
> >>
> >> On the VACUUM question, I didn't say, but we're kind of wondering if it
> was
> >> lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the
> >> database (and especially on the 1.6TB table which is part of the query).
> >> I'm kind of skeptical because if the VACUUM wasn't enough on the
> standby,
> >> it should be the same on the primary.
> >>
> >> Actually, there are two things that really bug me:
> >> * why the difference between primary and both standbys?
> >> * why now? (it worked great before this weekend, and the only thing I
> know
> >> happened before is a batch delete on sunday... which may be a
> good-enough
> >> reason for things to get screwed, but once again, why only both
> standbys?)
> >>
> >> Julien Rouhaud also told me about killtuples, but I have no idea what
> they
> >> are. I suppose this is different from dead tuples. Anyway, if you can
> >> enlighten me, I'll be happy :)
> > That's an optimisation where an index scan can mark an index entry as
> dead
> > (LP_DEAD) if if tries to fetch some data from the heap that turns out to
> be all
> > dead, so further scans won't have to check again (you can grep
> kill_prior_tuple
> > in the source for more details).  As that's a hint bit, it may not be
> > replicated unless you enable wal_log_hints or data_checksums (or write
> it as a
> > FPI indeed), which could explain discrepancy between primary (after a
> first
> > slow index scan) and standby nodes.
> >
> > But since your customer recreated their standbys from scratch *after*
> that
> > delete, all the nodes should have those hint bits set (Guillaume
> confirmed
> > off-list that they used a fresh BASE_BACKUP).  Note that Guillaume also
> > confirmed off-list that the customer has checksums enabled, which means
> that
> > MarkBufferDirtyHint() should be guaranteed to mark the buffers as dirty,
> so I'm
> > out of ideas to explain the different behavior on standbys.
>
> Would EXPLAIN (VERBOSE, COSTS, FORMAT JSON) run on both nodes help show
> any
> differences?
>
>
No differences.


-- 
Guillaume.


Re: Weird planner issue on a standby

2022-10-11 Thread Guillaume Lelarge
Le mar. 11 oct. 2022 à 19:42, Guillaume Lelarge  a
écrit :

> Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera  a
> écrit :
>
>> On 2022-Oct-11, Tom Lane wrote:
>>
>> > Are there any tables in this query where extremal values of the join
>> > key are likely to be in recently-added or recently-dead rows?  Does
>> > VACUUM'ing on the primary help?
>>
>> I remember having an hypothesis, upon getting a report of this exact
>> problem on a customer system once, that it could be due to killtuple not
>> propagating to standbys except by FPIs.  I do not remember if we proved
>> that true or not.  I do not remember observing that tables were being
>> read, however.
>>
>>
> Thanks for your answers.
>
> The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have
> no idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is
> yesterday) is much probably recently-added. I can ask my customer if you
> want but this looks like a pretty safe bet.
>
> On the VACUUM question, I didn't say, but we're kind of wondering if it
> was lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on
> the database (and especially on the 1.6TB table which is part of the
> query). I'm kind of skeptical because if the VACUUM wasn't enough on the
> standby, it should be the same on the primary.
>
>
It appears that I was wrong. I just got an email from my customer saying
they got their performance back after a VACUUM on the two main tables of
the query. I'll have them on the phone in about an hour. I'll probably know
more then. Still wondering why it was an issue on the standby and not on
the primary. VACUUM cleans up tables and indexes, and this activity goes
through WAL, doesn't it?


-- 
Guillaume.