Fwd: different execution time for the same query (and same DB status)

2021-03-07 Thread Francesco De Angelis
Hi all,
I would appreciate if somebody could help me understanding why the query
described below takes very different execution times to complete, almost
completely randomly.

I have two very "simple" tables, A and B:

CREATE TABLE A (
a1 varchar(10) NULL,
a2 varchar(10) NULL,
v int4 NULL
);
CREATE TABLE B (
a1 varchar(10) NULL,
a2 varchar(10) NULL,
v int4 NULL
);

I load A and B with some random numbers (10^7 records each one) and, when
the loading is complete, I run a query that "essentially" counts the number
of the records obtained through a sequence of JOINs, which has the
following form:
SELECT count(*) from (... A join B )

The query does not write/update any value, it is simply a SELECT applied
over the aforementioned tables (plus some unions and intersections).

The problem is the following: the query can take between 20 seconds and 4
minutes to complete. Most of times, when I run the query for the first time
after the server initialisation, it takes 20 seconds; but if I re-run it
again (without changing anything) right after the first execution, the
probability to take more than 4 minutes is very high.
My impression is that the "status" of the internal structures of the DBMS
is somehow affected by the first execution, but I cannot figure out neither
what nor how I can fix it.

To give some additional information, I can state the following facts:
- There are no other processes reading or writing the tables on the schema
and the status of A and B is constant.
- During the loading of A and B (process that takes more or less one minute
to complete),  I get the following messages:
LOG:  checkpoints are occurring too frequently (29 seconds apart).
HINT:  Consider increasing the configuration parameter "max_wal_size".
LOG:  checkpoints are occurring too frequently (18 seconds apart)
HINT:  Consider increasing the configuration parameter "max_wal_size".
- I am running my query through DBeaver and PostgreSQL runs in a Docker
container.

The version that I am using is the following: 13.2 (Debian
13.2-1.pgdg100+1).
The only configuration parameters I have changed are the following:
- force_parallel_mode = on
- max_parallel_workers_per_gather = 64
- parallel_setup_cost = 1
- parallel_tuple_cost = 0.001
- work_mem = '800MB'

I hope somebody could help me, as I really don't know why I am experiencing
such a strange behaviour.

Thanks a lot.

Best regards,
Francesco


RE: different execution time for the same query (and same DB status)

2021-03-07 Thread Michel SALAIS
Hi,

 

Have you tried to use EXPLAIN ANALYZE at least?

It could give valuable information about why this is occurring.

 

Michel SALAIS

De : Francesco De Angelis  
Envoyé : samedi 6 mars 2021 22:40
À : [email protected]
Objet : Fwd: different execution time for the same query (and same DB status)

 

 

Hi all,

I would appreciate if somebody could help me understanding why the query 
described below takes very different execution times to complete, almost 
completely randomly.

 

I have two very "simple" tables, A and B:

 

CREATE TABLE A (
a1 varchar(10) NULL,
a2 varchar(10) NULL,
v int4 NULL
);
CREATE TABLE B (
a1 varchar(10) NULL,
a2 varchar(10) NULL,
v int4 NULL
);

 

I load A and B with some random numbers (10^7 records each one) and, when the 
loading is complete, I run a query that "essentially" counts the number of the 
records obtained through a sequence of JOINs, which has the following form:

SELECT count(*) from (... A join B )

 

The query does not write/update any value, it is simply a SELECT applied over 
the aforementioned tables (plus some unions and intersections).

 

The problem is the following: the query can take between 20 seconds and 4 
minutes to complete. Most of times, when I run the query for the first time 
after the server initialisation, it takes 20 seconds; but if I re-run it again 
(without changing anything) right after the first execution, the probability to 
take more than 4 minutes is very high.

My impression is that the "status" of the internal structures of the DBMS is 
somehow affected by the first execution, but I cannot figure out neither what 
nor how I can fix it.

 

To give some additional information, I can state the following facts:

- There are no other processes reading or writing the tables on the schema and 
the status of A and B is constant.

- During the loading of A and B (process that takes more or less one minute to 
complete),  I get the following messages:
LOG:  checkpoints are occurring too frequently (29 seconds apart).

HINT:  Consider increasing the configuration parameter "max_wal_size".

LOG:  checkpoints are occurring too frequently (18 seconds apart)

HINT:  Consider increasing the configuration parameter "max_wal_size".

- I am running my query through DBeaver and PostgreSQL runs in a Docker 
container.

 

The version that I am using is the following: 13.2 (Debian 13.2-1.pgdg100+1).

The only configuration parameters I have changed are the following:

- force_parallel_mode = on

- max_parallel_workers_per_gather = 64

- parallel_setup_cost = 1

- parallel_tuple_cost = 0.001

- work_mem = '800MB'

 

I hope somebody could help me, as I really don't know why I am experiencing 
such a strange behaviour. 

 

Thanks a lot.

 

Best regards,

Francesco

 

 

 

 

 



Re: different execution time for the same query (and same DB status)

2021-03-07 Thread Julien Rouhaud
On Sun, Mar 07, 2021 at 03:51:05PM +0100, Michel SALAIS wrote:
> 
> Have you tried to use EXPLAIN ANALYZE at least?
> 
> It could give valuable information about why this is occurring.

+1, and more generally please follow
https://wiki.postgresql.org/wiki/Slow_Query_Questions.




Re: different execution time for the same query (and same DB status)

2021-03-07 Thread Tom Lane
Julien Rouhaud  writes:
> +1, and more generally please follow
> https://wiki.postgresql.org/wiki/Slow_Query_Questions.

Yeah.  FWIW, the most likely explanation for the change in behavior is
that by the time of the second execution, auto-analyze has managed to
update the table's statistics, and that for some reason that's changing
the plan for the worse.  But then the next question is why and what
can be done about that.  See the wiki entry for info that would be
helpful in diagnosing this.

regards, tom lane