Re: different execution time for the same query (and same DB status)

2021-03-10 Thread Francesco De Angelis
Hello,
many thanks to all the persons who replied.
I am back with the information requested in
https://wiki.postgresql.org/wiki/Slow_Query_Questions.
Here you can find the results of the EXPLAIN commands:
1) First execution: https://explain.depesz.com/s/X2as
2) Second execution (right after the first one):
https://explain.depesz.com/s/gHrb

Table A and B are both loaded with 750 records, whereas Table C
contains 1600 records.
Both queries are executed with work_mem=800MB (the other parameters are
reported in Section 7).
With such a value, I noticed also the following phenomenon: in addition to
variable execution times (as previusly stated, the range is between 20
seconds and 4 minutes),
sometimes the query crashes, returning the following error: SQL Error
[57P03]: FATAL: the database system is in recovery mode.
When this happens, in the log file I find many messages like these ones:
...
UTC [1] LOG:  terminating any other active server processes
UTC [115] WARNING:  terminating connection because of crash of another
server process
UTC [115] DETAIL:  The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
...
FATAL:  the database system is in recovery mode
LOG:  redo starts at 0/E8BFC8D0
LOG:  invalid record length at 0/E8BFC9B8: wanted 24, got 0
LOG:  redo done at 0/E8BFC980

If I re-run the query with with work_mem=400MB, the execution never crashes
(at least in all the tests I carried out) and response times are pretty
stable (always 4 minutes, plus/minus a small delta).
So I started wondering whether variable response times and crashes are
somehow correlated and due to the fact that the work_mem value is too high.
Anyway, in the following sections I tried to report all the information
described in the wiki.


1. Version


PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 8.3.0-6) 8.3.0, 64-bit


2. Tables


CREATE TABLE A (
a1 varchar(100),
a2 varchar(100),
v int4 primary key
);


CREATE TABLE B (
a1 varchar(100),
a2 varchar(100),
v int4 primary key
);

CREATE TABLE C (
la1 varchar(100),
la2 varchar(100),
va1 varchar(100),
va2 varchar(100),
res varchar (100),
c text
);

 Schema | Name | Type  | Owner
+--+---+---
 public | a| table | admin
 public | b| table | admin
 public | c| table | admin
(3 rows)

 List of relations
 Schema | Name | Type  | Owner | Persistence |  Size  | Description
+--+---+---+-++-
 public | a| table | admin | permanent   | 317 MB |
(1 row)

 List of relations
 Schema | Name | Type  | Owner | Persistence |  Size  | Description
+--+---+---+-++-
 public | b| table | admin | permanent   | 317 MB |
(1 row)

 List of relations
 Schema | Name | Type  | Owner | Persistence |  Size  | Description
+--+---+---+-++-
 public | c| table | admin | permanent   | 152 kB |
(1 row)


3. Indices


tablename |  indexname  |indexdef
---+-+
 a | a_pkey  | CREATE UNIQUE INDEX a_pkey ON public.a USING
btree (v)
 a | hash_pka| CREATE INDEX hash_pka ON public.a USING hash (v)
 b | b_pkey  | CREATE UNIQUE INDEX b_pkey ON public.b USING
btree (v)
 b | hash_pkb| CREATE INDEX hash_pkb ON public.b USING hash (v)
 c | hash_class0 | CREATE INDEX hash_class0 ON public.c USING hash
(la1)
 c | hash_class1 | CREATE INDEX hash_class1 ON public.c USING hash
(la2)
 c | hash_class2 | CREATE INDEX hash_class2 ON public.c USING hash
(va1)
 c | hash_class3 | CREATE INDEX hash_class3 ON public.c USING hash
(va2)
 c | hash_pkc| CREATE INDEX hash_pkc ON public.c USING hash (c)
(9 rows)


3. Table metadata

relname|relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
---||-|-|---||--|--|-|
b  |   40541|7499970.0|40541|r  |   3|false
|NULL  |332226560|
a  |   40541|7499970.0|40541|r  |   3|false
|NULL  |332226560|
c  |  14|   1600.0|   14|r  |   6|false
|NULL  |   155648|


4. Hardware and its performances

MacBook Pro (15-inch, 2018),
2.2 GHz Intel Core i7,
16 GB 2400 MHz DDR4

Results from bonnie++:
root@c1a8bd8320a0:/# bonnie++ -f -n0 -x4 -d /var/lib/postgresql/ -u root
Using uid:0, gid:0.
format_ve

Re: Fwd: different execution time for the same query (and same DB status)

2021-03-10 Thread Justin Pryzby
On Sat, Mar 06, 2021 at 10:40:00PM +0100, Francesco De Angelis wrote:
> The problem is the following: the query can take between 20 seconds and 4
> minutes to complete. Most of times, when I run the query for the first time
> after the server initialisation, it takes 20 seconds; but if I re-run it
> again (without changing anything) right after the first execution, the
> probability to take more than 4 minutes is very high.

On Tue, Mar 09, 2021 at 11:58:05PM +0100, Francesco De Angelis wrote:
> With such a value, I noticed also the following phenomenon: in addition to
> variable execution times (as previusly stated, the range is between 20
> seconds and 4 minutes),

You said it takes between 20s and 4min (240s), but both the explain analyze
show ~1300s.

explain analyze can be slower than the query, due to timing overhead.
Is that what's happening here?  You could try explain(analyze,timing 
off,buffers).
You should send a result for the "20sec" result, and one for the "4min" result,
to compare.

I assume the crash is a result of OOM - you could find the result in dmesg
output ("Out of memory: Killed process") or the postgres logfile will say
"terminated by signal 9: Killed".  It's important to avoid setting work_mem so
high that the process is killed and has to go into recovery mode.

-- 
Justin




Re: Fwd: different execution time for the same query (and same DB status)

2021-03-10 Thread Michael Lewis
I would increase shared_buffers to 1GB or more. Also, it would be very
interesting to see these queries executed with JIT off.


Re: Fwd: different execution time for the same query (and same DB status)

2021-03-10 Thread Francesco De Angelis
Hello,
yes exactly in the previous analyses, as mentioned in the wiki, I ran
EXPLAIN (ANALYZE, BUFFERS)* query*, which took much longer to complete
(around 30 minutes) as showed in https://explain.depesz.com/s/gHrb and
https://explain.depesz.com/s/X2as .
As you said, I did the new tests with EXPLAIN (ANALYZE, timing off BUFFERS)*
query,* and these are the results:
- First execution: https://explain.depesz.com/s/ynAv
- Second execution: https://explain.depesz.com/s/z1eb
Now they are pretty aligned with the execution time of *query* (a few
seconds more to complete) and the difference between the first and second
execution is visible.
Also, from what I can see, the plans are different...

Il giorno mer 10 mar 2021 alle ore 09:27 Justin Pryzby 
ha scritto:

> On Sat, Mar 06, 2021 at 10:40:00PM +0100, Francesco De Angelis wrote:
> > The problem is the following: the query can take between 20 seconds and 4
> > minutes to complete. Most of times, when I run the query for the first
> time
> > after the server initialisation, it takes 20 seconds; but if I re-run it
> > again (without changing anything) right after the first execution, the
> > probability to take more than 4 minutes is very high.
>
> On Tue, Mar 09, 2021 at 11:58:05PM +0100, Francesco De Angelis wrote:
> > With such a value, I noticed also the following phenomenon: in addition
> to
> > variable execution times (as previusly stated, the range is between 20
> > seconds and 4 minutes),
>
> You said it takes between 20s and 4min (240s), but both the explain analyze
> show ~1300s.
>
> explain analyze can be slower than the query, due to timing overhead.
> Is that what's happening here?  You could try explain(analyze,timing
> off,buffers).
> You should send a result for the "20sec" result, and one for the "4min"
> result,
> to compare.
>
> I assume the crash is a result of OOM - you could find the result in dmesg
> output ("Out of memory: Killed process") or the postgres logfile will say
> "terminated by signal 9: Killed".  It's important to avoid setting
> work_mem so
> high that the process is killed and has to go into recovery mode.
>
> --
> Justin
>