Re: different execution time for the same query (and same DB status)
Hello, many thanks to all the persons who replied. I am back with the information requested in https://wiki.postgresql.org/wiki/Slow_Query_Questions. Here you can find the results of the EXPLAIN commands: 1) First execution: https://explain.depesz.com/s/X2as 2) Second execution (right after the first one): https://explain.depesz.com/s/gHrb Table A and B are both loaded with 750 records, whereas Table C contains 1600 records. Both queries are executed with work_mem=800MB (the other parameters are reported in Section 7). With such a value, I noticed also the following phenomenon: in addition to variable execution times (as previusly stated, the range is between 20 seconds and 4 minutes), sometimes the query crashes, returning the following error: SQL Error [57P03]: FATAL: the database system is in recovery mode. When this happens, in the log file I find many messages like these ones: ... UTC [1] LOG: terminating any other active server processes UTC [115] WARNING: terminating connection because of crash of another server process UTC [115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. ... FATAL: the database system is in recovery mode LOG: redo starts at 0/E8BFC8D0 LOG: invalid record length at 0/E8BFC9B8: wanted 24, got 0 LOG: redo done at 0/E8BFC980 If I re-run the query with with work_mem=400MB, the execution never crashes (at least in all the tests I carried out) and response times are pretty stable (always 4 minutes, plus/minus a small delta). So I started wondering whether variable response times and crashes are somehow correlated and due to the fact that the work_mem value is too high. Anyway, in the following sections I tried to report all the information described in the wiki. 1. Version PostgreSQL 13.2 (Debian 13.2-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit 2. Tables CREATE TABLE A ( a1 varchar(100), a2 varchar(100), v int4 primary key ); CREATE TABLE B ( a1 varchar(100), a2 varchar(100), v int4 primary key ); CREATE TABLE C ( la1 varchar(100), la2 varchar(100), va1 varchar(100), va2 varchar(100), res varchar (100), c text ); Schema | Name | Type | Owner +--+---+--- public | a| table | admin public | b| table | admin public | c| table | admin (3 rows) List of relations Schema | Name | Type | Owner | Persistence | Size | Description +--+---+---+-++- public | a| table | admin | permanent | 317 MB | (1 row) List of relations Schema | Name | Type | Owner | Persistence | Size | Description +--+---+---+-++- public | b| table | admin | permanent | 317 MB | (1 row) List of relations Schema | Name | Type | Owner | Persistence | Size | Description +--+---+---+-++- public | c| table | admin | permanent | 152 kB | (1 row) 3. Indices tablename | indexname |indexdef ---+-+ a | a_pkey | CREATE UNIQUE INDEX a_pkey ON public.a USING btree (v) a | hash_pka| CREATE INDEX hash_pka ON public.a USING hash (v) b | b_pkey | CREATE UNIQUE INDEX b_pkey ON public.b USING btree (v) b | hash_pkb| CREATE INDEX hash_pkb ON public.b USING hash (v) c | hash_class0 | CREATE INDEX hash_class0 ON public.c USING hash (la1) c | hash_class1 | CREATE INDEX hash_class1 ON public.c USING hash (la2) c | hash_class2 | CREATE INDEX hash_class2 ON public.c USING hash (va1) c | hash_class3 | CREATE INDEX hash_class3 ON public.c USING hash (va2) c | hash_pkc| CREATE INDEX hash_pkc ON public.c USING hash (c) (9 rows) 3. Table metadata relname|relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size| ---||-|-|---||--|--|-| b | 40541|7499970.0|40541|r | 3|false |NULL |332226560| a | 40541|7499970.0|40541|r | 3|false |NULL |332226560| c | 14| 1600.0| 14|r | 6|false |NULL | 155648| 4. Hardware and its performances MacBook Pro (15-inch, 2018), 2.2 GHz Intel Core i7, 16 GB 2400 MHz DDR4 Results from bonnie++: root@c1a8bd8320a0:/# bonnie++ -f -n0 -x4 -d /var/lib/postgresql/ -u root Using uid:0, gid:0. format_ve
Re: Fwd: different execution time for the same query (and same DB status)
On Sat, Mar 06, 2021 at 10:40:00PM +0100, Francesco De Angelis wrote:
> The problem is the following: the query can take between 20 seconds and 4
> minutes to complete. Most of times, when I run the query for the first time
> after the server initialisation, it takes 20 seconds; but if I re-run it
> again (without changing anything) right after the first execution, the
> probability to take more than 4 minutes is very high.
On Tue, Mar 09, 2021 at 11:58:05PM +0100, Francesco De Angelis wrote:
> With such a value, I noticed also the following phenomenon: in addition to
> variable execution times (as previusly stated, the range is between 20
> seconds and 4 minutes),
You said it takes between 20s and 4min (240s), but both the explain analyze
show ~1300s.
explain analyze can be slower than the query, due to timing overhead.
Is that what's happening here? You could try explain(analyze,timing
off,buffers).
You should send a result for the "20sec" result, and one for the "4min" result,
to compare.
I assume the crash is a result of OOM - you could find the result in dmesg
output ("Out of memory: Killed process") or the postgres logfile will say
"terminated by signal 9: Killed". It's important to avoid setting work_mem so
high that the process is killed and has to go into recovery mode.
--
Justin
Re: Fwd: different execution time for the same query (and same DB status)
I would increase shared_buffers to 1GB or more. Also, it would be very interesting to see these queries executed with JIT off.
Re: Fwd: different execution time for the same query (and same DB status)
Hello,
yes exactly in the previous analyses, as mentioned in the wiki, I ran
EXPLAIN (ANALYZE, BUFFERS)* query*, which took much longer to complete
(around 30 minutes) as showed in https://explain.depesz.com/s/gHrb and
https://explain.depesz.com/s/X2as .
As you said, I did the new tests with EXPLAIN (ANALYZE, timing off BUFFERS)*
query,* and these are the results:
- First execution: https://explain.depesz.com/s/ynAv
- Second execution: https://explain.depesz.com/s/z1eb
Now they are pretty aligned with the execution time of *query* (a few
seconds more to complete) and the difference between the first and second
execution is visible.
Also, from what I can see, the plans are different...
Il giorno mer 10 mar 2021 alle ore 09:27 Justin Pryzby
ha scritto:
> On Sat, Mar 06, 2021 at 10:40:00PM +0100, Francesco De Angelis wrote:
> > The problem is the following: the query can take between 20 seconds and 4
> > minutes to complete. Most of times, when I run the query for the first
> time
> > after the server initialisation, it takes 20 seconds; but if I re-run it
> > again (without changing anything) right after the first execution, the
> > probability to take more than 4 minutes is very high.
>
> On Tue, Mar 09, 2021 at 11:58:05PM +0100, Francesco De Angelis wrote:
> > With such a value, I noticed also the following phenomenon: in addition
> to
> > variable execution times (as previusly stated, the range is between 20
> > seconds and 4 minutes),
>
> You said it takes between 20s and 4min (240s), but both the explain analyze
> show ~1300s.
>
> explain analyze can be slower than the query, due to timing overhead.
> Is that what's happening here? You could try explain(analyze,timing
> off,buffers).
> You should send a result for the "20sec" result, and one for the "4min"
> result,
> to compare.
>
> I assume the crash is a result of OOM - you could find the result in dmesg
> output ("Out of memory: Killed process") or the postgres logfile will say
> "terminated by signal 9: Killed". It's important to avoid setting
> work_mem so
> high that the process is killed and has to go into recovery mode.
>
> --
> Justin
>
