Runtime partition pruning

2020-03-23 Thread Radu Radutiu
Hello list,

Is runtime partition pruning available in PostgreSQL 12?

I have a table partitioned by range on column id (primary  key).
For the query (client_id is passed as a parameter from the application):

select * from test  where id between>0 and  and id<1000 and client_id=? ;

partition pruning works fine.

However

select * from test where id between client_id-10 and client_id+10  and
client_id=?;

does not (it scans all partitions in parallel) .
Is it expected?

Regards,
Radu


Re: Runtime partition pruning

2020-03-23 Thread Radu Radutiu
Thanks. Yes, the query with the same parameter seems to work as expected.
It doesn't help us though as we are trying to transparently support
partitioning using an ORM and we cannot change the parameters. Using the
column name would have been much easier.

Regards,
Radu

On Mon, Mar 23, 2020 at 5:56 PM Michael Lewis  wrote:

> select * from test where id between client_id-10 and client_id+10  and
>> client_id=?;
>>
>> does not (it scans all partitions in parallel) .
>> Is it expected?
>>
>
> Yes. But the below would work fine I expect since the planner would know a
> constant range for id. I would be very surprised if the optimizer had some
> condition rewrite rules to handle just the scenario you show.
>
>  select * from test where id between ?-10 and ?+10  and client_id=?;
>


How to debug "background worker "parallel worker" (PID 32152) was terminated by signal 11: Segmentation fault"

2020-04-27 Thread Radu Radutiu
Hi,

Can you guide me how to debug postgresql crash?
I have a development VM (kvm) running Centos 8.1(up to date with all
patches, kernel 4.18.0-147.8.1.el8_1.x86_64,
postgresql12-server-12.2-2PGDG.rhel8.x86_64 ) with a very large partitioned
table (few hundred GBs) . Creating an index on partition tables crashed the
postgresql instance. I have in the log:

2020-04-26 17:35:50.065 EEST [8385] LOG:  background worker "parallel
worker" (PID 32152) was terminated by signal 11: Segmentation fault
2020-04-26 17:35:50.065 EEST [8385] DETAIL:  Failed process was running:
CREATE INDEX IDX_instrid
ON ach.achiteminfo USING btree
(instrid);
2020-04-26 17:35:50.065 EEST [8385] LOG:  terminating any other active
server processes
2020-04-26 17:35:50.065 EEST [32134] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.065 EEST [32134] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.065 EEST [32134] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.065 EEST [32146] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.065 EEST [32146] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.065 EEST [32146] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.065 EEST [32126] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.065 EEST [32126] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.065 EEST [32126] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.066 EEST [32130] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.066 EEST [32130] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.066 EEST [32130] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.066 EEST [32129] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.066 EEST [32129] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.066 EEST [32129] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.067 EEST [32132] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.067 EEST [32132] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.067 EEST [32132] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.067 EEST [32142] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.067 EEST [32142] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.067 EEST [32142] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.068 EEST [32143] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.068 EEST [32143] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.068 EEST [32143] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26 17:35:50.068 EEST [32140] WARNING:  terminating connection
because of crash of another server process
2020-04-26 17:35:50.068 EEST [32140] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-04-26 17:35:50.068 EEST [32140] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2020-04-26

Postgresql 16.3 Out Of Memory

2024-06-03 Thread Radu Radutiu
Hello,

I have an out of memory problem after upgrading from postgresql 12 to 16.3.
I have identified one query that can reproduce the error on demand. Once
the query starts, it will eventually exhaust all RAM and swap until the OOM
killer will stop postgresql.
The setup is as follows:
- One VMWARE VM, RHEL 8.9, 64 GB RAM, 16 vCPU, 32 GB swap. Database is used
by a local java application that takes up to 22 GB (hard limit). The
application uses 2 db connections (long running, app uses a connection
pool). The database has about 10 tables, with 3 large tables involved in
the problem query:
 relation   | total_size
---+
 outputrequest | 217 GB
 inputrequest  | 149 GB
 tran  | 5041 MB

I have the following settings:
shared_buffers = '8192MB'
effective_cache_size = '24GB'
maintenance_work_mem = '2GB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
random_page_cost = '1.1'
effective_io_concurrency = '200'
work_mem = '104857kB'
min_wal_size = '1GB'
max_wal_size = '4GB'
max_worker_processes = '16'
max_parallel_workers_per_gather = '4'
max_parallel_workers = '16'
max_parallel_maintenance_workers = '4'
jit = 'off'

Beside OOM there is another (possibly related)problem: the postgresql
memory usage is significantly higher than the shared_buffers + the rest of
the memory used per connection. On my system with shared buffers 8G and
work_mem ~ 100M I have memory usage of 20GB with the system running just
simple inserts (the memory might have increased due to previous queries).

I have found out that if I set  enable_parallel_hash = 'off', I do get high
memory usage for the problem query (above 30G) but not OOM. With the
enable_parallel_hash = 'on', I will always get OOM. For simplicity I've
restricted postgresql systemd service to cap the memory at 36 GB (so that I
don't have to wait for the 32 GB swap to be filled before OOM).  This is
the memory usage I get when running the query, before systemd restarts
postgres - memory usage is 35.9 GB just below the 36 GB limit (the query is
running with explain analyze,buffers, the other 2 idle connections are from
the application):

[postgres@ips3 ~]$ systemctl status postgresql-16; top -u postgres -n 1 -c
● postgresql-16.service - PostgreSQL 16 database server
   Loaded: loaded (/etc/systemd/system/postgresql-16.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/postgresql-16.service.d
   └─override.conf
   Active: active (running) since Mon 2024-06-03 04:23:16 +08; 11h ago
 Docs: https://www.postgresql.org/docs/16/static/
  Process: 283703 ExecStartPre=/usr/pgsql-16/bin/postgresql-16-check-db-dir
${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 283709 (postgres)
Tasks: 14 (limit: 408404)
   Memory: 35.9G (limit: 36.0G)
   CGroup: /system.slice/postgresql-16.service
   ├─283709 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/
   ├─283710 postgres: logger
   ├─295499 postgres: checkpointer
   ├─295500 postgres: background writer
   ├─295506 postgres: walwriter
   ├─295507 postgres: autovacuum launcher
   ├─295508 postgres: logical replication launcher
   ├─295511 postgres: db1 db1 127.0.0.1(37720) idle
   ├─295946 postgres: postgres db1 [local] EXPLAIN
   ├─295947 postgres: parallel worker for PID 295946
   ├─295948 postgres: parallel worker for PID 295946
   ├─295949 postgres: parallel worker for PID 295946
   ├─295950 postgres: parallel worker for PID 295946
   └─296037 postgres: db1 db1 127.0.0.1(41708) idle

top - 16:14:32 up 14 days,  4:42,  1 user,  load average: 5.41, 3.58, 2.25
Tasks: 405 total,   6 running, 399 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.1 us,  5.2 sy,  0.0 ni, 71.2 id, 21.3 wa,  0.7 hi,  0.4 si,
 0.0 st
MiB Mem :  63873.4 total,   2162.0 free,  57177.8 used,   4533.6 buff/cache
MiB Swap:  32260.0 total,   3813.8 free,  28446.2 used.   4349.0 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND


 295947 postgres  20   0   21.2g   7.4g 654168 R  20.0  11.8   1:31.81
postgres: parallel worker for PID 295946


 295949 postgres  20   0   21.3g   7.5g 656316 R  20.0  12.0   1:32.57
postgres: parallel worker for PID 295946


 295950 postgres  20   0   21.2g   7.5g 654756 R  20.0  12.1   1:31.99
postgres: parallel worker for PID 295946


 295946 postgres  20   0   21.2g   7.4g 655684 R  13.3  11.8   1:31.86
postgres: postgres db1 [local] EXPLAIN


 295948 postgres  20   0   21.6g   6.0g 656024 R  13.3   9.6   1:37.91
postgres: parallel worker for PID 295946


 283709 postgres  20   0 8960628  44144  43684 S   0.0   0.1   0:42.28
/usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/


 283710 postgres  20   0  355288   4876   4576 S   0.0   0.0   0:04.42
postgres: logger

 295499 postgres  20   0 8960936 107376 106868 S   0.0   0.2   0:01.75
postgres: checkpointer


 295500 postg

Re: Postgresql 16.3 Out Of Memory

2024-06-05 Thread Radu Radutiu
It looks like I did not copy the list.

I did run VACUUM ANALYZE after the upgrade. I've even run it now
> before getting a new explain plan that is very similar (just the costs
> differ):
>  Gather Merge  (cost=12336145.92..16111570.23 rows=31531596 width=66)
>Workers Planned: 4
>->  Sort  (cost=12335145.86..12354853.11 rows=7882899 width=66)
>  Sort Key: t.msg_status DESC, t.input_sequence
>  ->  Parallel Hash Right Join  (cost=9376528.66..11242773.26
> rows=7882899 width=66)
>Hash Cond: (snd_tro.reply_input_sequence = t.input_sequence)
>->  Parallel Seq Scan on tbl_outputrequest snd_tro
>  (cost=0.00..1431919.45 rows=20057645 width=16)
>->  Parallel Hash  (cost=9254599.76..9254599.76
> rows=5245992 width=58)
>  ->  Parallel Hash Right Join
>  (cost=7458636.44..9254599.76 rows=5245992 width=58)
>Hash Cond: (rec_tro.input_sequence =
> t.input_sequence)
>->  Parallel Seq Scan on tbl_outputrequest
> rec_tro  (cost=0.00..1431919.45 rows=20057645 width=16)
>->  Parallel Hash  (cost=7380902.99..7380902.99
> rows=3491156 width=50)
>  ->  Parallel Hash Right Join
>  (cost=5592677.17..7380902.99 rows=3491156 width=50)
>Hash Cond:
> (rpl_snd_tro.reply_input_sequence = r.input_sequence)
>->  Parallel Seq Scan on
> tbl_outputrequest rpl_snd_tro  (cost=0.00..1431919.45 rows=20057645
> width=16)
>->  Parallel Hash
>  (cost=5518353.72..5518353.72 rows=3491156 width=42)
>  ->  Parallel Hash Right Join
>  (cost=3729209.40..5518353.72 rows=3491156 width=42)
>Hash Cond:
> (rpl_rec_tro.input_sequence = r.input_sequence)
>->  Parallel Seq Scan
> on tbl_outputrequest rpl_rec_tro  (cost=0.00..1431919.45 rows=20057645
> width=16)
>->  Parallel Hash
>  (cost=3658294.95..3658294.95 rows=3491156 width=34)
>  ->  Parallel Hash
> Right Join  (cost=1883503.35..3658294.95 rows=3491156 width=34)
>Hash Cond:
> (r.originalrequest_id = t.input_sequence)
>->
>  Parallel Seq Scan on tbl_inputrequest r  (cost=0.00..1739752.66
> rows=13348166 width=16)
>->
>  Parallel Hash  (cost=1839863.91..1839863.91 rows=3491156 width=26)
>  ->
>  Parallel Seq Scan on tbl_inputrequest t  (cost=0.00..1839863.91
> rows=3491156 width=26)
>
>  Filter: ((receive_time < '2024-05-17 00:00:00'::timestamp without time
> zone) AND (input_sequence < '202406020168279904'::bigint) AND
> ((msg_status)::text = ANY ('{COMPLETED,REJECTED}'::text[])))
>
> The query cost is high and it returns a significant number of rows.
> However it should not consume 64+ GB RAM with the
> default enable_parallel_hash = 'on' when my shared_buffers is 8GB. The
> temporary fix I've implemented to get the system working is a change in the
> application configuration so that the timestamp filter selects fewer rows
> (3x less), together with setting enable_parallel_hash = 'off'. PostgreSQL
> service memory usage grows and stays over 20GB even with this setting. I'd
> like to find out exactly what causes the high memory usage as we have other
> projects using PostgreSQL and they are scheduled for upgrade from v12.
>
> My test.sql looks like this (application uses prepared statements, the two
> set operations are to revert for the current session the already
> implemented fixes):
>
> prepare my_query (timestamp,bigint) as SELECT  t.input_sequence,
> rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence,
> rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type  FROM
> inputrequest t  LEFT JOIN outputrequest rec_tro ON rec_tro.input_sequence =
> t.input_sequence LEFT JOIN inputrequest r ON r.originalRequest_id =
> t.input_sequence   LEFT JOIN outputrequest rpl_rec_tro ON
> rpl_rec_tro.input_sequence = r.input_sequence  LEFT JOIN outputrequest
> rpl_snd_tro ON rpl_snd_tro.reply_input_sequence = r.input_sequence  LEFT
> JOIN outputrequest snd_tro ON snd_tro.reply_input_sequence =
> t.input_sequence  WHERE t.receive_time < $1 AND t.input_sequence < $2  AND
> t.msg_status IN ('COMPLETED', 'REJECTED')  ORDER BY t.msg_status DESC,
> t.input_sequence ;
>
> EXPLAIN  EXECUTE my_query('2024-05-17 00:00:00', 202406020168279904);
>
> I have an explanation for what I suspected was a memory leak. It seems
that systemd reports cached memory, the relevant par

Plan regression from 12 to 13

2022-02-23 Thread Radu Radutiu
Hi,
I have a query (generated by RequestTracker) that has regressed after
upgrading from postgresql 12 to 13. The strange thing is that decreasing
the effective_cache_size causes postgresql 13 to generate an acceptable
plan.
My setup is: 1 VM, 4 cores, 16 GB RAM (server has around 10 GB for
filesystem cache)
shared_buffers 2.4 GB

With effective_cache_size 8G, I have the following plan:
https://explain.depesz.com/s/lIcV (the run time is after several tries,
with most of the data in memory; the initial run reading from disk is
around 10 times slower)
If I decrease the effective_cache_size to '2GB' or less, I get a much
better plan: https://explain.depesz.com/s/Z6C7W
The old VM running postgresql 12 (with a somewhat smaller database) always
generate a good plan,regardless of the effective_cache_size :
https://explain.depesz.com/s/iawn

Are there any defaults that have changed from 12 to 13 that would explain
the different plans?

Thanks,
Radu