pg12 partitions show bad performance vs pg96
Hey,
I upgraded from 96 to 12 in our test env and I'm seeing that for queries
that involve join operation between a partition table and other tables
there is degradation is performance compared to pg96 performance.
My machine : 8cpu,16gb,regular hd,linux redhat 6
pg settings :
max_wal_size = 2GB
min_wal_size = 1GB
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
log_checkpoints = on
log_lock_waits = on
log_temp_files = 1024
log_min_duration_statement = 1
log_autovacuum_min_duration = 1
standard_conforming_strings = off
max_locks_per_transaction = 5000
max_connections = 500
log_line_prefix = '%t %d %p '
random_page_cost = 4
deadlock_timeout = 5s
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 32764
log_directory = 'pg_log'
enable_partitionwise_join = on # for pg12
enable_partitionwise_aggregate = on # for pg12
listen_addresses = '*'
ssl = on
maintenance_work_mem = 333MB
work_mem = 16MB
shared_buffers = 4020MB
effective_cache_size = 8040MB
postgresql12.2
I used this table as the joined table for both cases :
create table iot_device(id serial primary key,name text);
insert into iot_device(name) select generate_series(1,100)||'a';
In pg96 I created the following regular table :
create table iot_data(id serial primary key,data text,metadata
bigint,device bigint references iot_device(id));
inserted the data :
insert into iot_data select
generate_series(1,1000),random()*10,random()*254,random()*99+1;
In pg12 I created a table with 3 hash partitiones :
create table iot_data(id serial ,data text,metadata bigint,device bigint
references iot_device(id),primary key(id,device)) partition by hash(device);
create table iot_data_0 partition of iot_data for values with (MODULUS 3,
remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS 3,
remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS 3,
remainder 2);
I generated a dump of the data in the pg96 machine and inserted it into the
pg12 db :
pg_dump -d postgres -U postgres -a -t iot_data > iot_data.dump
psql -d postgres -U postgres -f -h pg12_machine /tmp/iot_data.dump
postgres=# select count(*) from iot_data_0;
count
-
3028682
(1 row)
postgres =# select count(*) from iot_data_1;
count
-
3234335
(1 row)
postgres =# select count(*) from iot_data_2;
count
-
3736983
(1 row)
create index on iot_data(metadata,lower(data));
vacuum analyze iot_data;
and now for the performance:
query : explain analyze select * from iot_data da,iot_device de where
de.name in ('50a') and de.id = da.device and da.metadata=50 and
lower(da.data) like '50';
PG12 :
QUERY
PLAN
---
Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858
rows=0 loops=1)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual
time=0.014..0.020 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Append (cost=5.16..771.30 rows=6 width=36) (actual
time=2.835..2.835 rows=0 loops=1)
-> Bitmap Heap Scan on iot_data_0 da (cost=5.16..233.78 rows=2
width=36) (actual time=2.829..2.829 rows=0 loops=1)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_0_metadata_lower_idx
(cost=0.00..5.14 rows=59 width=0) (actual time=2.827..2.827 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) =
'50'::text))
-> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2
width=37) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_1_metadata_lower_idx
(cost=0.00..5.18 rows=63 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) =
'50'::text))
-> Bitmap Heap Scan on iot_data_2 da_2 (cost=5.30..288.16 rows=2
width=36) (never executed)
Recheck Cond: (metadata = 50)
Filter: ((de.id = device) AND (lower(data) ~~ '50'::text))
-> Bitmap Index Scan on iot_data_2_metadata_lower_idx
(cost=0.00..5.29 rows=73 width=0) (never executed)
Index Cond: ((metadata = 50) AND (lower(data) =
'50'::text))
Planning Time: 8.157 ms
Execution Time: 2.920 ms
(22 rows)
PG96 :
QUERY PLAN
Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121
rows=0 loops=
Re: pg12 partitions show bad performance vs pg96
I realized that the planner goes to the right partition because "(never executed)" is mentioned near the scan of the other partitions. However, still i'm not sure why performance is better in pg96. בתאריך יום א׳, 8 במרץ 2020 ב-18:05 מאת Mariel Cherkassky < [email protected]>: > Hey, > I upgraded from 96 to 12 in our test env and I'm seeing that for queries > that involve join operation between a partition table and other tables > there is degradation is performance compared to pg96 performance. > > My machine : 8cpu,16gb,regular hd,linux redhat 6 > pg settings : > max_wal_size = 2GB > min_wal_size = 1GB > wal_buffers = 16MB > checkpoint_completion_target = 0.9 > checkpoint_timeout = 30min > log_checkpoints = on > log_lock_waits = on > log_temp_files = 1024 > log_min_duration_statement = 1 > log_autovacuum_min_duration = 1 > standard_conforming_strings = off > max_locks_per_transaction = 5000 > max_connections = 500 > log_line_prefix = '%t %d %p ' > random_page_cost = 4 > deadlock_timeout = 5s > shared_preload_libraries = 'pg_stat_statements' > track_activity_query_size = 32764 > log_directory = 'pg_log' > enable_partitionwise_join = on # for pg12 > enable_partitionwise_aggregate = on # for pg12 > listen_addresses = '*' > ssl = on > maintenance_work_mem = 333MB > work_mem = 16MB > shared_buffers = 4020MB > effective_cache_size = 8040MB > > postgresql12.2 > > I used this table as the joined table for both cases : > create table iot_device(id serial primary key,name text); > insert into iot_device(name) select generate_series(1,100)||'a'; > > In pg96 I created the following regular table : > create table iot_data(id serial primary key,data text,metadata > bigint,device bigint references iot_device(id)); > > inserted the data : > insert into iot_data select > generate_series(1,1000),random()*10,random()*254,random()*99+1; > > In pg12 I created a table with 3 hash partitiones : > create table iot_data(id serial ,data text,metadata bigint,device bigint > references iot_device(id),primary key(id,device)) partition by hash(device); > create table iot_data_0 partition of iot_data for values with (MODULUS 3, > remainder 0); > create table iot_data_1 partition of iot_data for values with (MODULUS 3, > remainder 1); > create table iot_data_2 partition of iot_data for values with (MODULUS 3, > remainder 2); > > > I generated a dump of the data in the pg96 machine and inserted it into > the pg12 db : > pg_dump -d postgres -U postgres -a -t iot_data > iot_data.dump > psql -d postgres -U postgres -f -h pg12_machine /tmp/iot_data.dump > > postgres=# select count(*) from iot_data_0; > count > - > 3028682 > (1 row) > > postgres =# select count(*) from iot_data_1; > count > - > 3234335 > (1 row) > > postgres =# select count(*) from iot_data_2; > count > - > 3736983 > (1 row) > > create index on iot_data(metadata,lower(data)); > vacuum analyze iot_data; > > and now for the performance: > query : explain analyze select * from iot_data da,iot_device de where > de.name in ('50a') and de.id = da.device and da.metadata=50 and > lower(da.data) like '50'; > > PG12 : > QUERY > PLAN > > --- > Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual > time=2.858..2.858 rows=0 loops=1) >-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual > time=0.014..0.020 rows=1 loops=1) > Filter: (name = '50a'::text) > Rows Removed by Filter: 99 >-> Append (cost=5.16..771.30 rows=6 width=36) (actual > time=2.835..2.835 rows=0 loops=1) > -> Bitmap Heap Scan on iot_data_0 da (cost=5.16..233.78 rows=2 > width=36) (actual time=2.829..2.829 rows=0 loops=1) >Recheck Cond: (metadata = 50) >Filter: ((de.id = device) AND (lower(data) ~~ '50'::text)) >-> Bitmap Index Scan on iot_data_0_metadata_lower_idx > (cost=0.00..5.14 rows=59 width=0) (actual time=2.827..2.827 rows=0 loops=1) > Index Cond: ((metadata = 50) AND (lower(data) = > '50'::text)) > -> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 > rows=2 width=37) (never executed) >Recheck Cond: (metadata = 50) >Filter: ((de.id = device) AND (lower(data) ~~ '50'::text)) >-> Bitmap Index Scan on iot_data_1_metadata_lower_idx > (cost=0.00..5.18 rows=63 width=0) (never executed) > Index Cond: ((metadata = 50) AND (lower(data) = > '50'::text)) > -> Bitmap Heap Scan on iot_data_2 da_2 (cost=5.30..288.16 > rows=2 width=36) (never executed) >Recheck Cond: (metadata = 50) >Filter: ((de.id = device) AND (lower(data) ~~ '50'::text)) >-> Bitmap Index Scan on iot_da
Re: pg12 partitions show bad performance vs pg96
On Mon, 9 Mar 2020 at 05:05, Mariel Cherkassky wrote: > PG12 : > Planning Time: 8.157 ms > Execution Time: 2.920 ms > (22 rows) > > > PG96 : > Planning time: 0.815 ms > Execution time: 0.158 ms > (12 rows) 8 ms seems pretty slow to planning that query. Does the planning time drop if you execute this multiple times in the same session? Does the time change if you try again without any foreign keys? The planning time for the partitioned case is certainly going to take longer. Partitioned tables are never going to improve the times of query planning. It's only possible that they'll improve the performance during query execution. For such small fast queries as the ones you've shown, it's important to remember that more complex query plans (ones with more nodes) do lead to longer times for executor startup and shutdown. EXPLAIN (without ANALYZE), will perform query planning and executor startup/shutdown. If you enable \timing on in psql and test the EXPLAIN performance of these queries in each version, then you might get an idea of where the overheads are. Additionally, you're unlikely to see performance improvements with table partitioning unless you're accessing many rows and partitioning allows the data locality of the rows that you are accessing to improve. i.e accesses fewer buffers and/or improves cache hit ratios. In PG12, if the partition pruning can be done during query planning then the planning and executor startup overhead is much lower since there are fewer relations to generate access paths for and fewer nodes in the final plan. This also improves the situation during execution as it means fewer locks to take and fewer nodes to startup/shutdown. > As you can see, the results are better in pg96. This example only shows the > results for a small data set. In bigger data sets I get a bigger diff... Can you share the results of that? > I tried changing many postgresql.conf parameters that were added > (max_workers_per_gather,enable_partitionwise_join and so on..). The former only does anything for parallel queries. None of the plans you've shown are parallel ones. The latter also does not count in this case. It only counts when joining two identically partitioned tables. > I dont understand why in pg12 it scans all the partitions instead of the > relevant one.. If you'd specified a specific "device" in the query SQL, then the query planner would know which partition to scan for that particular device. However, since you're looking up the device in another table and performing a join, the device is only known during query execution. The plan nodes for the non-matching partitions do go through executor startup, but they're not scanned during execution, as you've seen with the "(never executed)" appearing in the EXPLAIN ANALYZE output. Since executor startup occurs before execution, the device you mean is still unknown during executor startup, so the executor must startup the nodes for all partitions that are in the plan. Starting up a plan node is not free, but not really very expensive either. However, the overhead of it might be quite large proportionally in your case since the executor is doing so little work. The most basic guidelines for table partitioning are, don't partition your tables unless it's a net win. If partitioning was always faster, we'd just have designed Postgres to implicitly partition all of your tables for you. There are some other guidelines in [1]. [1] https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES David
Re: pg12 partitions show bad performance vs pg96
On Sun, Mar 08, 2020 at 06:05:26PM +0200, Mariel Cherkassky wrote:
> In pg12 I created a table with 3 hash partitiones :
> create table iot_data(id serial ,data text,metadata bigint,device bigint
> references iot_device(id),primary key(id,device)) partition by hash(device);
> and now for the performance:
> query : explain analyze select * from iot_data da,iot_device de where
> de.name in ('50a') and de.id = da.device and da.metadata=50 and
> lower(da.data) like '50';
> I dont understand why in pg12 it scans all the partitions instead of the
> relevant one..
As you noticed, it doesn't actually scan them. I believe v11 "partition
elimination during query execution" is coming into play here. There's no
option to disable that, but as a quick test, you could possibly try under PG10
(note, that doesn't support inherited indexes). Or you could try to profile
under PG12 (and consider comparing with pg13dev).
You partitioned on hash(iot_data.device), but your query doesn't specify
device, except that da.device=de.id AND de.name IN ('50'). If that's a typical
query, maybe it'd work better to partition on metadata or lower(name) (or
possibly both).
On Sun, Mar 08, 2020 at 06:05:26PM +0200, Mariel Cherkassky wrote:
> PG12 :
> Nested Loop (cost=5.16..773.61 rows=2 width=43) (actual time=2.858..2.858
> rows=0 loops=1)
...
> -> Bitmap Heap Scan on iot_data_1 da_1 (cost=5.20..249.32 rows=2
> width=37) (NEVER EXECUTED)
...
> Planning Time: 8.157 ms
> Execution Time: 2.920 ms
> PG96 :
> Nested Loop (cost=6.57..397.19 rows=2 width=44) (actual time=0.121..0.121
> rows=0 loops=1)
...
> Planning time: 0.815 ms
> Execution time: 0.158 ms
--
Justin
