Partitioning Optimizer Questions and Issues

keith anderson Fri, 08 Feb 2019 03:17:20 -0800

I am using Postgres on a large system (recording approximately 20million 
transactions per day). We use partitioning by date to assist with both vacuum 
processing time and to archive old data. At the core of the system are records 
in 2 different tables detailing different types of activity for monetary 
transactions (e.g. money in and money out) -> a single transaction has entries 
in both tables, so to retrieve all details for a single transaction we need to 
join the 2 tables.


The use of partitioning however has a significant impact on the performance of 
retrieving this data. Being relatively new to Postgres I wanted to share my 
findings and understand how others address them. We run postgres version 9.6 on 
CentOS, but the same behaviour is apparent in postgres 10.6. The test case 
outputs are from version 10.6 running on my Ubuntu machine with default 
postgres configuration.
Below is an example script to populate the test data:
=======================drop table if exists tablea cascade;drop table if exists 
tableb cascade;
CREATE TABLE tablea (    id            serial,    reference     int not null,   
 created       date not null)  PARTITION BY RANGE (created);

CREATE TABLE tablea_part1 PARTITION OF tablea    FOR VALUES FROM ('2018-01-01') 
TO ('2018-01-02');CREATE TABLE tablea_part2 PARTITION OF tablea    FOR VALUES 
FROM ('2018-01-02') TO ('2018-01-03');CREATE TABLE tablea_part3 PARTITION OF 
tablea    FOR VALUES FROM ('2018-01-03') TO ('2018-01-04');CREATE TABLE 
tablea_part4 PARTITION OF tablea    FOR VALUES FROM ('2018-01-04') TO 
('2018-01-05');CREATE TABLE tablea_part5 PARTITION OF tablea    FOR VALUES FROM 
('2018-01-05') TO ('2018-01-06');
CREATE INDEX tablea_id_1 ON tablea_part1 (id);CREATE INDEX tablea_id_2 ON 
tablea_part2 (id);CREATE INDEX tablea_id_3 ON tablea_part3 (id);CREATE INDEX 
tablea_id_4 ON tablea_part4 (id);CREATE INDEX tablea_id_5 ON tablea_part5 
(id);CREATE INDEX tablea_reference_1 ON tablea_part1 (reference);CREATE INDEX 
tablea_reference_2 ON tablea_part2 (reference);CREATE INDEX tablea_reference_3 
ON tablea_part3 (reference);CREATE INDEX tablea_reference_4 ON tablea_part4 
(reference);CREATE INDEX tablea_reference_5 ON tablea_part5 (reference);CREATE 
INDEX tablea_created_1 ON tablea_part1 (created);CREATE INDEX tablea_created_2 
ON tablea_part2 (created);CREATE INDEX tablea_created_3 ON tablea_part3 
(created);CREATE INDEX tablea_created_4 ON tablea_part4 (created);CREATE INDEX 
tablea_created_5 ON tablea_part5 (created);alter table tablea_part1 add CHECK ( 
created >= DATE '2018-01-01' AND created < DATE '2018-01-02');alter table 
tablea_part2 add CHECK ( created >= DATE '2018-01-02' AND created < DATE 
'2018-01-03');alter table tablea_part3 add CHECK ( created >= DATE '2018-01-03' 
AND created < DATE '2018-01-04');alter table tablea_part4 add CHECK ( created 
>= DATE '2018-01-04' AND created < DATE '2018-01-05');alter table tablea_part5 
add CHECK ( created >= DATE '2018-01-05' AND created < DATE '2018-01-06');
create or replace function populate_tablea() RETURNS integer AS$BODY$           
                                                  DECLARE                       
                 i integer;                                     v_created 
date;BEGIN                                               i := 0;                
                           WHILE (i < 50000)    loop        i := i + 1;         
                                       IF (mod(i,5) = 1) THEN            
v_created = '2018-01-01';        ELSIF (mod(i,5) = 2) THEN            v_created 
= '2018-01-02';         ELSIF (mod(i,5) = 3) THEN            v_created = 
'2018-01-03';        ELSIF (mod(i,5) = 4) THEN            v_created = 
'2018-01-04';        ELSIF (mod(i,5) = 0) THEN            v_created = 
'2018-01-05';        END IF;                                    insert into 
tablea values (i, i, v_created);
    end loop;                                                                   
                               RETURN i;END;$BODY$LANGUAGE plpgsql VOLATILE    
COST 100;
CREATE TABLE tableb(    id         serial,    reference  int not null,    
created    date not null) PARTITION BY RANGE (created);

CREATE TABLE tableb_part1 PARTITION OF tableb    FOR VALUES FROM ('2018-01-01') 
TO ('2018-01-02');CREATE TABLE tableb_part2 PARTITION OF tableb    FOR VALUES 
FROM ('2018-01-02') TO ('2018-01-03');CREATE TABLE tableb_part3 PARTITION OF 
tableb    FOR VALUES FROM ('2018-01-03') TO ('2018-01-04');CREATE TABLE 
tableb_part4 PARTITION OF tableb    FOR VALUES FROM ('2018-01-04') TO 
('2018-01-05');CREATE TABLE tableb_part5 PARTITION OF tableb    FOR VALUES FROM 
('2018-01-05') TO ('2018-01-06');

CREATE INDEX tableb_id_1 ON tableb_part1 (id);CREATE INDEX tableb_id_2 ON 
tableb_part2 (id);CREATE INDEX tableb_id_3 ON tableb_part3 (id);CREATE INDEX 
tableb_id_4 ON tableb_part4 (id);CREATE INDEX tableb_id_5 ON tableb_part5 
(id);CREATE INDEX tableb_reference_1 ON tableb_part1 (reference);CREATE INDEX 
tableb_reference_2 ON tableb_part2 (reference);CREATE INDEX tableb_reference_3 
ON tableb_part3 (reference);CREATE INDEX tableb_reference_4 ON tableb_part4 
(reference);CREATE INDEX tableb_reference_5 ON tableb_part5 (reference);CREATE 
INDEX tableb_created_1 ON tableb_part1 (created);CREATE INDEX tableb_created_2 
ON tableb_part2 (created);CREATE INDEX tableb_created_3 ON tableb_part3 
(created);CREATE INDEX tableb_created_4 ON tableb_part4 (created);CREATE INDEX 
tableb_created_5 ON tableb_part5 (created);alter table tableb_part1 add CHECK ( 
created >= DATE '2018-01-01' AND created < DATE '2018-01-02');alter table 
tableb_part2 add CHECK ( created >= DATE '2018-01-02' AND created < DATE 
'2018-01-03');alter table tableb_part3 add CHECK ( created >= DATE '2018-01-03' 
AND created < DATE '2018-01-04');alter table tableb_part4 add CHECK ( created 
>= DATE '2018-01-04' AND created < DATE '2018-01-05');alter table tableb_part5 
add CHECK ( created >= DATE '2018-01-05' AND created < DATE '2018-01-06');

create or replace function populate_tableb() RETURNS integer AS$BODY$DECLARE  i 
integer;  v_created date;BEGIN    i := 0;    WHILE (i < 50000)    loop        i 
:= i + 1;        IF (mod(i,5) = 0) THEN            v_created = '2018-01-01';    
    ELSIF (mod(i,5) = 1) THEN            v_created = '2018-01-02';         
ELSIF (mod(i,5) = 2) THEN            v_created = '2018-01-03';        ELSIF 
(mod(i,5) = 3) THEN            v_created = '2018-01-04';        ELSIF (mod(i,5) 
= 4) THEN            v_created = '2018-01-05';            END IF;        insert 
into tableb values (i, i, v_created);    end loop;    RETURN 
i;END;$BODY$LANGUAGE plpgsql VOLATILE    COST 100;
select populate_tablea();select populate_tableb();vacuum 
analyze;==================================

So it creates 2 tables, both with 5 partitions (using range partitioning on the 
created column). Each partition has 10000 rows in it.
Below are some example queries I have run, the outputs of explain analyze for 
each and notes on each of my findings/questions:

============
-- NOTICE IN THE BELOW THAT WE USE A SINGLE ID (ESSENTIALLY THE PRIMARY KEY) 
BUT WE HAVE ESTIMATED 5 ROWS RETURNED. WE SEEM TO BE BASING-- ON PARTITION 
STATS ONLY AND SUMMING. SO EACH PARTITION ASSUMES ID IS UNIQUE, BUT WITH 5 
PARTITIONS, THE TOTAL ROWS IS 5.
explain analyze select * from tablea where id = 101;                            
                               QUERY PLAN                                       
                     
---------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.29..41.51 rows=5 width=12) (actual time=0.027..0.066 rows=1 
loops=1)   ->  Index Scan using tablea_id_1 on tablea_part1  (cost=0.29..8.30 
rows=1 width=12) (actual time=0.026..0.029 rows=1 loops=1)         Index Cond: 
(id = 101)   ->  Index Scan using tablea_id_2 on tablea_part2  (cost=0.29..8.30 
rows=1 width=12) (actual time=0.010..0.010 rows=0 loops=1)         Index Cond: 
(id = 101)   ->  Index Scan using tablea_id_3 on tablea_part3  (cost=0.29..8.30 
rows=1 width=12) (actual time=0.008..0.009 rows=0 loops=1)         Index Cond: 
(id = 101)   ->  Index Scan using tablea_id_4 on tablea_part4  (cost=0.29..8.30 
rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=1)         Index Cond: 
(id = 101)   ->  Index Scan using tablea_id_5 on tablea_part5  (cost=0.29..8.30 
rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=1)         Index Cond: 
(id = 101) Planning time: 0.875 ms Execution time: 0.176 ms

============
-- IF WE USE AN IN WITH 10 ID'S WE ESTIMATE 50 ROWS RETURNED INSTEAD OF THE 
ACTUAL 10. AGAIN SEEMS TO BE AGGREGATING PARTITION STATISTICS.
explain analyze select * from tablea where id in 
(101,102,103,104,105,106,107,108,109,110);                                      
                          QUERY PLAN                                            
                 
-----------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.29..215.13 rows=50 width=12) (actual time=0.040..0.283 rows=10 
loops=1)   ->  Index Scan using tablea_id_1 on tablea_part1  (cost=0.29..43.03 
rows=10 width=12) (actual time=0.039..0.079 rows=2 loops=1)         Index Cond: 
(id = ANY ('{101,102,103,104,105,106,107,108,109,110}'::integer[]))   ->  Index 
Scan using tablea_id_2 on tablea_part2  (cost=0.29..43.03 rows=10 width=12) 
(actual time=0.021..0.052 rows=2 loops=1)         Index Cond: (id = ANY 
('{101,102,103,104,105,106,107,108,109,110}'::integer[]))   ->  Index Scan 
using tablea_id_3 on tablea_part3  (cost=0.29..43.03 rows=10 width=12) (actual 
time=0.022..0.048 rows=2 loops=1)         Index Cond: (id = ANY 
('{101,102,103,104,105,106,107,108,109,110}'::integer[]))   ->  Index Scan 
using tablea_id_4 on tablea_part4  (cost=0.29..43.03 rows=10 width=12) (actual 
time=0.026..0.049 rows=2 loops=1)         Index Cond: (id = ANY 
('{101,102,103,104,105,106,107,108,109,110}'::integer[]))   ->  Index Scan 
using tablea_id_5 on tablea_part5  (cost=0.29..43.03 rows=10 width=12) (actual 
time=0.028..0.048 rows=2 loops=1)         Index Cond: (id = ANY 
('{101,102,103,104,105,106,107,108,109,110}'::integer[])) Planning time: 1.526 
ms Execution time: 0.397 ms                                                
===========
-- IF WE USE A RANGE INSTEAD OF INDIVIDUAL ID'S, WE GET ESTIMATED 10 ROWS 
RETURNED (GOOD). -- IS THIS USING THE GLOBAL TABLE STATISTICS INSTEAD? WHY DOES 
IT DIFFER FROM DISTINCT ID'S?
explain analyze select * from tablea where id >= 101 and id <= 110;             
                                              QUERY PLAN                        
                                    
---------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.29..41.62 rows=10 width=12) (actual time=0.022..0.074 rows=10 
loops=1)   ->  Index Scan using tablea_id_1 on tablea_part1  (cost=0.29..8.32 
rows=2 width=12) (actual time=0.021..0.026 rows=2 loops=1)         Index Cond: 
((id >= 101) AND (id <= 110))   ->  Index Scan using tablea_id_2 on 
tablea_part2  (cost=0.29..8.32 rows=2 width=12) (actual time=0.010..0.012 
rows=2 loops=1)         Index Cond: ((id >= 101) AND (id <= 110))   ->  Index 
Scan using tablea_id_3 on tablea_part3  (cost=0.29..8.32 rows=2 width=12) 
(actual time=0.009..0.010 rows=2 loops=1)         Index Cond: ((id >= 101) AND 
(id <= 110))   ->  Index Scan using tablea_id_4 on tablea_part4  
(cost=0.29..8.32 rows=2 width=12) (actual time=0.009..0.010 rows=2 loops=1)     
    Index Cond: ((id >= 101) AND (id <= 110))   ->  Index Scan using 
tablea_id_5 on tablea_part5  (cost=0.29..8.32 rows=2 width=12) (actual 
time=0.008..0.010 rows=2 loops=1)         Index Cond: ((id >= 101) AND (id <= 
110)) Planning time: 1.845 ms Execution time: 0.196 ms
==========
-- HERE ARE THE TABLE STATS, SHOWING THAT POSTGRES IS AWARE THAT ID'S ARE 
GLOBALLY UNIQUE IN THE TABLEA TABLE. CAN IT USE THEM?
select tablename,n_distinct from pg_stats where tablename like '%tablea%' and 
attname = 'id';  tablename   | n_distinct --------------+------------ 
tablea_part3 |         -1 tablea       |         -1 tablea_part2 |         -1 
tablea_part4 |         -1 tablea_part5 |         -1 tablea_part1 |         -1
==========
-- WHEN I JOIN, THE NUMBER OF ROWS MULTIPLIES. NOTICE THE SEQUENTIAL SCAN OF 
THE TABLEB PARTITION. THIS IS CAUSED BY THE OVERESTIMATION-- OF ROWS RETURNED 
BY TABLEA
explain analyze select * from tablea a, tableb b where a.reference = 
b.reference and a.id in (101,102,103,104,105,106,107,108,109,110);              
                                                      QUERY PLAN                
                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=215.75..1178.75 rows=50 width=24) (actual time=0.386..46.845 
rows=10 loops=1)   Hash Cond: (b.reference = a.reference)   ->  Append  
(cost=0.00..775.00 rows=50000 width=12) (actual time=0.024..26.527 rows=50000 
loops=1)         ->  Seq Scan on tableb_part1 b  (cost=0.00..155.00 rows=10000 
width=12) (actual time=0.022..4.006 rows=10000 loops=1)         ->  Seq Scan on 
tableb_part2 b_1  (cost=0.00..155.00 rows=10000 width=12) (actual 
time=0.023..4.039 rows=10000 loops=1)         ->  Seq Scan on tableb_part3 b_2  
(cost=0.00..155.00 rows=10000 width=12) (actual time=0.023..3.247 rows=10000 
loops=1)         ->  Seq Scan on tableb_part4 b_3  (cost=0.00..155.00 
rows=10000 width=12) (actual time=0.016..1.421 rows=10000 loops=1)         ->  
Seq Scan on tableb_part5 b_4  (cost=0.00..155.00 rows=10000 width=12) (actual 
time=0.007..1.113 rows=10000 loops=1)   ->  Hash  (cost=215.13..215.13 rows=50 
width=12) (actual time=0.316..0.316 rows=10 loops=1)         Buckets: 1024  
Batches: 1  Memory Usage: 9kB         ->  Append  (cost=0.29..215.13 rows=50 
width=12) (actual time=0.034..0.301 rows=10 loops=1)               ->  Index 
Scan using tablea_id_1 on tablea_part1 a  (cost=0.29..43.03 rows=10 width=12) 
(actual time=0.033..0.074 rows=2 loops=1)                     Index Cond: (id = 
ANY ('{101,102,103,104,105,106,107,108,109,110}'::integer[]))               ->  
Index Scan using tablea_id_2 on tablea_part2 a_1  (cost=0.29..43.03 rows=10 
width=12) (actual time=0.020..0.051 rows=2 loops=1)                     Index 
Cond: (id = ANY ('{101,102,103,104,105,106,107,108,109,110}'::integer[]))       
        ->  Index Scan using tablea_id_3 on tablea_part3 a_2  (cost=0.29..43.03 
rows=10 width=12) (actual time=0.021..0.048 rows=2 loops=1)                     
Index Cond: (id = ANY ('{101,102,103,104,105,106,107,108,109,110}'::integer[])) 
              ->  Index Scan using tablea_id_4 on tablea_part4 a_3  
(cost=0.29..43.03 rows=10 width=12) (actual time=0.025..0.072 rows=2 loops=1)   
                  Index Cond: (id = ANY 
('{101,102,103,104,105,106,107,108,109,110}'::integer[]))               ->  
Index Scan using tablea_id_5 on tablea_part5 a_4  (cost=0.29..43.03 rows=10 
width=12) (actual time=0.028..0.049 rows=2 loops=1)                     Index 
Cond: (id = ANY ('{101,102,103,104,105,106,107,108,109,110}'::integer[])) 
Planning time: 2.642 ms Execution time: 47.005 ms
===========
-- REPEAT, BUT USING A RANGE QUERY. NO LONGER SEQUENTIAL SCAN AS THE TABLEA ROW 
ESTIMATE DROPS TO 10 FROM 50. -- QUERY EXECUTION TIME DROPS FROM 47MS TO 0.7MS
explain analyze select * from tablea a, tableb b where a.reference = 
b.reference and a.id >= 101 and a. id <= 110;                                   
                                 QUERY PLAN                                     
                                
---------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.57..437.25 rows=10 width=24) (actual time=0.063..0.543 
rows=10 loops=1)   ->  Append  (cost=0.29..41.62 rows=10 width=12) (actual 
time=0.027..0.091 rows=10 loops=1)         ->  Index Scan using tablea_id_1 on 
tablea_part1 a  (cost=0.29..8.32 rows=2 width=12) (actual time=0.026..0.031 
rows=2 loops=1)               Index Cond: ((id >= 101) AND (id <= 110))         
->  Index Scan using tablea_id_2 on tablea_part2 a_1  (cost=0.29..8.32 rows=2 
width=12) (actual time=0.010..0.013 rows=2 loops=1)               Index Cond: 
((id >= 101) AND (id <= 110))         ->  Index Scan using tablea_id_3 on 
tablea_part3 a_2  (cost=0.29..8.32 rows=2 width=12) (actual time=0.009..0.011 
rows=2 loops=1)               Index Cond: ((id >= 101) AND (id <= 110))         
->  Index Scan using tablea_id_4 on tablea_part4 a_3  (cost=0.29..8.32 rows=2 
width=12) (actual time=0.009..0.012 rows=2 loops=1)               Index Cond: 
((id >= 101) AND (id <= 110))         ->  Index Scan using tablea_id_5 on 
tablea_part5 a_4  (cost=0.29..8.32 rows=2 width=12) (actual time=0.015..0.018 
rows=2 loops=1)               Index Cond: ((id >= 101) AND (id <= 110))   ->  
Append  (cost=0.29..39.51 rows=5 width=12) (actual time=0.021..0.041 rows=1 
loops=10)         ->  Index Scan using tableb_reference_1 on tableb_part1 b  
(cost=0.29..7.90 rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=10)    
           Index Cond: (reference = a.reference)         ->  Index Scan using 
tableb_reference_2 on tableb_part2 b_1  (cost=0.29..7.90 rows=1 width=12) 
(actual time=0.006..0.006 rows=0 loops=10)               Index Cond: (reference 
= a.reference)         ->  Index Scan using tableb_reference_3 on tableb_part3 
b_2  (cost=0.29..7.90 rows=1 width=12) (actual time=0.009..0.010 rows=0 
loops=10)               Index Cond: (reference = a.reference)         ->  Index 
Scan using tableb_reference_4 on tableb_part4 b_3  (cost=0.29..7.90 rows=1 
width=12) (actual time=0.006..0.007 rows=0 loops=10)               Index Cond: 
(reference = a.reference)         ->  Index Scan using tableb_reference_5 on 
tableb_part5 b_4  (cost=0.29..7.90 rows=1 width=12) (actual time=0.006..0.006 
rows=0 loops=10)               Index Cond: (reference = a.reference) Planning 
time: 3.629 ms Execution time: 0.762 ms
===========

So to summarise the findings/questions from above:
- It seems like the Postgres optimizer sometimes uses the partition level 
statistics, and sometimes the global table level statistics? Or is it using 
something else?- With partitioning tables with unique identifier and retrieving 
explicitly on those identifiers, at present the optimizer will always 
understimate the selectivity and overestimate the rows returned. This 
inaccuracy increases in proportion to the number of partitions.- As a result, 
when joining to other tables, you are liable to hitting sequential scans. This 
becomes more likely as you have more partitions or if join to more partitioned 
tables (note I am aware I could try and tune random_page_cost to try and 
prevent this).- To me in the examples queries described above, it makes sense 
to use the partition statistics for the partition level access strategy, but 
the global statistics when estimating the actual rows returned by all the 
individual partition queries. Is there a reason not to do this? Or do others 
believe the optimizer is doing the right thing here?
And then some general questions:
- How do other people use partitioning but without a significant performance 
disadvantage on reading the data? Is there something else I should be doing 
here to achieve the same thing without the overhead? At present my reads have 
increased optimization cost (as it needs to optimize access to each partition) 
and also execution cost (access the index on every partition). Even without the 
optimizer issues described above, the cost of reading simple data is extremely 
high relative to non-partitioned data (unless you use the partition key as a 
filter for each table to eliminate those partitions).- Is there any chance/plan 
to add global indexes to postgres? If so would that impact significantly the 
cost of the partition drop e.g. to clean up the index.
Thanks in advance for any feedback/support,
Keith

Partitioning Optimizer Questions and Issues

Reply via email to