PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Ravi Garg
Hi,
I am looking to Range Partition one of my table (i.e. TransactionLog) in 
PostgreSQL 11.While evaluating query performance difference between the 
un-partitioned and partitioned table I am getting huge difference in planning 
time. Planning time is very high on partitioned table.Similarly when I query by 
specifying partition name directly in query the planning time is much less 
**0.081 ms** as compared to when I query based on partition table (parent 
table) name in query, where planning time **6.231 ms** (Samples below).
Below are the details, Let me know how can I improve query performance on 
partitioned table.
Following is the schema CREATE TABLE TransactionLog (
    txid character varying(36) NOT NULL,    txnDetails character varying(64),   
 loggingtime timestamp(6) without time zone DEFAULT LOCALTIMESTAMP,) PARTITION 
BY RANGE(loggingtime);
CREATE TABLE IF NOT EXISTS TransactionLog_20200223 PARTITION OF TransactionLog 
FOR VALUES FROM ('2020-02-23') TO ('2020-02-24');CREATE UNIQUE INDEX 
TransactionLog_20200223_UnqTxId ON TransactionLog_20200223 (txnid);

Following is explain analyze result when I query Directly on partition. 
Planning time ~**0.080 ms** (average of 10 execution)postgres=> EXPLAIN 
(ANALYZE,VERBOSE,COSTS,BUFFERS,TIMING,SUMMARY) select txnDetails FROM 
mra_part.TransactionLog_20200223 WHERE txnid = 
'febd139d-1b7f-4564-a004-1b3474e51756';                                         
                                    QUERY 
PLAN-
 Index Scan using TransactionLog_20200223_UnqTxId on TransactionLog_20200223 
(cost=0.57..4.61 rows=1 width=10) (actual time=0.039..0.040 rows=1 loops=1)   
Output: txnDetails   Index Cond: ((TransactionLog_20200223.txnid)::text = 
'febd139d-1b7f-4564-a004-1b3474e51756'::text)   Buffers: shared hit=5 
**Planning Time: 0.081 ms** Execution Time: 0.056 ms(6 rows)

Following is explain analyze result when I query by parent-table. Planning time 
**6.198 ms** (average of 10 execution)postgres=> EXPLAIN 
(ANALYZE,VERBOSE,COSTS,BUFFERS,TIMING,SUMMARY)  select txnDetails FROM 
mtdauthlog WHERE txnid = 'febd139d-1b7f-4564-a004-1b3474e51756' AND loggingtime 
>= '2020-02-23'::timestamp without time zone AND loggingtime < 
'2020-02-24'::timestamp without time zone;                                      
                                                        QUERY 
PLAN
 Append  (cost=0.57..4.62 rows=1 width=10) (actual time=0.036..0.037 rows=1 
loops=1)   Buffers: shared hit=5   ->  Index Scan using 
TransactionLog_20200223_UnqTxId on TransactionLog_20200223  (cost=0.57..4.61 
rows=1 width=10) (actual time=0.035..0.036 rows=1 loops=1)         Output: 
TransactionLog_20200223.txnDetails         Index Cond: 
((TransactionLog_20200223.txnid)::text = 
'febd139d-1b7f-4564-a004-1b3474e51756'::text)         Filter: 
((TransactionLog_20200223.loggingtime >= '2020-02-23 00:00:00'::timestamp 
without time zone) AND (TransactionLog_20200223.loggingtime < '2020-02-24 
00:00:00'::timestamp without time zone))         Buffers: shared hit=5 
**Planning Time: 6.231 ms** Execution Time: 0.076 ms(9 rows)
There are around ~200 child partitions. Partition pruning enabled.PostgreSQL 
Version: PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit
Thanks and Regards,
Ravi Garg,



Re: PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Justin Pryzby
On Sun, Feb 23, 2020 at 09:56:30AM +, Ravi Garg wrote:
> Hi,
> I am looking to Range Partition one of my table (i.e. TransactionLog) in 
> PostgreSQL 11.While evaluating query performance difference between the 
> un-partitioned and partitioned table I am getting huge difference in planning 
> time. Planning time is very high on partitioned table.Similarly when I query 
> by specifying partition name directly in query the planning time is much less 
> **0.081 ms** as compared to when I query based on partition table (parent 
> table) name in query, where planning time **6.231 ms** (Samples below).

That's probably to be expected under pg11:

https://www.postgresql.org/docs/11/ddl-partitioning.html
|Too many partitions can mean longer query planning times...
|It is also important to consider the overhead of partitioning during query 
planning and execution. The query planner is generally able to handle partition 
hierarchies with up to a few hundred partitions fairly well, provided that 
typical queries allow the query planner to prune all but a small number of 
partitions. Planning times become longer and memory consumption becomes higher 
as more partitions are added

> There are around ~200 child partitions. Partition pruning enabled.PostgreSQL 
> Version: PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
> 20150623 (Red Hat 4.8.5-39), 64-bit

How large are the partitions and how many indexes each, and how large are they?
Each partition will be stat()ed and each index will be open()ed and read() for
every query.  This was resolved in pg12:
https://commitfest.postgresql.org/21/1778/

-- 
Justin




Re: PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Justin Pryzby
On Sun, Feb 23, 2020 at 04:12:09AM -0600, Justin Pryzby wrote:
> How large are the partitions and how many indexes each, and how large are 
> they?
> Each partition will be stat()ed ... for every query.

I should have said that's every 1GB "segment" is stat()ed for every query.

> This was resolved in pg12:
> https://commitfest.postgresql.org/21/1778/

+ https://www.postgresql.org/about/featurematrix/detail/320/

-- 
Justin




Re: PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Ravi Garg
Hi Justin,
Thanks for response.
Unfortunately we will not be able to migrate to PG12 any time soon.   
   - There is only one index (unique index btree) on 'txnID' (i.e. transaction 
ID) character varying(36). Which we are creating on each partition.
   - Our use case is limited to simple selects (we don't join with the other 
tables) however, we are expecting ~70 million records inserted per day and 
there would be couple of updates on each records where average record size 
would be ~ 1.5 KB. 
   - Currently we are thinking to have Daily partitions and as we need to keep 
6 months of data thus 180 Partitions.However we have liberty to reduce the 
number of partitions to weekly/fortnightly/monthly, If we get comparable 
performance.   

   - We need to look current partition and previous partition for all of our 
use-cases/queries.
Can you please suggest what sort of combinations/partition strategy we can test 
considering data-volume/vacuum etc. Also let me know if some of the pg_settings 
can help us tuning this (I have attached my pg_settings).

Thanks and Regards,
Ravi Garg,
Mob : +91-98930-66610 

On Sunday, 23 February, 2020, 03:42:13 pm IST, Justin Pryzby 
 wrote:  
 
 On Sun, Feb 23, 2020 at 09:56:30AM +, Ravi Garg wrote:
> Hi,
> I am looking to Range Partition one of my table (i.e. TransactionLog) in 
> PostgreSQL 11.While evaluating query performance difference between the 
> un-partitioned and partitioned table I am getting huge difference in planning 
> time. Planning time is very high on partitioned table.Similarly when I query 
> by specifying partition name directly in query the planning time is much less 
> **0.081 ms** as compared to when I query based on partition table (parent 
> table) name in query, where planning time **6.231 ms** (Samples below).

That's probably to be expected under pg11:

https://www.postgresql.org/docs/11/ddl-partitioning.html
|Too many partitions can mean longer query planning times...
|It is also important to consider the overhead of partitioning during query 
planning and execution. The query planner is generally able to handle partition 
hierarchies with up to a few hundred partitions fairly well, provided that 
typical queries allow the query planner to prune all but a small number of 
partitions. Planning times become longer and memory consumption becomes higher 
as more partitions are added

> There are around ~200 child partitions. Partition pruning enabled.PostgreSQL 
> Version: PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
> 20150623 (Red Hat 4.8.5-39), 64-bit

How large are the partitions and how many indexes each, and how large are they?
Each partition will be stat()ed and each index will be open()ed and read() for
every query.  This was resolved in pg12:
https://commitfest.postgresql.org/21/1778/

-- 
Justin
  postgres=# select name, setting from pg_settings order by name desc;
  name  | setting
+-
 zero_damaged_pages | off
 xmloption  | content
 xmlbinary  | base64
 work_mem   | 4096
 wal_writer_flush_after | 128
 wal_writer_delay   | 200
 wal_sync_method| fdatasync
 wal_sender_timeout | 6
 wal_segment_size   | 1073741824
 wal_retrieve_retry_interval| 5000
 wal_receiver_timeout   | 6
 wal_receiver_status_interval   | 10
 wal_log_hints  | on
 wal_level  | replica
 wal_keep_segments  | 50
 wal_consistency_checking   |
 wal_compression| off
 wal_buffers| 8192
 wal_block_size | 8192
 vacuum_multixact_freeze_table_age  | 15000
 vacuum_multixact_freeze_min_age| 500
 vacuum_freeze_table_age| 15000
 vacuum_freeze_min_age  | 5000
 vacuum_defer_cleanup_age   | 0
 vacuum_cost_page_miss  | 10
 vacuum_cost_page_hit   | 1
 vacuum_cost_page_dirty | 20
 vacuum_cost_limit  | 800
 vacuum_cost_delay  | 0
 vacuum_cleanup_index_scale_factor  | 0.1
 update_process_title   | on
 unix_socket_permissions| 0777
 unix_socket_group  |
 unix_socket_directories| /tmp
 transform_null_equals  | off
 transaction_read_only  | off
 transaction_isolation  | read committed
 transaction_deferrable | off
 track_io_timing| on
 track_functions| all
 track_counts   | on
 tr

Re: PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Justin Pryzby
On Sun, Feb 23, 2020 at 10:57:29AM +, Ravi Garg wrote:
>- Currently we are thinking to have Daily partitions and as we need to 
> keep 6 months of data thus 180 Partitions.However we have liberty to reduce 
> the number of partitions to weekly/fortnightly/monthly, If we get comparable 
> performance.   

I didn't hear how large the tables and indexes are.

>- We need to look current partition and previous partition for all of our 
> use-cases/queries.

Do you mean that a given query is only going to hit 2 partitions ?  Or do you
mean that all but the most recent 2 partitions are "archival" and won't be
needed by future queries ?

> Can you please suggest what sort of combinations/partition strategy we can 
> test considering data-volume/vacuum etc. Also let me know if some of the 
> pg_settings can help us tuning this (I have attached my pg_settings).

You should determine what an acceptable planning speed is, or the best balance
of planning/execution time.  Try to detach half your current partitions and, if
that gives acceptable performance, then partition by day/2 or more.  You could
make a graph of (planning and total) time vs npartitions, since I think it's
likely to be nonlinear.

I believe others have reported improved performance under v11 with larger
numbers of partitions, by using "partitions of partitions".  So you could try
making partitions by month themselves partitioned by day.

>- Our use case is limited to simple selects (we don't join with the other
>tables) however, we are expecting ~70 million records inserted per day
>and there would be couple of updates on each records where average record
>size would be ~ 1.5 KB.

>  shared_buffers | 1048576

If you care about INSERT performance, you probably need to make at least a
single partition's index fit within shared_buffers (or set shared_buffers such
that it fits).  Use transactions around your inserts.  If your speed is not
limited by I/O, you could further use multiple VALUES(),() inserts, or maybe
prepared statements.  Maybe synchronous_commit=off.

If you care about (consistent) SELECT performance, you should consider
VACUUMing the tables after bulk inserts, to set hint bits (and since
non-updated tuples won't be hit by autovacuum).  Or maybe VACUUM FREEZE to
freeze tuples (since it sounds like a typical page is unlikely to ever be
updated).

-- 
Justin




Re: PostgreSQL 11 higher Planning time on Partitioned table

2020-02-23 Thread Imre Samu
> ...  txid character varying(36) NOT NULL,
> ... WHERE txnid = 'febd139d-1b7f-4564-a004-1b3474e51756'
> There is only one index (unique index btree) on 'txnID' (i.e. transaction
ID) character varying(36). Which we are creating on each partition.

IF txnid is real UUID , then you can test the
https://www.postgresql.org/docs/11/datatype-uuid.html performance
see
https://stackoverflow.com/questions/29880083/postgresql-uuid-type-performance
imho: it should be better.

best,
 Imre


Ravi Garg  ezt írta (időpont: 2020. febr. 23., V,
11:57):

> Hi Justin,
>
> Thanks for response.
>
> Unfortunately we will not be able to migrate to PG12 any time soon.
>
>- There is only one index (unique index btree) on 'txnID' (i.e.
>transaction ID) character varying(36). Which we are creating on each
>partition.
>- Our use case is limited to simple selects (we don't join with the
>other tables) however, we are expecting ~70 million records inserted
>per day and there would be couple of updates on each records where average
>record size would be ~ 1.5 KB.
>- Currently we are thinking to have Daily partitions and as we need to
>keep 6 months of data thus 180 Partitions.However we have liberty to reduce
>the number of partitions to weekly/fortnightly/monthly, If we get
>comparable performance.
>- We need to look current partition and previous partition for all of
>our use-cases/queries.
>
> Can you please suggest what sort of combinations/partition strategy we can
> test considering data-volume/vacuum etc. Also let me know if some of the
> pg_settings can help us tuning this (I have attached my pg_settings).
>
>
> Thanks and Regards,
> Ravi Garg,
> Mob : +91-98930-66610
>
>
> On Sunday, 23 February, 2020, 03:42:13 pm IST, Justin Pryzby <
> [email protected]> wrote:
>
>
> On Sun, Feb 23, 2020 at 09:56:30AM +, Ravi Garg wrote:
> > Hi,
> > I am looking to Range Partition one of my table (i.e. TransactionLog) in
> PostgreSQL 11.While evaluating query performance difference between the
> un-partitioned and partitioned table I am getting huge difference in
> planning time. Planning time is very high on partitioned table.Similarly
> when I query by specifying partition name directly in query the planning
> time is much less **0.081 ms** as compared to when I query based on
> partition table (parent table) name in query, where planning time **6.231
> ms** (Samples below).
>
> That's probably to be expected under pg11:
>
> https://www.postgresql.org/docs/11/ddl-partitioning.html
> |Too many partitions can mean longer query planning times...
> |It is also important to consider the overhead of partitioning during
> query planning and execution. The query planner is generally able to handle
> partition hierarchies with up to a few hundred partitions fairly well,
> provided that typical queries allow the query planner to prune all but a
> small number of partitions. Planning times become longer and memory
> consumption becomes higher as more partitions are added
>
>
> > There are around ~200 child partitions. Partition pruning
> enabled.PostgreSQL Version: PostgreSQL 11.7 on x86_64-pc-linux-gnu,
> compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
>
>
> How large are the partitions and how many indexes each, and how large are
> they?
> Each partition will be stat()ed and each index will be open()ed and read()
> for
> every query.  This was resolved in pg12:
> https://commitfest.postgresql.org/21/1778/
>
> --
> Justin
>
>