Fwd: Planning time is time-consuming

2023-09-11 Thread Mikhail Balayan
Hi Laurenz,

My bad, I forgot to write that I tried vacuum too, but it didn't help. To
demonstrate the result, I did it again:

# vacuum (analyze, verbose) test_db_bench_1;
INFO:  vacuuming "perfkit.public.test_db_bench_1"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  finished vacuuming "perfkit.public.test_db_bench_1": index scans: 0
pages: 0 removed, 684731 remain, 17510 scanned (2.56% of total)
tuples: 0 removed, 3999770 remain, 0 are dead but not yet removable
removable cutoff: 27200203, which was 0 XIDs old when operation ended
index scan bypassed: 7477 pages from table (1.09% of total) have 20072 dead
item identifiers
avg read rate: 0.099 MB/s, avg write rate: 0.009 MB/s
buffer usage: 27770 hits, 11 misses, 1 dirtied
WAL usage: 1 records, 1 full page images, 1762 bytes
system usage: CPU: user: 0.15 s, system: 0.71 s, elapsed: 0.87 s
INFO:  vacuuming "perfkit.pg_toast.pg_toast_16554"
INFO:  finished vacuuming "perfkit.pg_toast.pg_toast_16554": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 27200203, which was 0 XIDs old when operation ended
new relfrozenxid: 27200203, which is 460 XIDs ahead of previous value
index scan not needed: 0 pages from table (100.00% of total) had 0 dead
item identifiers removed
avg read rate: 113.225 MB/s, avg write rate: 0.000 MB/s
buffer usage: 3 hits, 1 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 188 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  analyzing "public.test_db_bench_1"
INFO:  "test_db_bench_1": scanned 3 of 684731 pages, containing 175085
live rows and 897 dead rows; 3 rows in sample, 3996204 estimated total
rows
VACUUM



# vacuum (analyze, verbose) test_db_bench_tenants;
INFO:  vacuuming "perfkit.public.test_db_bench_tenants"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  finished vacuuming "perfkit.public.test_db_bench_tenants": index
scans: 0
pages: 0 removed, 78154 remain, 1 scanned (0.00% of total)
tuples: 0 removed, 4064008 remain, 0 are dead but not yet removable
removable cutoff: 27200204, which was 0 XIDs old when operation ended
new relfrozenxid: 27200204, which is 2 XIDs ahead of previous value
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item
identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 34 hits, 0 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 188 bytes
system usage: CPU: user: 0.01 s, system: 0.08 s, elapsed: 0.10 s
INFO:  analyzing "public.test_db_bench_tenants"
INFO:  "test_db_bench_tenants": scanned 3 of 78154 pages, containing
156 live rows and 0 dead rows; 3 rows in sample, 4064008 estimated
total rows
VACUUM



# vacuum (analyze, verbose) test_db_bench_tenant_closure;
INFO:  vacuuming "perfkit.public.test_db_bench_tenant_closure"
INFO:  launched 1 parallel vacuum worker for index cleanup (planned: 1)
INFO:  finished vacuuming "perfkit.public.test_db_bench_tenant_closure":
index scans: 0
pages: 0 removed, 181573 remain, 3808 scanned (2.10% of total)
tuples: 0 removed, 28505125 remain, 0 are dead but not yet removable
removable cutoff: 27200205, which was 0 XIDs old when operation ended
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item
identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 97.907 MB/s
buffer usage: 7680 hits, 0 misses, 3803 dirtied
WAL usage: 3800 records, 2 full page images, 224601 bytes
system usage: CPU: user: 0.08 s, system: 0.21 s, elapsed: 0.30 s
INFO:  analyzing "public.test_db_bench_tenant_closure"
INFO:  "test_db_bench_tenant_closure": scanned 3 of 181573 pages,
containing 4709835 live rows and 0 dead rows; 3 rows in sample,
28505962 estimated total rows
VACUUM



 Limit  (cost=1.98..152.05 rows=1 width=44) (actual time=0.012..0.013
rows=0 loops=1)
   ->  Nested Loop  (cost=1.98..1052.49 rows=7 width=44) (actual
time=0.011..0.012 rows=0 loops=1)
 ->  Nested Loop  (cost=1.55..1022.18 rows=7 width=37) (actual
time=0.011..0.011 rows=0 loops=1)
   ->  Nested Loop  (cost=1.12..1019.03 rows=7 width=8) (actual
time=0.011..0.011 rows=0 loops=1)
 ->  Index Scan using test_db_bench_tenants_uuid on
test_db_bench_tenants tenants_parent  (cost=0.56..2.77 rows=1 width=8)
(actual time=0.010..0.010 rows=0 loops=1)
   Index Cond: ((uuid)::text =
'4c79c1c5-21ae-45a0-8734-75d67abd0330'::text)
   Filter: (NOT is_deleted)
 ->  Index Scan using test_db_bench_tenant_closure_pkey
on test_db_bench_tenant_closure tenants_closure  (cost=0.56..1006.97
rows=929 width=16) (never executed)
   Index Cond: (parent_id = tenants_parent.id)
   Filter: (barrier <= 0)
   ->  Index Scan using test_db_bench_tenants_pkey on
test_db_be

Fwd: Planning time is time-consuming

2023-09-11 Thread Mikhail Balayan
Thanks for the idea. I was surprised to find that this is not the way it
works and the planning time remains the same. To keep the experiment clean,
I ran it several times, first a couple of times explain analyze, then a
couple of times the query itself:

# PREPARE the_query (varchar) AS
SELECT "test_db_bench_1"."id" id, "test_db_bench_1"."tenant_id"
  FROM "test_db_bench_1"
  JOIN "test_db_bench_tenants" AS "tenants_child" ON
(("tenants_child"."uuid" = "test_db_bench_1"."tenant_id")
   AND
("tenants_child"."is_deleted" != true))
  JOIN "test_db_bench_tenant_closure" AS "tenants_closure" ON
(("tenants_closure"."child_id" = "tenants_child"."id")
   AND
("tenants_closure"."barrier" <= 0))
  JOIN "test_db_bench_tenants" AS "tenants_parent" ON
(("tenants_parent"."id" = "tenants_closure"."parent_id")
AND
("tenants_parent"."uuid" IN ($1))
AND
("tenants_parent"."is_deleted" != true))
 LIMIT 1;

# explain analyze EXECUTE the_query('4c79c1c5-21ae-45a0-8734-75d67abd0330');

  QUERY PLAN
--
 Limit  (cost=1.98..152.05 rows=1 width=152) (actual time=0.014..0.015
rows=0 loops=1)
   ->  Nested Loop  (cost=1.98..1052.49 rows=7 width=152) (actual
time=0.013..0.013 rows=0 loops=1)
 ->  Nested Loop  (cost=1.55..1022.18 rows=7 width=108) (actual
time=0.013..0.013 rows=0 loops=1)
   ->  Nested Loop  (cost=1.12..1019.03 rows=7 width=63)
(actual time=0.012..0.013 rows=0 loops=1)
 ->  Index Scan using test_db_bench_tenants_uuid on
test_db_bench_tenants tenants_parent  (cost=0.56..2.77 rows=1 width=45)
(actual time=0.012..0.012 rows=0 loops=1)
   Index Cond: ((uuid)::text =
'4c79c1c5-21ae-45a0-8734-75d67abd0330'::text)
   Filter: (NOT is_deleted)
 ->  Index Scan using test_db_bench_tenant_closure_pkey
on test_db_bench_tenant_closure tenants_closure  (cost=0.56..1006.97
rows=929 width=18) (never executed)
   Index Cond: (parent_id = tenants_parent.id)
   Filter: (barrier <= 0)
   ->  Index Scan using test_db_bench_tenants_pkey on
test_db_bench_tenants tenants_child  (cost=0.43..0.45 rows=1 width=45)
(never executed)
 Index Cond: (id = tenants_closure.child_id)
 Filter: (NOT is_deleted)
 ->  Index Scan using test_db_bench_1_idx_tenant_id_3 on
test_db_bench_1  (cost=0.43..2.98 rows=135 width=44) (never executed)
   Index Cond: ((tenant_id)::text = (tenants_child.uuid)::text)
 Planning Time: 0.982 ms
 Execution Time: 0.059 ms
(17 rows)

# explain analyze EXECUTE the_query('4c79c1c5-21ae-45a0-8734-75d67abd0330');

  QUERY PLAN
--
 Limit  (cost=1.98..152.05 rows=1 width=152) (actual time=0.011..0.012
rows=0 loops=1)
   ->  Nested Loop  (cost=1.98..1052.49 rows=7 width=152) (actual
time=0.010..0.011 rows=0 loops=1)
 ->  Nested Loop  (cost=1.55..1022.18 rows=7 width=108) (actual
time=0.010..0.011 rows=0 loops=1)
   ->  Nested Loop  (cost=1.12..1019.03 rows=7 width=63)
(actual time=0.010..0.010 rows=0 loops=1)
 ->  Index Scan using test_db_bench_tenants_uuid on
test_db_bench_tenants tenants_parent  (cost=0.56..2.77 rows=1 width=45)
(actual time=0.010..0.010 rows=0 loops=1)
   Index Cond: ((uuid)::text =
'4c79c1c5-21ae-45a0-8734-75d67abd0330'::text)
   Filter: (NOT is_deleted)
 ->  Index Scan using test_db_bench_tenant_closure_pkey
on test_db_bench_tenant_closure tenants_closure  (cost=0.56..1006.97
rows=929 width=18) (never executed)
   Index Cond: (parent_id = tenants_parent.id)
   Filter: (barrier <= 0)
   ->  Index Scan using test_db_bench_tenants_pkey on
test_db_bench_tenants tenants_child  (cost=0.43..0.45 rows=1 width=45)
(never executed)
 Index Cond: (id = tenants_closure.child_id)
 Filter: (NOT is_deleted)
 ->  Index Scan using test_db_bench_1_idx_tenant_id_3 on
test_db_bench_1  (cost=0.43..2.98 rows=135 width=44) (never executed)
   Index Cond: ((tenant_id)::text = (tenants_child.uuid)::text)
 Planning Time: 0.843 ms
 Execution Time: 0.046 ms
(17 rows)

# EXECUTE the_query('4c7

Re: Fwd: Planning time is time-consuming

2023-09-11 Thread Laurenz Albe
On Mon, 2023-09-11 at 12:57 +0800, Mikhail Balayan wrote:
> Thanks for the idea. I was surprised to find that this is not the way it 
> works and the planning time remains the same.

To benefit from the speed gains of a prepared statement, you'd have to execute 
it
at least seven times.  If a generic plan is used (which should happen), you will
see $1 instead of the literal argument in the execution plan.

Prepared statements are probably your best bet.

Yours,
Laurenz Albe




Re: Planning time is time-consuming

2023-09-11 Thread David Rowley
On Mon, 11 Sept 2023 at 18:16, Laurenz Albe  wrote:
> Also, there are quite a lot of indexes on "test_db_bench_1".  On a test 
> database, drop some
> indexes and see if that makes a difference.

Yeah, I count 3 that either have the key columns as some prefix of
another index or are just a duplicate of some other index.

Getting rid of those 3 will save some time in create_index_paths().

David




Re: Planning time is time-consuming

2023-09-11 Thread David Rowley
On Mon, 11 Sept 2023 at 21:54, Mikhail Balayan  wrote:
> Could it be a regression? I'll check it on PG14 when I get a chance.

I'm not sure if you're asking for help here because you need planning
to be faster than it currently is, or if it's because you believe that
planning should always be faster than execution. If you think the
latter, then you're mistaken. It seems to me that the complexity of
planning this query is much more complex than executing it.  The outer
side of the inner-most nested loop finds 0 rows, so it need not scan
the inner side, which results in that nested loop producing 0 rows,
therefore the outer side of none of the subsequent nested loops find
any rows. This is why you see "(never executed)" in the EXPLAIN
ANALYZE.

You could use perf record or perf top to dig into what's slow.

On the other hand, please report back if you find PG14 to be much faster here.

You could also experiment with a set of tables which are empty.  It's
possible getting the relation sizes are a factor to consider here.
mdnblocks() needs to do a bit more work when the relation has multiple
segments.

David




Re: Range partitioning query performance with date_trunc (vs timescaledb)

2023-09-11 Thread Philippe Pepiot
On 29/08/2023, David Rowley wrote:
> On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot  wrote:
> > I'm trying to implement some range partitioning on timeseries data. But it
> > looks some queries involving date_trunc() doesn't make use of partitioning.
> >
> > BEGIN;
> > CREATE TABLE test (
> > time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
> > value FLOAT NOT NULL
> > ) PARTITION BY RANGE (time);
> > CREATE INDEX test_time_idx ON test(time DESC);
> > CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO 
> > ('2021-01-01');
> > CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO 
> > ('2022-01-01');
> > CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS 
> > value FROM test GROUP BY 1;
> > EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP 
> > '2021-01-01';
> > ROLLBACK;
> >
> > The plan query all partitions:
> 
> > I wonder if there is a way with a reasonable amount of SQL code to achieve 
> > this
> > with vanilla postgres ?
> 
> The only options I see for you are
> 
> 1) partition by LIST(date_Trunc('year', time)), or;
> 2) use a set-returning function instead of a view and pass the date
> range you want to select from the underlying table via parameters.
> 
> I imagine you won't want to do #1. However, it would at least also
> allow the aggregation to be performed before the Append if you SET
> enable_partitionwise_aggregate=1.
> 
> #2 isn't as flexible as a view as you'd have to create another
> function or expand the parameters of the existing one if you want to
> add items to the WHERE clause.
> 
> Unfortunately, date_trunc is just a black box to partition pruning, so
> it's not able to determine that DATE_TRUNC('year', time) >=
> '2021-01-01'  is the same as time >= '2021-01-01'.  It would be
> possible to make PostgreSQL do that, but that's a core code change,
> not something that you can do from SQL.

Ok I think I'll go for Set-returning function since
LIST or RANGE on (date_trunc('year', time)) will break advantage of
partitioning when querying with "time betwen x and y".

Thanks!




Re: Planning time is time-consuming

2023-09-11 Thread Frits Hoogland
Any statement that is executed has to go through the 4 stages of query 
execution:
- parse
- rewrite
- plan
- execute

The execute phase is the phase that mostly is the focus on, and is the phase in 
which normally is spent the most time.

In the postgres backend main loop, there are multiple ways of getting a 
statement to go through these stages.
The simple query execution is a single call that performs going through all 
these stages and the other common method is to use the client parse (which 
includes the server side parse and rewrite), bind (which performs the server 
side plan) and execute commands from this backend main loop.

A prepared statement, or named statement, is a way of performing statement 
execution where some of the intermediate results are stored in a memory area in 
the backend and thus allows the backend to persist some of the execution 
details. Non-prepared statement reuse the memory area, and thus flush any 
metadata.

The reason for explaining this is that when preparing a statement, the result 
of the phases of parse and rewrite, which is the parse tree, is stored.
That means that after the prepare, the work of generating the parse tree can be 
omitted by only performing calling bind and execute for the prepared/named 
statement.

The planner statistics are recorded for the calculated cost of a statement with 
the specified variables/binds, and record a cost of when the specified binds 
would be “non specific” alias generic.
After 5 times of execution of a prepared statement, if the generic plan is 
costed equal or lower during than the plan of the statement with the specified 
bind variables, then the backend will switch to the generic plan. 

The advantage of switching to the generic plan is that it will not perform the 
plan costing and all accompanied transformations, but instead directly use the 
generic plan.
For this question, this would ’solve’ the issue of the plan phase taking more 
time than the execution, but potentially only after 5 times of executing the 
prepared statement.
The downside is that because the costing is skipped, it cannot choose another 
plan anymore for that named statement for the lifetime of the prepared 
statement in that backend, unless the backend is instructed explicitly to not 
to use the generic statement.

Frits Hoogland




> On 11 Sep 2023, at 10:13, Laurenz Albe  wrote:
> 
> On Mon, 2023-09-11 at 12:57 +0800, Mikhail Balayan wrote:
>> Thanks for the idea. I was surprised to find that this is not the way it 
>> works and the planning time remains the same.
> 
> To benefit from the speed gains of a prepared statement, you'd have to 
> execute it
> at least seven times.  If a generic plan is used (which should happen), you 
> will
> see $1 instead of the literal argument in the execution plan.
> 
> Prepared statements are probably your best bet.
> 
> Yours,
> Laurenz Albe
> 
> 



Re: Planning time is time-consuming

2023-09-11 Thread Tom Lane
David Rowley  writes:
> I'm not sure if you're asking for help here because you need planning
> to be faster than it currently is, or if it's because you believe that
> planning should always be faster than execution. If you think the
> latter, then you're mistaken.

Yeah.  I don't see anything particularly troubling here.  Taking
circa three-quarters of a millisecond (on typical current hardware)
to plan a four-way join on large tables is not unreasonable.
In most cases one could expect the execution of such a query to
take a good bit longer than that.  I think the OP is putting too
much emphasis on an edge case where execution finishes quickly
because there are in fact zero rows matching the uuid restriction.

BTW, in addition to the duplicative indexes, I wonder why the
uuid columns being joined on aren't all of "uuid" type.  While
I doubt fixing that would move the needle greatly, it still
seems sloppy.

regards, tom lane




Re: Planning time is time-consuming

2023-09-11 Thread Imre Samu
Hi Mikhail.

Postgresql version: 15.3 (Debian 15.3-1.pgdg110+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
> And just in case it matters, this is an experimental setup, so Postgresql
> running in Docker.
>

Are you using the official Docker Postgres image, specifically
`postgres:15.3-bullseye`? ( https://hub.docker.com/_/postgres )
- If so, consider upgrading to version 15.4. It has some planner fixes not
directly related to your issue. Check details here:
 PostgreSQL 15.4 Release Notes
https://www.postgresql.org/docs/release/15.4/
- For all technical text type id columns *apply the `Collate "C"` *.  (
like  `assign_time_str` and `cti_entity_uuid` )
  Alternatively, use the "uuid" column type everywhere, as Tom Lane
suggests.

- Could you provide details on your current tuning settings? I'm interested
in `work_mem`, `shared_buffers`, `effective_cache_size`, and others.
- Please test with different `work_mem` values.

If it's not too much trouble, can you also test with:  ( These version uses
a different locale and LLVM (JIT). )
- postgres:15.4-bookworm
- postgres:15.4-alpine3.18

Regards,
 Imre


Re: Planning time is time-consuming

2023-09-11 Thread David Rowley
On Tue, 12 Sept 2023 at 02:27, Tom Lane  wrote:
>
> David Rowley  writes:
> > I'm not sure if you're asking for help here because you need planning
> > to be faster than it currently is, or if it's because you believe that
> > planning should always be faster than execution. If you think the
> > latter, then you're mistaken.
>
> Yeah.  I don't see anything particularly troubling here.  Taking
> circa three-quarters of a millisecond (on typical current hardware)
> to plan a four-way join on large tables is not unreasonable.

I took a few minutes to reverse engineer the tables in question (with
assistance from an AI bot) and ran the query in question.
Unsurprisingly, I also see planning as slower than execution, but with
a ratio of about planning being 12x slower than execution vs the
reported ~18x.

Planning Time: 0.581 ms
Execution Time: 0.048 ms

Nothing alarming in perf top of executing the query in pgbench with -M
simple.  I think this confirms the problem is just with expectations.

   5.09%  postgres  [.] AllocSetAlloc
   2.99%  postgres  [.] SearchCatCacheInternal
   2.52%  postgres  [.] palloc
   2.38%  postgres  [.] expression_tree_walker_impl
   1.82%  postgres  [.] add_path_precheck
   1.78%  postgres  [.] add_path
   1.73%  postgres  [.] MemoryContextAllocZeroAligned
   1.63%  postgres  [.] base_yyparse
   1.61%  postgres  [.] CatalogCacheComputeHashValue
   1.38%  postgres  [.] try_nestloop_path
   1.36%  postgres  [.] stack_is_too_deep
   1.33%  postgres  [.] add_paths_to_joinrel
   1.19%  postgres  [.] core_yylex
   1.18%  postgres  [.] lappend
   1.15%  postgres  [.] initial_cost_nestloop
   1.13%  postgres  [.] hash_search_with_hash_value
   1.01%  postgres  [.] palloc0
   0.95%  postgres  [.] get_memoize_path
   0.90%  postgres  [.] equal
   0.88%  postgres  [.] get_eclass_for_sort_expr
   0.81%  postgres  [.] compare_pathkeys
   0.80%  postgres  [.] bms_is_subset
   0.77%  postgres  [.] ResourceArrayRemove
   0.77%  postgres  [.] check_stack_depth
   0.77%  libc.so.6 [.] __memmove_avx_unaligned_erms
   0.74%  libc.so.6 [.] __memset_avx2_unaligned
   0.73%  postgres  [.] AllocSetFree
   0.71%  postgres  [.] final_cost_nestloop
   0.69%  postgres  [.] compare_path_costs_fuzzily
   0.68%  postgres  [.] initial_cost_mergejoin
   0.64%  libc.so.6 [.] __memset_avx2_unaligned_erms
   0.61%  postgres  [.] create_nestloop_path
   0.61%  postgres  [.] examine_variable
   0.59%  postgres  [.] hash_bytes
   0.56%  postgres  [.] truncate_useless_pathkeys
   0.56%  postgres  [.] bms_overlap

David