Extremely slow when query uses GIST exclusion index
Hi. My databases make heavy use of timestamp ranges, and they rely on GIST
exclusion constraints to ensure that the ranges are disjoint. I've noticed
that queries that hit the GIST indexes are EXTREMELY slow, and the queries
run much faster if I make trivial changes to avoid the GIST indexes.
Here's the setup for a test case. (Timings were collected on PostgreSQL
9.5.4 on x86_64-pc-linux-gnu, Intel Xeon E5 in a VM with 3.5 GB RAM):
CREATE TABLE app (
pk SERIAL PRIMARY KEY,
group_id TEXT NOT NULL,
app_time TIMESTAMPTZ NOT NULL
);
CREATE TABLE group_span (
pk SERIAL PRIMARY KEY,
group_id TEXT NOT NULL,
valid_period TSTZRANGE NOT NULL,
EXCLUDE USING GIST (group_id WITH =, valid_period WITH &&)
);
CREATE TABLE member_span (
pk SERIAL PRIMARY KEY,
member_id TEXT NOT NULL,
group_id TEXT NOT NULL,
valid_period TSTZRANGE NOT NULL,
EXCLUDE USING GIST
(member_id WITH =, group_id WITH =, valid_period WITH &&)
);
-- Fill tables with some random data
INSERT INTO app (group_id, app_time)
SELECT
MD5(CONCAT(GENERATE_SERIES(1, 1), RANDOM())),
DATE_TRUNC('month', TIMESTAMPTZ '2000-01-01' +
INTERVAL '3 years' * RANDOM());
-- Give groups a 1-year span, and give some groups a 2nd-year span:
INSERT INTO group_span (group_id, valid_period)
(SELECT
group_id,
TSTZRANGE(app_time, app_time + INTERVAL '1 year')
FROM app)
UNION ALL
(SELECT
group_id,
TSTZRANGE(app_time + INTERVAL '1 year',
app_time + INTERVAL '2 year')
FROM app LIMIT 2000);
-- Create members with a random span within their group_span:
INSERT INTO member_span (member_id, group_id, valid_period)
SELECT
MD5(RANDOM()::TEXT),
group_id,
TSTZRANGE(
LOWER(valid_period),
UPPER(valid_period) - DATE_TRUNC(
'days',
(UPPER(valid_period) - LOWER(valid_period)) * RANDOM()
)
)
FROM group_span;
Given this setup, here's a query that hits the GIST exclusion index on the
"member_span" table. It takes 38 sec on my machine:
SELECT *
FROM app
JOIN group_span ON
app.group_id = group_span.group_id AND
app.app_time <@ group_span.valid_period
JOIN member_span ON
group_span.group_id = member_span.group_id AND
group_span.valid_period && member_span.valid_period;
Here's the query plan for that query:
Nested Loop (cost=319.27..776.39 rows=1 width=196) (actual
time=15.370..38406.466 rows=1 loops=1)
Join Filter: (app.group_id = member_span.group_id)
-> Hash Join (cost=319.00..771.00 rows=12 width=104) (actual
time=5.790..130.613 rows=1 loops=1)
Hash Cond: (group_span.group_id = app.group_id)
Join Filter: (app.app_time <@ group_span.valid_period)
Rows Removed by Join Filter: 2000
-> Seq Scan on group_span (cost=0.00..257.00 rows=12000 width=59)
(actual time=0.005..16.282 rows=12000 loops=1)
-> Hash (cost=194.00..194.00 rows=1 width=45) (actual
time=5.758..5.758 rows=1 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 910kB
-> Seq Scan on app (cost=0.00..194.00 rows=1 width=45)
(actual time=0.002..2.426 rows=1 loops=1)
-> Index Scan using member_span_member_id_group_id_valid_period_excl on
member_span (cost=0.28..0.44 rows=1 width=92) (actual time=1.988..3.817
rows=1 loops=1)
Index Cond: ((group_id = group_span.group_id) AND
(group_span.valid_period && valid_period))
Planning time: 0.784 ms
Execution time: 38410.227 ms
We can make a small tweak to the query to make it complicated enough that
the execution planner avoids the GIST index. In this particular case, we
can replace "app.app_time <@ group_span.valid_period" with the
equivalent "app.app_time
>= LOWER(group_span.valid_period) AND app.app_time <
UPPER(group_span.valid_period)". This equivalent query is MUCH faster:
SELECT *
FROM app
JOIN group_span ON
app.group_id = group_span.group_id AND
app.app_time >= LOWER(group_span.valid_period) AND
app.app_time < UPPER(group_span.valid_period)
JOIN member_span ON
group_span.group_id = member_span.group_id AND
group_span.valid_period && member_span.valid_period;
It only takes 86 ms, even though it's doing 3 seq scans instead of using
the index:
Hash Join (cost=953.71..1186.65 rows=8 width=196) (actual
time=58.364..84.706 rows=1 loops=1)
Hash Cond: (app.group_id = group_span.group_id)
Join Filter: ((app.app_time >= lower(group_span.valid_period)) AND
(app.app_time < upper(group_span.valid_period)))
Rows Removed by Join Filter: 2000
-> Seq Scan on app (cost=0.00..194.00 rows=1 width=45) (actual
time=0.007..2.391 rows=1 loops=1)
-> Hash (cost=952.81..952.81 rows=72 width=151) (actual
time=58.343..58.343 rows=12000 loops=1)
Buckets: 16384 (originally 1024) Batches: 1 (originally 1) Memory
Usage: 2285kB
-> Hash Join (cost=407.00..952.81 rows=72 width=151) (actual
time=15.048..44.103 rows=12000 loops=1)
Hash Cond: (member_span.group_id = group_span.group_id)
Join Filter: (group_s
Re: Extremely slow when query uses GIST exclusion index
Thanks for your help investigating this! Follow-up below: On Wed, Aug 29, 2018 at 7:25 AM, Andreas Kretschmer wrote: > > Okay, other solution. The problem is the nested loop, we can disable that: >> > test=*# set enable_nestloop to false; Is it OK to keep this off permanently in production? I thought these settings were just for debugging, and once we've identified the culprit, we're supposed to take other steps (?) to avoid the suboptimal execution plan. your GiST-Index contains (member_id,group_id,valid_period), but your query > is only on the latter 2 fields. Yeah, I didn't really want GIST index in the first place -- PostgreSQL created it automatically as a side effect of the exclusion constraint that I need. Your suggestion to create *another* GIST index is an interesting workaround. But we've seen that the query runs even faster if we didn't have the GIST index(es) at all. So is there any way to tell the planner to avoid the GIST index altogether? (Alternatively, could there be a bug that's causing PostgreSQL to underestimate the cost of using the GIST index?) > Nested Loop (cost=319.27..776.18 rows=1 width=196) (actual > time=3.156..334.963 rows=1 loops=1) >Join Filter: (app.group_id = member_span.group_id) >-> Hash Join (cost=319.00..771.00 rows=12 width=104) (actual > time=3.100..14.040 rows=1 loops=1) Hm, also, it looks like one of the oddities of this query is that PostgreSQL is severely underestimating the cardinality of the join. It seems to think that the join will result in only 1 row, when the join actually produces 10,000 rows. Maybe that's why the planner thinks that using the GIST index is cheap? (I.e., the planner thought that it would only need to do 1 GIST index lookup, which is cheaper than a sequential scan; but in reality it has to do 10,000 GIST index lookups, which is much more expensive than a scan.) Is there any way to help the planner better estimate how big the join output going to be?
Re: Batch insert heavily affecting query performance.
Jean, It is very likely you are running out of IOPS with that size of server. We have several Postgres databases running at AWS. We consistently run out of IOPS on our development servers due to the types queries and sizing of our development databases. I would check the AWS monitoring graphs to determine the cause. We typically see low CPU and high IOPS just prior to our degraded performance. Our production environment runs provisioned IOPS to avoid this very issue. Regards, David From: Jean Baro To: Jeremy Finzel Cc: Danylo Hlynskyi ; [email protected] Sent: Wednesday, December 27, 2017 11:03 AM Subject: Re: Batch insert heavily affecting query performance. Sorry guys, The performance problem is not caused by PG. 'Index Scan using idx_user_country on public.old_card (cost=0.57..1854.66 rows=460 width=922) (actual time=3.442..76.606 rows=200 loops=1)'' Output: id, user_id, user_country, user_channel, user_role, created_by_system_key, created_by_username, created_at, last_modified_at, date_start, date_end, payload, tags, menu, deleted, campaign, correlation_id'' Index Cond: (((old_card.user_id)::text = '1234'::text) AND (old_card.user_country = 'BR'::bpchar))'' Buffers: shared hit=11 read=138 written=35''Planning time: 7.748 ms''Execution time: 76.755 ms' 77ms on an 8GB database with 167MM rows and almost 500GB in size is amazing!! Now we are investigating other bottlenecks, is it the creation of a new connection to PG (no connection poller at the moment, like PGBouncer), is it the Lambda start up time? Is it the network performance between PG and Lambda? I am sorry for wasting your time guys, it helped us to find the problem though, even if it wasn't a PG problem. BTW, what a performance! I am impressed. Thanks PG community! Em 27 de dez de 2017 14:34, "Jean Baro" escreveu: Thanks Jeremy, We will provide a more complete EXPLAIN as other people have suggested. I am glad we might end up with a much better performance (currently each query takes around 2 seconds!). Cheers Em 27 de dez de 2017 14:02, "Jeremy Finzel" escreveu: The EXPLAIN 'Index Scan using idx_user_country on card (cost=0.57..1854.66 rows=460 width=922)'' Index Cond: (((user_id)::text = '4684'::text) AND (user_country = 'BR'::bpchar))' Show 3 runs of the full explain analyze plan on given condition so that we can also see cold vs warm cache performance. There is definitely something wrong as there is no way a query like that should take 500ms. Your instinct is correct there.
Re: pgaudit and create postgis extension logs a lot inserts
On 1/19/18 6:05 AM, Magnus Hagander wrote: > > > On Thu, Jan 18, 2018 at 6:54 PM, Joe Conway <mailto:[email protected]>> wrote: > > On 01/18/2018 04:12 AM, Svensson Peter wrote: > > When I then install postgis extension in a database it writes a huge > > amount of logs which slow down the server a lot. > > Not only table creation and functions are logged, even all inserts in > > spatial_ref_sys are written to the audit-log. > > > > LOG: AUDIT: SESSION,1,1,DDL,CREATE FUNCTION,,," > > .. > > INSERT INTO ""spatial_ref_sys"" (""srid"",""auth_name" > > > > > > This behaviour make pgaudit useless in our environment due to the > > overhead in log-file write. > > How often do you intend to install PostGIS? Disable pgaudit, install > PostGIS, enable pgaudit? > > > Would it make sense for pgaudit to, at least by option, not include DDL > statements that are generated as "sub-parts" of a CREATE EXTENSION? It > should still log the CREATE EXTENSION of course, but not necessarily all > the contents of it, since that's actually defined in the extension > itself already? That's doable, but I think it could be abused if it was always on and installing extensions is generally not a daily activity. It seems in this case the best action is to disable pgaudit before installing postgis or install postgis first. Regards, -- -David [email protected]
Re: pgaudit and create postgis extension logs a lot inserts
Hi Peter, On 1/18/18 7:12 AM, Svensson Peter wrote: > > Also noticed that setting a session log to none (set > pgaudit.log='none';) overrides parameter from postgresql.conf, but > does not get logged, and then you can do whatever you want without any > audit. > I supposed this changing of audit session log parameter should be > logged to file? pgaudit is not intended to audit the superuser and only a superuser can set pgaudit.log. However, you can limit superuser access with the setuser extension: https://github.com/pgaudit/set_user Regards, -- -David [email protected]
Slow performance after restoring a dump
=685 width=4) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: (product_id = 2466420) Filter: (credit_stock_account_id = 3) Buffers: shared hit=3 -> Index Scan using account_trans_pkey on account_trans a (cost=0.00..9.49 rows=1 width=8) (never executed) Index Cond: (account_trans_id = s.account_trans_id) Total runtime: 0.050 ms (15 rows) Regards, -- David
Re: Slow performance after restoring a dump
Hi, yes I've run "analyse" against the newly restored database. Should that be enough? On 19 March 2018 at 15:35, Tom Lane wrote: > David Osborne writes: > > The first question people will ask is did you re-ANALYZE the new > database? pg_dump doesn't take care of that for you, and auto-analyze > might not think it needs to process the smaller tables. > >
Re: Slow performance after restoring a dump
That did the trick... thanks! yes perhaps a minor planner difference just tipped us over the edge previously => alter table stock_trans alter column product_id set STATISTICS 1000; QUERY PLAN -- Limit (cost=3461.10..3461.10 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1) Buffers: shared hit=3 -> Sort (cost=3461.10..3461.75 rows=260 width=4) (actual time=0.013..0.013 rows=0 loops=1) Sort Key: a.trans_date DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=3 -> Nested Loop (cost=0.87..3459.80 rows=260 width=4) (actual time=0.008..0.008 rows=0 loops=1) Buffers: shared hit=3 -> Index Scan using stock_trans_product_idx on stock_trans s (cost=0.43..1263.55 rows=260 width=4) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (product_id = 2466420) Filter: (credit_stock_account_id = 3) Buffers: shared hit=3 -> Index Scan using account_trans_pkey on account_trans a (cost=0.43..8.44 rows=1 width=8) (never executed) Index Cond: (account_trans_id = s.account_trans_id) Planning time: 0.255 ms Execution time: 0.039 ms (16 rows) On 19 March 2018 at 16:22, Tom Lane wrote: > David Osborne writes: > > Hi, yes I've run "analyse" against the newly restored database. Should > that > > be enough? > > My apologies, you did say that further down in the original message. > It looks like the core of the problem is the poor rowcount estimation > here: > > -> Bitmap Index Scan on stock_trans_product_idx > (cost=0.00..31.42 rows=1465 width=0) (actual time=0.009..0.009 rows=0 > loops=1) >Index Cond: (product_id = 2466420) >Buffers: shared hit=3 > > You might be able to improve that by raising the statistics target > for stock_trans.product_id. I'm not sure why you weren't getting > bitten by the same issue in 9.1; but the cost estimates aren't > that far apart for the two plans, so maybe you were just lucky ... > > regards, tom lane > -- David Osborne Qcode Software Limited http://www.qcode.co.uk T: +44 (0)1463 896484
Re: Slow planning time for custom function
On 24 March 2018 at 14:35, Andres Freund wrote:
> How long does planning take if you repeat this? I wonder if a good chunk
> of those 1.8s is initial loading of plv8.
Maybe, but it also could be the execution of the function, after all,
the planner does invoke immutable functions:
# explain verbose select lower('TEST');
QUERY PLAN
---
Result (cost=0.00..0.01 rows=1 width=32)
Output: 'test'::text
(2 rows)
Would be interesting to see what changes without the IMMUTABLE flag.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: functions: VOLATILE performs better than STABLE
On 25 March 2018 at 18:00, Laurenz Albe wrote: > Peter wrote: >> Over all, VOLATILE performs better than STABLE. > > The reason is that the subquery with the VOLATILE function can be > flattened; see the EXPLAIN (VERBOSE) output. > > There is not guarantee that less volatility means better performance. Although, it would be nice. TPC-H Q1 does appear to be crafted to allow database with smarter expression evaluation to get a better score. It would probably require some sort of recursive expression evaluation where at each level we check if that expression has already been seen, if it has, then replace it with some sort of placeholder, then evaluate each placeholder in the required order. Probably the first part could be done during planning. It would mean targetlists would need to carry a bit more weight. It would be an interesting project to work on, but not planning to personally. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: help in analysis of execution plans
On 6 May 2018 at 03:16, Neto pr wrote:
> Execution Plan 1 (query execution time 4.7 minutes):
> - https://explain.depesz.com/s/Ughh
> - Postgresql version 10.1 (default) with index on l_shipdate (table lineitem)
>
> Execution Plan 2 (query execution time 2.95 minutes):
> - https://explain.depesz.com/s/7Zb7
> - Postgresql version 9.5 (version with source code changed by me) with
> index on l_orderkey (table lineitem).
>
> Some doubts
> - Difference between GroupAggregate and Finalize GroupAggregate
A Finalize Aggregate node is required to combine the Partially
Aggregated records. A Partially Aggregated result differs from normal
aggregation as the final function of the aggregate is not called
during partial aggregation. This allows the finalize aggregate to
combine the partially aggregated results then call the final function.
Imagine an aggregate like AVG() where it goes and internally
calculates the sum and the count of non-null records. A partial
aggregate node would return {sum, count}, but a normal aggregate would
return {sum / count}. Having {sum, count} allows each partial
aggregated result to be combined allowing the average to be calculated
with the total_sum / total_count.
> - because some algorithms show measurements on "Disk" and others on
> "Memory" example:
> - External sort Disk: 52784kB
> - quicksort Memory: 47770kB
Please read about work_mem in
https://www.postgresql.org/docs/current/static/runtime-config-resource.html
The reason 10.1 is slower with the parallel query is down to the
bitmap heap scan going lossy and scanning many more heap pages than it
expected. You could solve this in various ways:
1. Increase work_mem enough to prevent the scan from going lossy (see
lossy=1605531 in your plan)
2. turn off enable_bitmapscans (set enable_Bitmapscan = off);
3. Cluster the table on l_shipdate
Unfortunately, parallel query often will choose to use a parallel plan
utilising multiple workers on a less efficient plan when it estimates
the cost / n_workers is lower than the cheapest serial plan's cost.
This appears to also be a contributor to your problem. You may get the
9.5 performance if you disabled parallel query, or did one of the 3
options above.
You may also want to consider using a BRIN index on the l_shipdate
instead of a BTREE index. The cost estimation for BRIN may realise
that the bitmap heap scan is not a good option, although I'm not sure
it'll be better than what the current v10 plan is using.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
On Wed, 27 Apr 2022 at 19:54, Emil Iggland wrote: > > > You've got the wrong column order (for this query anyway) in that > > index. It'd work a lot better if dataview were the first column; > I might be misunderstanding you, but I assume that you are suggesting an > index on (dataview, valuetimestamp). > We have that index, it is the primary key. For some reason it isn't > being selected. I don't think that index can be used for your original query. It could only be used if "channel" is unique in "valueseries" and you'd written the query as: select * from datavalue where dataview = (select id from valueseries where channel = 752433) ORDER BY VALUETIMESTAMP DESC FETCH FIRST ROW only; that would allow a backwards index scan using the (dataview, valuetimestamp) index. Because you're using the IN clause to possibly look for multiple "dataview" values matching the given "channel", the index range scan does not have a single point to start at. What you've done with the LATERAL query allows the index to be scanned once for each "valueseries" row with a "channel" value matching your WHERE clause. I guess "channel" must not be the primary key to "valueseries" and that's why you use an IN(). The above query would return an error if multiple rows were returned by the subquery. David
Re: Useless memoize path generated for unique join on primary keys
On Tue, 3 May 2022 at 23:05, Benjamin Coutu wrote: > -> Memoize (cost=0.58..0.67 rows=1 width=16) (actual time=0.002..0.002 > rows=0 loops=4067215) > Cache Key: e2.field, e2.index > Cache Mode: logical > Hits: 0 Misses: 4067215 Evictions: 3228355 Overflows: 0 Memory > Usage: 65537kB > Buffers: shared hit=16268863 > -> Index Only Scan using pk_extdataregular on extdataregular e1 > (cost=0.57..0.66 rows=1 width=16) (actual time=0.001..0.001 rows=0 > loops=4067215) > Index Cond: ((field = e2.field) AND (index = e2.index)) > Why would it ever make sense to generate a memoize plan for a unique join? It wouldn't ever make sense. The problem is that estimate_num_groups() is used to estimate the number of distinct values and that function does not know about primary keys. There's no way the costing of Memoize would allow a Memoize plan to be used if it thought all values were unique, so the only possibility here is that ndistinct is being underestimated by some amount that makes Memoize look like the most favourable plan. You could see what the planner thinks about the ndistinct estimate on field, index by doing: EXPLAIN SELECT field,index FROM extdataregular GROUP BY 1,2; Whatever you see in the final row estimate for that plan is what's being fed into the Memoize costing code. > I think this issue might tie in with the current discussion over on the > hackers mailing list [1] I'd say it's a pretty different problem. The cache hit ratio discussion on that thread talks about underestimating the hit ratio. That particular problem could only lead to Memoize plans *not* being chosen when they maybe should be. Not the other way around, which is your case. create statistics extdataregular_field_index_stats (ndistinct) on field, index from extdataregular; analyze extdataregular; would likely put that right. David
Re: Useless memoize path generated for unique join on primary keys
On Wed, 4 May 2022 at 00:21, Benjamin Coutu wrote: > Thanks David, using extended statistics for both (and only for both) tables > solved this problem. Oh, whoops. I did get that backwards. The estimate used by the Memoize costing code is from the outer side of the join, which is the extdataempty in this case. I don't think the extdataregular_field_index_stats will do anything. It'll be the ones you added on extdataempty that are making it work. > BTW, thank you for all your work on performance in recent releases. Thanks for the feedback :) David
Re: Window partial fetch optimization
On Wed, 4 May 2022 at 06:11, Levi Aul wrote: > It is our expectation that this query “should” be able to be cheap-to-compute > and effectively instantaneous. (It’s clear to us how we would make it so, > given a simple LMDB-like sorted key-value store: prefix-match on > holder_address; take the first row you find for the contract-address you’re > on; build a comparator key of (holder_address, contract_address, > highest-possible-version) and traverse to find the lowest row that sorts > greater than it; repeat.) > > Which might, in SQL, be expressed as something like this: > > WITH ranked_holder_balances AS ( > SELECT > *, > row_number() OVER w AS balance_rank > FROM contract_balance_updates > WHERE holder_address = '\x'::bytea > WINDOW w AS ( > PARTITION BY holder_address, contract_address > ORDER BY start_block_height DESC > ) > ORDER BY holder_address ASC, contract_address ASC, start_block_height DESC > ) > SELECT * > FROM ranked_holder_balances > WHERE balance_rank = 1 Yes, PostgreSQL 14 is not very smart about realising that WHERE balance_rank = 1 is only going to match the first row of each window partition. PostgreSQL 15 (coming later this year) should be better in this regard as some work was done to teach the query planner about monotonic window functions [1]. However, that change likely does not do all you'd like here as the WindowAgg node still must consume and throw away all tuples until it finds the first tuple belonging to the next window partition. It sounds like you really want "Skip Scans" or "Loose Index Scans" which are implemented by some other RDBMS'. I imagine that even with the change to PostgreSQL 15 that it still wouldn't be as fast as your DISTINCT ON example. > WITH bup1 AS ( > SELECT DISTINCT bup.holder_address, bup.contract_address > FROM contract_balance_updates bup > WHERE bup.holder_address = > '\xe03c23519e18d64f144d2800e30e81b0065c48b5'::bytea > ORDER BY bup.contract_address ASC > ) > SELECT > bup1.holder_address, > bup1.contract_address, > ( > SELECT balance > FROM contract_balance_updates bup2 > WHERE bup2.holder_address = bup1.holder_address > AND bup2.contract_address = bup1.contract_address > ORDER BY bup2.holder_address ASC, bup2.contract_address ASC, > bup2.start_block_height DESC > LIMIT 1 > ) AS balance > FROM bup1 > I really don’t like this last approach; it scans twice, it’s surprising / > confusing for people maintaining the query, etc. I believe that, due to the > correlated subquery, the planning overhead is also O(N) with the number of > matched entities increases (though I don’t have a good test-case for this.) No, the subquery is not replanned each time it is rescanned. It's planned once and that same plan will be executed each time. So no O(n) planning overhead. > Is there any way to get PG to do what this last query is doing, purely using > window-functions / distinct on / etc.? Because, if there is, I can’t find it. > > It seems that PG can in fact do index-range-seeking (since that’s what it’s > doing when gathering the distinct contract_addresses in the last query.) It > seems intuitive to me that it should be using such an approach to filter for > rows in window/group-partitions, when a criteria+index that can be combined > to limit the size of the window/group are available to the planner. And that, > even when not able to be automatically inferred, it would make sense for > there to be control over such behaviour in SQL, using hypothetical syntax > like: Unfortunately, DISTINCT can only be implemented with Hash Aggregate or Sort / Index Scan + Unique. We don't have anything currently which will jump to the next highest key in an index. There has been some work on what we're starting to call "Skip Scans", but it's all still work in progress. You might find something useful in [2] which might help speed up your DISTINCT query. > -- for windows > row_number() OVER (PARTITION BY x ORDER BY x LIMIT 10 OFFSET 3) > > -- for groups > GROUP BY x, y, z (APPLYING LIMIT 20 OFFSET 5 PER GROUP) > > > Does this make sense? Or is this something PG is already doing, and I just > haven’t found the right magic words / built my index correctly to unlock it? > (I notice that the last example is an index-only scan; would I get this > behaviour from the previous two queries if I made the index a covering index > such that those could be index-only scans as well?) Unfortunately, there is no magic words here. PostgreSQL 14 simply has no ability to know that row_number() is monotonically increasing, therefore has no ability to skip any processing for rows that are never needed. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd1aea8e9131d8f4edb21bf1687e40782 [2] https://wiki.postgresql.org/wiki/Loose_indexscan
Re: Why is there a Sort after an Index Only Scan?
On Thu, 5 May 2022 at 11:15, André Hänsel wrote: > > Quick(?) question... why is there a Sort node after an Index Only Scan? > Shouldn't the index already spit out sorted tuples? > > CREATE INDEX ON orders_test(shipping_date, order_id); > > EXPLAIN ANALYZE SELECT > FROM orders_test > WHERE TRUE > AND shipping_date >= '2022-05-01' > AND shipping_date <= '2022-05-01' > ORDER BY order_id > LIMIT 50; Unfortunately, the query planner is not quite smart enough to realise that your shipping_date clauses can only match a single value. There's quite a bit more we could do with the planner's EquivalanceClasses. There is a patch around to help improve things in this area but it requires some more infrastructure to make it more practical to do from a performance standpoint in the planner. You'll get the plan you want if you requite the query and replace your date range with shipping_date = '2022-05-01'. Your use of WHERE TRUE indicates to me that you might be building this query in an application already, so maybe you can just tweak that application to test if the start and end dates are the same and use equality when they are. David [1] https://commitfest.postgresql.org/38/3524/
Re: Postgresql 13 partitioning advice
On Tue, 2 Aug 2022 at 19:48, Ameya Bidwalkar wrote: > We have a Postgresql 13 database where we have a single table with several > millions of rows . We plan to partition it based on timestamp . > We have been seeking advice for best practices for building this. > This table will get lots of updates for the same rows during a short period > of time.During this time rows would be in a single partition . > After this short time these rows would move to another partition .Where no > more updates take place on these rows.But might have some SELECT queries > running. > We plan to l have partitions based on months and then roll them up in a year > and then archive these older partitions > One consultant we talked with told us this row movement between the > partitions will have > huge complications .But this was an issue during the Postgres 10 version . Define "huge complications"? The capabilities of partitioned tables have changed quite a bit since the feature was added. It's very easy for knowledge to get out-dated in this area. I did quite a bit of work on them and I struggle to remember off the top of my head which versions saw which improvements. PG12 saw lots. See [1], search for "partition". One possible complication is what is mentioned in [2] about "serialization failure error". UPDATEs that cause a tuple to move to another partition can cause a serialization failure at transaction isolation level, not just serializable transactions. If it's not already, you might want to have your application retry transactions on SQL:40001 errors. Apart from that, assuming there's comparatively a small number of rows in the partition being updated compared to the partition with the static rows, then it sounds fairly efficient. As you describe it, the larger static partition is effectively INSERT only and auto-vacuum will need to touch it only for tuple freezing work. The smaller of the two tables will receive more churn but will be faster to vacuum. PG13 got a new feature that makes sure auto-vacuum also does the rounds on INSERT-only tables too, so the static partition is not going to be neglected until anti-wrap-around-autovacuums trigger, like they would have in PG12 and earlier. Another thing to consider is that an UPDATE of a non-partitioned table has a chance at being a HOT update. That's possible if the tuple can fit on the same page and does not update any of the indexed columns. A HOT update means no indexes need to be updated so these perform faster and require less space in WAL than a non-HOT update. An UPDATE that moves a tuple to another partition can never be a HOT update. That's something you might want to consider. If you're updating indexed columns already then it's not a factor to consider.There's also overhead to postgres having to find the partition for the newly updated version of the tuple. That's not hugely expensive, but it's generally measurable. RANGE partitioned tables with a large number of partitions will have the most overhead for this. HASH partitioned tables, the least. The best thing you can likely do is set up a scenario with pgbench and compare the performance. pgbench is a pretty flexible tool that will allow you to run certain queries X% of the time and even throttle the workload at what you expect your production server to experience. You could then run it overnight on a test server, or even for weeks and see how auto-vacuum keeps up when compared to the non-partitioned case. You can also check how much extra WAL is generated vs the non-partitioned case. > So we are seeking advice on the performance perspective and things we should > take care of along with manual vacuums on a regular schedule and indexing. > Are there any tunables I should experiment with in particular ? Perhaps if you want to keep a small high-chun table in check you might want to consider if autovacuum_naptime is set low enough. You may not care if the space being consumed in the standard 1min autovacuum_naptime is small enough not to be of concern. David [1] https://www.postgresql.org/docs/release/12.0/ [2] https://www.postgresql.org/docs/13/sql-update.html
Re: Catching up with performance & PostgreSQL 15
On Wed, 30 Nov 2022 at 03:31, Tom Lane wrote: > > Alvaro Herrera writes: > > IMO it was a mistake to turn JIT on in the default config, so that's one > > thing you'll likely want to change. > > I wouldn't necessarily go quite that far, but I do think that the > default cost thresholds for invoking it are enormously too low, > or else there are serious bugs in the cost-estimation algorithms > for deciding when to use it. A nearby example[1] of a sub-1-sec > partitioned query that took 30sec after JIT was enabled makes me > wonder if we're accounting correctly for per-partition JIT costs. I'm very grateful for JIT. However, I do agree that the costs need to work. The problem is that the threshold to turn JIT on does not consider how many expressions need to be compiled. It's quite different to JIT compile a simple one-node plan with a total cost of 10 than to JIT compile a plan that costs the same but queries 1000 partitions. I think we should be compiling expressions based on the cost of the individial node rather than the total cost of the plan. We need to make some changes so we can more easily determine the number of times a given node will be executed before we can determine how worthwhile JITting an expression in a node will be. David > [1] > https://www.postgresql.org/message-id/B6025887-D73F-4B5B-9925-4DA4B675F7E5%40elevated-dev.com
Re: wrong rows and cost estimation when generic plan
On Tue, 6 Dec 2022 at 18:28, James Pang (chaolpan) wrote: >-> Index Scan using idx_x_time on x (cost=0.44..8.48 rows=1 > width=2923) (actual time=8136.242..8136.242 rows=0 loops=1) > Index Cond: ((starttime = $7) AND (endtime = $8)) > Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND > (btype = $6) AND... > Rows Removed by Filter: 5534630 I wonder if you did: create statistics x_starttime_endtime_stats (ndistinct) on starttime,endtime from x; analyze x; if the planner would come up with a higher estimate than what it's getting for the above and cause it to use the other index instead. > optimzer is very complicated, could you direct me how optimizer to do > selectivity estimation when building generic plan, for this case? for > custom_plan, optimizer knows boundparams values, but when generic_plan, > planner() use boundparams=NULL, it try to calculate average value based on > mcv list of the index attributes (starttime,endtime) ? IIRC, generic plan estimates become based on distinct estimations rather than histograms or MCVs. David
Re: wrong rows and cost estimation when generic plan
On Tue, 6 Dec 2022 at 20:17, James Pang (chaolpan) wrote: > Could you provide the function name for generic plan selectivity estimation? If you look at eqsel_internal(), you'll see there are two functions that it'll call var_eq_const() for Consts and otherwise var_eq_non_const(). It'll take the non-Const path for planning generic plans. David
Re: DML sql execution time slow down PGv14 compared with PGv13
On Thu, 15 Dec 2022 at 21:12, James Pang (chaolpan) wrote: >We had some load test ( DML inserts/deletes/updates/ on tens of hash > partition tables) and found that PGV14 slow down 10-15% compared with PGV13. > Same test server, same schema tables and data. From pg_stat_statements, sql > exec_time, we did found similar mean_exec_time increased from 5%-25% with > same SQL statements. Both v14 and v13 give very fast sql response time, just > compare the %diff from sql statements mean_exec_time. I tried this out on the tip of the PG13 and PG14 branch with the same scale of pgbench as you mentioned and I don't see the same slowdown as you do. PG13: tps = 1711.980109 (excluding connections establishing) PG14: tps = 1736.466835 (without initial connection time) As for why yours might be slower. You might want to have a look at the EXPLAIN ANALYZE output for the UPDATE statements. You can recreate the -M prepared by using PREPARE and EXECUTE. You might want to execute the statements 6 times and see if the plan changes on the 6th execution. It's likely not impossible that PG14 is using custom plans, whereas PG13 might be using generic plans for these updates. There were some quite significant changes made to the query planner in PG14 that changed how planning works for UPDATEs and DELETEs from partitioned tables. Perhaps there's some reason there that the custom/generic plan choice might differ. I see no reason why INSERT would have become slower. Both the query planning and execution is very different for INSERT. You might also want to have a look at what perf says. If you have the debug symbols installed, then you could just watch "perf top --pid=". Maybe that will show you something interesting.
Re: Postgres12 looking for possible HashAggregate issue workarounds?
On Sun, 18 Dec 2022 at 23:44, João Paulo Luís wrote: > Meanwhile, as a one-time workaround I've disabled the hashagg algorithm, The way the query planner determines if Hash Aggregate's hash table will fit in work_mem or not is based on the n_distinct estimate of the columns being grouped on. You may want to review what analyze set n_distinct to on this table. That can be done by looking at: select attname,n_distinct from pg_Stats where tablename = 'sentencesource' and attname = 'sentence'; If what that's set to does not seem realistic, then you can overwrite this with: ALTER TABLE sentencesource ALTER COLUMN sentence SET (n_distinct = N); Please see the paragraph in [1] about n_distinct. Using an absolute value is likely not a great idea if the table is going to grow. You could maybe give it a better estimate about how many times values are repeated by setting some negative value, as described in the documents. You'll need to analyze the table again after changing this setting. David [1] https://www.postgresql.org/docs/12/sql-altertable.html
Re: Getting an index scan to be a parallel index scan
On Wed, 1 Feb 2023 at 18:39, Alex Kaiser wrote: > postgres=# set force_parallel_mode = on; There's been a bit of debate about that GUC and I'm wondering how you came to the conclusion that it might help you. Can you share details of how you found out about it and what made you choose to set it to "on"? David
Re: Getting an index scan to be a parallel index scan
On Thu, 2 Feb 2023 at 14:49, Thomas Munro wrote: > If I had more timerons myself, I'd like to try to make parallel > function scans, or parallel CTE scans, work... I've not really looked in detail but I thought parallel VALUES scan might be easier than those two. David
Re: Window Functions & Table Partitions
On Thu, 9 Feb 2023 at 10:45, Benjamin Tingle wrote:
> Basically- window partition functions don't take advantage of existing table
> partitions. I use window functions as a more powerful GROUP BY clause that
> preserves row-by-row information- super handy for a lot of things.
>
> In particular, I want to use window functions on already partitioned tables,
> like the below example:
>
> create table abb (a int, b int, g int) partition by hash(b)
> /* populate table etc... */
> select a, b, min(a) over (partition by b) as g from abb
>
> Ideally with a query plan like this:
>
> Window:
> Append:
> Sort on table_p0
> Sort on table_p1
> Sort on table_p2
There was some effort [1] in version 12 to take advantage of the order
defined by the partitioning scheme. The release notes [2] mention:
"Avoid sorting when partitions are already being scanned in the necessary order"
However, it's not 100% of what you need as there'd have to be a btree
index on abb(b) for the planner to notice.
Likely this could be made better so that add_paths_to_append_rel()
added the pathkeys defined by the partitioned table into
all_child_pathkeys if they didn't exist already. In fact, I've
attached a very quickly hacked together patch against master to do
this. I've given it very little thought and it comes complete with
failing regression tests.
If you're interested in pursuing this then feel free to take the patch
to the pgsql-hackers mailing list and propose it. It's unlikely I'll
get time to do that for a while, but I will keep a branch locally with
it to remind me in case I do at some point in the future.
David
[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=959d00e9dbe4cfcf4a63bb655ac2c29a5e579246
[2] https://www.postgresql.org/docs/release/12.0/
diff --git a/src/backend/optimizer/path/allpaths.c
b/src/backend/optimizer/path/allpaths.c
index ae0f9bdc8a..c4271c9179 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1365,6 +1365,27 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo
*rel,
}
}
+ if (rel->part_scheme != NULL && IS_SIMPLE_REL(rel) &&
+ partitions_are_ordered(rel->boundinfo, rel->live_parts))
+ {
+ List *partition_pathkeys;
+ boolpartial;
+
+ partition_pathkeys = build_partition_pathkeys(root, rel,
+
ForwardScanDirection,
+
&partial);
+
+ if (!partial)
+ all_child_pathkeys =
lappend(all_child_pathkeys, partition_pathkeys);
+
+ partition_pathkeys = build_partition_pathkeys(root, rel,
+
BackwardScanDirection,
+
&partial);
+
+ if (!partial)
+ all_child_pathkeys =
lappend(all_child_pathkeys, partition_pathkeys);
+ }
+
/*
* Collect lists of all the available path orderings and
* parameterizations for all the children. We use these as a
Re: Window Functions & Table Partitions
On Fri, 10 Feb 2023 at 06:40, Benjamin Tingle wrote: > Thanks for the helpful response david! I'll have a shot at getting the patch > to work myself & submitting to pgsql-hackers. I took some time today for this and fixed up a few mistakes in the patch and added it to the March commitfest [1]. Time is ticking away for v16, so given this is a fairly trivial patch, I thought it might be nice to have it. Any discussion on the patch can be directed at [2] David [1] https://commitfest.postgresql.org/42/4198/ [2] https://www.postgresql.org/message-id/flat/caaphdvojkdbr3mr59jxmacybyhb6q_5qpru+dy93en8wm+x...@mail.gmail.com > Ben > > On Wed, Feb 8, 2023 at 2:36 PM David Rowley wrote: >> >> On Thu, 9 Feb 2023 at 10:45, Benjamin Tingle wrote: >> > Basically- window partition functions don't take advantage of existing >> > table partitions. I use window functions as a more powerful GROUP BY >> > clause that preserves row-by-row information- super handy for a lot of >> > things. >> > >> > In particular, I want to use window functions on already partitioned >> > tables, like the below example: >> > >> > create table abb (a int, b int, g int) partition by hash(b) >> > /* populate table etc... */ >> > select a, b, min(a) over (partition by b) as g from abb >> > >> > Ideally with a query plan like this: >> > >> > Window: >> > Append: >> > Sort on table_p0 >> > Sort on table_p1 >> > Sort on table_p2 >> >> There was some effort [1] in version 12 to take advantage of the order >> defined by the partitioning scheme. The release notes [2] mention: >> >> "Avoid sorting when partitions are already being scanned in the necessary >> order" >> >> However, it's not 100% of what you need as there'd have to be a btree >> index on abb(b) for the planner to notice. >> >> Likely this could be made better so that add_paths_to_append_rel() >> added the pathkeys defined by the partitioned table into >> all_child_pathkeys if they didn't exist already. In fact, I've >> attached a very quickly hacked together patch against master to do >> this. I've given it very little thought and it comes complete with >> failing regression tests. >> >> If you're interested in pursuing this then feel free to take the patch >> to the pgsql-hackers mailing list and propose it. It's unlikely I'll >> get time to do that for a while, but I will keep a branch locally with >> it to remind me in case I do at some point in the future. >> >> David >> >> [1] >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=959d00e9dbe4cfcf4a63bb655ac2c29a5e579246 >> [2] https://www.postgresql.org/docs/release/12.0/ > > > > -- > > Ben(t).> > Ben > > On Wed, Feb 8, 2023 at 2:36 PM David Rowley wrote: >> >> On Thu, 9 Feb 2023 at 10:45, Benjamin Tingle wrote: >> > Basically- window partition functions don't take advantage of existing >> > table partitions. I use window functions as a more powerful GROUP BY >> > clause that preserves row-by-row information- super handy for a lot of >> > things. >> > >> > In particular, I want to use window functions on already partitioned >> > tables, like the below example: >> > >> > create table abb (a int, b int, g int) partition by hash(b) >> > /* populate table etc... */ >> > select a, b, min(a) over (partition by b) as g from abb >> > >> > Ideally with a query plan like this: >> > >> > Window: >> > Append: >> > Sort on table_p0 >> > Sort on table_p1 >> > Sort on table_p2 >> >> There was some effort [1] in version 12 to take advantage of the order >> defined by the partitioning scheme. The release notes [2] mention: >> >> "Avoid sorting when partitions are already being scanned in the necessary >> order" >> >> However, it's not 100% of what you need as there'd have to be a btree >> index on abb(b) for the planner to notice. >> >> Likely this could be made better so that add_paths_to_append_rel() >> added the pathkeys defined by the partitioned table into >> all_child_pathkeys if they didn't exist already. In fact, I've >> attached a very quickly hacked together patch against master to do >> this. I've given it very little thought and it comes complete with >> failing regression tests. >> >> If you're interested in pursuing this then feel free to take the patch >> to the pgsql-hackers mailing list and propose it. It's unlikely I'll >> get time to do that for a while, but I will keep a branch locally with >> it to remind me in case I do at some point in the future. >> >> David >> >> [1] >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=959d00e9dbe4cfcf4a63bb655ac2c29a5e579246 >> [2] https://www.postgresql.org/docs/release/12.0/ > > > > -- > > Ben(t).
Re: multicolumn partitioning help
On Wed, 15 Mar 2023 at 10:41, Laurenz Albe wrote:
> I think you want subpartitioning, like
>
> CREATE TABLE humans (
> hash bytea,
> fname text,
> dob date
> ) PARTITION BY LIST (EXTRACT (YEAR FROM dob));
This may be perfectly fine, but it is also important to highlight that
partitioning in this way may hinder partition pruning.
If the first level partitioned table was to be BY RANGE (dob); then
the partitions could be defined like FOR VALUES FROM ('2023-01-01') TO
('2024-01-01'). For a query that had something like WHERE dob =
'2023-03-16', then PostgreSQL could prune away all the partitions for
the other years. The same wouldn't occur if the table was partitioned
by LIST (EXTRACT (YEAR FROM dob)) unless you added a AND EXTRACT (YEAR
FROM dob) = 2023 to the query's WHERE clause.
Robert, there are a few tips about partitioning in [1] that you may
wish to review.
David
[1] https://www.postgresql.org/docs/devel/ddl-partitioning.html
Re: multicolumn partitioning help
On Thu, 16 Mar 2023 at 00:47, James Robertson wrote: > or do we get? > > TopLevelTable > | > |> worker-thread 1 (default catch) > |> worker thread 2 -> sub-table 1.1 > |> worker thread 3 -> sub-table 1.2 > |> worker thread 4 -> sub-table 1.n > | > |> worker-thread 5 (default catch) > |> worker thread 6 -> sub-table 2.1 > |> worker thread 7 -> sub-table 2.2 > |> worker thread 8 -> sub-table 2.n The planner generally flattens out the scans to each partition into a single Append or MergeAppend node. Append nodes can be parallelised. Assuming there's no reason that a particular partition can't support it, the parallel workers can be distributed to assist without restriction to which partition they help with. Multiple workers can even help with a single partition. Workers can then move over to helping with other partitions when they're done with the partition they've been working on. I believe some other databases do or did at least restrict parallelism to 1 worker per partition (which perhaps is why you raised this). There's no such restriction with PostgreSQL. > Summary: > 1) if we create nested partitions, do we create performance issues: If you create too many partitions, it can create performance issues. You should look at the partitioning best practices section of the documents for details about that. I recommend a careful read of those. > 2) if nested partitions are the solutions, what is the point of multi-column > partitioning? There are many reasons. If you have multiple levels of partitioning, then the partition pruning done during planning is going to have more work to do as it'll be executed once, then once again for each partitioned table remaining after running it for the first level. Also, it seems much easier to PARTITION BY HASH(a,b) than to first do HASH(a) then another level to HASH(b). However, there may be advantages to having multiple levels here as the planner would still be able to prune partitions if the WHERE clause didn't contain any quals like "b = ". The key take away here is that they're different, so we support both. > wish list) wouldn't it be neat if we can do mult-mode multi-column? like > PARTITION BY RANGE (EXTRACT(YEAR FROM dob)) LIST (SUBSTRING(hash, 1, 1)); Effectively, multi-level partitioning gives you that, It's just the DDL is different from how you wrote it. David
Re: speeding up grafana sensor-data query on raspberry pi 3
On Mon, 17 Apr 2023 at 05:00, Clemens Eisserer wrote: > Example: > table smartmeter with non-null column ts (timestamp with time zone) > and brinc index on ts, no pk to avoid a btree index. > Sensor values are stored every 5s, so for 1 month there are about 370k > rows - and in total the table currently holds about 3M rows. > The query to display the values for 1 month takes ~3s, with the bitmap > heap scan as well as aggregation taking up most of the time, with > sorting in between. I know you likely don't have much RAM to spare here, but more work_mem might help, even just 16MBs might be enough. This would help the Sort and to a lesser extent the Bitmap Heap Scan too. Also, if you'd opted to use PostgreSQL 14 or 15, then you could have performed CREATE STATISTICS on your GROUP BY clause expression and then run ANALYZE. That might cause the planner to flip to a Hash Aggregate which would eliminate the Sort before aggregation. You'd only need to sort 236 rows after the Hash Aggregate for the ORDER BY. Plus, what Justin said. David
Re: High QPS, random index writes and vacuum
On Tue, 18 Apr 2023 at 12:35, peter plachta wrote: > I increased work_mem to 2Gb maintenance_work_mem is the configuration option that vacuum uses to control how much memory it'll make available for storage of dead tuples. I believe 1GB would allow 178,956,970 tuples to be stored before multiple passes would be required. The chunk of memory for dead tuple storage is capped at 1GB. David
Re: Partitioning update-heavy queue with hash partitions vs partial indexes
On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha wrote: > > Do Not Use Partial Indexes as a Substitute for Partitioning > > While a search in this larger index might have to descend through a couple > > more tree levels than a search in a smaller index, that's almost certainly > > going to be cheaper than the planner effort needed to select the > > appropriate one of the partial indexes. The core of the problem is that the > > system does not understand the relationship among the partial indexes, and > > will laboriously test each one to see if it's applicable to the current > > query. > > Would this be true in my case too? Yes. The process of determining which partial indexes are valid for the given query must consider each index one at a time and validate the index's WHERE clause against the query's WHERE clause to see if it can be used. There is no shortcut that sees you have a series of partial indexes with WHERE id % 10 = N; which just picks 1 index without searching all of them. > Is it faster for the planner to select a correct partition(hash partitioning > on `id` column) instead of a correct partial index like in my case? I don't > think I'll need more than ~32 partitions/partial-indexes in an extreme > scenario. I mean, test it and find out, but probably, yes, the partition pruning code for hash partitioning is an O(1) operation and is very fast. Once the given Constants have been hashed, finding the partition is just a single divide operation away. David
Re: Slow query, possibly not using index
On Mon, 28 Aug 2023 at 19:21, Les wrote: > More important question is, how can I find out why the index was not auto > vacuumed. You should have a look at pg_stat_user_tables. It'll let you know if the table is being autovacuumed and how often. If you're concerned about autovacuum not running properly, then you might want to lower log_autovacuum_min_duration. Generally, anything that takes a conflicting lock will cause autovacuum to cancel so that the conflicting locker can get through. Things like ALTER TABLE or even an ANALYZE running will cancel most autovacuum runs on tables. Also, this is a fairly large table and you do have the standard autovacuum settings. Going by pgstattuple, the table has 39652836 tuples. Autovacuum will trigger when the statistics indicate that 20% of tuples are dead, which is about 8 million tuples. Perhaps that's enough for the index scan to have to skip over a large enough number of dead tuples to make it slow. You might want to consider lowering the autovacuum scale factor for this table. Also, ensure you're not doing anything like calling pg_stat_reset(); It might be worth showing us the output of: select * from pg_stat_user_tables where relid = 'media.block'::regclass; David
Re: Range partitioning query performance with date_trunc (vs timescaledb)
On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot wrote:
> I'm trying to implement some range partitioning on timeseries data. But it
> looks some queries involving date_trunc() doesn't make use of partitioning.
>
> BEGIN;
> CREATE TABLE test (
> time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
> value FLOAT NOT NULL
> ) PARTITION BY RANGE (time);
> CREATE INDEX test_time_idx ON test(time DESC);
> CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO
> ('2021-01-01');
> CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO
> ('2022-01-01');
> CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS
> value FROM test GROUP BY 1;
> EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01';
> ROLLBACK;
>
> The plan query all partitions:
> I wonder if there is a way with a reasonable amount of SQL code to achieve
> this
> with vanilla postgres ?
The only options I see for you are
1) partition by LIST(date_Trunc('year', time)), or;
2) use a set-returning function instead of a view and pass the date
range you want to select from the underlying table via parameters.
I imagine you won't want to do #1. However, it would at least also
allow the aggregation to be performed before the Append if you SET
enable_partitionwise_aggregate=1.
#2 isn't as flexible as a view as you'd have to create another
function or expand the parameters of the existing one if you want to
add items to the WHERE clause.
Unfortunately, date_trunc is just a black box to partition pruning, so
it's not able to determine that DATE_TRUNC('year', time) >=
'2021-01-01' is the same as time >= '2021-01-01'. It would be
possible to make PostgreSQL do that, but that's a core code change,
not something that you can do from SQL.
David
Re: Queries containing ORDER BY and LIMIT started to work slowly
On Thu, 31 Aug 2023 at 06:32, Rondat Flyag wrote: > I tried VACUUM ANALYZE for three tables, but without success. I also tried to > set enable_seqscan=off and the query took even more time. If I set > enable_sort=off then the query takes a lot of time and I cancel it. > > Please see the attached query plans. It's a little hard to comment here as I don't see what the plan was before when you were happy with the performance. I also see the queries you mentioned in the initial email don't match the plans. There's no table called "isbns" in the query. I guess this is "asins"? Likely you could get a faster plan if there was an index on asins_statistics (asin_id). That would allow a query plan that scans the isbns_value_key index and performs a parameterised nested loop on asins_statistics using the asins_statistics (asin_id) index. Looking at your schema, I don't see that index, so it's pretty hard to guess why the plan used to be faster. Even if the books/asins merge join used to take place first, there'd have been no efficient way to join to the asins_statistics table and preserve the Merge Join's order (I'm assuming non-parameterized nested loops would be inefficient in this case). Doing that would have also required the asins_statistics (asin_id) index. Are you sure that index wasn't dropped? However, likely it's a waste of time to try to figure out what the plan used to be. Better to focus on trying to make it faster. I suggest you create the asins_statistics (asin_id) index. However, I can't say with any level of confidence that the planner would opt to use that index if it did exist. Lowering random_page_cost or increasing effective_cache_size would increase the chances of that. David
Re: Planning time is time-consuming
On Mon, 11 Sept 2023 at 18:16, Laurenz Albe wrote: > Also, there are quite a lot of indexes on "test_db_bench_1". On a test > database, drop some > indexes and see if that makes a difference. Yeah, I count 3 that either have the key columns as some prefix of another index or are just a duplicate of some other index. Getting rid of those 3 will save some time in create_index_paths(). David
Re: Planning time is time-consuming
On Mon, 11 Sept 2023 at 21:54, Mikhail Balayan wrote: > Could it be a regression? I'll check it on PG14 when I get a chance. I'm not sure if you're asking for help here because you need planning to be faster than it currently is, or if it's because you believe that planning should always be faster than execution. If you think the latter, then you're mistaken. It seems to me that the complexity of planning this query is much more complex than executing it. The outer side of the inner-most nested loop finds 0 rows, so it need not scan the inner side, which results in that nested loop producing 0 rows, therefore the outer side of none of the subsequent nested loops find any rows. This is why you see "(never executed)" in the EXPLAIN ANALYZE. You could use perf record or perf top to dig into what's slow. On the other hand, please report back if you find PG14 to be much faster here. You could also experiment with a set of tables which are empty. It's possible getting the relation sizes are a factor to consider here. mdnblocks() needs to do a bit more work when the relation has multiple segments. David
Re: Planning time is time-consuming
On Tue, 12 Sept 2023 at 02:27, Tom Lane wrote: > > David Rowley writes: > > I'm not sure if you're asking for help here because you need planning > > to be faster than it currently is, or if it's because you believe that > > planning should always be faster than execution. If you think the > > latter, then you're mistaken. > > Yeah. I don't see anything particularly troubling here. Taking > circa three-quarters of a millisecond (on typical current hardware) > to plan a four-way join on large tables is not unreasonable. I took a few minutes to reverse engineer the tables in question (with assistance from an AI bot) and ran the query in question. Unsurprisingly, I also see planning as slower than execution, but with a ratio of about planning being 12x slower than execution vs the reported ~18x. Planning Time: 0.581 ms Execution Time: 0.048 ms Nothing alarming in perf top of executing the query in pgbench with -M simple. I think this confirms the problem is just with expectations. 5.09% postgres [.] AllocSetAlloc 2.99% postgres [.] SearchCatCacheInternal 2.52% postgres [.] palloc 2.38% postgres [.] expression_tree_walker_impl 1.82% postgres [.] add_path_precheck 1.78% postgres [.] add_path 1.73% postgres [.] MemoryContextAllocZeroAligned 1.63% postgres [.] base_yyparse 1.61% postgres [.] CatalogCacheComputeHashValue 1.38% postgres [.] try_nestloop_path 1.36% postgres [.] stack_is_too_deep 1.33% postgres [.] add_paths_to_joinrel 1.19% postgres [.] core_yylex 1.18% postgres [.] lappend 1.15% postgres [.] initial_cost_nestloop 1.13% postgres [.] hash_search_with_hash_value 1.01% postgres [.] palloc0 0.95% postgres [.] get_memoize_path 0.90% postgres [.] equal 0.88% postgres [.] get_eclass_for_sort_expr 0.81% postgres [.] compare_pathkeys 0.80% postgres [.] bms_is_subset 0.77% postgres [.] ResourceArrayRemove 0.77% postgres [.] check_stack_depth 0.77% libc.so.6 [.] __memmove_avx_unaligned_erms 0.74% libc.so.6 [.] __memset_avx2_unaligned 0.73% postgres [.] AllocSetFree 0.71% postgres [.] final_cost_nestloop 0.69% postgres [.] compare_path_costs_fuzzily 0.68% postgres [.] initial_cost_mergejoin 0.64% libc.so.6 [.] __memset_avx2_unaligned_erms 0.61% postgres [.] create_nestloop_path 0.61% postgres [.] examine_variable 0.59% postgres [.] hash_bytes 0.56% postgres [.] truncate_useless_pathkeys 0.56% postgres [.] bms_overlap David
Re: [EXTERNAL] Re: Performance down with JDBC 42
On Mon, 6 Nov 2023 at 08:37, Abraham, Danny wrote: > > Both plans refer to the same DB. JDBC is making use of PREPARE statements, whereas psql, unless you're using PREPARE is not. > #1 – Fast – using psql or old JDBC driver The absence of any $1 type parameters here shows that's a custom plan that's planned specifically using the parameter values given. > Slow – when using JDBC 42 Because this query has $1, $2, etc, that's a generic plan. When looking up statistics histogram bounds and MCV slots cannot be checked. Only ndistinct is used. If you have a skewed dataset, then this might not be very good. You might find things run better if you adjust postgresql.conf and set plan_cache_mode = force_custom_plan then select pg_reload_conf(); Please also check the documentation so that you understand the full implications for that. David
Re: I don't understand that EXPLAIN PLAN timings
On Tue, 23 Jan 2024 at 20:45, Jean-Christophe Boggio wrote: > explain says actual time between 1.093→1.388 but final execution time says > 132.880ms?!? The 1.388 indicates the total time spent in that node starting from just before the node was executed for the first time up until the node returned no more tuples. The 132.88 ms is the total time spent to start up the plan, which includes doing things like obtaining locks (if not already obtained during planning), opening files and allocating memory for various things needed for the plan. After the top-level node returns NULL it then will print out the plan before shutting the plan down. This shutdown time is also included in the 132.88 ms. I don't know where exactly the extra time is spent with your query, but it must be in one of the operations that I mentioned which takes place either before the top node is first executed or after the top node returns NULL. If you're using psql, if you do \timing on, how long does EXPLAIN take without ANALYZE? That also goes through executor startup and shutdown. It just skips the running the executor part. David
Re: I don't understand that EXPLAIN PLAN timings
On Fri, 26 Jan 2024 at 02:31, Jean-Christophe Boggio wrote: > You are absolutely correct : the EXPLAIN without ANALYZE gives about the same > results. Also, minimizing the amount of workmem in postgresql.conf changes > drastically the timings. So that means memory allocation is eating up a lot > of time _PER_QUERY_ ? We do reuse pallocs to create memory context, but only for I believe 1k and 8k blocks. That likely allows most small allocations in the executor to be done without malloc. Speaking in vague terms as I don't have the exact numbers to hand, but larger allocations will go directly to malloc. There was a bug fixed in [1] that did cause behaviour like this, but you seem to be on 14.10 which will have that fix. Also, the 2nd plan you sent has no Memoize nodes. I do wonder now if it was a bad idea to make Memoize build the hash table on plan startup rather than delaying that until we fetch the first tuple. I see Hash Join only builds its table during executor run. > Since we have quite some RAM on our machines, I dedicated as much as possible > to workmem (initially I was allocating 1GB) but this looks quite > counterproductive (I didn't think that memory was allocated every time, I > thought it was "available" for the current query but not necessarily used). > Is this an issue specific to that version of PostgreSQL? (I guess no) Or can > this be hardware-related? Or OS-related (both systems on which I have done > tests are running Ubuntu, I will try on Debian)? It would be good to narrow down which plan node is causing this. Can you try disabling various planner enable_* GUCs before running EXPLAIN (SUMMARY ON) with \timing on and see if you can find which enable_* GUC causes the EXPLAIN to run more quickly? Just watch out for variations in the timing of "Planning Time:". You're still looking for a large portion of time not accounted for by planning time. I'd start with: SET enable_memoize=0; EXPLAIN (SUMMARY ON) ; RESET enable_memoize; SET enable_hashjoin=0; EXPLAIN (SUMMARY ON) ; RESET enable_hashjoin; The following will show others that you could try. select name,setting from pg_settings where name like 'enable%'; David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e731ed12aa
Re: I don't understand that EXPLAIN PLAN timings
On Fri, 26 Jan 2024 at 17:23, Jean-Christophe Boggio wrote: > Let me know if I can do anything to provide you with more useful > benchmark. The DB is still very small so it is easy to do tests. What I was looking to find out was if there was some enable_* GUC that you could turn off that would make the unaccounted time that you were complaining about go away. Because it seems when you reduce work_mem this unaccounted for time goes away, it makes me think that some executor node is allocating a load of memory during executor startup. I was hoping to find out which node is the offending one by the process of elimination. Are you still seeing this unaccounted for time with your notebook? i.e. the relatively high "Execution Time" reported by EXPLAIN ANALYZE and low total actual execution time on the plan's top-level node. I probably didn't need to mention the planning time as it seems unlikely that disabling an enable* GUC would result in increased planning time. However, it does not seem impossible that that *could* happen. David
Re: Slow query in table where many rows were deleted. VACUUM FULL fixes it
On Wed, 31 Jan 2024 at 09:09, Philip Semanchuk wrote: > So in your case those 5m rows that you deleted were probably still clogging > up your table until you ran VACUUM FULL. It seems more likely to me that the VACUUM removed the rows and just left empty pages in the table. Since there's no index on expires_at, the only way to answer that query is to Seq Scan and Seq Scan will need to process those empty pages. While that processing is very fast if the page's item pointers array is empty, it could still be slow if the page needs to be read from disk. Laurenz's request for the explain (analyze, buffers) output with track_io_timing on will help confirm this. If it is just reading empty pages that's causing this issue then adding that missing index would improve the situation after running just plain VACUUM each time there's a bulk delete. David
Table Partitioning and Indexes Performance Questions
I was told that partitioned table indexed must always start with the partition key columns. Is this always the case or does it depend on use case? When would you want to create indexes in this way? The documentation just mentions that it is strictly unnecessary but can be helpful. My understanding is partitions behave like normal tables. Each gets their own index. So, I'd expect the reasoning behind creating the index on the partition should be the same as if it were just a normal table (assuming it has the same subset of data as the individual partition). Is this a correct understanding? Any other performance considerations when it comes to partitioned table indexing? Specifically, partitioning by range where the range is a single value.
Re: Plan selection based on worst case scenario
On Thu, 30 May 2024 at 13:03, Darwin O'Connor wrote: > Is there a PostgreSQL setting that can control how it judges plans? There's nothing like that, unfortunately. > Here is a recent example of a query that finds the last time at a stop > filtered for a certain route it has to look up another table to find. > PostgreSQL initially chose the plan that cost "37357.45..37357.45" rather > than the one that cost "1.15..61088.32". > > transsee=# explain analyze select t.time+coalesce(t.timespent,0)*interval '1 > second' from trackstopscurr t join tracktrip r on r.a=0 and r.route='501' and > r.id=t.trackid and t.stopid='4514' order by t.time desc limit 1; > Limit (cost=37357.45..37357.45 rows=1 width=16) (actual > time=2667.674..2694.047 rows=1 loops=1) >-> Nested Loop (cost=182.60..36357.34 rows=1 width=16) > (actual time=381.913..2659.412 rows=1342 loops=3) > -> Parallel Bitmap Heap Scan on trackstopscurr t > (cost=182.03..19048.63 rows=2014 width=14) (actual time=380.467..1231.788 > rows=8097 loops=3) >Recheck Cond: ((stopid)::text = '4514'::text) >Heap Blocks: exact=8103 >-> Bitmap Index Scan on trackstopscurr_2 > (cost=0.00..180.82 rows=4833 width=0) (actual time=382.653..382.653 > rows=24379 loops=1) > Index Cond: ((stopid)::text = '4514'::text) > -> Index Scan using tracktrip_0 on tracktrip r > (cost=0.57..8.59 rows=1 width=4) (actual time=0.175..0.175 rows=0 loops=24290) >Index Cond: (id = t.trackid) >Filter: ((a = 0) AND ((route)::text = '501'::text)) >Rows Removed by Filter: 1 > Planning Time: 0.228 ms > Execution Time: 2694.077 ms The problem here is primarily down to the poor estimates for the scan on tracktrip. You can see that the Nested Loop estimates 1 row, so therefore the LIMIT costing code thinks LIMIT 1 will require reading all rows, all 1 of them. In which case that's expected to cost 36357.34, which is cheaper than the other plan which costs 61088.32 to get one row. If you can fix the row estimate to even estimate 2 rows rather than 1, then it'll choose the other plan. An estimate of 2 rows would mean the total cost of the best path after the limit would be 61088.32 / 2 = 30544.16, which is cheaper than the 36357.34 of the bad plan. You could try ANALYZE on tracktrip, or perhaps increasing the statistics targets on the columns being queried here. If there's a correlation between the "a" and "route" columns then you might want to try CREATE STATISTICS: CREATE STATISTICS ON a,route FROM tracktrip; ANALYZE tracktrip; David
Re: a lot of shared buffers hit when planning for a simple query with primary access path
On Mon, 1 Jul 2024 at 21:45, James Pang wrote: >Buffers: shared hit=110246 <<< here planning need access a lot of > buffers > Planning Time: 81.850 ms > Execution Time: 0.034 ms > >could you help why planning need a lot of shared buffers access ? Perhaps you have lots of bloat in your system catalogue tables. That could happen if you make heavy use of temporary tables. There are many other reasons too. It's maybe worth doing some vacuum work on the catalogue tables. David
Re: a lot of shared buffers hit when planning for a simple query with primary access path
On Mon, 1 Jul 2024 at 22:20, Pavel Stehule wrote: > The planners get min/max range from indexes. So some user's indexes can be > bloated too with similar effect I considered that, but it doesn't apply to this query as there are no range quals. David
Re: Hash Right join and seq scan
On Fri, 5 Jul 2024 at 12:50, James Pang wrote: >we have a daily vacuumdb and analyze job, generally speaking it's done in > seconds, sometimes it suddenly running more than tens of minutes with same > bind variable values and huge temp space got used and at that time, explain > show "Hash Anti join, Hash Right join" with seq scan two tables. There was talk about adding costing for run-time partition pruning factors but nothing was ever agreed, so nothing was done. It's just not that obvious to me how we'd do that. If the Append had 10 partitions as subnodes, with an equality join condition, you could assume we'll only match to 1 of those 10, but we've no idea at plan time which one that'll be and the partitions might drastically vary in size. The best I think we could do is take the total cost of those 10 and divide by 10 to get the average cost. It's much harder for range conditions as those could match anything from 0 to all partitions. The best suggestion I saw for that was to multiply the costs by DEFAULT_INEQ_SEL. I think for now, you might want to lower the random_page_cost or increase effective_cache_size to encourage the nested loop -> index scan plan. Good ranges for effective_cache_size is anywhere between 50 - 75% of your servers's RAM. However, that might not be ideal if your server is under memory pressure from other running processes. It also depends on how large shared_buffers are as a percentage of total RAM. David
Re: Hash Right join and seq scan
On Sat, 6 Jul 2024 at 02:43, James Pang wrote: >for nest loop path, since the first one estimated only "8" rows , > and they use partitionkeyid as joinkey and all are hash partitions , is it > better to estimate cost to 8 (loop times) * 1600 = 12800 (each one loop map > to only 1 hash partition bitmap scan ,avg one partition cost), that's much > less than 398917.29 of all partitions ? I'm not really sure where you're getting the numbers from here. The outer side of the deepest nested loop has an 8 row estimate, not the nested loop itself. I'm unsure where the 1600 is from. I only see 1669. As of now, we don't do a great job of costing for partition pruning that will happen during execution. We won't be inventing anything to fix that in existing releases of PostgreSQL, so you'll need to either adjust the code yourself, or find a workaround. You've not shown us your schema, but perhaps enable_partitionwise_join = on might help you. Other things that might help are further lowering random_page_cost or raising effective_cache_size artificially high. It's hard to tell from here how much random I/O is being costed into the index scans. You could determine this by checking if the nested loop plan costs change as a result of doing further increases to effective_cache_size. You could maybe nudge it up enough for it to win over the hash join plan. It is possible that this won't work, however. > for secondary Nest Loop Anti join could be rows 299118 rows * 15.78(avg > index scan cost of one partition) = 4,720,082 that still much less than > 132168227.57 ? > for Hash Right join, is it possible to estimate by 8 seq partition > scan instead of all 32 hash partitions since the first query estimated 8 rows > only ? > extend statistics may help estimate count(partitionkeyid) based on > other columns bind variables, but looks like that did not help table join > case. I can't quite follow this. You'll need to better explain where you're getting these numbers for me to be able to understand. David
Re: Has gen_random_uuid() gotten much slower in v17?
Good idea. Thanks.I did check. It's not enabled by default but just in case I did another build. This time explicitly defining --disable-debug and --disable-cassert. And I tested. Still slower than old versions. This feels like a build configuration problem. Just can't put my finger on it yet. On Wed, 11 Sept 2024, 10:40 Peter Eisentraut, wrote: > On 10.09.24 15:58, David Mullineux wrote: > > I'm getting a bit concerned by the slow performance of generating uidds > > on latest dev code versus older versions. Here I compare the time to > > generate 50k random uuids. Both son the same machine. > > I must be missing something. > > Are you sure that the 18devel installation isn't compiled with > assertions enabled? > > The underlying code for gen_random_uuid() is virtually unchanged between > PG14 and current. > >
Re: "set primary keys..." is missing when using hight values for transactions / scaling factor with pgbench
On 27 June 2018 at 00:21, Elvir Kurić wrote: > I have strange issue with pgbench where it fails to execute step to create > primary keys when I specify scaling factor / transactions to some reasonable > high value - eg. 8k. The primary keys are only created in -i mode, which can't be used in conjunction with the options you've mentioned. pgbench will perform a vacuum before an actual test run, so perhaps that's what you're seeing. You may also have noticed it also didn't perform the create tables and data population too without -i. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Queue table that quickly grows causes query planner to choose poor plan
Hi All, I’m having performance trouble with a particular set of queries. It goes a bit like this 1) queue table is initially empty, and very narrow (1 bigint column) 2) we insert ~30 million rows into queue table 3) we do a join with queue table to delete from another table (delete from a using queue where a.id<http://a.id> = queue.id<http://queue.id>), but postgres stats say that queue table is empty, so it uses a nested loop over all 30 million rows, taking forever If I kill the query in 3 and let it run again after autoanalyze has done it’s thing then it is very quick This queue table is empty 99% of the time, and the query in 3 runs immediately after step 2. Is there any likelyhood that tweaking the autoanalyze params would help in this case? I don’t want to explicitly analyze the table between steps 2 and three either as there are other patterns of use where for example 0 rows are inserted in step 2 and this is expected to run very very quickly. Do I have any other options? Postgres 9.5 ATM, but an upgrade is in planning. Thanks in advance David Wheeler Software developer [cid:2C4D0888-9F8B-463F-BD54-2B60A322210C] E [email protected]<mailto:[email protected]> D +61 3 9663 3554 W http://dgitsystems.com Level 8, 620 Bourke St, Melbourne VIC 3000.
Re: Queue table that quickly grows causes query planner to choose poor plan
Hi Tom, Thanks for your reply, that’s very helpful and informative. Although there's no way to have any useful pg_statistic stats if you won't do an ANALYZE, the planner nonetheless can see the table's current physical size, and what it normally does is to multiply the last-reported tuple density (reltuples/relpages) by the current size. So if you're getting an "empty table" estimate anyway, I have to suppose that the table's state involves reltuples = 0 and relpages > 0. That's not a good place to be in; it constrains the planner to believe that the table is in fact devoid of tuples, because that's what the last ANALYZE saw. That appears to be correct. I assumed that because the table was analyzed and found to be empty then the autovacuum would probably have cleared all the tuples too, but that’s not the case. relpages | reltuples --+- 0 | 2.33795e+06 I am not following your aversion to sticking an ANALYZE in there, either. It's not like inserting 30 million rows would be free. There are many usage profiles for these tables. Sometimes there will be a single insert of 30 million rows, sometimes there will be several inserts of up to 100 million rows each in different threads, sometimes there will be many (~80 000) inserts of 0 rows (for which an ANALYSE is simply a waste) - I don’t want to cause undue performance penalty on the other usage profiles. But as Justin rightly points out I can selectively ANALYSE only when > x rows are inserted, which I think is the best way forward. David Wheeler Software developer [cid:2C4D0888-9F8B-463F-BD54-2B60A322210C] E [email protected]<mailto:[email protected]> D +61 3 9663 3554 W http://dgitsystems.com Level 8, 620 Bourke St, Melbourne VIC 3000. On 28 Jun 2018, at 4:27 am, Tom Lane mailto:[email protected]>> wrote: David Wheeler mailto:[email protected]>> writes: I'm having performance trouble with a particular set of queries. It goes a bit like this 1) queue table is initially empty, and very narrow (1 bigint column) 2) we insert ~30 million rows into queue table 3) we do a join with queue table to delete from another table (delete from a using queue where a.id<http://a.id><http://a.id> = queue.id<http://queue.id><http://queue.id>), but postgres stats say that queue table is empty, so it uses a nested loop over all 30 million rows, taking forever Although there's no way to have any useful pg_statistic stats if you won't do an ANALYZE, the planner nonetheless can see the table's current physical size, and what it normally does is to multiply the last-reported tuple density (reltuples/relpages) by the current size. So if you're getting an "empty table" estimate anyway, I have to suppose that the table's state involves reltuples = 0 and relpages > 0. That's not a good place to be in; it constrains the planner to believe that the table is in fact devoid of tuples, because that's what the last ANALYZE saw. Now, the initial state for a freshly-created or freshly-truncated table is *not* that. It is reltuples = 0 and relpages = 0, representing an undefined tuple density. Given that, the planner will make some guess about average tuple size --- which is likely to be a very good guess, for a table with only fixed-width columns --- and then compute a rowcount estimate using that plus the observed physical size. So I think your problem comes from oscillating between really-empty and not-at-all-empty, and not using an idiomatic way of going back to the empty state. Have you tried using TRUNCATE instead of DELETE? This queue table is empty 99% of the time, and the query in 3 runs immediately after step 2. Is there any likelyhood that tweaking the autoanalyze params would help in this case? I don't want to explicitly analyze the table between steps 2 and three either as there are other patterns of use where for example 0 rows are inserted in step 2 and this is expected to run very very quickly. Do I have any other options? I am not following your aversion to sticking an ANALYZE in there, either. It's not like inserting 30 million rows would be free. regards, tom lane
Re: Performance difference in accessing differrent columns in a Postgres Table
On 29 July 2018 at 17:38, Dinesh Kumar wrote: > I found performance variance between accessing int1 and int200 column which > is quite large. Have a look at slot_deform_tuple and heap_deform_tuple. You'll see that tuples are deformed starting at the first attribute. If you ask for attribute 200 then it must deform 1-199 first. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: To keep indexes in memory, is large enough effective_cache_size enough?
On 19 September 2018 at 21:18, Sam R. wrote: > Ok. So, we should set also shared_buffers big. It might not be quite as beneficial as you might think. If your database is larger than RAM often having a smaller shared_buffers setting yields better performance. The reason is that if you have a very large shared_buffers that the same buffers can end up cached in the kernel page cache and shared buffers. If you have a smaller shared buffers setting then the chances of that double buffering are reduced and the chances of finding a page cached somewhere increases. However, if your database is quite small and you can afford to fit all your data in shared buffers, with enough free RAM for everything else, then you might benefit from a large shared buffers, but it's important to also consider that some operations, such as DROP TABLE can become slow of shared buffers is very large. You might get more specific recommendations if you mention how much RAM the server has and how big the data is now and will be in the future. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: To keep indexes in memory, is large enough effective_cache_size enough?
On 19 September 2018 at 22:12, Kaixi Luo wrote: > Does a large shared_buffers impact checkpoint performance negatively? I was > under the impression that everything inside shared_buffers must be written > during a checkpoint. Only the dirty buffers get written. Also having too small a shared buffers can mean that buffers must be written more than they'd otherwise need to be. If a buffer must be evicted from shared buffers to make way for a new buffer then the chances of having to evict a dirty buffer increases with smaller shared buffers. Obviously, this dirty buffer needs to be written out before the new buffer can be loaded in. In a worst-case scenario, a backend performing a query would have to do this. pg_stat_bgwriter is your friend. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: To keep indexes in memory, is large enough effective_cache_size enough?
On 20 September 2018 at 15:19, Wei Shan wrote: > I believe you can use pg_prewarm to pin index or table to cache. > > https://www.postgresql.org/docs/current/static/pgprewarm.html I think the key sentence in the document you linked to is: "Prewarmed data also enjoys no special protection from cache evictions, so it is possible that other system activity may evict the newly prewarmed blocks shortly after they are read" So this is not pinning. It's merely loading buffers into shared buffers in the hope that they might be around long enough for you to make the most of that effort. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: To keep indexes in memory, is large enough effective_cache_size enough?
On Tue, 25 Sep 2018 at 18:36, Sam R. wrote: > Regarding double buffering: I do not know how much double buffering would > slow down operations. > It could also be possible to turn off kernel page cache on our DB server, to > avoid double buffering. Although, we may still keep it in use. I think you've misunderstood double buffering. The double buffering itself does not slow anything down. If the buffer is in shared buffers already then it does not need to look any further for it. Double buffering only becomes an issue when buffers existing 2 times in memory causes other useful buffers to appear 0 times. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: To keep indexes in memory, is large enough effective_cache_size enough?
On 28 September 2018 at 16:45, Sam R. wrote: > That was what I was suspecting a little. Double buffering may not matter in > our case, because the whole server is meant for PostgreSQL only. > > In our case, we can e.g. reserve almost "all memory" for PostgreSQL (shared > buffers etc.). > > Please correct me if I am wrong. You mentioned above: > RAM: 64 GB > Data: 500 GB - 1.5 TB, for example. If most of that data just sits on disk and is never read then you might be right, but if the working set of the data is larger than RAM then you might find you get better performance from smaller shared buffers. I think the best thing you can go and do is to go and test this. Write some code that mocks up a realistic production workload and see where you get the best performance. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: NOT IN vs. NOT EXISTS performance
On 9 November 2018 at 08:35, Lincoln Swaine-Moore wrote: > My primary question is: why is this approach only possible (for data too > large for memory) when using NOT EXISTS, and not when using NOT IN? > > I understand that there is a slight difference in the meaning of the two > expressions, in that NOT IN will produce NULL if there are any NULL values > in the right hand side (in this case there are none, and the queries should > return the same COUNT). But if anything, I would expect that to improve > performance of the NOT IN operation, since a single pass through that data > should reveal if there are any NULL values, at which point that information > could be used to short-circuit. So I am a bit baffled. The problem is that the planner makes the plan and would have to know beforehand that no NULLs could exist on either side of the join. For more simple cases it could make use of NOT NULL constaints, but more complex cases exist, such as: SELECT * FROM t1 LEFT JOIN t2 ON t1.x = t2.y WHERE t2.y NOT IN(SELECT z FROM t3); There's a bit more reading about the complexity of this in [1] [1] https://www.postgresql.org/message-id/flat/CAMkU%3D1zPVbez_HWao781L8PzFk%2Bd1J8VaJuhyjUHaRifk6OcUA%40mail.gmail.com#7c6d3178c18103d8508f3ec5982b1b8e -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
On Wed, 28 Nov 2018 at 03:16, Sanyo Moura wrote: > 11.0 > Planning Time: 7.238 ms > Planning Time: 2.638 ms > > 11.5 > Planning Time: 15138.533 ms > Execution Time: 2.310 ms Does it still take that long after running ANALYZE on the partitioned table? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: SQL Perfomance during autovacuum
On Wed, 19 Dec 2018 at 19:04, anand086 wrote: > We have noticed this increase in execution times for the sqls only when > autovacuum runs and it runs with prevent wraparound mode. I think during the > autovacuum process the Buffers: shared hit are increasing causing increase > in execution time. > > I need help with the approach to debug this issue. Is this expected > behaviour wherein sql execution timing incease during the autovacuum? If so > , what is the reason for the same? This is unsurprising. There are various GUC settings designed to throttle vacuum to help minimise this problem. The auto-vacuum process is competing for the same resources as your query is, and is likely loading many new buffers, therefore flushing buffers out of cache that might be useful for your query. Showing the output of: select name,setting from pg_Settings where name like '%vacuum%'; may be of use here. You'll particularly want to pay attention to the settings of autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit and vacuum_cost_limit. The settings of vacuum_cost_page_dirty, vacuum_cost_page_hit, vacuum_cost_page_miss matter too, but these are less often changed by users. You may be able to learn exactly what's going on with the query by doing: set track_io_timing = on; explain (analyze, buffers, timing) both during the auto-vacuum run, and at a time when it's not running. If the query plans of each match, then pay attention to the number of buffers read and how long they took to read. If you find that these don't explain the variation then something else is at fault, perhaps CPU contention, or perhaps swapping due to high memory usage. It also seems pretty strange that you should need to use DISTINCT on a column that's named "id". -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Query Performance Issue
On Sat, 29 Dec 2018 at 04:32, Justin Pryzby wrote: > I think the solution is to upgrade (at least) to PG10 and CREATE STATISTICS > (dependencies). Unfortunately, I don't think that'll help this situation. Extended statistics are currently only handled for base quals, not join quals. See dependency_is_compatible_clause(). It would be interesting to see how far out the estimate is without the version = 1 clause. If just the recommended_content_id clause is underestimated enough it could be enough to have the planner choose the nested loop. Perhaps upping the stats on that column may help, but it may only help so far as to reduce the chances of a nested loop. If the number of distinct recommended_content_id values is higher than the statistic targets and is skewed enough then there still may be some magic values in there that end up causing a bad plan. It would also be good to know what random_page_cost is set to, and also if effective_cache_size isn't set too high. Increasing random_page_cost would help reduce the chances of this nested loop plan, but it's a pretty global change and could also have a negative effect on other queries. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Query Performance Issue
On Sat, 29 Dec 2018 at 20:15, Justin Pryzby wrote: > On Sat, Dec 29, 2018 at 07:58:28PM +1300, David Rowley wrote: > > Unfortunately, I don't think that'll help this situation. Extended > > statistics are currently only handled for base quals, not join quals. > > See dependency_is_compatible_clause(). > > Right, understand. > > Corrrect me if I'm wrong though, but I think the first major misestimate is in > the scan, not the join: I should have checked more carefully. Of course, they are base quals. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: select query does not pick up the right index
On Thu, 3 Jan 2019 at 05:28, Abadie Lana wrote: > I would appreciate any hints as this problem looks to me rather strange…I > tried to google it but in vain. > > select t.name, > c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val > from sample c, channel t where t.channel_id=c.channel_id and > t.name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW' order by c.smpl_time desc limit 5; > > takes 20mn to execute because it picks up the wrong index…see explain analyse > below. I would expect this query to use the (channel_id,smpl_time) but it > uses the smpl_time index. [...] > Any ideas, why the planner is not taking the right index? The planner assumes that the required channel values are evenly distributed through the scan of the index on smpl_time. If your required 5 rows were found quickly (i.e channels with recent sample values), then the plan would have worked out well. It looks like 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW' is probably a channel which has some very old sample values. I can see that from "Rows Removed by Join Filter: 322099471", meaning that on backwards scanning the smpl_time index, that many rows were found not to match the channel you requested. The planner, by default only has statistics to say how common each channel is in the sample table. I think in this case since the planner has no knowledge of which channel_id it will be searching for (that's only determined during execution), then I suppose it must be using the n_distinct of the sample.channel_id table. It would be interesting to know how far off the n_distinct estimation is. You can find out with: select stadistinct from pg_statistic where starelid='sample'::regclass and staattnum = 1; select count(*) from (select distinct channel_id from sample) s; -- this may take a while to run... If the stadistinct estimate is far out from the reality, then you could consider setting this manually with: alter table sample alter column channel_id set (n_distinct = ); but keep in mind, that as the table evolves, whatever you set there could become outdated. Another method to fix you could try would be to coax the planner into doing something different would be to give it a better index to work with. create index on channel(name, channel_id); You didn't show us the details from the channel table, but if there's not an index like this then this might reduce the cost of a Merge Join, but since the order rows output from that join would be in channel_id order, a Sort would be required, which would require joining all matching rows, not just the first 5 matches. Depending on how many rows actually match will determine if that's faster or not. If you don't have luck with either of the above then, one other thing you could try would be to disallow the planner from using the smpl_time index by changing the order by to "ORDER BY c.smpl_time + INTERVAL '0 sec'; that's a bit of a hack, but we don't have anything we officially call "query hints" in PostgreSQL, so often we're left to solve issues like this with ugly tricks like that. Also, going by: > -> Seq Scan on channel t (cost=0.00..915.83 rows=1 width=41) (actual > time=4.683..7.885 rows=1 loops=1) perhaps "name" is unique on the channel table? (I doubt there's an index/constraint to back that up, however, since such an index would have likely been used here instead of the Seq Scan) If so, and you can add a constraint to back that up, you might be able to reform the query to be: select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW', c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; If you can do that then it's highly likely to be *very* fast to execute since I see there's an index on (channel_id, smpl_time) on each of the inherited tables. (If our planner was smarter then in the presence of the correct unique index, we could have rewritten the query as such automatically but it's not / we don't. I believe I've mentioned about improving this somewhere in the distant past of the -hackers mailing list, although I can't find it right now. I recall naming the idea "scalar value lookup joins", but development didn't get much beyond thinking of that name) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: select query does not pick up the right index
On Fri, 4 Jan 2019 at 01:57, Abadie Lana wrote: > 4) name is unique, constraint and index created. Right index is picked up and > query time is rather constant there 40sec. That's surprisingly slow. Can you share the EXPLAIN (ANALYZE, BUFFERS) of that? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: select query does not pick up the right index
> From: David Rowley > Sent: 03 January 2019 14:01 > That's surprisingly slow. Can you share the EXPLAIN (ANALYZE, BUFFERS) of > that? > > explain (analyze,buffers) select > 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val > from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE > name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; >-> Index Scan Backward using smpl_time_bx2_idx on > sample_buil_year c_5 (cost=0.56..2023925.30 rows=3162364 > width=320) (actual time=15167.330..15167.330 rows=0 loops=1) > Filter: (channel_id = $0) > Rows Removed by Filter: 50597834 > Buffers: shared hit=25913147 read=713221 >-> Index Scan Backward using sample_time_cm_idx on > sample_ctrl_month c_6 (cost=0.56..1862587.12 rows=537562 > width=77) (actual time=0.048..0.048 rows=0 loops=1) > Index Cond: (channel_id = $0) > Buffers: shared read=4 >-> Index Scan Backward using smpl_time_cmx2_idx on > sample_ctrl_year c_7 (cost=0.57..3186305.67 rows=2094186 > width=68) (actual time=25847.549..25847.549 rows=0 loops=1) > Filter: (channel_id = $0) > Rows Removed by Filter: 79579075 > Buffers: shared hit=49868991 read=1121715 Right, so you need to check your indexes on sample_ctrl_year and sample_buil_year. You need an index on (channel_id, smpl_time) on those. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: select query does not pick up the right index
On Fri, 4 Jan 2019 at 02:20, Abadie Lana wrote:
> > From: David Rowley
> > Sent: 03 January 2019 14:01
> Right, so you need to check your indexes on sample_ctrl_year and
> sample_buil_year. You need an index on (channel_id, smpl_time) on those.
> These indexes exist already
That's interesting. The \d output indicates that the indexes are not
INVALID, so it's not all that obvious why the planner would choose a
lesser index to provide the required rows. One thought is that the
more suitable index is very bloated. This would increase the
estimated cost of scanning the index and reduce the chances of the
index being selected by the query planner.
If you execute:
select indrelid::regclass as table_name, indexrelid::Regclass as
index_name,pg_size_pretty(pg_relation_size(indrelid))
table_size,pg_size_pretty(pg_relation_size(indexrelid)) index_size
from pg_index
where indrelid in('sample_ctrl_year'::regclass, 'sample_buil_year'::regclass)
order by indrelid::regclass::name, indexrelid::regclass::name;
This should show you the size of the tables and indexes in question.
If the sample_time_cy_idx and sample_time_by_idx indexes are very
large when compared with the size of their table, then it is likely
worth building a new index for these then dropping the old index then
retrying the re-written version of the query. If this is a live
system then you can build the new indexes by using the CREATE INDEX
CONCURRENTLY command. This will allow other DML operations to work
without being blocked. The old indexes can then be dropped with DROP
INDEX CONCURRENTLY.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: select query does not pick up the right index
On Thu, 10 Jan 2019 at 01:55, Abadie Lana wrote: > The other query suggested by D.Rowley has the same issue : still swap > activity is higher. > explain analyze select 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW', > c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val > from sample c where c.channel_id in (select channel_id from channel where > name ='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5; This is not the query I suggested. I mentioned if channel.name had a unique index, you'd be able to do WHERE c.channel_id = (select channel_id from channel where name = '...'). That's pretty different to what you have above. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Parallel stats in execution plans
Hi folks -
I'm having trouble understanding what some of the stats mean in the
execution plan output when parallel workers are used. I've tried to read
up about it, but I haven't been able to find anything that explains what
I'm seeing. Apologies in advance if there's documentation I've been too
stupid to find.
I've run the following query. The "towns" table is a massive table that
I created in order to get some big numbers on a parallel query - don't
worry, this isn't a real query I want to make faster, just a silly
example I'd like to understand.
EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE)
SELECT name, code, article
FROM towns
ORDER BY nameASC,
codeDESC;
The output looks like this:
[
{
"Plan": {
"Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost":
1013948.54, "Total Cost": 1986244.55, "Plan Rows": 884, "Plan
Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time":
60628.982, "Actual Rows": 1010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks": 29,
"Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared Written
Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local
Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks":
91342, "Temp Written Blocks": 91479, "Plans": [
{
"Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan
Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496,
"Actual Total Time": 48526.168, "Actual Rows": 337, "Actual Loops":
3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort
Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks":
142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local
Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0,
"Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written
Blocks": 273700, "Workers": [
{
"Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total
Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit
Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 89067, "Temp Written Blocks": 89202 }, {
"Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total
Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit
Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 92880, "Temp Written Blocks": 93019 }
], "Plans": [
{
"Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel
Aware": true, "Relation Name": "towns", "Schema": "public", "Alias":
"towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows":
4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total
Time": 8305.886, "Actual Rows": 337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared Dirtied
Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local
Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
"Temp Read Blocks": 0, "Temp Written Blocks": 0, "Workers": [
{
"Worker Number": 0, "Actual Startup Time": 0.105, "Actual Total Time":
8394.629, "Actual Rows": 3277980, "Actual Loops": 1, "Shared Hit
Blocks": 35, "Shared Read Blocks": 46794, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 0, "Temp Written Blocks": 0 }, {
"Worker Number": 1, "Actual Startup Time": 0.113, "Actual Total Time":
8139.382, "Actual Rows": 3385130, "Actual Loops": 1, "Shared Hit
Blocks": 32, "Shared Read Blocks": 48327, "Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks":
0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 0, "Temp Written Blocks": 0 }
]
}
]
}
]
}, "Planning Time": 22.898, "Triggers": [
], "Execution Time": 61133.161 }
]
Or a more slimmed-down version, with just the confusing fields:
[
{
"Plan": {
"Node Type": "Gather Merge", "Parallel Aware": false,"Actual Total
Time": 60628.982, "Actual Rows": 1010, "Actual Loops": 1,"Workers
Planned": 2, "Workers Launched": 2,"Plans": [
{
"Node Type": "Sort"
Re: Parallel stats in execution plans
It seems like no-one has any ideas on this - does anyone know anywhere
else I can try to look/ask to find out more?
Is it possible that this is a bug?
Thanks
Dave
On 16/01/2019 11:31, David Conlin wrote:
Hi folks -
I'm having trouble understanding what some of the stats mean in the
execution plan output when parallel workers are used. I've tried to
read up about it, but I haven't been able to find anything that
explains what I'm seeing. Apologies in advance if there's
documentation I've been too stupid to find.
I've run the following query. The "towns" table is a massive table
that I created in order to get some big numbers on a parallel query -
don't worry, this isn't a real query I want to make faster, just a
silly example I'd like to understand.
EXPLAIN (ANALYZE, FORMAT JSON, BUFFERS, VERBOSE)
SELECT name, code, article
FROM towns
ORDER BY nameASC,
codeDESC;
The output looks like this:
[
{
"Plan": {
"Node Type": "Gather Merge", "Parallel Aware": false, "Startup Cost":
1013948.54, "Total Cost": 1986244.55, "Plan Rows": 884, "Plan
Width": 77, "Actual Startup Time": 42978.838, "Actual Total Time":
60628.982, "Actual Rows": 1010, "Actual Loops": 1, "Output": ["name", "code", "article"], "Workers Planned": 2, "Workers Launched": 2, "Shared Hit Blocks":
29, "Shared Read Blocks": 47641, "Shared Dirtied Blocks": 0, "Shared
Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0,
"Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read
Blocks": 91342, "Temp Written Blocks": 91479, "Plans": [
{
"Node Type": "Sort", "Parent Relationship": "Outer", "Parallel Aware":
false, "Startup Cost": 1012948.52, "Total Cost": 1023365.25, "Plan
Rows": 4166692, "Plan Width": 77, "Actual Startup Time": 42765.496,
"Actual Total Time": 48526.168, "Actual Rows": 337, "Actual
Loops": 3, "Output": ["name", "code", "article"], "Sort Key": ["towns.name", "towns.code DESC"], "Sort Method": "external merge", "Sort Space Used": 283856, "Sort
Space Type": "Disk", "Shared Hit Blocks": 170, "Shared Read Blocks":
142762, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local
Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0,
"Local Written Blocks": 0, "Temp Read Blocks": 273289, "Temp Written
Blocks": 273700, "Workers": [
{
"Worker Number": 0, "Actual Startup Time": 42588.662, "Actual Total
Time": 48456.662, "Actual Rows": 3277980, "Actual Loops": 1, "Shared
Hit Blocks": 72, "Shared Read Blocks": 46794, "Shared Dirtied Blocks":
0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
"Temp Read Blocks": 89067, "Temp Written Blocks": 89202 }, {
"Worker Number": 1, "Actual Startup Time": 42946.705, "Actual Total
Time": 48799.414, "Actual Rows": 3385130, "Actual Loops": 1, "Shared
Hit Blocks": 69, "Shared Read Blocks": 48327, "Shared Dirtied Blocks":
0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read
Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0,
"Temp Read Blocks": 92880, "Temp Written Blocks": 93019 }
], "Plans": [
{
"Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel
Aware": true, "Relation Name": "towns", "Schema": "public", "Alias":
"towns", "Startup Cost": 0.00, "Total Cost": 184524.92, "Plan Rows":
4166692, "Plan Width": 77, "Actual Startup Time": 0.322, "Actual Total
Time": 8305.886, "Actual Rows": 337, "Actual Loops": 3, "Output": ["name", "code", "article"], "Shared Hit Blocks": 96, "Shared Read Blocks": 142762, "Shared
Dirtied Blocks&
Re: Parallel stats in execution plans
On Thu, 17 Jan 2019 at 00:31, David Conlin wrote: > How the time values combine with parallelism. For example, each execution of > the sort node takes an average of 48.5s, over three loops. This makes a total > running time of 145.5s. Even if this was perfectly distributed between the > two workers, I would expect this to take 72.75s, which is more than the total > execution time, so it can't take this long. > How the row numbers combine with those in the "Workers" subkey. For example, > in the Sort node, worker #0 produces 3,277,980 rows, while worker #1 produces > 3,385,130 rows. The Sort node as a whole produces 3,333,337 rows per loop, > for a total of 10,000,010 (the value in the gather merge node). I would have > expected the number of rows produced by the two workers to sum to the number > produced by the Sort node as a whole, either per loop or in total. > How the "Actual Loops" values combine with those in the "Workers" subkey. For > example, the "Sort" node has 3 loops, but each of the workers inside it have > 1 loop. I would have expected either: > > each of the workers to have done 3 loops (since the sort is executed 3 > times), or > the number of loops in the two workers to sum to three (if the three > executions of the sort are distributed across the two workers) It's important to know that all of the actual row counts and actual time are divided by the number of loops, which in this case is 3, one per process working on that part of the plan. There are two workers, but also the main process helps out too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Zero throughput on a query on a very large table.
On Fri, 25 Jan 2019 at 19:24, Tom Lane wrote: > PS: On the third hand, you mention having created new indexes on this > table with apparently not a lot of pain, which is a tad surprising > if you don't have the patience to wait for a sort to finish. How > long did those index builds take? It would certainly be good to look at psql's \d tmp_outpatient_rev output to ensure that the index is not marked as INVALID. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: autovacuum big table taking hours and sometimes seconds
On Thu, 7 Feb 2019 at 00:17, Laurenz Albe wrote: > > On Wed, 2019-02-06 at 12:29 +0200, Mariel Cherkassky wrote: > > Now the question is how to handle or tune it ? Is there any change that I > > need to increase the cost_limit / cost_delay ? > > Maybe configuring autovacuum to run faster will help: > > alter table orig_table set (toast.autovacuum_vacuum_cost_limit = 2000); > > Or, more extreme: > > alter table orig_table set (toast.autovacuum_vacuum_cost_delay = 0); Going by the block hits/misses/dirtied and the mentioned vacuum times, it looks like auto-vacuum is set to the standard settings and if so it spent about 100% of its time sleeping on the job. It might be a better idea to consider changing the vacuum settings globally rather than just for one table. Running a vacuum_cost_limit of 200 is likely something you'd not want to ever do with modern hardware... well maybe unless you just bought the latest Raspberry PI, or something. You should be tuning that value to something that runs your vacuums to a speed you're happy with but leaves enough IO and CPU for queries running on the database. If you see that all auto-vacuum workers are busy more often than not, then they're likely running too slowly and should be set to run more quickly. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: autovacuum big table taking hours and sometimes seconds
On Thu, 7 Feb 2019 at 02:34, Mariel Cherkassky wrote: > As I said, I set the next settings for the toasted table : > > alter table orig_table set (toast.autovacuum_vacuum_scale_factor = 0); > > alter table orig_table set (toast.autovacuum_vacuum_threshold =1); These settings don't control how fast auto-vacuum runs, just when it should run. > Can you explain a little bit more why you decided that the autovacuum spent > it time on sleeping ? Yeah, if you look at the following settings. vacuum_cost_limit | 200 vacuum_cost_page_dirty | 20 vacuum_cost_page_hit| 1 vacuum_cost_page_miss | 10 autovacuum_vacuum_cost_delay| 20ms I've tagged on the default setting for each of these. Both vacuum and auto-vacuum keep score of how many points they've accumulated while running. 20 points for dirtying a page, 10 for a read that's not found to be in shared_buffers, 1 for reading a buffer from shared buffers. When vacuum_cost_limit points is reached (or autovacuum_vacuum_cost_limit if not -1) auto-vacuum sleeps for autovacuum_vacuum_cost_delay, normal manual vacuums sleep for vacuum_cost_delay. In one of the log entries you saw: > buffer usage: 4808221 hits, 6404148 misses, 6152603 dirtied > avg read rate: 2.617 MiB/s, avg write rate: 2.514 MiB/s > system usage: CPU 148.65s/70.06u sec elapsed 19119.55 sec Doing a bit of maths to see how much time that vacuum should have slept for: postgres=# select (4808221 * 1 + 6404148 * 10 + 6152603 * 20) / 200.0 * 20 / 1000; ?column? 19190.1761 That's remarkably close to the actual time of 19119.55 sec. If you do the same for the other 3 vacuums then you'll see the same close match. > I see the autovacuum statistics from the logs, how can I check that the > workers are busy very often ? It would be nice if there was something better, but periodically doing: SELECT count(*) FROM pg_stat_activity where query like 'autovacuum%'; will work. > My vacuum limit is 200 right now, basically If vacuum runs on my toasted > table and reached 200 but it didnt finish to clean all the dead tuples, after > the nap, should it continue cleaning it or wait until the vacuum_threshold > hit again ? You're confusing nap time is something else, Maybe you're confusing that with speed of vacuum? Napping is just the time auto-vacuum will wait between checking for new tables to work on. Having the auto-vacuum run so slowly is a probable cause of still having dead tuples after the vacuum... likely because they became dead after vacuum started. I'd recommend reading the manual or Tomas Vondra's blog about vacuum costs. It's not overly complex, once you understand what each of the vacuum settings does. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Why isn't an index scan being used?
On Wed, 20 Feb 2019 at 13:11, Abi Noda wrote: > However, when I index the closed column, a bitmap scan is used instead of an > index scan, with slightly slower performance. Why isn't an index scan being > used, given that the exact same number of rows are at play as in my query on > the state column? That's down to the planner's cost estimates. Likely it thinks that either doing a bitmap scan is cheaper, or close enough that it does not matter. > How do I index closed in a way where an index scan is used? The costing does account for the size of the index. If the "closed_index" index is large than the "state_index", then doing an Index scan on "closed_index" is going to be costed higher. Most of this likely boils down to random_page_cost being a guess. You may want to check your effective_cache_size is set to something like 75% of the machine's memory, and/or tweak random page cost down, if it's set to the standard 4 setting. modern SSDs are pretty fast at random reads. HDDs, not so much. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.
On Sun, 24 Feb 2019 at 10:06, Gunther wrote: > I am using an SQL queue for distributing work to massively parallel workers. > Workers come in servers with 12 parallel threads. One of those worker sets > handles 7 transactions per second. If I add a second one, for 24 parallel > workers, it scales to 14 /s. Even a third, for 36 parallel workers, I can add > to reach 21 /s. If I try a fourth set, 48 workers, I end up in trouble. But > that isn't even so much my problem rather than the fact that in short time, > the performance will deteriorate, and it looks like that is because the queue > index deteriorates and needs a REINDEX. It sounds very much like auto-vacuum is simply unable to keep up with the rate at which the table is being updated. Please be aware, that by default, auto-vacuum is configured to run fairly slowly so as not to saturate low-end machines. vacuum_cost_limit / autovacuum_vacuum_cost limit control how many "points" the vacuum process can accumulate before it will perform an autovacuum_vacuum_cost_delay / vacuum_cost_delay. Additionally, after an auto-vacuum run completes it will wait for autovacuum_naptime before checking again if any tables require some attention. I think you should be monitoring how many auto-vacuums workers are busy during your runs. If you find that the "queue" table is being vacuumed almost constantly, then you'll likely want to increase vacuum_cost_limit / autovacuum_vacuum_cost_limit. You could get an idea of how often this table is being auto-vacuumed by setting log_autovacuum_min_duration to 0 and checking the logs. Another way to check would be to sample what: SELECT query FROM pg_stat_activity WHERE query LIKE 'autovacuum%'; returns. You may find that all of the workers are busy most of the time. If so, that indicates that the cost limits need to be raised. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: IS NOT DISTINCT FROM statement
On Sat, 9 Mar 2019 at 00:30, Artur Zając wrote: > Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” > when value on right side of expression is not NULL or is this any way to use > index with „IS NOT DISTINCT FROM” statement? Probably nothing other than nobody has done it yet. It might be reasonable to have some transformation stage called from distribute_restrictinfo_to_rels() when adding single rel RestrictInfos to RTE_RELATION base rels. It's only these you can check for NOT NULL constraints, i.e. not so possible with rtekinds such as RTE_FUNCTION and the like. It becomes more complex if you consider that someone might have added a partial index on the relation that matches the IS NOT DISTINCT FROM clause. In this case, they might not be happy that their index can no longer be used. Fixing that would require some careful surgery on predicate_implied_by() to teach it about IS NOT DISTINCT FROM clauses. However, that seems to go a step beyond what predicate_implied_by() does for now. Currently, it only gets to know about quals. Not the relations they belong to, so there'd be no way to know that the NOT NULL constraint exists from there. I'm not sure if there's a good reason for this or not, it might be because it's not been required before. It gets more complex still if you want to consider other quals in the list to prove not nullness. In short, probably possible, but why not just write an equality clause, if you know NULLs are not possible? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: IS NOT DISTINCT FROM statement
On Sat, 9 Mar 2019 at 01:25, Artur Zając wrote: > I made some workaround. I made function: > > CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS > $BODY$ > SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END); > $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE; > explain analyze select id from sometable where smarteq(id1,NULL); > QUERY PLAN > -- > Bitmap Heap Scan on sometable (cost=19338.59..57045.02 rows=882343 width=4) > (actual time=116.236..306.304 rows=881657 loops=1) >Recheck Cond: (id1 IS NULL) >Heap Blocks: exact=9581 >-> Bitmap Index Scan on sometable_index1 (cost=0.00..19118.00 > rows=882343 width=0) (actual time=114.209..114.209 rows=892552 loops=1) > Index Cond: (id1 IS NULL) > Planning time: 0.135 ms > Execution time: 339.229 ms > > It looks like it works, but I must check if it will still works in plpgsql (I > expect some problems if query is prepared). I think with either that you'll just be at the mercy of whether a generic or custom plan is chosen. If you get a custom plan then likely your case statement will be inlined and constant folded away, but for a generic plan, that can't happen since those constants are not consts, they're parameters. Most likely, if you've got an index on the column you'll perhaps always get a custom plan as the generic plan would result in a seqscan and it would have to evaluate your case statement for each row. By default, generic plans are only considered on the 6th query execution and are only chosen if the generic cost is cheaper than the average custom plan cost + fuzz cost for planning. PG12 gives you a bit more control over that with the plan_cache_mode GUC, but... that's the not out yet. However, possibly the cost of planning each execution is cheaper than doing the seq scan, so you might be better off with this. There is a risk that the planner does for some reason choose a generic plan and ends up doing the seq scan, but for that to happen likely the table would have to be small, in which case it wouldn't matter or the costs would have to be off, which might cause you some pain. The transformation mentioned earlier could only work if the arguments of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with Params since the values are unknown to the planner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Poor man's partitioned index .... not being used?
On Thu, 21 Mar 2019 at 15:51, Gunther wrote: > foo=# CREATE UNIQUE INDEX Test_pk0 ON Test(id) WHERE mod(id,2) = 0; > CREATE INDEX > foo=# CREATE UNIQUE INDEX Test_pk1 ON Test(id) WHERE mod(id,2) = 1; > CREATE INDEX > foo=# ANALYZE Test; > ANALYZE > foo=# EXPLAIN SELECT * FROM Test WHERE id = '8934'; > QUERY PLAN > --- > Seq Scan on test (cost=0.00..6569.61 rows=1 width=4) >Filter: (id = 8934) > (2 rows) > > Why is that index never used? When the planner looks at partial indexes to see if they'll suit the scan, the code that does the matching (predicate_implied_by()) simply does not go to that much trouble to determine if it matches. If you look at operator_predicate_proof() you'll see it requires the expression on at least one side of the OpExpr to match your predicate. Yours matches on neither side since "id" is wrapped up in a mod() function call. Certainly, predicate_implied_by() is by no means finished, new smarts have been added to it over the years to allow it to prove more cases, but each time something is added we still need to carefully weigh up the additional overhead of the new code vs. possible benefits. It may be possible to do something with immutable functions found in the expr but someone doing so might have a hard time proving that it's always safe to do so. For example, arg 2 of your mod() call is a Const. If it had been another Var then it wouldn't be safe to use. What other unsafe cases are there? Is there a way we can always identify unsafe cases during planning? ... are the sorts of questions someone implementing this would be faced with. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Poor man's partitioned index .... not being used?
On Fri, 22 Mar 2019 at 07:57, Gunther wrote: > foo=# PREPARE testplan(int) AS > foo-#SELECT * FROM Test WHERE mod(id,2) = mod($1,2) AND id = $1; > PREPARE > foo=# EXPLAIN EXECUTE testplan(8934); > QUERY PLAN > -- > Index Only Scan using test_pk0 on test (cost=0.42..8.44 rows=1 width=4) >Index Cond: (id = 8934) > (2 rows) > > That's quite alright actually. Now the questions is, could we use this in a > nested loop query plan? That's where I think it can't work: Not really. In that case, the parameters were replaced with the specified values (a.k.a custom plan). That happens for the first 5 executions of a prepared statement, and in this case likely the planner will continue to use the custom plan since the generic plan won't know that the partial index is okay to use and the plan costs would likely go up enough that the custom plan would continue to be favoured. > foo=# SET enable_mergejoin TO off; > SET > foo=# EXPLAIN SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON( mod(b.id,2) = > mod(a.id,2) AND b.id = a.id) LIMIT 10; >QUERY PLAN > > Limit (cost=0.00..102516.78 rows=10 width=8) >-> Nested Loop Left Join (cost=0.00..38238760.24 rows=3730 width=8) > Join Filter: ((b.id = a.id) AND (mod(b.id, 2) = mod(a.id, 2))) > -> Seq Scan on test2 a (cost=0.00..54.30 rows=3730 width=4) > -> Materialize (cost=0.00..9056.93 rows=388129 width=4) >-> Seq Scan on test b (cost=0.00..5599.29 rows=388129 > width=4) > (6 rows) > > It looks like it doesn't want to evaluate the mod(a.id, 2) before it moves to > the index query for the nested loop. Whether partial indexes can be used are not is determined using only quals that can be applied at the scan level. In this case your qual is a join qual, and since no other qual exists that can be evaluated at the scan level where the index can be used, then it's not considered. In any case, nothing there guarantees that one of your indexes will match all records. For it to work, both of you indexes would have to be scanned. It's not clear why you think that would be any better than scanning just one index. I imagine it would only ever be a win if you could eliminate one of the index scans with some qual that guarantees that the index can't contain any records matching your query. > I wonder if there was a way of marking such expressions as safe in the query, > like suggesting a certain evaluation order, i.e., > > SELECT * FROM Test2 a LEFT OUTER JOIN Test b ON(mod(b.id,2) = > EVAL(mod(a.id,2)) AND b.id = a.id) LIMIT 10; > > It's OK though. It just goes to show that in a case like this, it is best to > just go with the partitioned table anyway. It sounds like you might want something like partition-wise join that exists in PG11. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Postgresql Sort cost Poor performance?
On Tue, 2 Apr 2019 at 20:00, tank.zhang <[email protected]> wrote: > 2、 Adding a DISTINCT response time was very slow > > qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM > QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO > IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= > TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <= > TO_DATE('2019-03-11','-MM-DD'); > checkcarnum > - > 1071367 > (1 row) That's because of how DISTINCT is implemented within an aggregate function in PostgreSQL. Internally within the aggregate code in the executor, a sort is performed on the entire input to the aggregate node. The planner is currently unable to make use of any indexes that provide pre-sorted input. One way to work around this would be to perform the DISTINCT and COUNT(*) in separate stages using a subquery. >From your original query, something like: SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC FROM ( SELECT SMTOC,SVIN FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >= TO_DATE('2017-02-11','-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','-MM-DD') GROUP BY SMTOC,SVIN ) A GROUP BY SMTOC; An index something like: CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE); Should help speed up the subquery and provide pre-sorted input to the outer aggregate. If you like, you could add SLINENO to the end of the index to allow an index-only scan which may result in further performance improvements. Without the index, you're forced to sort, but at least it's just one sort instead of two. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Temporarily very slow planning time after a big delete
On Tue, 21 May 2019 at 12:44, Walter Smith wrote: > > We had a mysterious (to us) slowdown today that I'm hoping someone can > explain just based on PG's principles of operation. It got better by itself > so it seems like it was "normal" behavior -- I just don't know what behavior > it was exhibiting. > > We have a table of user notifications containing about 80 million rows. It > gets a lot of inserts continually, and is cleaned up once a day. There are no > updates. In all history there have been about 330 million rows created. > > Today we deleted about 15 million rows in one transaction from this table. > Immediately afterwards, a particular SELECT started running very slowly -- > 500 to 3000 ms rather than the usual <1ms. > > We did an EXPLAIN ANALYZE on this select and it was still doing an index scan > as usual. The *planning time* for the query is what had gotten slow. The > query itself was still executing in <1ms. It would be good to know which version you're running here. It basically sounds very much like get_actual_variable_range() will be the culprit. Basically, if a constant value that's being used by the planner to determine row estimates with falls outside the statistic's histogram and a btree index exists that we can use to look up the actual bound of the data, then we do so in that function. If you've just deleted a bunch of rows then that index scan may have to traverse a bunch of dead tuples before finding that first live tuple. This code has changed a few times in recent times, see fccebe421 and 3ca930fc3, which is why your version is of interest. Apart from that, if you want to confirm that's the issue and you just want it fixed, just VACUUM the table. You should likely be doing that anyway directly after your bulk delete. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Analyze results in more expensive query plan
On Tue, 21 May 2019 at 08:23, Jeremy Altavilla wrote: > > Thanks for the suggestion. I created extended statistics objects for the two > tables in question. Unfortunately the resulting plan was the same (and had > the same estimates). It looks like the extended stats discovered a > potentially useful correlation on bag: "2, 3 => 1" (owner_id, bag_type_id => > id). I'm guessing this wasn't usable because the docs state "They are not > used to improve estimates for equality conditions comparing two columns". I'd say that since the time spent planning is near 3x what is spent during execution that you're wasting your time trying to speed up the execution. What you should be thinking about is using PREPAREd statements to avoid the planning overhead completely. If that's not possible then you've more chance of reducing the time spent planning by reducing the statistics on the table rather than adding more planning overhead by adding extended stats. You might want to experiment with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ..; and setting those down a bit then analyzing the tables again. Although, that's likely only going to make a very small difference, if any, than getting rid of the planning completely. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Temporarily very slow planning time after a big delete
On Tue, 21 May 2019 at 14:04, Walter Smith wrote: > I'm so sorry -- I meant to give the version, of course. It's 9.6.13. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3ca930fc3 has been applied since then. It would be good if you could confirm the problem is resolved after a vacuum. Maybe run VACUUM VERBOSE on the table and double check there's not some large amount of tuples that are "nonremovable". -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: scans on table fail to be excluded by partition bounds
On Tue, 25 Jun 2019 at 05:31, Justin Pryzby wrote: > ts=# explain SELECT * FROM eric_enodeb_cell_metrics WHERE start_time BETWEEN > '2019-01-01 04:00' AND '2019-01-01 05:00' OR start_time BETWEEN '2019-01-02 > 04:00' AND '2019-01-02 05:00'; > Append (cost=36.04..39668.56 rows=12817 width=2730) >-> Bitmap Heap Scan on eric_enodeb_cell_20190101 (cost=36.04..19504.14 > rows=6398 width=2730) > Recheck Cond: (((start_time >= '2019-01-01 04:00:00-05'::timestamp > with time zone) AND (start_time <= '2019-01-01 05:00:00-05'::timestamp with > time zone)) OR ((start_time >= '2019-01-02 04:00:00-05'::timestamp with time > zone) AND (start_time <= '2019-01-02 05:00:00-05'::timestamp with time zone))) > -> BitmapOr (cost=36.04..36.04 rows=6723 width=0) >-> Bitmap Index Scan on eric_enodeb_cell_20190101_idx > (cost=0.00..16.81 rows=6465 width=0) > Index Cond: ((start_time >= '2019-01-01 > 04:00:00-05'::timestamp with time zone) AND (start_time <= '2019-01-01 > 05:00:00-05'::timestamp with time zone)) >-> Bitmap Index Scan on eric_enodeb_cell_20190101_idx > (cost=0.00..16.03 rows=259 width=0) > Index Cond: ((start_time >= '2019-01-02 > 04:00:00-05'::timestamp with time zone) AND (start_time <= '2019-01-02 > 05:00:00-05'::timestamp with time zone)) >-> Bitmap Heap Scan on eric_enodeb_cell_20190102 (cost=36.08..20100.34 > rows=6419 width=2730) > Recheck Cond: (((start_time >= '2019-01-01 04:00:00-05'::timestamp > with time zone) AND (start_time <= '2019-01-01 05:00:00-05'::timestamp with > time zone)) OR ((start_time >= '2019-01-02 04:00:00-05'::timestamp with time > zone) AND (start_time <= '2019-01-02 05:00:00-05'::timestamp with time zone))) > -> BitmapOr (cost=36.08..36.08 rows=6982 width=0) >-> Bitmap Index Scan on eric_enodeb_cell_20190102_idx > (cost=0.00..16.03 rows=259 width=0) > Index Cond: ((start_time >= '2019-01-01 > 04:00:00-05'::timestamp with time zone) AND (start_time <= '2019-01-01 > 05:00:00-05'::timestamp with time zone)) >-> Bitmap Index Scan on eric_enodeb_cell_20190102_idx > (cost=0.00..16.84 rows=6723 width=0) > Index Cond: ((start_time >= '2019-01-02 > 04:00:00-05'::timestamp with time zone) AND (start_time <= '2019-01-02 > 05:00:00-05'::timestamp with time zone)) > > Is there some reason why the partition constraints aren't excluding any of the > index scans ? Yeah, we don't do anything to remove base quals that are redundant due to the partition constraint. There was a patch [1] to try and fix this but it's not seen any recent activity. [1] https://commitfest.postgresql.org/19/1264/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Planner performance in partitions
On Tue, 13 Aug 2019 at 08:03, Piotr Włodarczyk wrote: > PostgreSQL11.5: > > select on main partition (public.book): planner: 60ms, execution: 5ms > select on partitioned table (public.book_1-1000): planner: 2.7 ms, execution: > 2,4 ms > > PostgreSQL 12B3: > > select on main partition (public.book): planner: 2,5ms , execution: 1,2ms > select on partitioned table (public.book_1-1000): planner: 2.5 ms, execution: > 1,2 ms > > So looking at above results we have two options: > > Wait for 12.0 stable version > Wait for patches to 11 – PostgreSQL Team: can You do this? You'll need to either reduce the number of partitions down to something realistic or wait for 12.0. The work done to speed up the planner with partitioned tables for v12 won't be going into v11. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Get the planner used by a query?
On Wed, 9 Oct 2019 at 19:21, Behrang Saeedzadeh wrote: > > Is there a way to display the planner algorithm used by a query, either in > EXPLAIN or in a different way? There's not really any simple way to know. If the number of relations in the join search meets or exceeds geqo_threshold then it'll use the genetic query optimizer. However, knowing exactly how many relations are in the join search is not often simple since certain types of subqueries can be pulled up into the main query and that can increase the number of relations in the search. If you don't mind writing C code, then you could write an extension that hooks into join_search_hook and somehow outputs this information to you before going on to call the geqo if the "enable_geqo && levels_needed >= geqo_threshold" condition is met. Besides that, I don't really know if there's any way. You could try editing the geqo_seed and seeing if the plan changes, but if it does not, then that does not mean the geqo was not used, so doing it that way could be quite error-prone. You'd only be able to tell the geqo was being used if you could confirm that changing geqo_seed did change the plan. (And you could be certain the plan did not change for some other reason like an auto-analyze). -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Query slow again after adding an `OR` operation (was: Slow PostgreSQL 10.6 query)
On Thu, 10 Oct 2019 at 01:07, Behrang Saeedzadeh wrote: > > This is a follow up to > https://www.postgresql.org/message-id/flat/CAERAJ%2B-1buiJ%2B_JWEo0a9Ao-CVMWpgp%3DEnFx1dJtnB3WmMi2zQ%40mail.gmail.com > > The query (generated by Hibernate) got a bit more complex and performance > degraded again. I have uploaded all the details here (with changed table > names, etc.): https://github.com/behrangsa/slow-query > > In short, the new query is: The query mostly appears slow due to the "Rows Removed By Filter" in the OR condition. The only way to get around not scanning the entire branch_invoices table would be to somehow write the way in such a way that allows it to go on the inner side of the join. You could do that if you ensure there's an index on branch_invoices (branch_id) and format the query as: SELECT inv.id AS i_id, inv.invoice_date AS inv_d, inv.invoice_xid AS inv_xid, inv.invoice_type AS inv_type, brs.branch_idAS br_id, cinvs.company_id AS c_id FROM invoices inv LEFT OUTER JOIN branch_invoices brs ON inv.id = brs.invoice_id LEFT OUTER JOIN company_invoices cinvs ON inv.id = cinvs.invoice_id INNER JOIN branches br ON brs.branch_id = br.id WHERE brs.branch_id IN (SELECT br1.id FROM branches br1 INNER JOIN access_rights ar1 ON br1.id = ar1.branch_id INNER JOIN users usr1 ON ar1.user_id = usr1.id INNER JOIN groups grp1 ON ar1.group_id = grp1.id INNER JOIN group_permissions gpr1 ON grp1.id = gpr1.group_id INNER JOIN permissions prm1 ON gpr1.permission_id = prm1.id WHERE usr1.id = 1636 AND prm1.code = 'C2' AND ar1.access_type = 'T1') UNION ALL SELECT br3.id FROM companies cmp INNER JOIN branches br3 ON cmp.id = br3.company_id INNER JOIN access_rights ar2 ON cmp.id = ar2.company_id INNER JOIN users usr2 ON ar2.user_id = usr2.id INNER JOIN groups g2 ON ar2.group_id = g2.id INNER JOIN group_permissions gpr2 ON g2.id = gpr2.group_id INNER JOIN permissions prm2 ON gpr2.permission_id = prm2.id WHERE usr2.id = 1636 AND prm2.code = 'C2' AND ar2.access_type = 'T1') ORDER BY inv.invoice_date DESC, br.name ASC LIMIT 12; The planner may then choose to pullup the subquery and uniquify it then put it on the outside of a nested loop join then lookup the branch_invoices record using the index on branch_id. I think this is quite a likely plan since the planner estimates there's only going to be 1 row from each of the subqueries. Also note, that the LEFT JOIN you have to branch_invoices is not really a left join since you're insisting that the branch_id must be in the first or 2nd sub-plan. There's no room for it to be NULL. The planner will just convert that to an INNER JOIN with the above query since that'll give it the flexibility to put the subquery in the IN clause on the outside of the join (after having uniquified it). You'll need to decide what you actually want the behaviour to be here. If you do need those NULL rows then you'd better move your WHERE quals down into the join condition for branch_invoices table. I'd suggest testing with some mock-up data if you're uncertain of what I mean. If you find that is faster and you can't rewrite the query due to it having been generated by Hibernate, then that sounds like a problem with Hibernate. PostgreSQL does not currently attempt to do any rewrites which convert OR clauses to use UNION or UNION ALL. No amount of tweaking the planner settings is going to change that fact. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Change in CTE treatment in query plans?
Hi folks - Does anyone know if there's been a change in the way values for CTEs are displayed in query plans? I think that it used to be the case that, for keys that include the values of child nodes values (eg "Shared Hit Blocks", or "Actual Total Time"), CTE scans included the CTE itself, even if it wasn't included as one of its children in the plan. If you didn't subtract the CTE scan, you would see surprising things, like sort operations reading table data, or the total time of the nodes in a single-threaded query plan adding up to significantly more than 100% of the total query time. Now (I think since v11, but I'm not sure), it looks like these values only include the children listed in the plan. For example, I've seen CTE scans that have smaller times and buffers values than the CTE itself, which couldn't be true if the CTE was included in the scan. I'm much less sure, but I *think* the same is also true of other InitPlan nodes - for example, if a node includes the filter "value > $1", its time and buffers used to (but no longer does) include the total for the InitPlan node which returned the value "$1". Am I way off base with this, or did this change happen, and if so, am I right in thinking that it was changed in v11? Thanks in advance Dave
Re: Change in CTE treatment in query plans?
Hi Tom - Thanks so much for getting back to me. I didn't realise that the costs of init/sub plans would be spread across the call sites - I had (stupidly) assumed that each call site would include the full cost. Having taken a couple of days to go back over the problems I was seeing, you were absolutely right - it was all to do with multiple call sites - the postgres version was just a red herring. Thanks for your help & all the best, Dave On 17/10/2019 10:04, Tom Lane wrote: David Conlin writes: Does anyone know if there's been a change in the way values for CTEs are displayed in query plans? Offhand I don't recall any such changes, nor does a cursory look through explain.c find anything promising. If you're concerned with a multiply-referenced CTE, one possibility for funny results is that the blame for its execution cost could be spread across the multiple call sites. The same can happen with initplans/subplans. But I'm just guessing; you didn't show any concrete examples so it's hard to be definite. regards, tom lane
Slow planning, fast execution for particular 3-table query
We’re having trouble working out why the planning time for this particular query is slow (~2.5s vs 0.9ms execution time). As you can see below, there are only 3 tables involved so it’s hard to imagine what decisions the planner has to make that take so long. After 5 runs the prepared-statement code kicks in and it becomes quick, but it’s quite infuriating for the first 5 runs given the execution is so much faster. Are you able to give any tips what might be taking so long (and how we might improve it)? We read elsewhere that someone had a “catalog stats file leak”, which I’m taking to mean a big pg_statistic table. Ours is 10mb, which doesn’t seem particularly large to me, but I don’t have much context for it. https://www.postgresql.org/message-id/CABWW-d21z_WgawkjXFQQviqm16oAx0KQvR6bLkRxvYQmhdByfg%40mail.gmail.com Other queries (with 3 or more tables) in the same db seem to be planning much quicker. The query: explain (analyse) SELECT subscription_binding.subscription_binding, subscription_binding.tid, subscription.subscription_uuid, subscription_binding.subscription_binding_uuid, binding.binding_uuid, subscription_binding.start_time, subscription_binding.end_time, subscription_binding.timezone, now() >= subscription_binding.start_time AND (subscription_binding.end_time IS NULL OR now() <= subscription_binding.end_time) AS active FROM jackpot.binding JOIN jackpot.subscription_binding USING (tid, binding) JOIN jackpot.subscription USING (tid, subscription) where (tid=2082003407) AND (binding_uuid='4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid) offset 0 limit 1000; QUERY PLAN -- Limit (cost=1.29..25.38 rows=1 width=80) (actual time=0.770..0.771 rows=1 loops=1) -> Nested Loop (cost=1.29..25.38 rows=1 width=80) (actual time=0.770..0.771 rows=1 loops=1) -> Nested Loop (cost=0.86..16.91 rows=1 width=76) (actual time=0.697..0.698 rows=1 loops=1) -> Index Scan using binding_tid_binding_uuid_key on binding (cost=0.43..8.45 rows=1 width=28) (actual time=0.647..0.647 rows=1 loops=1) Index Cond: ((tid = 2082003407) AND (binding_uuid = '4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid)) -> Index Scan using subscription_binding_idx on subscription_binding (cost=0.43..8.45 rows=1 width=64) (actual time=0.045..0.046 rows=1 loops=1) Index Cond: ((tid = 2082003407) AND (binding = binding.binding)) -> Index Scan using subscription_pkey on subscription (cost=0.43..8.45 rows=1 width=28) (actual time=0.068..0.068 rows=1 loops=1) Index Cond: ((tid = 2082003407) AND (subscription = subscription_binding.subscription)) Planning time: 2429.682 ms Execution time: 0.914 ms (11 rows) Postgres version 9.5.19 Each of the tables has between 3-4 indexes, and all the indexes include tid as first parameter. No partitions, no sign of a stray replication slot / uncommitted transaction / prepared transaction that may be holding up autovac, no sign of bloated indexes. TIA! Best regards, David Wheeler General Manager Bali Office Bali T +62 361 475 2333 M +62 819 3660 9180 E [email protected]<mailto:[email protected]> Jl. Pura Mertasari No. 7, Sunset Road Abian Base Kuta, Badung – Bali 80361, Indonesia http://www.dgitsystems.com<http://www.dgitsystems.com/> [signature_1605257152][signature_1263573595]
Re: Slow planning, fast execution for particular 3-table query
I'm not sure what "unusually large" is, but they're all < 1mb which is a little
larger than some of our other comparable databases (mostly <300kb) but seems
reasonable to me.
Regards,
David
On 4/11/19, 3:37 pm, "Laurenz Albe" wrote:
On Mon, 2019-11-04 at 03:04 +, David Wheeler wrote:
> We’re having trouble working out why the planning time for this
particular query is slow
> (~2.5s vs 0.9ms execution time). As you can see below, there are only 3
tables involved
> so it’s hard to imagine what decisions the planner has to make that take
so long. After
> 5 runs the prepared-statement code kicks in and it becomes quick, but
it’s quite
> infuriating for the first 5 runs given the execution is so much faster.
>
> Are you able to give any tips what might be taking so long (and how we
might improve it)?
>
[...]
> Planning time: 2429.682 ms
>
> Execution time: 0.914 ms
Strange.
Are any of your catalog tables unusually large?
SELECT pg_relation_size(t.oid),
t.relname
FROM pg_class AS t
JOIN pg_namespace AS n ON t.relnamespace = n.oid
WHERE t.relkind = 'r'
ORDER BY pg_relation_size(t.oid) DESC
LIMIT 10;
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Re: Slow planning, fast execution for particular 3-table query
>To see this issue, you have to have recently >inserted or deleted a bunch of extremal values of the indexed join-key >column. And the problem only persists until those values become known >committed-good, or known dead-to-everybody. (Maybe you've got a >long-running transaction somewhere, postponing the dead-to-everybody >condition?) There are no long-running transactions that have backend_xmin set in pg_stat_activity, if that's what you mean here. There are also no open prepared transactions or replication slots which I understand have a similar keeping-things-alive issue. These tables are biggish (hundreds of mb), but not changing so frequently that I'd expect large quantities of data to be inserted or deleted before autovac can get in there and clean it up. And certainly not in a single uncommitted transaction. I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue. Regards, David On 4/11/19, 4:01 pm, "Tom Lane" wrote: David Wheeler writes: > We’re having trouble working out why the planning time for this > particular query is slow (~2.5s vs 0.9ms execution time). As you can see > below, there are only 3 tables involved so it’s hard to imagine what > decisions the planner has to make that take so long. I wonder whether this traces to the cost of trying to estimate the largest/smallest value of an indexed column by looking into the index. Normally that's pretty cheap, but if you have a lot of recently-inserted or recently-deleted values at the end of the index, it can get painful. AFAIR this only happens for columns that are equijoin keys, so the fact that your query is a join is significant. I'm not convinced that this is the problem, because it's a corner case that few people hit. To see this issue, you have to have recently inserted or deleted a bunch of extremal values of the indexed join-key column. And the problem only persists until those values become known committed-good, or known dead-to-everybody. (Maybe you've got a long-running transaction somewhere, postponing the dead-to-everybody condition?) > Postgres version 9.5.19 If this *is* the cause, v11 and up have a performance improvement that you need: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 regards, tom lane
Re: Slow planning, fast execution for particular 3-table query
>> I'll try reindexing each of the tables just to make sure it's not strange >> index imbalance or something causing the issue. > I seen this issue few time - and reindex helps. Alas our reindex doesn’t seem to have helped. I’m going to see if we can reproduce this on a non-prod environment so we can muck about a bit more. If we can reproduce it in a safe place, is there a tool we can use to get more info out of the query planner to find what it’s doing to take so long? Regards, David From: Pavel Stehule Date: Monday, 4 November 2019 at 4:53 pm To: David Wheeler Cc: Tom Lane , "[email protected]" , Cameron Redpath Subject: Re: Slow planning, fast execution for particular 3-table query po 4. 11. 2019 v 6:17 odesílatel David Wheeler mailto:[email protected]>> napsal: >To see this issue, you have to have recently >inserted or deleted a bunch of extremal values of the indexed join-key >column. And the problem only persists until those values become known >committed-good, or known dead-to-everybody. (Maybe you've got a >long-running transaction somewhere, postponing the dead-to-everybody >condition?) There are no long-running transactions that have backend_xmin set in pg_stat_activity, if that's what you mean here. There are also no open prepared transactions or replication slots which I understand have a similar keeping-things-alive issue. These tables are biggish (hundreds of mb), but not changing so frequently that I'd expect large quantities of data to be inserted or deleted before autovac can get in there and clean it up. And certainly not in a single uncommitted transaction. I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue. I seen this issue few time - and reindex helps. Pavel Regards, David On 4/11/19, 4:01 pm, "Tom Lane" mailto:[email protected]>> wrote: David Wheeler mailto:[email protected]>> writes: > We’re having trouble working out why the planning time for this > particular query is slow (~2.5s vs 0.9ms execution time). As you can see > below, there are only 3 tables involved so it’s hard to imagine what > decisions the planner has to make that take so long. I wonder whether this traces to the cost of trying to estimate the largest/smallest value of an indexed column by looking into the index. Normally that's pretty cheap, but if you have a lot of recently-inserted or recently-deleted values at the end of the index, it can get painful. AFAIR this only happens for columns that are equijoin keys, so the fact that your query is a join is significant. I'm not convinced that this is the problem, because it's a corner case that few people hit. To see this issue, you have to have recently inserted or deleted a bunch of extremal values of the indexed join-key column. And the problem only persists until those values become known committed-good, or known dead-to-everybody. (Maybe you've got a long-running transaction somewhere, postponing the dead-to-everybody condition?) > Postgres version 9.5.19 If this *is* the cause, v11 and up have a performance improvement that you need: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 regards, tom lane
Re: Slow planning, fast execution for particular 3-table query
Is default_statistics_target set above default 100? I would assume that would reflect in the size of pg_statistic, but wanted to ask since increasing that from 100 to 1000 was the only time I have seen planning time explode. Are other queries slow to plan? Looks like you’ve found it! Someone has set the target to 10k so that’s going to wildly increase planning time. Thanks for your help! And thanks to the others who chipped in along the way 😊 Regards, David
Re: Slow planning, fast execution for particular 3-table query
On Thu, 7 Nov 2019 at 11:59, Justin Pryzby wrote: > > On Mon, Nov 04, 2019 at 03:04:45AM +, David Wheeler wrote: > > Postgres version 9.5.19 > > Each of the tables has between 3-4 indexes, and all the indexes include tid > > as first parameter. > But note that index definition will be prohibited since: > > https://www.postgresql.org/docs/9.6/release-9-6.html > |Disallow creation of indexes on system columns, except for OID columns > (David Rowley) > |Such indexes were never considered supported, and would very possibly > misbehave since the system might change the system-column fields of a tuple > without updating indexes. However, previously there were no error checks to > prevent them from being created. David will have meant the user column named "tid" rather than the system column named "ctid". -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Slow performance with trivial self-joins
On Thu, 6 Feb 2020 at 11:12, Adam Brusselback wrote: > > > You can create a library of > > reusable views that are small, easy-to-understand and readable. Then > > you build them up into bigger views, and finally query from them. But > > then you end up with lots of (hidden) self-joins. > > I will concur with this use case being pretty common, but also something I > have actively avoided anywhere performance is important because of the > lack of this optimization. > > Even still, I have 20+ views like that in my database. I think the best direction to move in to push that forward would be to go and benchmark the proposed patch and see if the overhead of detecting the self joined relations is measurable with various queries with varying numbers of joins. It does not sound too like it would be a great deal of effort to look through the rangetable for duplicate Oids and only do further processing to attempt self-join removal if there are. However, if that effort happened to slow down all queries by say 5%, then perhaps it would be a bad idea. People's opinions don't really have much traction for arguments on this. Unbiased and reproducible benchmarks should be used as evidence to support discussion. Doing worst-case and average-case benchmarks initially will save you time, as someone will almost certainly ask if you don't do it. (I've not been following the thread for the patch) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg12 partitions show bad performance vs pg96
On Mon, 9 Mar 2020 at 05:05, Mariel Cherkassky wrote: > PG12 : > Planning Time: 8.157 ms > Execution Time: 2.920 ms > (22 rows) > > > PG96 : > Planning time: 0.815 ms > Execution time: 0.158 ms > (12 rows) 8 ms seems pretty slow to planning that query. Does the planning time drop if you execute this multiple times in the same session? Does the time change if you try again without any foreign keys? The planning time for the partitioned case is certainly going to take longer. Partitioned tables are never going to improve the times of query planning. It's only possible that they'll improve the performance during query execution. For such small fast queries as the ones you've shown, it's important to remember that more complex query plans (ones with more nodes) do lead to longer times for executor startup and shutdown. EXPLAIN (without ANALYZE), will perform query planning and executor startup/shutdown. If you enable \timing on in psql and test the EXPLAIN performance of these queries in each version, then you might get an idea of where the overheads are. Additionally, you're unlikely to see performance improvements with table partitioning unless you're accessing many rows and partitioning allows the data locality of the rows that you are accessing to improve. i.e accesses fewer buffers and/or improves cache hit ratios. In PG12, if the partition pruning can be done during query planning then the planning and executor startup overhead is much lower since there are fewer relations to generate access paths for and fewer nodes in the final plan. This also improves the situation during execution as it means fewer locks to take and fewer nodes to startup/shutdown. > As you can see, the results are better in pg96. This example only shows the > results for a small data set. In bigger data sets I get a bigger diff... Can you share the results of that? > I tried changing many postgresql.conf parameters that were added > (max_workers_per_gather,enable_partitionwise_join and so on..). The former only does anything for parallel queries. None of the plans you've shown are parallel ones. The latter also does not count in this case. It only counts when joining two identically partitioned tables. > I dont understand why in pg12 it scans all the partitions instead of the > relevant one.. If you'd specified a specific "device" in the query SQL, then the query planner would know which partition to scan for that particular device. However, since you're looking up the device in another table and performing a join, the device is only known during query execution. The plan nodes for the non-matching partitions do go through executor startup, but they're not scanned during execution, as you've seen with the "(never executed)" appearing in the EXPLAIN ANALYZE output. Since executor startup occurs before execution, the device you mean is still unknown during executor startup, so the executor must startup the nodes for all partitions that are in the plan. Starting up a plan node is not free, but not really very expensive either. However, the overhead of it might be quite large proportionally in your case since the executor is doing so little work. The most basic guidelines for table partitioning are, don't partition your tables unless it's a net win. If partitioning was always faster, we'd just have designed Postgres to implicitly partition all of your tables for you. There are some other guidelines in [1]. [1] https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES David
Re: pg12 partitions show bad performance vs pg96
On Tue, 10 Mar 2020 at 02:08, Mariel Cherkassky
wrote:
> PG12 - 3 PARTITIONS
>
> QUERY
> PLAN
>
> Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual
> time=22306.091..22309.209 rows=0 loops=1)
>Workers Planned: 2
>Workers Launched: 2
>-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual
> time=22299.412..22299.413 rows=0 loops=3)
> Hash Cond: (da_2.device = de.id)
> -> Parallel Append (cost=0.00..1562506.90 rows=814 width=37)
> (actual time=22299.411..22299.411 rows=0 loops=3)
>-> Parallel Seq Scan on iot_data_2 da_2
> (cost=0.00..584064.14 rows=305 width=37) (actual time=9076.535..9076.535
> rows=0 loops=3)
> Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
> Rows Removed by Filter: 12460009
>-> Parallel Seq Scan on iot_data_1 da_1
> (cost=0.00..504948.69 rows=262 width=36) (actual time=10296.751..10296.751
> rows=0 loops=2)
> Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
> Rows Removed by Filter: 16158316
>-> Parallel Seq Scan on iot_data_0 da (cost=0.00..473490.00
> rows=247 width=37) (actual time=19075.081..19075.081 rows=0 loops=1)
> Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
> Rows Removed by Filter: 30303339
> -> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
>-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1
> width=7) (never executed)
> Filter: (name = '50a'::text)
> Planning Time: 30.429 ms
> Execution Time: 22309.364 ms
> (20 rows)
>From what I can work out, the DDL you used here is:
-- you didn't seem to specify the DDL for iot_device, so I used:
create table iot_device (
id bigint primary key,
name text not null
);
insert into iot_device select x,x::Text || 'a' from generate_Series(1,100) x;
create table iot_data(id serial ,data text,metadata bigint,device
bigint references iot_device(id),primary key(id,device)) partition by
hash(device);
create table iot_data_0 partition of iot_data for values with (MODULUS
3, remainder 0);
create table iot_data_1 partition of iot_data for values with (MODULUS
3, remainder 1);
create table iot_data_2 partition of iot_data for values with (MODULUS
3, remainder 2);
insert into iot_data select
generate_series(1,1000),random()*10,random()*254,random()*99+1;
create index on iot_data(metadata,lower(data));
vacuum analyze iot_data;
In which case, you're getting a pretty different plan than I am. (I
admit that I only tried on current master and not PG12.2, however, I
see no reason that PG12.2 shouldn't produce the same plan)
I get:
# explain analyze select * from iot_data da,iot_device de where
de.name in ('50a') and de.id = da.device and da.metadata=50 and
lower(da.data) like '50';
QUERY PLAN
Nested Loop (cost=0.56..28.04 rows=1 width=49) (actual
time=0.058..0.058 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=11)
(actual time=0.013..0.016 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Append (cost=0.56..25.76 rows=3 width=38) (actual
time=0.040..0.040 rows=0 loops=1)
-> Index Scan using iot_data_0_metadata_lower_idx on
iot_data_0 da_1 (cost=0.56..8.58 rows=1 width=38) (actual
time=0.020..0.020 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Filter: (lower(data) ~~ '50'::text)
-> Index Scan using iot_data_1_metadata_lower_idx on
iot_data_1 da_2 (cost=0.56..8.58 rows=1 width=38) (actual
time=0.010..0.010 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Filter: (lower(data) ~~ '50'::text)
-> Index Scan using iot_data_2_metadata_lower_idx on
iot_data_2 da_3 (cost=0.56..8.58 rows=1 width=38) (actual
time=0.009..0.009 rows=0 loops=1)
Index Cond: ((metadata = 50) AND (lower(data) = '50'::text))
Filter: (lower(data) ~~ '50'::text)
Planning Time: 0.280 ms
Execution Time: 0.094 ms
(17 rows)
Are you certain that you added an index on iot_data (metadata, lower(data)); ?
Re: Duplicate WHERE condition changes performance and plan
On Thu, 16 Apr 2020 at 07:56, [email protected] wrote: > We have an odd issue where specifying the same where clause twice causes PG > to pick a much more efficent plan. We would like to know why. > The EXPLAIN ANALYZE for both queries can be found here:- > Query A: https://explain.depesz.com/s/lFuy > Query B: https://explain.depesz.com/s/Jqmv This is basically down to just a poor join selectivity estimation. The selectivity estimation on the duplicate not equal clause is not removed by the planner and the selectivity of that is taking into account twice which reduces the selectivity of the table named "problem". With that selectivity taken into account, the query planner thinks a nested loop will be a more optimal plan, to which it seems to be. Join selectivity estimations can use the most common values lists as you may see if you look at the pg_stats view for the tables and columns involved in the join condition. Perhaps ID columns are not good candidates to get an MCV list in the stats. In that case, the ndistinct estimate will be used. If there's no MCV list in the stats then check ndistinct is reasonably accurate. If there is an MCV list, then you can make that bigger by increasing the statistics targets on the join columns and running ANALYZE. Note: Planning can become slower when you increase the statistics targets. Starting with PostgreSQL 9.6, foreign keys are also used to help with join selectivity estimations. I see you have a suitable foreign key from the schema you posted. You might want to add that to the list of reasons to upgrade. David
