Re: When to use PARTITION BY HASH?
On Tue, Jun 2, 2020 at 7:33 PM Justin Pryzby wrote: > > To: [email protected], > [email protected] > > Please don't cross post to multiple lists. > > On Tue, Jun 02, 2020 at 07:17:11PM +0200, Oleksandr Shulgin wrote: > > I was reading up on declarative partitioning[1] and I'm not sure what > could > > be a possible application of Hash partitioning. > > It's a good question. See Tom's complaint here. > https://www.postgresql.org/message-id/31605.1586112900%40sss.pgh.pa.us > > It *does* provide the benefit of smaller indexes and smaller tables, which > might allow seq scans to outpeform index scans. > > It's maybe only useful for equality conditions on the partition key, and > not > for ranges. Here, it scans a single partition: > > postgres=# CREATE TABLE t(i int) PARTITION BY HASH(i); CREATE TABLE t1 > PARTITION OF t FOR VALUES WITH (REMAINDER 0, MODULUS 3); > postgres=# CREATE TABLE t2 PARTITION OF t FOR VALUES WITH (MODULUS 3, > REMAINDER 1); > postgres=# CREATE TABLE t3 PARTITION OF t FOR VALUES WITH (MODULUS 3, > REMAINDER 2); > postgres=# INSERT INTO t SELECT i%9 FROM generate_series(1,)i; ANALYZE > t; > postgres=# explain analyze SELECT * FROM t WHERE i=3; > Seq Scan on t2 (cost=0.00..75.55 rows= width=4) (actual > time=0.021..0.518 rows= loops=1) >Filter: (i = 3) >Rows Removed by Filter: > I see. So it works with low cardinality in the partitioned column. With high cardinality an index scan on an unpartitioned table would be preferable I guess. The documentation page I've linked only contains examples around partitioning BY RANGE. I believe it'd be helpful to extend it with some meaningful examples for LIST and HASH partitioning. Regards, -- Alex
Re: When to use PARTITION BY HASH?
On Wed, Jun 03, 2020 at 09:45:48AM +0200, Oleksandr Shulgin wrote: > I see. So it works with low cardinality in the partitioned column. With > high cardinality an index scan on an unpartitioned table would be > preferable I guess. > > The documentation page I've linked only contains examples around > partitioning BY RANGE. I believe it'd be helpful to extend it with some > meaningful examples for LIST and HASH partitioning. I agree. I think it would also be useful to mention the "benefits" which aren't likely to apply to hash partitioning. Would you want to propose an example to include ? Eventually it needs to be submitted as a patch to -hackers. -- Justin
increased max_parallel_workers_per_gather results in fewer workers?
Hi all, I’ve been experimenting with some performance tuning on a particular query, and I observed a result that I don’t understand. I’ve been setting max_parallel_workers_per_gather to values the range 1-6 and then running EXPLAIN ANALYZE to see how much benefit we get from more parallelization. My data is organized by year, so the year is a parameter in the query’s WHERE clause. For my 2018 data, Postgres launches as many workers as max_parallel_workers_per_gather permits, and the execution time decreases nicely, from 280 seconds with 1 worker all the way down to 141s with 6 workers. So far, so good. When I run the same query for our 2022 data, I get the same behavior (improvement) for max_parallel_workers_per_gather values of 1-4. But with max_parallel_workers_per_gather set to 5 or 6, Postgres only uses 1 worker, and the execution time increases dramatically, even worse than when I deliberately limit the number of workers to 1 — - max_parallel_workers_per_gather=1, runtime = 1061s - max_parallel_workers_per_gather=2, runtime = 770s - max_parallel_workers_per_gather=3, runtime = 637s - max_parallel_workers_per_gather=4, runtime = 573s - max_parallel_workers_per_gather=5, runtime = 1468s - max_parallel_workers_per_gather=6, runtime = 1469s Our 2022 data set is several times larger than our 2018 data, so I suspect some resource is getting exhausted, but I’m not sure what. So far, this result has been 100% re-creatable. I’m on a dedicated test server with 16 virtual CPUs and 128Gb RAM; no one else is competing with me for Postgres processes. max_worker_processes and max_parallel_workers are both set to 12. Can anyone help me understand why this happens, or where I might look for clues? Thanks, Philip
Re: increased max_parallel_workers_per_gather results in fewer workers?
On Wed, Jun 03, 2020 at 04:04:13PM -0400, Philip Semanchuk wrote: > Can anyone help me understand why this happens, or where I might look for > clues? What version postgres ? Can you reproduce if you do: ALTER SYSTEM SET max_parallel_workers_per_gather=0; SELECT pg_reload_conf(); .. and then within the session do: SET max_parallel_workers_per_gather=12; I guess you should show an explain analyze, specifically "Workers Planned/Launched", maybe by linking to explain.depesz.com -- Justin
Re: increased max_parallel_workers_per_gather results in fewer workers?
> On Jun 3, 2020, at 5:15 PM, Justin Pryzby wrote: > > On Wed, Jun 03, 2020 at 04:04:13PM -0400, Philip Semanchuk wrote: >> Can anyone help me understand why this happens, or where I might look for >> clues? > > What version postgres ? Sorry, I should have posted that in my initial email. select version(); +-+ | version | |-| | PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit | +-+ This is AWS’ version of Postgres 11.6 (“Aurora”) which of course might make a difference. > Can you reproduce if you do: > ALTER SYSTEM SET max_parallel_workers_per_gather=0; SELECT pg_reload_conf(); > .. and then within the session do: SET max_parallel_workers_per_gather=12; Unfortunately under Aurora I’m not superuser so I can’t run ALTER SYSTEM, but I can change the config via AWS’ config interface, so I set max_parallel_workers_per_gather=0 there. show max_parallel_workers_per_gather +---+ | max_parallel_workers_per_gather | |---| | 0 | +---+ SHOW Time: 0.034s postgres@philip-2020-05-19-cluster:wylan> SET max_parallel_workers_per_gather=12 SET Time: 0.028s postgres@philip-2020-05-19-cluster:wylan> show max_parallel_workers_per_gather +---+ | max_parallel_workers_per_gather | |---| | 12| +---+ SHOW I then ran the EXPLAIN ANALYZE and got the same slow runtime (1473s) and 1 worker in the EXPLAIN ANALYZE output. > I guess you should show an explain analyze, specifically "Workers > Planned/Launched", maybe by linking to explain.depesz.com Out of an abundance of caution, our company has a policy of not pasting our plans to public servers. However, I can confirm that when I set max_parallel_workers_per_gather > 4 and the runtime increases, this is what’s in the EXPLAIN ANALYZE output: Workers Planned: 1 Workers Launched: 1 FWIW, the Planning Time reported in EXPLAIN ANALYZE output doesn’t vary significantly, only from 411-443ms, and the variation within that range correlates only very weakly with max_parallel_workers_per_gather. thank you Philip
Re: increased max_parallel_workers_per_gather results in fewer workers?
On Wed, Jun 03, 2020 at 06:23:57PM -0400, Philip Semanchuk wrote: > > On Jun 3, 2020, at 5:15 PM, Justin Pryzby wrote: > > What version postgres ? > > This is AWS’ version of Postgres 11.6 (“Aurora”) which of course might make a > difference. > > I guess you should show an explain analyze, specifically "Workers > > Planned/Launched", maybe by linking to explain.depesz.com > > Out of an abundance of caution, our company has a policy of not pasting our > plans to public servers. However, I can confirm that when I set > max_parallel_workers_per_gather > 4 and the runtime increases, this is what’s > in the EXPLAIN ANALYZE output: > > Workers Planned: 1 > Workers Launched: 1 Are you referring to a parallel scan/aggregate/hash/?? Are you able to show a plan for a toy query like SELECT count(col) FROM tbl , preferably including a CREATE TABLE tbl AS... ; VACUUM ANALYZE tbl; Are you able to reproduce with an unpatched postgres ? -- Justin
Re: increased max_parallel_workers_per_gather results in fewer workers?
Hi Philip, > On 4. Jun 2020, at 00:23, Philip Semanchuk > wrote: > >> I guess you should show an explain analyze, specifically "Workers >> Planned/Launched", maybe by linking to explain.depesz.com > > Out of an abundance of caution, our company has a policy of not pasting our > plans to public servers. However, I can confirm that when I set > max_parallel_workers_per_gather > 4 and the runtime increases, this is what’s > in the EXPLAIN ANALYZE output: > > Workers Planned: 1 > Workers Launched: 1 Can you please verify the amount of max_parallel_workers and max_worker_processes? It should be roughly max_worker_processes > max_parallel_workers > max_parallel_workers_per_gather, for instance: max_worker_processes = 24 max_parallel_workers = 18 max_parallel_workers_per_gather = 6 Also, there are more configuration settings related to parallel queries you might want to look into. Most notably: parallel_setup_cost parallel_tuple_cost min_parallel_table_scan_size Especially the last one is a typical dealbreaker, you can try to set it to 0 for the beginning. Good starters for the others are 500 and 0.1 respectively. > FWIW, the Planning Time reported in EXPLAIN ANALYZE output doesn’t vary > significantly, only from 411-443ms, and the variation within that range > correlates only very weakly with max_parallel_workers_per_gather. It can happen, that more parallelism does not help the query but slows it down beyond a specific amount of parallel workers. You can see this in EXPLAIN when there is for instance a BITMAP HEAP INDEX SCAN or similar involved. Cheers, Sebastian
Re: increased max_parallel_workers_per_gather results in fewer workers?
Hi, on top of the settings that Sebastian suggested, you can also try disabling the participation of the leader (i.e. the main backend process for your connection) in the distribution of the parallel workload: SET parallel_leader_participation TO false Depending on your workload the leader could be saturated if it has to do a share of the workload and aggregate the results of all the workers. Cheers Luis
