Having just received a shiny new dual CPU machine to use as a postgresql server, I'm trying to do some reasonable efforts to configure it correctly. The hard ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql 16.9.
In postgresql.conf I have: max_worker_processes = 90 # (change requires restart) max_parallel_workers_per_gather = 72 # gsh 26 oct 2022 max_parallel_maintenance_workers = 72 # gsh 12 jun 2025 max_parallel_workers = 72 # gsh 12 jun 2025 max_logical_replication_workers = 72 # gsh 12 jun 2025 max_sync_workers_per_subscription = 72 # gsh 12 jun 2025 autovacuum_max_workers = 12 # max number of autovacuum subprocesses When I do a simple count of a large (large being 1.8 Billion entries), I get about 10 workers used. prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from gaiadr3.gaia_source; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=14379796.81..14379796.82 rows=1 width=8) (actual time=16702.806..16705.479 rows=1 loops=1) Buffers: shared hit=2507481 -> Gather (cost=14379795.78..14379796.79 rows=10 width=8) (actual time=16702.513..16705.470 rows=11 loops=1) Workers Planned: 10 Workers Launched: 10 Buffers: shared hit=2507481 -> Partial Aggregate (cost=14379785.78..14379785.79 rows=1 width=8) (actual time=16691.820..16691.821 rows=1 loops=11) Buffers: shared hit=2507481 -> Parallel Index Only Scan using gaia_source_nest128 on gaia_source (cost=0.58..13926632.85 rows=181261171 width=0) (actual time=0.025..9559.644 rows=164700888 loops=11) Heap Fetches: 0 Buffers: shared hit=2507481 Planning: Buffers: shared hit=163 Planning Time: 14.898 ms Execution Time: 16705.592 ms Postgres has chosen to use only a small fraction of the CPU's I have on my machine. Given the query returns an answer in about 8 seconds, it may be that Postgresql has allocated the proper number of works. But if I wanted to try to tweak some config parameters to see if using more workers would give me an answer faster, I don't seem to see any obvious knobs to turn. Are there parameters that I can adjust to see if I can increase throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost likely to be of help?