PostgreSQL performance problem moving from 9.6.17 to 12.3

2020-05-28 Thread Kenneth Marshall
Hi PostgreSQL community,

I have a system that was running version 9.6.17 running on a system with
48gb of memory and spinning disks front-ed by a HW RAID controller with
NVRAM cache. We moved to a new box running version 12.3 on a system with
64gb of memory and NVME SSD drives. Here are the system config options:

OLD:
shared_buffers = 2048MB # min 128kB
work_mem = 128MB# min 64kB
maintenance_work_mem = 1024MB   # min 1MB
effective_io_concurrency = 8# 1-1000; 0 disables prefetching
max_parallel_workers_per_gather = 0 # taken from max_worker_processes
effective_cache_size = 24GB
default_statistics_target = 500 # range 1-1
from_collapse_limit = 30
join_collapse_limit = 30# 1 disables collapsing of explicit
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 4.0  # same scale as above

NEW:
shared_buffers = 12GB   # min 128kB
work_mem = 128MB# min 64kB
maintenance_work_mem = 2GB  # min 1MB
effective_io_concurrency = 200  # 1-1000; 0 disables prefetching
max_worker_processes = 24   # (change requires restart)
max_parallel_workers_per_gather = 4 # taken from max_parallel_workers
max_parallel_workers = 24   # maximum number of 
max_worker_processes that
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.1  # same scale as above for SSDs
effective_cache_size = 36GB
default_statistics_target = 500 # range 1-1
from_collapse_limit = 30
join_collapse_limit = 30# 1 disables collapsing of explicit

As far as the schema goes, it uses an id field populated by a sequence as
the primary key for everything. Here are the definitions for the tables
involved in the query:

  Table "public.users"
   Column|Type | 
Modifiers  
-+-+
 id  | integer | not null default 
nextval(('users_id_seq'::text)::regclass)
 name| character varying(200)  | not null
 password| character varying(256)  | 
 comments| text| 
 signature   | text| 
 emailaddress| character varying(120)  | 
 freeformcontactinfo | text| 
 organization| character varying(200)  | 
 realname| character varying(120)  | 
 nickname| character varying(16)   | 
 lang| character varying(16)   | 
 gecos   | character varying(16)   | 
 homephone   | character varying(30)   | 
 workphone   | character varying(30)   | 
 mobilephone | character varying(30)   | 
 pagerphone  | character varying(30)   | 
 address1| character varying(200)  | 
 address2| character varying(200)  | 
 city| character varying(100)  | 
 state   | character varying(100)  | 
 zip | character varying(16)   | 
 country | character varying(50)   | 
 timezone| character varying(50)   | 
 creator | integer | not null default 0
 created | timestamp without time zone | 
 lastupdatedby   | integer | not null default 0
 lastupdated | timestamp without time zone | 
 authtoken   | character varying(16)   | 
 smimecertificate| text| 
Indexes:
"users_pkey" PRIMARY KEY, btree (id) CLUSTER
"users1" UNIQUE, btree (lower(name::text))
"users2" btree (lower(emailaddress::text))
"users_email_trgm" gin (emailaddress gin_trgm_ops)

Table "public.principals"
Column | Type  |Modifiers   
 
---+---+-
 id| integer   | not null default 
nextval(('principals_id_seq'::text)::regclass)
 principaltype | character varying(16) | not null
 disabled  | smallint  | not null default 0
Indexes:
"principals_pkey" PRIMARY KEY, btree (id) CLUSTER

   Table "public.cachedgroupmembers"
  Column   |   Type   |Modifiers

---+--+-
 id| integer  | not null default 
nextval(('cachedgroupmembers_id_seq'

Re: PostgreSQL performance problem moving from 9.6.17 to 12.3

2020-05-28 Thread Tom Lane
Kenneth Marshall  writes:
> I have a system that was running version 9.6.17 running on a system with
> 48gb of memory and spinning disks front-ed by a HW RAID controller with
> NVRAM cache. We moved to a new box running version 12.3 on a system with
> 64gb of memory and NVME SSD drives. Here are the system config options:

> OLD:
> shared_buffers = 2048MB # min 128kB
> work_mem = 128MB# min 64kB
> maintenance_work_mem = 1024MB   # min 1MB
> effective_io_concurrency = 8# 1-1000; 0 disables prefetching
> max_parallel_workers_per_gather = 0 # taken from max_worker_processes
> effective_cache_size = 24GB
> default_statistics_target = 500 # range 1-1
> from_collapse_limit = 30
> join_collapse_limit = 30# 1 disables collapsing of explicit
> seq_page_cost = 1.0   # measured on an arbitrary scale
> random_page_cost = 4.0# same scale as above

> NEW:
> shared_buffers = 12GB   # min 128kB
> work_mem = 128MB# min 64kB
> maintenance_work_mem = 2GB  # min 1MB
> effective_io_concurrency = 200  # 1-1000; 0 disables prefetching
> max_worker_processes = 24   # (change requires restart)
> max_parallel_workers_per_gather = 4 # taken from max_parallel_workers
> max_parallel_workers = 24   # maximum number of 
> max_worker_processes that
> seq_page_cost = 1.0 # measured on an arbitrary scale
> random_page_cost = 1.1  # same scale as above for SSDs
> effective_cache_size = 36GB
> default_statistics_target = 500 # range 1-1
> from_collapse_limit = 30
> join_collapse_limit = 30# 1 disables collapsing of explicit

Maybe you should be changing fewer variables at one time ...

In particular, decreasing random_page_cost as you've done here is
going to encourage the planner to rely on nestloop-with-inner-indexscan
joins.  Does undoing that change improve matters?

I personally think that v12 is way too enthusiastic about invoking
JIT compilation, too.  You might want to play with the parameters
for that as well.

regards, tom lane