Re: Very expensive update to update a single row

2025-09-23 Thread Sean M
Hi,

My first thought is pending list costs associated with the GIN indexes.
https://pganalyze.com/blog/gin-index

You may be able to use pageinspect's function gin_metapage_info to see what the
https://www.postgresql.org/docs/16/pageinspect.html#PAGEINSPECT-GIN-FUNCS
https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/issues/14386

Are you seeing HOT updates on the good example but not able to get HOT
updates on the bad examples so all indexes need to get updated? You
are updating checkOutByUID and CheckOutDateTime which don't appear to
be indexes so HOT updates should be possible. Lowering the fill factor
on the table may allow more HOT updates. If you change the fill factor
you need to rebuild the table using pg_repack, etc. You could check
for hot updates on the table with something like this.
Select n_tup_upd, n_tup_hot_upd, round(100.0*n_tup_hot_upd /
NULLIF(n_tup_upd,0), 1) as hot_pct
from pg_stat_user_tables
where relname = 'request';

https://www.crunchydata.com/blog/postgres-performance-boost-hot-updates-and-fill-factor
https://pganalyze.com/blog/5mins-postgres-performance-HOT-updates-CLUSTER

HTH.

On Wed, 24 Sept 2025 at 05:41, Dirschel, Steve
 wrote:
>
> Postgres RDS running in AWS.  PostgreSQL 16.8 on aarch64-unknown-linux-gnu, 
> compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17), 64-bit
>
>
>
> Table definition
>
>
>
> Table: request
>
>   Column  |Type | Collation | Nullable |
> Default
>
> --+-+---+--+
>
> objectid | character(36)   |   | not null |
>
> data | jsonb   |   | not null | 
> '{}'::jsonb
>
> clientid | character(3)|   | not null |
>
> active   | integer |   | not null |
>
> productid| integer |   | not null |
>
> checkoutbyuid| character(100)  |   |  |
>
> checkoutdatetime | timestamp without time zone |   |  |
>
> metadata | jsonb   |   | not null | 
> '{}'::jsonb
>
> search_vector| tsvector|   |  |
>
> requeststate | text|   | not null | 
> 'Active'::text
>
> Indexes:
>
> "requestkey" PRIMARY KEY, btree (objectid, productid)
>
> "idx_request_gin_data" gin (data)
>
> "idx_request_gin_data_i" gin ((data -> 'i'::text))
>
> "idx_request_gin_data_r" gin ((data -> 'r'::text))
>
> "idx_request_gin_data_t" gin ((data -> 't'::text))
>
> "idx_request_gin_meta_identifiers" gin ((metadata -> 'identifiers'::text))
>
> "idx_request_gin_metadata" gin (metadata)
>
> "idx_request_meta_identifiers" btree ((metadata ->> 'identifiers'::text))
>
> Check constraints:
>
> "require_jsonb_data" CHECK (jsonb_typeof(data) = 'object'::text)
>
> "require_jsonb_metadata" CHECK (jsonb_typeof(metadata) = 'object'::text)
>
> Disabled user triggers:
>
> audit_trigger_row AFTER INSERT OR DELETE OR UPDATE ON request FOR EACH 
> ROW EXECUTE FUNCTION audit.audit_event_processor('true')
>
> audit_trigger_stm AFTER TRUNCATE ON request FOR EACH STATEMENT EXECUTE 
> FUNCTION audit.audit_event_processor('true'
>
>
>
> Lots of gin indexes...
>
> Note the 2 triggers are disabled.
>
>
>
> Size of the table, indexes, toast:
>
>
>
> table_name  request
>
> row_estimate227,771
>
> total_bytes 12,952,428,544
>
> index_bytes 6,088,523,776
>
> toast_bytes 6,614,458,368
>
> table_bytes 249,446,400
>
> total12 GB
>
> index5806 MB
>
> toast6308 MB
>
> table   238 MB
>
>
>
>
>
> We are updating some sample rows.  Here are the sizes of the 2 JSONB columns 
> for these sample rows:
>
>
>
> select objectid, pg_column_size(data) data, pg_column_size(metadata) 
> metadata,  clientid
>
>   from request r
>
>  order by  pg_column_size(data)  desc
>
>  limit 4;
>
>
>
> objectid data  metadata   clientid
>
> f9077561-4b40-41f8-8a52-3a2b83014982 22172347   842   VHW 
>< update #1 below
>
> 0103fe5d-7407-45b6-8966-32a8e4f6b67c 17814078   866   OV9 
>< update #2 below
>
> 6009dbfc-8375-48b3-bbf6-921d48aad308 15971425   879   VHW
>
> 3a0201bc-23f0-b648-8408-c21f81f6b974 14690119   785   VHU 
>< update #3 below
>
>
>
>
>
> *** Update the top row above which is the largest for the DATA JSONB column:
>
>
>
> explain(analyze, buffers)
>
> UPDATE request
>
> SET
>
>  checkOutByUID = 'shivatest',
>
>  CheckOutDateTime = '2022-01-10T16:24:10.2900556Z'
>
> WHERE
>
>  objectid = 'f9077561-4b40-41f8-8a52-3a2b83014982';
>
>
>
> Update on request  (cost=0.42..8.44 rows=0 width=0) (actual 
> time=74.678..74.679 rows=0 lo

Re: Indexes on expressions with multiple columns and operators

2025-09-23 Thread Andrei Lepikhov

On 23/9/2025 12:20, Frédéric Yhuel wrote:

On 9/22/25 23:15, Andrei Lepikhov wrote:
It may solve at least one issue with the 'dependencies' statistics: a 
single number describing the dependency between any two values in the 
columns often leads to incorrect estimations, as I see.


For what it's worth, I've never encountered a case in my life as a 
PostgreSQL support engineer where the 'dependency' kind could be useful. 
I only successfully used the 'mcv' kind once (and that was only 
partially successful, as it fixed the estimates but not the plan).Thanks for your feedback!
I also don't think the 'dependencies' statistics are very useful now, 
especially considering how many computational resources it is needed in 
case of multiple columns involved.
But is it the same for the 'distinct' statistics? It seems you should 
love it - the number of groups in GROUP-BY, DISTINCT, and even HashJoin 
should be estimated more precisely, no?


--
regards, Andrei Lepikhov




Very expensive update to update a single row

2025-09-23 Thread Dirschel, Steve
Postgres RDS running in AWS.  PostgreSQL 16.8 on aarch64-unknown-linux-gnu, 
compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17), 64-bit

Table definition

Table: request
  Column  |Type | Collation | Nullable |
Default
--+-+---+--+
objectid | character(36)   |   | not null |
data | jsonb   |   | not null | 
'{}'::jsonb
clientid | character(3)|   | not null |
active   | integer |   | not null |
productid| integer |   | not null |
checkoutbyuid| character(100)  |   |  |
checkoutdatetime | timestamp without time zone |   |  |
metadata | jsonb   |   | not null | 
'{}'::jsonb
search_vector| tsvector|   |  |
requeststate | text|   | not null | 
'Active'::text
Indexes:
"requestkey" PRIMARY KEY, btree (objectid, productid)
"idx_request_gin_data" gin (data)
"idx_request_gin_data_i" gin ((data -> 'i'::text))
"idx_request_gin_data_r" gin ((data -> 'r'::text))
"idx_request_gin_data_t" gin ((data -> 't'::text))
"idx_request_gin_meta_identifiers" gin ((metadata -> 'identifiers'::text))
"idx_request_gin_metadata" gin (metadata)
"idx_request_meta_identifiers" btree ((metadata ->> 'identifiers'::text))
Check constraints:
"require_jsonb_data" CHECK (jsonb_typeof(data) = 'object'::text)
"require_jsonb_metadata" CHECK (jsonb_typeof(metadata) = 'object'::text)
Disabled user triggers:
audit_trigger_row AFTER INSERT OR DELETE OR UPDATE ON request FOR EACH ROW 
EXECUTE FUNCTION audit.audit_event_processor('true')
audit_trigger_stm AFTER TRUNCATE ON request FOR EACH STATEMENT EXECUTE 
FUNCTION audit.audit_event_processor('true'

Lots of gin indexes...
Note the 2 triggers are disabled.

Size of the table, indexes, toast:

table_name  request
row_estimate227,771
total_bytes 12,952,428,544
index_bytes 6,088,523,776
toast_bytes 6,614,458,368
table_bytes 249,446,400
total12 GB
index5806 MB
toast6308 MB
table   238 MB


We are updating some sample rows.  Here are the sizes of the 2 JSONB columns 
for these sample rows:

select objectid, pg_column_size(data) data, pg_column_size(metadata) metadata,  
clientid
  from request r
 order by  pg_column_size(data)  desc
 limit 4;

objectid data  metadata   clientid
f9077561-4b40-41f8-8a52-3a2b83014982 22172347   842   VHW   
 < update #1 below
0103fe5d-7407-45b6-8966-32a8e4f6b67c 17814078   866   OV9   
 < update #2 below
6009dbfc-8375-48b3-bbf6-921d48aad308 15971425   879   VHW
3a0201bc-23f0-b648-8408-c21f81f6b974 14690119   785   VHU   
 < update #3 below


*** Update the top row above which is the largest for the DATA JSONB column:

explain(analyze, buffers)
UPDATE request
SET
 checkOutByUID = 'shivatest',
 CheckOutDateTime = '2022-01-10T16:24:10.2900556Z'
WHERE
 objectid = 'f9077561-4b40-41f8-8a52-3a2b83014982';

Update on request  (cost=0.42..8.44 rows=0 width=0) (actual time=74.678..74.679 
rows=0 loops=1)
  Buffers: shared hit=2818 read=1 dirtied=1
  I/O Timings: shared read=0.835
  ->  Index Scan using requestkey on request  (cost=0.42..8.44 rows=1 
width=418) (actual time=0.875..0.878 rows=1 loops=1)
Index Cond: (objectid = 'f9077561-4b40-41f8-8a52-3a2b83014982'::bpchar)
Buffers: shared hit=3 read=1
I/O Timings: shared read=0.835
Planning Time: 0.093 ms
Execution Time: 74.707 ms

*** 2818 shared block hits, 1 read, 1 dirtied.  Excellent.


*** Update the next largest row

explain(analyze, buffers) UPDATE request
SET
 checkOutByUID = 'shivatest',
 CheckOutDateTime = '2022-01-10T16:24:10.2900556Z'
WHERE
 objectid = '0103fe5d-7407-45b6-8966-32a8e4f6b67c';

Update on request  (cost=0.42..8.44 rows=0 width=0) (actual 
time=34663.039..34663.040 rows=0 loops=1)
  Buffers: shared hit=10787339 read=79854 dirtied=90531
  I/O Timings: shared read=5083.198
  ->  Index Scan using requestkey on request  (cost=0.42..8.44 rows=1 
width=418) (actual time=0.908..0.912 rows=1 loops=1)
Index Cond: (objectid = '0103fe5d-7407-45b6-8966-32a8e4f6b67c'::bpchar)
Buffers: shared hit=3 read=1
I/O Timings: shared read=0.868
Planning Time: 0.095 ms
Execution Time: 34663.071 ms

Wow-  10 million shared block reads...  90k dirtied...  update it again:  Same 
deal with shared blocks at 10M.  Dirtied came down but shared blocks is still 
extremely high.

Update on request  (cost=0.42..8.44 rows=0 width=0) (actual 
time=27683.165..27683.166 rows=0 lo