Postgres process count GCC vs Clang is Different on autovaccum=on

2021-11-24 Thread hpc researcher_mspk
Software/Hardware used:
===
PostgresV14.v
OS: RHELv8.4
Benchmark:HammerDB v4.3
Hardware used: Apple/AMD Ryzen.
RAM size: 256 GB
SSD/HDD: 1TB
CPU(s): 256(0-255)
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):   2
NUMA node(s):8

Command used to count process: ps -eaf | grep postgres

Case1: AutoVaccum=on
vu GCC Clang
32 43 42
64 76 74
192 203 202
250 262 262
Case2:AutoVaccum=off
vu GCC Clang
32 40 40
64 72 72
192 200 200
250 261 263
In Case1 why is the process different in Clang vs GCC.
In postgresql process dependent wrt compiler GCC/Clang?
Is any recommendation or suggestion to check on this in Postgresv14


Re: Out of memory error

2021-11-24 Thread Thomas Kellerer
aditya desai schrieb am 24.11.2021 um 08:35:
> Hi Thomas,
> v_message is of composite data type r_log_message and it's definition is as 
> shown below.
>
> postgres=# \d r_log_message;
>                  Composite type "public.r_log_message"
>    Column    |          Type           | Collation | Nullable | Default
> -+-+---+--+-
>  column_name | character varying(30)   |           |          |
>  oldvalue    | character varying(4000) |           |          |
>  newvalue    | character varying(4000) |           |          |
>
> Regards,
> Aditya.

Sorry, didn't see that.

Then you need to create records of that type in the array:

   v_message := array[('empName', old.empname, new.empname)::r_log_message, 
('location', old.location, new.location)::r_log_message];

or an array of that type:

   v_message := array[('empName', old.empname, new.empname), ('location', 
old.location, new.location)]::r_log_message[];


Btw: why don't you use `text` instead of varchar(4000).




Re: Out of memory error

2021-11-24 Thread Thomas Kellerer
aditya desai schrieb am 24.11.2021 um 08:31:
> H Michael,
> Please see insert_info function below. Also r_log_message is composite data 
> type and it's definition is also given below.
>
> CREATE OR REPLACE FUNCTION insert_info(
>     info_array  r_log_message[]
> ) RETURNS varchar AS $$
>     DECLARE
>         info_element  r_log_message;
>     BEGIN
>         FOREACH info_element IN ARRAY info_array
>         LOOP
>             INSERT INTO testaditya(
>                 columname,
>                 oldvalue,
>                 newvalue
>             ) VALUES(
>                 info_element.column_name,
>                 info_element.oldvalue,
>                 info_element.newvalue
>             );
>         END LOOP;
>         RETURN 'OK';
>     END;
> $$ LANGUAGE plpgsql;

You don't need a loop for that. This can be done more efficiently using unnest()


INSERT INTO testaditya(columname,oldvalue,newvalue)
select u.*
from unnest(info_array) as u;







Re: Postgres process count GCC vs Clang is Different on autovaccum=on

2021-11-24 Thread Tomas Vondra




On 11/24/21 13:05, hpc researcher_mspk wrote:

Software/Hardware used:
===
PostgresV14.v
OS: RHELv8.4
Benchmark:HammerDB v4.3
Hardware used: Apple/AMD Ryzen.
RAM size: 256 GB
SSD/HDD: 1TB
CPU(s): 256(0-255)
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        8

Command used to count process: ps -eaf | grep postgres

Case1: AutoVaccum=on
vu  GCC Clang
32  43  42
64  76  74
192 203 202
250 262 262


Case2:AutoVaccum=off
vu  GCC Clang
32  40  40
64  72  72
192 200 200
250 261 263


In Case1 why is the process different in Clang vs GCC.
In postgresql process dependent wrt compiler GCC/Clang?


No, it's not. The most likely explanation is that you're seeing 
different number of autovacuum workers. Those are dynamic, i.e. may 
appear/disappear. Or maybe there are more connections to the DB.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Out of memory error

2021-11-24 Thread aditya desai
Thanks Thomas!  Sorry to say this but ,this was migrated from Oracle to PG
:) and the app team just wants to keep the data type as it is  :(

On Wed, Nov 24, 2021 at 5:40 PM Thomas Kellerer  wrote:

> aditya desai schrieb am 24.11.2021 um 08:35:
> > Hi Thomas,
> > v_message is of composite data type r_log_message and it's definition is
> as shown below.
> >
> > postgres=# \d r_log_message;
> >  Composite type "public.r_log_message"
> >Column|  Type   | Collation | Nullable | Default
> > -+-+---+--+-
> >  column_name | character varying(30)   |   |  |
> >  oldvalue| character varying(4000) |   |  |
> >  newvalue| character varying(4000) |   |  |
> >
> > Regards,
> > Aditya.
>
> Sorry, didn't see that.
>
> Then you need to create records of that type in the array:
>
>v_message := array[('empName', old.empname,
> new.empname)::r_log_message, ('location', old.location,
> new.location)::r_log_message];
>
> or an array of that type:
>
>v_message := array[('empName', old.empname, new.empname), ('location',
> old.location, new.location)]::r_log_message[];
>
>
> Btw: why don't you use `text` instead of varchar(4000).
>
>
>


Re: Need help identifying a periodic performance issue.

2021-11-24 Thread Robert Creager


On Nov 19, 2021, at 11:47 AM, Robert Creager 
mailto:[email protected]>> wrote:



On Nov 18, 2021, at 2:42 PM, Tom Lane 
mailto:[email protected]>> wrote:

This message originated outside your organization.

Thomas Munro mailto:[email protected]>> writes:

This is all guesswork though, since we don't know quite what's
happening on Robert's system.  It might be worth setting
"log_autovacuum_min_duration = 0" (either globally, or as a
reloption on the relevant tables), and seeing if there seems
to be any correlation between autovacuum/autoanalyze activity
and the occurrences of poor plan choices.

Ok, doing a SET plan_cache_mode=force_custom_plan before the COPY and resetting 
it after appears to fix the problem.  We’re going to run it over the weekend to 
make sure.

We are at it again.  I have a DELETE operation that’s taking 48 minutes so far. 
 I had set plan_cache_mode = force_custom_plan for the entire server before 
this happened, as we started seeing the COPY slowdown again.  I have dtrace 
information again, but primarily shows the nested scan operation.

pid,client_port,runtime,query_start,datname,state,wait_event_type,query,usename
40665,15978,0 years 0 mons 0 days 0 hours 48 mins 49.62347 secs,2021-11-24 
20:13:30.017188 +00:00,tapesystem,active,,DELETE FROM ds3.blob WHERE EXISTS 
(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = 
$1)),Administrator

So how do we avoid this query plan? Do we need to start doing explicit analyzes 
after every delete?


EXPLAIN DELETE
FROM ds3.blob
WHERE EXISTS(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND 
(bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1'));

250k objects in blob
256k objects in s3_object

QUERY PLAN
Delete on blob  (cost=10117.05..16883.09 rows=256002 width=12)
  ->  Hash Join  (cost=10117.05..16883.09 rows=256002 width=12)
Hash Cond: (blob.object_id = s3_object.id)
->  Seq Scan on blob  (cost=0.00..6094.02 rows=256002 width=22)
->  Hash  (cost=6917.02..6917.02 rows=256002 width=22)
  ->  Seq Scan on s3_object  (cost=0.00..6917.02 rows=256002 
width=22)
Filter: (bucket_id = 
'8a988c6c-ef98-465e-a148-50054c739212'::uuid)

’Normal’ explain, very few objects with that bucket.

QUERY PLAN
Delete on blob  (cost=0.71..6.76 rows=1 width=12)
  ->  Nested Loop  (cost=0.71..6.76 rows=1 width=12)
->  Index Scan using ds3_s3_object__bucket_id on s3_object  
(cost=0.29..2.31 rows=1 width=22)
  Index Cond: (bucket_id = 
'85b9e793-2141-455c-a752-90c2346cdfe1'::uuid)
->  Index Scan using ds3_blob__object_id on blob  (cost=0.42..4.44 
rows=1 width=22)
  Index Cond: (object_id = s3_object.id)




Re: Need help identifying a periodic performance issue.

2021-11-24 Thread Robert Creager
I forgot, I had reloaded postgres, but had not re-started our app, so the 
connections wouldn’t have that plan setting on them. Re-doing now.

On Nov 24, 2021, at 2:13 PM, Robert Creager 
mailto:[email protected]>> wrote:



On Nov 19, 2021, at 11:47 AM, Robert Creager 
mailto:[email protected]>> wrote:



On Nov 18, 2021, at 2:42 PM, Tom Lane 
mailto:[email protected]>> wrote:

This message originated outside your organization.

Thomas Munro mailto:[email protected]>> writes:

This is all guesswork though, since we don't know quite what's
happening on Robert's system.  It might be worth setting
"log_autovacuum_min_duration = 0" (either globally, or as a
reloption on the relevant tables), and seeing if there seems
to be any correlation between autovacuum/autoanalyze activity
and the occurrences of poor plan choices.

Ok, doing a SET plan_cache_mode=force_custom_plan before the COPY and resetting 
it after appears to fix the problem.  We’re going to run it over the weekend to 
make sure.

We are at it again.  I have a DELETE operation that’s taking 48 minutes so far. 
 I had set plan_cache_mode = force_custom_plan for the entire server before 
this happened, as we started seeing the COPY slowdown again.  I have dtrace 
information again, but primarily shows the nested scan operation.

pid,client_port,runtime,query_start,datname,state,wait_event_type,query,usename
40665,15978,0 years 0 mons 0 days 0 hours 48 mins 49.62347 secs,2021-11-24 
20:13:30.017188 +00:00,tapesystem,active,,DELETE FROM ds3.blob WHERE EXISTS 
(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = 
$1)),Administrator

So how do we avoid this query plan? Do we need to start doing explicit analyzes 
after every delete?


EXPLAIN DELETE
FROM ds3.blob
WHERE EXISTS(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND 
(bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1'));

250k objects in blob
256k objects in s3_object

QUERY PLAN
Delete on blob  (cost=10117.05..16883.09 rows=256002 width=12)
  ->  Hash Join  (cost=10117.05..16883.09 rows=256002 width=12)
Hash Cond: (blob.object_id = s3_object.id)
->  Seq Scan on blob  (cost=0.00..6094.02 rows=256002 width=22)
->  Hash  (cost=6917.02..6917.02 rows=256002 width=22)
  ->  Seq Scan on s3_object  (cost=0.00..6917.02 rows=256002 
width=22)
Filter: (bucket_id = 
'8a988c6c-ef98-465e-a148-50054c739212'::uuid)

’Normal’ explain, very few objects with that bucket.

QUERY PLAN
Delete on blob  (cost=0.71..6.76 rows=1 width=12)
  ->  Nested Loop  (cost=0.71..6.76 rows=1 width=12)
->  Index Scan using ds3_s3_object__bucket_id on s3_object  
(cost=0.29..2.31 rows=1 width=22)
  Index Cond: (bucket_id = 
'85b9e793-2141-455c-a752-90c2346cdfe1'::uuid)
->  Index Scan using ds3_blob__object_id on blob  (cost=0.42..4.44 
rows=1 width=22)
  Index Cond: (object_id = s3_object.id)



Re: Need help identifying a periodic performance issue.

2021-11-24 Thread Justin Pryzby
On Wed, Nov 24, 2021 at 10:44:12PM +, Robert Creager wrote:
> I forgot, I had reloaded postgres, but had not re-started our app, so the 
> connections wouldn’t have that plan setting on them. Re-doing now.

Are you sure?  GUC changes should be applied for existing sessions, right ?

Would you send the logs surrounding the slow COPY ?
Specifically including the autovacuum logs.

> We are at it again.  I have a DELETE operation that’s taking 48 minutes so 
> far.

Before, you had slow COPY due to FKs.  Now you have a slow DELETE, which you
only alluded to before.

> So how do we avoid this query plan? Do we need to start doing explicit 
> analyzes after every delete?

If your DELETE is deleting the entire table, then I think you should VACUUM
anyway (or else the next inserts will bloat the table).

Or (preferably) use TRUNCATE instead, which will set relpages=0 and (one
supposes) avoid the bad plans.  But read the NOTE about non-mvcc behavior of
TRUNCATE, in case that matters to you.

But first, I believe Thomas was suggesting to put plan_cache_mode back to its
default, and (for testing purposes) try using issue DISCARD PLANS.

On Fri, Nov 19, 2021 at 10:08:02AM +1300, Thomas Munro wrote:
> Just to understand what's going on, it'd be interesting to know if the
> problem goes away if you *just* inject the DISCARD PLANS statement
> before running your COPYs, but if that doesn't help it'd also be
> interesting to know what happens if you ANALYZE each table after each
> COPY.  Are you running any explicit ANALYZE commands?  How long do
> your sessions/connections live for?

-- 
Justin