Reading execution plan - first row time vs last row time

2024-10-01 Thread Pecsök Ján
Dear community,

We see significant difference in explain analyze Actual time in the first line 
of execution plan and Execution time in the last line of execution plan. What 
can be the reason?

For example, first line of execution plan:
Gather  (cost=1038.49..257627100.42 rows=794326 width=104) (actual 
time=531.925..3673159.806 rows=7943221053 loops=1)

This is 3673159/1000/60 = 61 minutes

And last 2 lines of execution plan:
Planning Time: 3000.399 ms
Execution Time: 21208130.602 ms

This is : 21208130/1000/60= 353 minutes


What happened in 353 - 61 = 292 minutes? We speculate that disk writes. Query 
possibly written cca 0,75TB of data, by create table as select query.


S pozdravem
Ján Pecsők

Tychonova 2, 160 00 Praha
Mobil: +420 775 942 871
jan.pec...@profinit.eu
www.profinit.eu
[cid:image001.png@01DB1417.E9060210]



Re: How to handle "could not find function xml_is_well_formed" when restoring database in Version 17

2024-10-01 Thread Christoph Moench-Tegeder
## George Weaver (gwea...@shaw.ca):

> I am testing upgrading from Version 13 to Version 17.  I am getting
> the following error when trying to restore a database in Version 17
> (the database was backed up from Version 13 using the Version 17
> pg_dump):
> 
>pg_Restore: error: could not execute query: ERROR:  could not find
>function "xml_is_well_formed" in file "C:/Program
>Files/PostgreSQL/17/lib/pgxml.dll"
>Command was: CREATE FUNCTION public.xml_is_well_formed(text) RETURNS
>boolean
>     LANGUAGE c IMMUTABLE STRICT
>     AS '$libdir/pgxml', 'xml_is_well_formed';
> 
> The only reference I can find to xml_is_well_formed in the Release
> Notes (Version 15) is:

That's it.
Your best option is to get rid of that module in the "old"
(PostgreSQL 13) database - maybe cou can just drop the extension
outright (because it's not really used anyways), maybe you need
to fix you code to use the new API. The minimalist workaround
would be to exclude function public.xml_is_well_formed() from
the backup (either at backup or at restore time) and point
your code to pg_catalog.xml_is_well_formed().
The fact that you are seeing that CREATE FUNCTION and not only
a CREATE EXTENSION would indicate that the function was created
manually and not as part of the extension (but pgxml.so would
match the extension) - or this is a left-over from pre-extension
days (CREATE EXTENSION was added in 9.1, so...)
I'd strongly advise to clean up the old database (if required
migrate away from the xml2 extension/functions and drop the
extensions and any manually created function referencing it),
it will safe you a lot of headache later on.

Regards,
Christoph

-- 
Spare Space




How to handle "could not find function xml_is_well_formed" when restoring database in Version 17

2024-10-01 Thread George Weaver

Good afternoon,

I am testing upgrading from Version 13 to Version 17.  I am getting the 
following error when trying to restore a database in Version 17 (the 
database was backed up from Version 13 using the Version 17 pg_dump):


   pg_Restore: error: could not execute query: ERROR:  could not find
   function "xml_is_well_formed" in file "C:/Program
   Files/PostgreSQL/17/lib/pgxml.dll"
   Command was: CREATE FUNCTION public.xml_is_well_formed(text) RETURNS
   boolean
    LANGUAGE c IMMUTABLE STRICT
    AS '$libdir/pgxml', 'xml_is_well_formed';

The only reference I can find to xml_is_well_formed in the Release Notes 
(Version 15) is:


   Remove xml2 's
   |xml_is_well_formed()| function (Tom Lane)

   This function has been implemented in the core backend since
   Postgres 9.1

How do I address this in restoring the backup to 17?

Thanks,

George




Re: How to handle "could not find function xml_is_well_formed" when restoring database in Version 17

2024-10-01 Thread Tom Lane
George Weaver  writes:
> I am testing upgrading from Version 13 to Version 17.  I am getting the 
> following error when trying to restore a database in Version 17 (the 
> database was backed up from Version 13 using the Version 17 pg_dump):

> pg_Restore: error: could not execute query: ERROR:  could not find
> function "xml_is_well_formed" in file "C:/Program
> Files/PostgreSQL/17/lib/pgxml.dll"
> Command was: CREATE FUNCTION public.xml_is_well_formed(text) RETURNS
> boolean
>      LANGUAGE c IMMUTABLE STRICT
>      AS '$libdir/pgxml', 'xml_is_well_formed';

What you appear to have here is a pre-9.1 version of the xml2
extension.  That is so old that you're going to have difficulty
modernizing it.  We used to provide scripts for converting those
loose objects into extensions, but we got rid of them in v13,
figuring that after ten years their usefulness had passed.

I think what you will have to do is manually drop all the xml2
functions (look for pg_proc entries with '$libdir/pgxml' in probin)
from the v13 database, then upgrade, then install the xml2 extension
if you still want it.  Fortunately that module only provided functions
not datatypes, so this shouldn't be too painful.  (Another way could
be to manually remove those CREATE FUNCTION commands from the dump
script.)

I'm betting that this database has a lot of other deferred
maintenance that you ought to think about while you're at it.
If there are any other old-style extensions in there, better
fix them up.

regards, tom lane




Re: Question on session timeout

2024-10-01 Thread Adrian Klaver

On 9/30/24 22:57, sud wrote:



On Tue, Oct 1, 2024 at 4:10 AM Adrian Klaver > wrote:



/Where are you getting the ~2000  count from?/
Seeing this in the "performance insights" dashboard and also its 
matching when I query the count of sessions from pg_stat_activity.


From your OP:

"... whereas the total number of active sessions in pg_stat_activity 
staying <100 at any point in time."


So:

1) They don't match.

2) ""performance insights" dashboard" does not actually tell us 
anything. What program and what is it actually measuring?




/What do you mean by ~120 sets, in particular what is a set?
/These are the values set as mentioned in the properties file which the 
application team uses for connection pooling


Again this does not tell us anything.

1) A set of what?

2) What properties file?



/Using what pooler?
/I need to check on this as Its Java application(jdbc driver for 
connecting to DB), so I thought it must be using standard connection 
pooling. Will double check.


Since pooling is what you are concerned with this is should be the 
starting point of your investigation.




/How often do to keep alive queries run?
/Need to check. But I am not sure, in general , if these "keep alive" 
queries are used for keeping a transaction alive or a session alive?


With an idle_in_transaction_session_timeout of 24 hrs I don't see that 
it makes a difference.




As you described, a long time open transaction with a session state as 
"idle" will be threatening as that will cause locking and "transaction 


That would be idle_in_transaction.

ID wrap around" issues to surface whereas having "idle sessions" of a 
closed transaction may not cause any issue as they will do no harm. Does 
it mean we can have any number of idle sessions or we should also have 
some non zero "timeout" setup for the "ide_session_timeout" parameter 
too (maybe ~1hr or so)?


Other then it takes up connections.



Is it correct to assume the session in pg_stat_activity with very old 
XACT_START are the one which are part of long running open 
transaction(i.e. driven by idle_in_transaction_session_timeout) whereas 
the ones with older BACKEND_START or QUERY_START are the one are just 
the idle session(driven by idle_session_timeout) but not tied to any 
open transaction?


I would read this descriptions here:

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW




Few observations:-

I do see, "MaximumUsedTransactionIDs" staying consistently ~200M for a 
long time then coming down. And its matching to 
"autovacuum_freeze_max_age" which is set as 200M. Hope it's expected. We 
have max_connections set as 5000.


"Database connection" touching ~2000 then coming down till 200. And we 
see uneven spikes in those, it seems to be matching with the pattern , 
when we have some errors occurring during the insert queries which are 
submitted by the Java application to insert the data into the tables.




This would have been a good thing to lead with.

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Question on session timeout

2024-10-01 Thread Greg Sabino Mullane
On Tue, Oct 1, 2024 at 1:57 AM sud  wrote:

> *Where are you getting the ~2000  count from?*
> Seeing this in the "performance insights" dashboard and also its matching
> when I query the count of sessions from pg_stat_activity.
>

So I'm guessing this is perhaps RDS or Aurora? Stating that up front can be
helpful.


> As you described, a long time open transaction with a session state as
> "idle" will be threatening as that will cause locking
>

No, idle is fine, "idle in transaction" is bad. :)

Is it correct to assume the session in pg_stat_activity with very old
> XACT_START are the one which are part of long running
>



You need to look at the "state" column as your primary bit of information.
Second most important is how long something has been in that state, which
you can find with now() - state_change. The best way to learn all of this
is to open a few concurrent sessions in psql and experiment.


> We have max_connections set as 5000.
>

That's quite high. But if you never reach that high, it doesn't matter a
whole lot.

"Database connection" touching ~2000 then coming down till 200. And we see
> uneven spikes in those, it seems to be matching with the pattern , when we
> have some errors occurring during the insert queries which are submitted by
> the Java application to insert the data into the tables.
>

(What sort of errors?) 2000 is high. Clearly, you are not pooling
connections, or not pooling them well. If you are using plain Postgres,
look into setting up pgbouncer. If using something managed (e.g. RDS) look
into their particular pooling solution. Or fix your application-level
pooling.

Cheers,
Greg


Re: Reading execution plan - first row time vs last row time

2024-10-01 Thread Tom Lane
=?iso-8859-2?Q?Pecs=F6k_J=E1n?=  writes:
> We see significant difference in explain analyze Actual time in the first 
> line of execution plan and Execution time in the last line of execution plan. 
> What can be the reason?

The time reported for the top query node is just the time taken to
compute all the rows returned by the query.  It doesn't account
for what might be done with the data afterward.  A normal EXPLAIN
just drops that data on the floor, so there's not much time left
unaccounted-for.  But if you're writing the data into a table via
EXPLAIN CREATE TABLE AS, that I/O would be extra.

regards, tom lane




Re: Help with query optimizer

2024-10-01 Thread Laurenz Albe
On Tue, 2024-10-01 at 09:36 +0100, David Mullineux wrote:
> > On Tue, 1 Oct 2024, 08:53 Siraj G,  wrote:
> > Can you please help with the attached query as it is taking time to
> > execute. The execution plan for the same is below:
> >
> > [EXPLAIN (ANALYZE, BUFFERS) output]
>
> Its usually helpful to give some details on what 'slow' means and then define 
> what
> 'fast enough ' means. Just so we know when to stop optimizing.
> You should also help by giving the DDL of those.tables and any indexes they 
> have etc..

Table definitions and the actual query text would be most helpful, yes.

The PostgreSQL version is essential too.

> The plan shows me that there are expensive seq scans on 2 tables : 
> conversations and tasks.

Those sequential scans are quite cheap; there is nothing to optimize.
The time is spent in two index scans that are repeated a lot, because they
are on the inner side of a nested loop:

- the index scan on "messages" is repeated 53708 times
- the index scan on "mailMessages" is repeated 173750 times


Yours,
Laurenz Albe




Re: Reading execution plan - first row time vs last row time

2024-10-01 Thread Adrian Klaver

On 10/1/24 06:53, Pecsök Ján wrote:

Dear community,

We see significant difference in explain analyze Actual time in the 
first line of execution plan and Execution time in the last line of 
execution plan. What can be the reason?


For example, first line of execution plan:

Gather  (cost=1038.49..257627100.42 rows=794326 width=104) (actual 
time=531.925..3673159.806 rows=7943221053 loops=1)


This is 3673159/1000/60 = 61 minutes

And last 2 lines of execution plan:

Planning Time: 3000.399 ms

Execution Time: 21208130.602 ms

This is : 21208130/1000/60= 353 minutes

What happened in 353 – 61 = 292 minutes? We speculate that disk writes. 
Query possibly written cca 0,75TB of data, by create table as select query.


You will need to show the complete EXPLAIN ANALYZE output as well as query.




S pozdravem

*Ján Pecsők*

Tychonova 2, 160 00 Praha
Mobil: +420 775 942 871
jan.pec...@profinit.eu 
www.profinit.eu 





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Reading execution plan - first row time vs last row time

2024-10-01 Thread Greg Sabino Mullane
On Tue, Oct 1, 2024 at 9:53 AM Pecsök Ján  wrote:

> We see significant difference in explain analyze Actual time in the first
> line of execution plan and Execution time in the last line of execution
> plan. What can be the reason?
>
>
>
> For example, first line of execution plan:
>
> Gather  (cost=1038.49..257627100.42 rows=794326 width=104) (actual
> time=531.925..3673159.806 rows=7943221053 loops=1)
>

It takes an hour for Postgres to gather up 7.9 BILLION (!) rows, but then
it is going to take additional time to do something with all of those rows.
That's where the rest of your time is going. Seeing your actual query would
be a help. If this is a CTAS (Create Table As Select), then you have an
hour for the 'Select' part, and the rest of your time is in the 'Create
Table' part.

Cheers,
Greg


Re: Suggestion for memory parameters

2024-10-01 Thread Greg Sabino Mullane
On Tue, Oct 1, 2024 at 2:52 AM yudhi s  wrote:

> When I execute the query with explain (analyze, buffers),I see the section
> below in the plan having "sort method" information in three places
> each showing ~75MB size, which if combined is coming <250MB. So , does that
> mean it's enough to set the work_mem as ~250MB for these queries before
> they start?
>

work_mem is set per action, so you don't need to usually combine them.
However, these are parallel workers, so you probably need to account for
the case in which no workers are available, in which case you DO want to
combine the values - but only for parallel workers all doing the same
action.


>  But yes somehow this query is finished in a few seconds when i execute
> using explain(analyze,buffers) while if i run it without using explain it
> runs for ~10minutes+. My expectation was that doing (explain analyze)
> should actually execute the query fully. Is my understanding correct here
> and if the disk spilling stats which I am seeing is accurate enough to go
> with?
>

Running explain analyze does indeed run the actual query, but it also
throws away the output. It looks like your limit is set to 300,000 rows
(why!??), which could account for some or all of the time taken - to pass
back those rows and for your client to process them. But it's hard to say
if that's the total reason for the difference without more data. It might
help to see the query, but as a rule of thumb, don't use SELECT * and keep
your LIMIT sane - only pull back the columns and rows your application
absolutely needs.

Cheers,
Greg


Re: Help with query optimizer

2024-10-01 Thread David Mullineux
Hi,
Its usually helpful to give some details on what 'slow' means and then
define what 'fast enough ' means. Just so we know when to stop optimizing.
You should also help by giving the DDL of those.tables and any indexes they
have etc..
The plan shows me that there are expensive seq scans on 2 tables :
conversations and tasks.

So it would be good to get the pg_stats items for the columns used on those
filters on those tables.

I would then look at what indexes you have on those  tables to find out if
some are not being used and why.( lots of reasons why ).

thanks

On Tue, 1 Oct 2024, 08:53 Siraj G,  wrote:

> Hello Experts!
>
> Can you please help with the attached query as it is taking time to
> execute. The execution plan for the same is below:
> Thank you!
>
> Nested Loop Left Join  (cost=8176.99..168603.11 rows=1 width=294) (actual
> time=3301.886..3303.715 rows=0 loops=1)
>Join Filter: (conversation.id = "jobReminders"."conversationId")
>Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>I/O Timings: shared read=7224.068 write=0.287
>->  Gather  (cost=8176.99..168472.90 rows=1 width=246) (actual
> time=3301.885..3303.711 rows=0 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>  I/O Timings: shared read=7224.068 write=0.287
>  ->  Nested Loop  (cost=7176.99..167472.80 rows=1 width=246)
> (actual time=3286.866..3286.874 rows=0 loops=3)
>Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>I/O Timings: shared read=7224.068 write=0.287
>->  Nested Loop  (cost=7176.56..61685.83 rows=59320
> width=174) (actual time=68.447..1978.593 rows=57917 loops=3)
>  Buffers: shared hit=284214 read=56135 dirtied=11
> written=9
>  I/O Timings: shared read=4546.889 write=0.242
>  ->  Hash Join  (cost=7176.13..20450.20 rows=15571
> width=142) (actual time=68.196..211.016 rows=17903 loops=3)
>Hash Cond: (conversation."mailBoxId" =
> "mailBox".id)
>Buffers: shared hit=7428 read=11254 dirtied=9
> written=3
>I/O Timings: shared read=85.214 write=0.085
>->  Parallel Hash Join  (cost=6944.87..20143.53
> rows=26942 width=118) (actual time=67.736..197.476 rows=21937 loops=3)
>  Hash Cond: (conversation."taskId" =
> task.id)
>  Buffers: shared hit=6678 read=11254
> dirtied=9 written=3
>  I/O Timings: shared read=85.214
> write=0.085
>  ->  Parallel Seq Scan on conversations
> conversation  (cost=0.00..13029.34 rows=64502 width=102) (actual
> time=0.017..79.191 rows=51677 loops=3)
>Filter: ((NOT "isCompose") AND
> ("parentConversationId" IS NULL) AND (id <>
> '559702f9-55a8-47c7-9b5c-93b29baabf3d'::uuid))
>Rows Removed by Filter: 622
>Buffers: shared hit=3605 read=8608
> dirtied=5 written=3
>I/O Timings: shared read=57.862
> write=0.085
>  ->  Parallel Hash  (cost=6583.67..6583.67
> rows=28896 width=16) (actual time=67.046..67.047 rows=22266 loops=3)
>Buckets: 131072  Batches: 1  Memory
> Usage: 4224kB
>Buffers: shared hit=3073 read=2646
> dirtied=4
>I/O Timings: shared read=27.352
>->  Parallel Seq Scan on tasks task
>  (cost=0.00..6583.67 rows=28896 width=16) (actual time=0.015..48.469
> rows=22266 loops=3)
>  Filter: (("deletedAt" IS
> NULL) AND (type = 'MESSAGE'::enum_tasks_type))
>  Rows Removed by Filter: 32166
>  Buffers: shared hit=3073
> read=2646 dirtied=4
>  I/O Timings: shared
> read=27.352
>->  Hash  (cost=230.80..230.80 rows=37
> width=24) (actual time=0.382..0.383 rows=37 loops=3)
>  Buckets: 1024  Batches: 1  Memory Usage:
> 11kB
>  Buffers: shared hit=690
>  ->  Seq Scan on "mailBoxes" "mailBox"
>  (cost=0.00..230.80 rows=37 width=24) (actual time=0.021..0.367 rows=37
> loops=3)
>Filter: (status = ANY
> ('{ACTIVE,SYNCING}'::"enum_mailBoxes_status"[]))
>Rows Removed by Filter: 27
>Buffers: shared hit=690
>  ->  Index Scan using idx_message_fk_conversation_i