Postgres NOT IN vs NOT EXISTS optimization

2022-06-14 Thread Dirschel, Steve
We are in the process of migrating from Oracle to Postgres and the following 
query does much less work with Oracle vs Postgres.

explain (analyze, buffers)
select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as 
changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_,
   favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as 
is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as 
position7_2_,
   favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as 
product_9_2_,
   favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as 
usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
  from cf0.FAVORITE_GROUP favoritegr0_
 where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
   and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
   and (favoritegr0_.FAVORITE_GROUP_SID not in
   (select favoriteen1_.FAVORITE_GROUP_SID
  from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
 cross join cf0.CATEGORY_PAGE categorypa2_
 where 
favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
   and categorypa2_.UNIQUE_NAME='Florida'
   and categorypa2_.IS_DELETED=0
   and favoriteen1_.IS_DELETED=0))
   and favoritegr0_.IS_DELETED=0
   and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null)
   and favoritegr0_.PRISM_GUID='ia7448342012ca23eacf87bb0ed56'
order by favoritegr0_.POSITION desc;

Here is the plan in Postgres.  It did 1426 shared block hits.  If you look at 
this plan it is not pushing filtering into the NOT IN subquery-  it is fully 
resolving that part of the query driving off where UNIQUE_NAME = 'Florida'.


 QUERY PLAN
-
Sort  (cost=5198.22..5198.22 rows=1 width=144) (actual time=6.559..6.560 rows=1 
loops=1)
   Sort Key: favoritegr0_."position" DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=1426
   ->  Index Scan using favorite_group_idx01 on favorite_group favoritegr0_  
(cost=5190.18..5198.21 rows=1 width=144) (actual time=6.514..6.515 rows=1 
loops=1)
 Index Cond: (((prism_guid)::text = 
'ia7448342012ca23eacf87bb0ed56'::text) AND (is_deleted = 0))
 Filter: ((NOT (hashed SubPlan 1)) AND ((usage_type = 0) OR (usage_type 
IS NULL)) AND ('FORMS.WESTLAW'::text = (product_sid)::text) AND 
((product_view)::text = 'DefaultProductView'::text))
 Buffers: shared hit=1423
 SubPlan 1
   ->  Nested Loop  (cost=0.70..5189.90 rows=1 width=33) (actual 
time=6.459..6.459 rows=0 loops=1)
 Buffers: shared hit=1417
 ->  Index Scan using category_page_idx04 on category_page 
categorypa2_  (cost=0.42..5131.71 rows=7 width=33) (actual time=0.035..6.138 
rows=92 loops=1)
   Index Cond: ((unique_name)::text = 'Florida'::text)
   Filter: (is_deleted = 0)
   Buffers: shared hit=1233
 ->  Index Scan using favorite_group_member_idx03 on 
favorite_group_member favoriteen1_  (cost=0.28..8.30 rows=1 width=66) (actual 
time=0.003..0.003 rows=0 loops=92)
   Index Cond: ((category_page_sid)::text = 
(categorypa2_.category_page_sid)::text)
   Filter: (is_deleted = 0)
   Buffers: shared hit=184
Planning Time: 1.624 ms
Execution Time: 6.697 ms

If I compare that to the plan Oracle uses it pushes the 
favoritegr0_.FAVORITE_GROUP_SID predicate into the NOT IN.  I'm able to get a 
similar plan with Postgres if I change the NOT IN to a NOT EXISTS:

explain (analyze, buffers)
select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as 
changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_,
   favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as 
is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as 
position7_2_,
   favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as 
product_9_2_,
   favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as 
usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
  from cf0.FAVORITE_GROUP favoritegr0_
 where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
   and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
   and not exists (
   select 'x'
  from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
 cross join cf0.CATEGORY_PAGE categorypa2_
 where favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
   and categorypa2_.UNIQUE_NAME='Florida'
   and categorypa2_.IS_DELETED=0
   and favoriteen1_.IS_DELETED=0
   and favoritegr0_

Tuning a query with ORDER BY and LIMIT

2022-06-22 Thread Dirschel, Steve
I am fairly new to tuning Postgres queries.  I have a long background tuning 
Oracle queries.

Posrgres version 10.11

Here is the DDL for the index the query is using:

create index workflow_execution_initial_ui_tabs
on workflow_execution (workflow_id asc, status asc, result asc, 
completed_datetime desc);


explain (analyze, verbose, costs, buffers, timing, summary, hashes)
select * from workflow_execution
where workflow_id = 14560 and
  status = 'COMPLETED' and
  result in 
('SUCCEEDED','REEXECUTED','ABORTED','DISCONTINUED','FAILED','PARTIAL_SUCCESS')
order by completed_datetime desc limit 50;

--
Limit  (cost=56394.91..56395.04 rows=50 width=1676) (actual 
time=3400.608..3400.622 rows=50 loops=1)
"  Output: execution_id, state_machine_id, workflow_id, started_datetime, 
completed_datetime, status, execution_context_s3_arn, ol_version, 
created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, 
correlation_id, result, state_machine_execution_arn, created_by_id, 
updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, 
acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, 
notification_trigger_workflow_id, acquired_object_name, subscription_guid"
  Buffers: shared hit=142368
  ->  Sort  (cost=56394.91..56432.71 rows=15118 width=1676) (actual 
time=3400.607..3400.615 rows=50 loops=1)
"Output: execution_id, state_machine_id, workflow_id, started_datetime, 
completed_datetime, status, execution_context_s3_arn, ol_version, 
created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, 
correlation_id, result, state_machine_execution_arn, created_by_id, 
updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, 
acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, 
notification_trigger_workflow_id, acquired_object_name, subscription_guid"
Sort Key: workflow_execution.completed_datetime DESC
Sort Method: top-N heapsort  Memory: 125kB
Buffers: shared hit=142368
->  Index Scan using workflow_execution_initial_ui_tabs on 
workflow.workflow_execution  (cost=0.69..55892.70 rows=15118 width=1676) 
(actual time=0.038..2258.579 rows=2634718 loops=1)
"  Output: execution_id, state_machine_id, workflow_id, 
started_datetime, completed_datetime, status, execution_context_s3_arn, 
ol_version, created_datetime, updated_datetime, deleted_millis, 
acquisition_channel_id, correlation_id, result, state_machine_execution_arn, 
created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, 
trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, 
scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, 
subscription_guid"
"  Index Cond: ((workflow_execution.workflow_id = 14560) AND 
((workflow_execution.status)::text = 'COMPLETED'::text) AND 
((workflow_execution.result)::text = ANY 
('{SUCCEEDED,REEXECUTED,ABORTED,DISCONTINUED,FAILED,PARTIAL_SUCCESS}'::text[])))"
  Buffers: shared hit=142368
Planning time: 0.217 ms
Execution time: 3400.656 ms

With Oracle for a query like this since the index is on the 3 columns matching 
the WHERE clause and the ORDER BY clause is in the 4th position Oracle would be 
able to scan that index and as soon as it finds the first matching 50 rows.  
But as you can see above Postgres is finding 2,634,718 matching rows for the 
WHERE clause , sorts them, and then returns the first 50 rows.

I was questioning if the result IN clause was causing the issue so I ran the 
query with result = and see the same results:

explain (analyze, verbose, costs, buffers, timing, summary, hashes)
select * from workflow_execution
where workflow_id = 14560 and
  status = 'COMPLETED' and
  result = 'SUCCEEDED'
order by completed_datetime desc limit 50;

Limit  (cost=54268.09..54268.22 rows=50 width=1676) (actual 
time=3372.453..3372.467 rows=50 loops=1)
"  Output: execution_id, state_machine_id, workflow_id, started_datetime, 
completed_datetime, status, execution_context_s3_arn, ol_version, 
created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, 
correlation_id, result, state_machine_execution_arn, created_by_id, 
updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, 
acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, 
notification_trigger_workflow_id, acquired_object_name, subscription_guid"
  Buffers: shared hit=140313
  ->  Sort  (cost=54268.09..54304.46 rows=14547 width=1676) (actual 
time=3372.452..3372.460 rows=50 loops=1)
"Output: execution_id, state_machine_id, workflow_id, started_datetime, 
completed_datetime, status, execution_context_s3_arn, ol_version, 
created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, 
correlation_id, result, state_machine_execution_arn, created_by_id, 
updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, 
acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_

log_min_messages = warning

2022-09-06 Thread Dirschel, Steve
We recently upgraded from postgres 12.8 to 14.3.  We are running Aurora 
Postgres on AWS.

We have procedures that will make calls to RAISE NOTICE to write out messages 
if you interactively call the procedure through psql.  These procedures are 
getting called by pg_cron.

Since upgrading these RAISE NOTICE messages are getting written to the postgres 
log file:

For example:

13:45:00.720882-05
CONTEXT: PL/pgSQL function perf.snap_stats() line 242 at RAISE
2022-09-06 18:45:01 UTC::@:[21535]:LOG: cron job 2: NOTICE: Snapping 
dba_hist_system_event at 2022-09-06 13:45:00.725818-05

Log_min_messages is set to warning so my understanding is that should only log 
messages at warning, error, log, fatal, or panic.  Any idea how to troubleshoot 
why these are getting written to the log file?

Thanks
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


RE: [EXT] Re: log_min_messages = warning

2022-09-06 Thread Dirschel, Steve
>On 9/6/22 12:07, Dirschel, Steve wrote:
>> We recently upgraded from postgres 12.8 to 14.3.  We are running 
>> Aurora Postgres on AWS.
>> 
>> We have procedures that will make calls to RAISE NOTICE to write out 
>> messages if you interactively call the procedure through psql.  These 
>> procedures are getting called by pg_cron.
>> 
>> Since upgrading these RAISE NOTICE messages are getting written to the 
>> postgres log file:
>> 
>> For example:
>> 
>> 13:45:00.720882-05
>> 
>> CONTEXT: PL/pgSQL function perf.snap_stats() line 242 at RAISE
>> 
>> 2022-09-06 18:45:01 UTC::@:[21535]:LOG: cron job 2: NOTICE: Snapping 
>> dba_hist_system_event at 2022-09-06 13:45:00.725818-05
>> 
>> Log_min_messages is set to warning so my understanding is that should 
>> only log messages at warning, error, log, fatal, or panic.  Any idea 
>> how to troubleshoot why these are getting written to the log file?
>
>
>1) Has log_min_messages been commented out?
>
>2) Was the setting changed from something else?
>
>3) If 2) was the server reloaded/restarted to catch the change?
>
>4) Are you sure you are looking at the correct setting?
>
>Try:
>
>select setting, source, sourcefile, pending_restart from pg_settings where 
>name = 'log_min_messages';
>
>to see if the value is coming from something like an include file.
>
>> 
>> Thanks
>> 
>>This e-mail is for the sole use of the intended recipient and contains 
>> information that may be privileged and/or confidential. If you are not 
>> an intended recipient, please notify the sender by return e-mail and 
>> delete this e-mail and any attachments. Certain required legal entity 
>> disclosures can be accessed on our website:
>> https://www.thomsonreuters.com/en/resources/disclosures.html
>>
>
>-- 
>Adrian Klaver
>adrian.kla...@aklaver.com


shgroup02s=> select setting, source, sourcefile, pending_restart from 
pg_settings where name = 'log_min_messages';
 setting | source  | sourcefile | pending_restart
-+-++-
 warning | default || f
(1 row)

Regards
Steve


RE: [EXT] Re: log_min_messages = warning

2022-09-06 Thread Dirschel, Steve
 We recently upgraded from postgres 12.8 to 14.3.  We are running 
 Aurora Postgres on AWS.

>Aurora Postgres != Postgres.  This doesn't seem like something they'd change, 
>but we can't rule that out entirely.

>>> Try:
>>> select setting, source, sourcefile, pending_restart from pg_settings 
>>> where name = 'log_min_messages';

>> shgroup02s=> select setting, source, sourcefile, pending_restart from 
>> pg_settings where name = 'log_min_messages';  setting | source  | 
>> sourcefile | pending_restart
>> -+-++-
>>  warning | default || f
>> (1 row)

>OK, so log_min_messages has the expected value in your interactive session, 
>and yet evidently not in the sessions running these RAISE NOTICE commands.  
>Maybe you have done ALTER USER SET or ALTER DATABASE SET commands that'd 
>change it in >those sessions?  You could look in the pg_db_role_setting 
>catalog to answer that definitively.

>   regards, tom lane

We have opened a case with AWS on this issue as well since it is Aurora but I 
decided to reach out to the community as well.  

We have pg_cron (which is kicking off the procedures that are executing the 
code that is causing these NOTICE messages to get written to the log file) 
setup with cron.database_name = shgroup02s.  That database is oid 16436.  

shgroup02s=> select * from pg_db_role_setting ;
 setdatabase | setrole |

  setconfig

-+-+--

   16401 |   0 | {auto_explain.log_min_duration=-1}
   16401 |  10 | {log_min_messages=panic}
   16436 |   0 | {TimeZone=America/Chicago}
   0 |  10 | 
{TimeZone=utc,log_statement=all,log_min_error_statement=debug5,log_min_messages=panic,exit_on_error=0,statement_timeout=0,role=rdsadmin,auto_explain.log_min_duration=-1,temp_file_limit=
-1,"search_path=pg_catalog, 
public",pg_hint_plan.enable_hint=off,default_transaction_read_only=off}
(4 rows)

If I login to the shgroup02s database as postgres user (using psql) and 
interactively call a procedure that cron calls which causes the RAISE NOTICE 
commands to be written to the log they do NOT get written to the log when I 
call the procedure.  The messages will be displayed on my screen but I don't 
see them getting written to the log.   I see cron has a setting-  
cron.log_min_messages which is set to warning so it doesn't seem to come from 
that.

Regards
Steve


RE: [EXT] Re: log_min_messages = warning

2022-09-07 Thread Dirschel, Steve
> "Dirschel, Steve"  writes:
>>  setdatabase | setrole | 
>>  
>> setconfig

>> -+-+--
>> -+-+--
>> -+-+--
>> -+-+--
>> -+-+--
>> 
>>16401 |   0 | {auto_explain.log_min_duration=-1}
>>16401 |  10 | {log_min_messages=panic}
>>16436 |   0 | {TimeZone=America/Chicago}
>>0 |  10 | 
>> {TimeZone=utc,log_statement=all,log_min_error_statement=debug5,log_min_messages=panic,exit_on_error=0,statement_timeout=0,role=rdsadmin,auto_explain.log_min_duration=-1,temp_file_limit=
>> -1,"search_path=pg_catalog, 
>> public",pg_hint_plan.enable_hint=off,default_transaction_read_only=off
>> }
>> (4 rows)

>> If I login to the shgroup02s database as postgres user (using psql) 
>> and interactively call a procedure that cron calls which causes the 
>> RAISE NOTICE commands to be written to the log they do NOT get written 
>> to the log when I call the procedure.  The messages will be displayed 
>> on my screen but I don't see them getting written to the log.

>You've evidently done "ALTER USER postgres SET log_min_messages=panic", so the 
>lack of any messages under that userid is unsurprising.  But we're not a lot 
>closer to being sure why it's different for the procedures' normal execution 
>environment.

>At this point it's hard to avoid the conclusion that those procedures'
>session is changing the value itself.  (This is scary, because it implies that 
>you're running those as superuser, which seems like a bad idea.) You might 
>have to enable log_statement = all to verify that.

>   regards, tom lane

I appreciate your feedback Tom.  To simplify this I created this procedure:

create or replace procedure part.test1()
 LANGUAGE plpgsql
 AS $$
 DECLARE

BEGIN

raise notice '* raise notice test *';

END $$;

If I call that through PSQL this is returned to my screen:

shgroup02s=> call part.test1();
NOTICE:  * raise notice test *
CALL

And if I monitor the log file nothing is written to it.  But if I schedule that 
procedure through pg_cron:

SELECT cron.schedule('Minute test', '* * * * *', 'call part.test1()');

If I monitor the log file I see it writing this to the log-  it actually writes 
it out 2 times:

2022-09-07 12:54:33 UTC::@:[21535]:LOG: cron job 6: NOTICE: * raise notice 
test *
CONTEXT: PL/pgSQL function part.test1() line 6 at RAISE
2022-09-07 12:54:33 UTC::@:[21535]:LOG: cron job 6: NOTICE: * raise notice 
test *
CONTEXT: PL/pgSQL function part.test1() line 6 at RAISE

If I create another test procedure:

create or replace procedure part.test2()
 LANGUAGE plpgsql
 AS $$
 DECLARE

BEGIN

raise exception '* raise ERROR test *';

END $$;

When I execute that through PSQL this is returned:

shgroup02s=> call part.test2();
ERROR:  * raise ERROR test *

And in the log file I see this written:

2022-09-07 12:58:00 
UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:ERROR: * raise ERROR 
test *
2022-09-07 12:58:00 
UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:CONTEXT: PL/pgSQL 
function part.test2() line 6 at RAISE
2022-09-07 12:58:00 
UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:STATEMENT: call 
part.test2();

So the raise exception gets written to the log file when I execute it through 
PSQL which I believe is expected.

I believe this clearly points to an issue with pg_cron.  Would you agree?

Regards
Steve




pg_stat_activity.backend_xmin

2022-09-21 Thread Dirschel, Steve
We are troubleshooting an issue where autovacuum is not cleaning up a table.  
The application using this database runs with autocommit turned off.  We can 
see in pg_stat_activity lots of sessions "idle in transaction" even though 
those sessions have not executed any DML-  they have executed selects but no 
DML.  The database's isolation level is set to read committed.

In a test database if I login through psql and set autocommit off and issue a 
select I can see my session in pg_stat_activity has xact_start populated but 
backend_xmin is null.  If I run vacuum against the table I selected from (that 
has garbage rows that need to be cleaned up) it will clean them up.  But if I 
do a "set transaction isolation level repeatable read" and then do the select 
pg_stat_activity xact_start is populated and backend_xmin is also populated.  
In a different session if I delete/insert into the table I selected from and 
then run vacuum against the table those rows will not get cleaned up because 
the xmin is of the rows that need to get cleaned up are higher (or is it lower) 
than the backend_xmin of my select session.

That scenario is the scenario we are seeing through the application.

According to the app team they are not aware of their code changing the 
isolation level to repeatable read.  Are there other scenarios where the 
transaction isolation is set to read committed, the session has autocommit off, 
and a select will populate backend_xmin in pg_stat_activity for the session 
that issued the select?

In session #1:

\set AUTOCOMMIT off
Set transaction isolation level repeatable read;
Select * from test1;


In session #2:
Delete from test1;
Insert into test1 values (1);
... do 10 more inserts, delete the rows, do 10 more inserts, delete the rows.
...

vacuum(verbose) test1;
INFO:  vacuuming "public.test1"
INFO:  "test1": found 0 removable, 55 nonremovable row versions in 1 out of 1 
pages
DETAIL:  44 dead row versions cannot be removed yet, oldest xmin: 32708199
There were 172 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_329295"
INFO:  "pg_toast_329295": found 0 removable, 0 nonremovable row versions in 0 
out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 32708199
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

The xmin value of 32708199 is the value in pg_stat_activity.backend_xmin for 
the session who issued the select.  Once I issue a commit in that session and 
then try autovacuum again:

vacuum(verbose) test1;
INFO:  vacuuming "public.test1"
INFO:  "test1": removed 44 row versions in 1 pages
INFO:  "test1": found 44 removable, 11 nonremovable row versions in 1 out of 1 
pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 32708957
There were 172 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_329295"
INFO:  "pg_toast_329295": found 0 removable, 0 nonremovable row versions in 0 
out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 32708957
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM


Thanks



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


RE: [EXT] Re: pg_stat_activity.backend_xmin

2022-09-21 Thread Dirschel, Steve
On Wed, 2022-09-21 at 14:11 +, Dirschel, Steve wrote:
>> We are troubleshooting an issue where autovacuum is not cleaning up a table.
>> The application using this database runs with autocommit turned off.
>> We can see in pg_stat_activity lots of sessions “idle in transaction” 
>> even though those sessions have not executed any DML-  they have 
>> executed selects but no DML.  The database’s isolation level is set to read 
>> committed.

> "backend_xmin" is set when the session has an active snapshot.  Such a 
> snapshot is held for the whole duration of a transaction in the REPEATABLE 
> READ isolation level, but there are cases where you can see that in READ 
> COMMITTED isolation level > as well:
>
> - if there is a long running query
>
> - if there is a cursor open
>
> Perhaps you could ask your developers if they have long running read-only 
> transactions with cursors.
>
> Yours,
> Laurenz Albe

Thanks for the reply Laurenz.  For an application session in this "state" 
pg_stat_activity shows the state of "idle in transaction" and backend_xmin is 
populated.  The query shows the last select query it ran.  It is not currently 
executing a query.  And dev has said they are not using a cursor for the query. 
 So it does not appear they have long running read-only transactions with 
cursors.

Outside that scenario can you think of any others where a session:
1. Login to the database
2. Set autocommit off
3. Run select query, query completes, session does nothing after that query 
completes.
4.  transaction isolation level is read committed

That session sitting there idle in transaction has backend_xmin populated.  
When I run that test backend_xmin does not get populated unless I set my 
transaction isolation level to repeatable read.  We have enabled statement 
logging so we can see if their sessions are changing that transaction isolation 
level behind the scenes that they are not aware of but so far we have not seen 
that type of command logged.

Regards
Steve


RE: [EXT] pg_stat_activity.backend_xmin

2022-09-21 Thread Dirschel, Steve
>> On Sep 21, 2022, at 9:32 AM, Dirschel, Steve 
>>  wrote:
>> 
>> On Wed, 2022-09-21 at 14:11 +, Dirschel, Steve wrote:
>>>> We are troubleshooting an issue where autovacuum is not cleaning up a 
>>>> table.
>>>> The application using this database runs with autocommit turned off.
>>>> We can see in pg_stat_activity lots of sessions “idle in transaction” 
>>>> even though those sessions have not executed any DML-  they have 
>>>> executed selects but no DML.  The database’s isolation level is set to 
>>>> read committed.
>> 
>>> "backend_xmin" is set when the session has an active snapshot.  Such a 
>>> snapshot is held for the whole duration of a transaction in the REPEATABLE 
>>> READ isolation level, but there are cases where you can see that in READ 
>>> COMMITTED isolation level > as well:
>>> 
>>> - if there is a long running query
>>> 
>>> - if there is a cursor open
>>> 
>>> Perhaps you could ask your developers if they have long running read-only 
>>> transactions with cursors.
>>> 
>>> Yours,
>>> Laurenz Albe
>> 
>> Thanks for the reply Laurenz.  For an application session in this "state" 
>> pg_stat_activity shows the state of "idle in transaction" and backend_xmin 
>> is populated.  The query shows the last select query it ran.  It is not 
>> currently executing a query.  And dev has said they are not using a cursor 
>> for the query.  So it does not appear they have long running read-only 
>> transactions with cursors.
>> 
>> Outside that scenario can you think of any others where a session:
>> 1. Login to the database
>> 2. Set autocommit off
>> 3. Run select query, query completes, session does nothing after that query 
>> completes.
>> 4.  transaction isolation level is read committed
>> 
>> That session sitting there idle in transaction has backend_xmin populated.  
>> When I run that test backend_xmin does not get populated unless I set my 
>> transaction isolation level to repeatable read.  We have enabled statement 
>> logging so we can see if their sessions are changing that transaction 
>> isolation level behind the scenes that they are not aware of but so far we 
>> have not seen that type of command logged.
>> 
>> Regards
>> Steve
>
>What stack is the application using?  Anything like Spring or Hibernate 
>involved?

Java is the stack.  Thanks


Interpreting postgres execution plan along with AND/OR precedence

2023-01-18 Thread Dirschel, Steve
Table definition:

  Table "load.lm_queue"
 Column |  Type  | Collation | Nullable 
| Default
++---+--+-
guid   | character(33)  |   | not null |
host_name  | character varying(40)  |   |  |
priority   | numeric(11,0)  |   | not null |
request_time   | timestamp(6) without time zone |   | not null |
collection_name| character varying(40)  |   | not null |
stage_id   | numeric(11,0)  |   |  |
source_file| character varying(250) |   |  |
lm_id  | numeric(11,0)  |   |  |
start_time | timestamp(6) without time zone |   |  |
status_text| character varying(225) |   | not null 
| NULL::character varying
load_data_id   | character varying(500) |   |  |
docs_in_load   | numeric(11,0)  |   |  |
client_name| character varying(50)  |   |  |
status_code| numeric(11,0)  |   |  |
email_address  | character varying(2000)|   |  |
hold_flag  | character(1)   |   |  |
process_type   | character varying(40)  |   |  |
cancel_flag| character(1)   |   |  |
file_type  | character varying(6)   |   |  |
lm_data| character varying(4000)|   |  |
ds_request_time| timestamp(6) without time zone |   |  |
ds_id  | numeric(11,0)  |   |  |
ds_start_time  | timestamp(6) without time zone |   |  |
auto_promote_flag  | character(1)   |   |  |
extract_out_file   | character varying(250) |   |  |
last_upd_time  | timestamp(6) without time zone |   |  |
ds_fail_count  | numeric(11,0)  |   |  |
cc_collection  | character varying(40)  |   |  |
cc_environment | character varying(40)  |   |  |
cc_fail_on_db_mismatch | character(1)   |   |  |
cc_tracking_guid   | character varying(33)  |   |  |
cc_numrows | character varying(50)  |   |  |
cc_owner   | character varying(30)  |   |  |
cc_password| character varying(30)  |   |  |
parent_guid| character varying(33)  |   |  |
Indexes:
"xpklm_queue" PRIMARY KEY, btree (guid)
"idx_hn_cn_dsid_sc_dst_dfc" btree (host_name, collection_name, ds_id, 
status_code, ds_start_time, ds_fail_count)
"ix_lm_cc" btree (collection_name, client_name)
"ix_lm_chl" btree (client_name, host_name, lm_id)


Query and plan:

explain
SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, 
PROCESS_TYPE, PARENT_GUID
FROM LOAD.LM_QUEUE lmq1
WHERE CLIENT_NAME='WLCASES'
AND HOLD_FLAG='Y'
AND HOST_NAME='WLLOADB'
AND STATUS_CODE in (1)
AND NOT EXISTS
(SELECT COLLECTION_NAME
   FROM LOAD.LM_QUEUE lmq2
  WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME
AND LM_ID <> 0
AND PROCESS_TYPE NOT IN('EXTRACT'))
OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')
   AND LM_ID = 0
   AND CLIENT_NAME='WLCASES'
   AND HOST_NAME= 'WLLOADB' )
ORDER BY PRIORITY DESC, REQUEST_TIME ASC;


Sort  (cost=1578.99..1579.00 rows=1 width=120)
   Sort Key: lmq1.priority DESC, lmq1.request_time
   ->  Index Scan using ix_lm_chl on lm_queue lmq1  (cost=0.40..1578.98 rows=1 
width=120)
 Index Cond: (((client_name)::text = 'WLCASES'::text) AND 
((host_name)::text = 'WLLOADB'::text))
 Filter: (((hold_flag = 'Y'::bpchar) AND (status_code = '1'::numeric) 
AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2))) OR 
(((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = 
'0'::numeric)))
 SubPlan 1
   ->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 
rows=1 width=0)
 Index Cond: ((collection_name)::text = 
(lmq1.collection_name)::text)
 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 
'EXTRACT'::text))
 SubPlan 2
   ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 
width=32)
 Filter: ((lm_id <> '0'::numer

DBeaver session populating pg_stat_activity.backend_xmin

2023-01-25 Thread Dirschel, Steve
Not sure if anyone on this list may have ideas on this or not.  I will also try 
and find a DBeaver e-mail list and send to that also.

We are running into a problem where users will connect to a database through 
DBeaver.  I have found if I leave all DBeaver options as default this issue 
does not happen.  But I have found if I edit my connection under "Connection 
settings", "Initialization"  and if I uncheck Auto-commit and then under 
"Connection settings", "Metadata" if I unselect "Open separate connection for 
metadata read" I can duplicate the issue.

When I connect to the database through DBeaver with those 2 default settings 
changed and find that session in pg_stat_activity column xact_start is 
populated along with backend_xmin.  Those get populated just by logging in.  I 
don't know if changing those 2 default settings is the only way to get the 
session to get like this but it does mimic the problem.  The problem is users 
will connect using DBeaver and their sessions will sit idle.  We have a table 
with high update activity and I will start seeing queries hitting this table do 
more and more logical reads because autovacuum is not able to cleanup records 
newer than the backend_xmin for those session.  We have told users not to leave 
their sessions sitting idle like that and are also considering setting 
idle_in_transaction_session_timeout to kill sessions that sit like this.

But I am trying to understand what DBeaver is doing to get a session into that 
state.

I've set log_statement to ALL and did a login through DBeaver with the 2 
settings changed to see what all it is executing.  I then changed the DBeaver 
settings back to default so I could see what it executes upon connect without 
having xact_start and backend_xmin populated.

It executes the exact same commands except when the 2 default DBeaver settings 
are changed to show the issue it is issuing a BEGIN and COMMIT around one block 
of code and then at another part of code it issues a BEGIN, runs some queries, 
and never issues a COMMIT.  The log info is below with 
<=== next to the BEGIN/COMMIT's that happen 
with the non-default settings.

I try and duplicate this in psql and cannot.  In psql I do a:

\set AUTOCOMMIT off

I then run the exact same commands as log_statements = ALL shows DBeaver runs 
without a COMMIT at the end of the last statement (those are shown below).  I 
can find that psql session in pg_stat_activity and xact_start is populated but 
backend_xmin is NOT populated.  Under that scenario it would not cause the 
problem of autovacuum not being able to cleanup this heavy update table due to 
a session sitting out there with backend_xmin populated.

Any ideas why I cannot duplicate this in psql if I'm running what appears to be 
the exact same commands DBeaver is?  Is it possible DBeaver is running 
something else that does not get written to the log file when log_statements is 
set to all?  One slight difference in my psql test was a couple of the queries 
below were using bind variables when executed through DBeaver.  I put the 
literal values in the query when I executed through psql.  Is it possible that 
somehow using bind variables with autocommit off, using BEGIN causes 
backend_xmin to get populated?

Thanks


>From log file:

SET extra_float_digits = 3

SET application_name = 'PostgreSQL JDBC Driver'

SET application_name = 'DBeaver 22.0.0 - Main '

BEGIN   
<=

SELECT current_schema(),session_user
SELECT n.oid,n.*,d.description FROM pg_catalog.pg_namespace n
LEFT OUTER JOIN pg_catalog.pg_description d ON d.objoid=n.oid 
AND d.objsubid=0 AND d.classoid='pg_namespace'::regclass
ORDER BY nspname

SELECT n.nspname = ANY(current_schemas(true)), n.nspname, t.typname FROM 
pg_catalog.pg_type t JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid 
WHERE t.oid = '1034'

SELECT typinput='array_in'::regproc as is_array, typtype, typname   FROM 
pg_catalog.pg_type   LEFT JOIN (select ns.oid as nspoid, ns.nspname, r.r
   from pg_namespace as ns   join ( select s.r, 
(current_schemas(false))[s.r] as nspnamefrom 
generate_series(1, array_upper(current_schemas(false), 1)) as s(r) ) as r   
   using ( nspname )) as sp ON sp.nspoid = typnamespace  WHERE 
typname = '_aclitem'  ORDER BY sp.r, pg_type.oid DESC

SHOW search_path

COMMIT  
<

BEGIN   
<

SELECT db.oid,db.* FROM pg_catalog.pg_database db WHERE datname='dbtest'

select * from pg_catalog.pg_settings where name='standard_conforming_strings'

select string_agg(word, ',') from pg_catalog.pg_get_keywords() where word <> 
ALL 
('{a,abs,absolute,action,ada,add,admin,after,all,allocate,alter,always,and,any,are,array,as,asc,as

RE: [EXT] Re: DBeaver session populating pg_stat_activity.backend_xmin

2023-01-25 Thread Dirschel, Steve
>> When I connect to the database through DBeaver with those 2 default 
>> settings changed and find that session in pg_stat_activity column 
>> xact_start is populated along with backend_xmin.  Those get populated 
>> just by logging in.

>As you found out in the log, the driver runs DbEaver run multiple SQL 
>statements during the "log in".
>And the first query will start a transaction, but as autocommit is disabled, 
>nothing will end that transaction.

Yes, I agree with that.  And if I set autocommit off in psql I see the same 
where a select will start a transaction.

>> The problem is users will connect using DBeaver and their sessions 
>> will sit idle.

>Idle is not a problem, "idle in transaction" is.

>From my perspective "idle in transaction" isn't necessarily a problem 
>(although I don't like seeing sessions sitting like that for a long time).  
>The problem is when pg_stat_activity.backend_xmin is populated-  that can 
>prevent autovacuum from cleaning up old records.  Again, if I login to psql, 
>set auto commit off and run a select I see pg_stat_activity.xact_start 
>populated but pg_stat_activity.backend_xmin is NOT populated.  So that 
>transaction from psql would not prevent autovacuum from cleaning up.  But when 
>this happens through DBeaver not only is pg_stat_activity.xact_start populated 
>but pg_stat_activity.backend_xmin is also populated.  My main question is what 
>could DBeaver be doing to get pg_stat_activity.backend_xmin populated?  It 
>doesn't happen when running a test in psql.  So at this point I'm unable to 
>duplicate this issue in psql running the same thing I think DBeaver is 
>running.   Maybe if I asked the question a little differently-  in psql if I 
>set autocommit off and run a select what else do I need to run to get 
>pg_stat_activity.backend_xmin populated through that session?  Is there a 
>certain "type" of select I could run to get it populated?  I know if I insert 
>or update or delete a row it will get populated but I also know DBeaver is not 
>executing an insert/update/delete.

>> It executes the exact same commands except when the 2 default DBeaver 
>> settings are changed to show the issue it is issuing a BEGIN and 
>> COMMIT around one block of code and then at another part of code it 
>> issues a BEGIN, runs some queries, and never issues a COMMIT.

>Yes, that's how turning off autocommit works. As soon as a statement is sent 
>through the JDBC driver, the driver will send a BEGIN to start the 
>transaction, but the the application (or the user) is responsible to end it 
>through a COMMIT (or ROLLBACK).

Thanks





RE: [EXT] DBeaver session populating pg_stat_activity.backend_xmin

2023-01-31 Thread Dirschel, Steve
Hi Christophe,

Thanks for the reply.   I am fairly new to Postgres and based on your reply 
below I am not understanding something.  Please see my test case below.  I can 
show where a session is "idle in transaction" but backend_xmin is NOT populated 
and I show vacuuming a table with 10 dead rows will work fine.  I then show a 
session is "idle in transaction" with backend_xmin populated and vacuuming a 
table with 10 dead rows will not work because of that session having 
backend_xmin populated.  Note in all cases here these sessions have not 
executed any DML.  Based on your reply below in both of these cases the vacuum 
should not have been able to cleanup these 10 dead rows.  What am I missing 
here?  

Regards
Steve

> -Original Message-
> From: Christophe Pettus  
> Sent: Wednesday, January 25, 2023 4:49 PM
> To: Dirschel, Steve 
> Cc: Thomas Kellerer ; pgsql-general@lists.postgresql.org
> Subject: Re: [EXT] DBeaver session populating pg_stat_activity.backend_xmin



> > On Jan 25, 2023, at 14:21, Dirschel, Steve 
> >  wrote:
> > From my perspective "idle in transaction" isn't necessarily a problem 
> > (although I don't like seeing sessions sitting like that for a long time).  
> > The problem is when pg_stat_activity.backend_xmin is populated-  that can 
> > prevent autovacuum from cleaning up old records.

> I don't think that analysis is quite correct.  There's nothing about 
> backend_xmin that blocks tuple cleanup *if the transaction has been 
> committed*.  The problem is that the session is sitting in idle in 
> transaction state, and *that* blocks tuple cleanup.  The fix is to not leave 
> sessions open in idle in transaction state.


Test case:

create table test1(a numeric, b numeric);

***
* TEST #1 *
***

Dbeaver setup:  
Connection setting, Initialization, Auto-commit not checked 
Connection setting, Metadata, Open separate connection for metadata 
read is checked.

Connect to the database through DBeaver.  Here are the 2 sessions:

select pid, application_name, xact_start, backend_xmin, state, query from 
pg_stat_activity where usename = 'postgres' and application_name like 
'DBeaver%';

dbtest=> select pid, application_name, xact_start, backend_xmin, state, query 
from pg_stat_activity where usename = 'postgres' and application_name like 
'DBeaver%';
  pid  |  application_name  | xact_start | backend_xmin | state 
| query
---+++--+---+
 30229 | DBeaver 22.0.0 - Main  ||  | idle  
| SHOW TRANSACTION ISOLATION LEVEL
 30231 | DBeaver 22.0.0 - Metadata  ||  | idle  
| SELECT t.oid,t.*,c.relkind,format_type(nullif(t.typbasetype, 0), t.typtypmod) 
as base_type_name, d.description+
   |||  |   
| FROM pg_catalog.pg_type t 
+
   |||  |   
| LEFT OUTER JOIN pg_catalog.pg_type et ON et.oid=t.typelem 
+
   |||  |   
| LEFT OUTER JOIN pg_catalog.pg_class c ON c.oid=t.typrelid 
+
   |||  |   
| LEFT OUTER JOIN pg_catalog.pg_description d ON t.oid=d.objoid 
+
   |||  |   
| WHERE t.typname IS NOT NULL   
+
   |||  |   
| AND (c.relkind IS NULL OR c.relkind = 'c') AND (et.typcategory IS NULL OR 
et.typcategory <> 'C')

***  Note neither session is in a transaction.  State for both is idle.

Leave the 2 DBeaver sessions as is.  In psql I run this:

dbtest=> do $$
dbtest$> begin
dbtest$>for cnt in 1..10 loop
dbtest$> insert into test1 values (cnt, cnt);
dbtest$> delete from test1;
dbtest$>end loop;
dbtest$> end; $$
dbtest-> ;
DO
dbtest=>
dbtest=> VACUUM (VERBOSE) test1;
INFO:  vacuuming "public.test1"
INFO:  table "test1": removed 10 dead item identifiers in 1 pages
INFO:  table "test1": found 10 removable, 0 nonremovable row versions in 1 out 
of 1 pages   <--- the 10 dea

Why using a partial index is doing slightly more logical I/O than a normal index

2023-05-03 Thread Dirschel, Steve
Table definition:

workflow_db=> \d workflow_execution_test
Table "public.workflow_execution_test"
  Column  |   Type   | Collation | 
Nullable | Default
--+--+---+--+-
execution_id | bigint   |   |   
   |
state_machine_id | bigint   |   |   
   |
workflow_id  | bigint   |   |   
   |
started_datetime | timestamp with time zone |   |   
   |
completed_datetime   | timestamp with time zone |   |   
   |
status   | character varying(50)|   |   
   |
execution_context_s3_arn | character varying(200)   |   |   
   |
ol_version   | integer  |   |   
   |
created_datetime | timestamp with time zone |   |   
   |
updated_datetime | timestamp with time zone |   |   
   |
deleted_millis   | bigint   |   |   
   |
acquisition_channel_id   | bigint   |   |   
   |
correlation_id   | character varying(36)|   |   
   |
result   | character varying(50)|   |   
   |
state_machine_execution_arn  | character varying(200)   |   |   
   |
created_by_id| bigint   |   |   
   |
updated_by_id| bigint   |   |   
   |
acquired_gcs_s3_object   | text |   |   
   |
sqs_trigger_id   | bigint   |   |   
   |
trigger_message  | text |   |   
   |
acquired_gcs_s3_object_uuid  | character varying(36)|   |   
   |
api_trigger_id   | bigint   |   |   
   |
scheduled_trigger_id | bigint   |   |   
   |
notification_trigger_workflow_id | bigint   |   |   
   |
acquired_object_name | text |   |   
   |
subscription_guid| uuid |   |   
   |
processing_class_code| character varying(50)|   |   
   |
root_execution_guid  | uuid |   |   
   |
Indexes:
"test_workflow_execution_active_pending_ordered_by_created_date_" btree 
(workflow_id, status, deleted_millis, created_datetime) WHERE (status::text = 
ANY (ARRAY['ACTION_NEEDED'::character varying, 'NOT_STARTED'::character 
varying, 'PAUSED'::character varying, 'PENDING'::character varying, 
'RUNNING'::character varying]::text[])) AND deleted_millis <= 0
"test_workflow_execution_initial_ui_tabs" btree (workflow_id, status, 
result, completed_datetime DESC NULLS LAST)

I created/populated this table with 22 million rows.  Afterwards I then created 
the 2 indexes.  So those indexes are packed tightly.

As is the optimizer decides to use the partial index.  Below shows it did 33 
logical reads.  This index should be very good for this query-  the leading 3 
columns of the index are on the 3 criteria in the WHERE clause and the partial 
part is only storing rows that match the status and deleted_millis filters.

explain (analyze, buffers)
select * from workflow_execution_test
where workflow_id = 1070
AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')
and deleted_millis <= 0;

 QUERY PLAN

Index Scan using 
test_workflow_execution_active_pending_ordered_by_created_date_ on 
workflow_execution_test  (cost=0.43..15824.82 rows=4335 width=1309) (actual 
time=0.040..0.095 rows=56 loops=1)
   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
  Buffers: shared hit=33
 Planning:
   Buffers: shared hit=2
Planning Time: 0.321 ms
Execution Time: 0.117 ms

If I hint the query to use the other index it does less work-  it does 24 
logical reads vs 33 using the partial index.

/*+ IndexScan(workflow_execution_test test_workflow_execution_initial_ui_tabs) 
*/
explain (analyze, buffers)
select * from workflow_execution_test
where workflow_id = 1070
AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')
and deleted_millis <= 0;

RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index

2023-05-03 Thread Dirschel, Steve
Thanks for the reply Jeff.  Yes-  more of an academic question.  Regarding this 
part:

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY 
('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
   Filter: (deleted_millis <= 0)
   Buffers: shared hit=24

For this usage, the =ANY is applied as an "in-index filter".  It only descends 
the index once, to where workflow_id=1070, and then scans forward applying the 
=ANY to each index-tuple until it exhausts the =1070 condition.  As long as all 
the =1070 entries fit into just a few buffers, the count of buffers accessed by 
doing this is fewer than doing the re-descents.  (Stepping from tuple to tuple 
in the same index page doesn't count as a new access.  While a re-descent 
releases and reacquires the buffer)

There are 2,981,425 rows where workflow_id = 1070.  Does that change your 
theory of using an “in-index filter” for that plan?  When you say there was a 
bit of speculation on the “boundard condition” vs “in-index filter” is the 
speculation on if Postgres has 2 different ways of processing a =ANY filter or 
is the speculation that one is being used by one plan and the other is being 
used by the other plan?

Thanks again for your reply.  It is helpful.
Steve


Query performance going from Oracle to Postgres

2023-09-06 Thread Dirschel, Steve
We are in the process of converting from Oracle to Postgres and I have a query 
that is using the same index in Postgres as is used in Oracle but in Postgres 
the query does 16x more buffer/logical reads.  I'd like to understand why.  The 
query is hitting a partitioned table but to simply things I changed it to hit a 
single partition.  This partition has the same number of rows in Oracle as in 
Postgres.

Here is the Postgres query, partition definition, execution plan.  I will also 
include the Oracle execution plan below in case anyone is interested.

explain (analyze, buffers)
select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from 
hist28.history_event_display_timestamp_20230301 historyeve0_ where 
historyeve0_.IS_DELETED=0
and historyeve0_.PRISM_GUID='i0accd6a2018405f095ee669dc5b4'
and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW'
and (historyeve0_.EVENT_TYPE not in  
('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryEvent'))
and (historyeve0_.PRODUCT_VIEW in 
('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA')
or historyeve0_.PRODUCT_VIEW is null)
and historyeve0_.CLIENT_ID='WLNCE_VNJXL7'
and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and 
'2023-03-01 23:59:59.999');

Aggregate  (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 rows=1 
loops=1)
   Buffers: shared hit=341
   ->  Index Scan using 
history_event_display_timesta_prism_guid_display_timestamp_idx1 on 
history_event_display_timestamp_20230301 historyeve0_  (cost=0.42..56.64 rows=1 
width=33) (actual time=0.034..0.897 rows=332 loops=1)
 Index Cond: (((prism_guid)::text = 
'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= 
'2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= 
'2023-03-01 23:59:59.999'::timestamp without time zone) AND 
((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = 
'0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text))
 Filter: (((event_type)::text <> ALL 
('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[]))
 AND (((product_view)::text = ANY 
('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) 
OR (product_view IS NULL)))
 Buffers: shared hit=341
Planning:
   Buffers: shared hit=6
Planning Time: 0.266 ms
Execution Time: 0.965 ms
(10 rows)

*** 341 logical reads to find 332 rows.

Oracle will find the same 332 rows using the same index but in Oracle it only 
does 20 logical reads.  I thought maybe the index was fragmented so I reindexed 
that index:

reindex index 
hist28.history_event_display_timesta_prism_guid_display_timestamp_idx1;

Plan after that:

Aggregate  (cost=40.64..40.65 rows=1 width=8) (actual time=0.707..0.708 rows=1 
loops=1)
   Buffers: shared hit=328
   ->  Index Scan using 
history_event_display_timesta_prism_guid_display_timestamp_idx1 on 
history_event_display_timestamp_20230301 historyeve0_  (cost=0.42..40.64 rows=1 
width=33) (actual time=0.032..0.683 rows=332 loops=1)
 Index Cond: (((prism_guid)::text = 
'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= 
'2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= 
'2023-03-01 23:59:59.999'::timestamp without time zone) AND 
((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = 
'0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text))
 Filter: (((event_type)::text <> ALL 
('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[]))
 AND (((product_view)::text = ANY 
('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) 
OR (product_view IS NULL)))
 Buffers: shared hit=328
Planning:
   Buffers: shared hit=27
Planning Time: 0.321 ms
Execution Time: 0.741 ms
(10 rows)

Shared hit came down a little but is still 16x more than Oracle.  The actual 
query will hit 12 partitions so the extra amount of work it does in Postgres 
adds up (note some data points will find significantly more rows than 332 so 
performance is much worse than shown here but this data point is used just to 
show the difference between Oracle and Postgres).  The interesting part it is 
seems Postgres is doing 1 shared hit per row.  I don't know anyone on this list 
knows Postgres's internal index implementations vs Oracle's  but is Postgres's 
Index Scan operation significantly different than Oracle's index range scan?  
There is something implemented much less efficiently here vs Oracle and I don't 
know what I can do to get this query to perform more closer to Oracle

RE: [EXT] Re: Query performance going from Oracle to Postgres

2023-09-13 Thread Dirschel, Steve
Thanks to all who replied on this issue.  I overlooked the difference in the 
execution plan between Oracle and Postgres.  Oracle in fact does an index range 
scan only with no table lookup.  When I changed the query to be a count(*) 
Postgres went from 332 logical reads to 19.  Oracle did 20.  Oracle did 20 
regardless if it was count(*) or count(non-indexed-column).

Regards
Steve

-Original Message-
From: Peter Geoghegan  
Sent: Thursday, September 7, 2023 8:22 PM
To: David Rowley 
Cc: Dirschel, Steve ; 
pgsql-general@lists.postgresql.org; Wong, Kam Fook (TR Technology) 

Subject: [EXT] Re: Query performance going from Oracle to Postgres

External Email: Use caution with links and attachments.

On Thu, Sep 7, 2023 at 3:48 AM David Rowley  wrote:
> On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan  wrote:
> > It seems likely that the problem here is that some of the predicates 
> > appear as so-called "Filter:" conditions, as opposed to true index 
> > quals.
>
> hmm, if that were true we'd see "Rows Removed by Filter" in the 
> explain analyze.

That's why I hedged, with "seems likely". The problem with using filter 
conditions rather than true index quals isn't limited to the problem of extra 
heap accesses. It happened to be convenient to make my point that way, but that 
isn't particularly fundamental here.

I deliberately chose to make my example involve an index-only scan (that 
doesn't require any heap accesses) for this reason.

> I think all that's going on is that each tuple is on a different page 
> and the heap accesses are just causing many buffers to be accessed.

This index is an absolute monstrosity. I find it far easier to believe that the 
real explanation is the one that Steve intuited: that there is an issue with 
the way that the physical data structures (which are more or less comparable in 
both systems) are accessed in Postgres.

The index in question
("history_event_display_timesta_prism_guid_display_timestamp_idx1")
has certain columns that are omitted from the query. These columns nevertheless 
appear between other columns that the query filters on.
The leading two columns ("prism_guid" and "display_timestamp") are made into 
index quals by Postgres, but the next index column after those two 
("unique_lookup_key") is omitted by the query, and so isn't an index qual. In 
fact *four* columns are omitted after that one. But, the very-low-order index 
column "product_sid" *does* appear in the query, and so also appears as 
Postgres index quals.

There is every chance that the true underlying explanation is that Oracle is 
able to skip over significant parts of the index structure dynamically. In fact 
I'm practically certain that that's the case, since the "product_sid" column 
appears as an "access predicate", rather than as a "filter predicate". These 
terms are explained here:

https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WK7WLP60$
 

https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WCXw7ubs$
 

How could "product_sid" be used as an "access predicate" given the omitted 
index columns? It seems very likely that parts of the index can be skipped in 
Oracle, but not in Postgres -- at least not yet.

Like Markus Winand, I think that it's a real problem that EXPLAIN doesn't yet 
expose the difference between access predicates and filter predicates. Some of 
the index quals shown by EXPLAIN are marked SK_BT_REQFWD/SK_BT_REQBKWD by 
nbtree (meaning that they're what Oracle calls "access predicates"), while 
other are not (meaning that they're what Oracle calls "filter predicates"). 
That issue makes it far harder to spot these sorts of problems.

> It seems to me that an IOS would likely fix that problem which is why 
> I suggested count(*) instead of
> count(
> )

Probably, but why should we have to use an index-only scan? And what about my 
example, that already used one, and still showed a big disparity where there is 
no inherently reason why there had to be? My example didn't involve jumping to 
another part of the index because the problem seems to be more general than 
that.

--
Peter Geoghegan


Query runtime differences- trying to understand why.

2023-11-29 Thread Dirschel, Steve
I have a question on the execution time of a query and the fluctuations I'm 
seeing.  I enabled auto_trace to capture some actual executions of a query by 
an app.  Below are 2 executions of it.  The top one took 1.697 milliseconds.  
The 2nd one took 31.241 milliseconds.  Note the query has hints in it and the 
top one did not obey the hint to use the pk index on table product.  That table 
is very small so full scanning it isn't really a problem.  When I look at the 
steps in the plan below I do not believe the difference in runtime is due to 
plan differences.

Here are my observations for the top query/plan:
-  The query found 318 rows when scanning the entry_guid_mod_idx index on table 
entry.  It did 336 or 337 shared block hits for that step and it took 619 
milliseconds for this step.
-  I believe it nested looped to the full scan of table product 318 times.  
That step took 812 milliseconds (time including prior step).  At that point the 
query did a total of 841 shared block hits.  I believe it found 318 rows that 
matched.
-  The LockRows step-  this is coming from the SELECT FOR UPDATE SKIP LOCKED.  
I believe this step shows it returned 0 rows.  Runtime was 1.696 seconds.  This 
implies all 318 rows that matched the query were all locked.

Observations for the bottom query/plan:
-  The query found 291 rows when scanning the entry_guid_mod_idx index on table 
entry.  It did 313 or 315 shared block sits for that step and it took 977 
milliseconds.
-  I believe it nested looped to the index scan pkproduct 291 times.  That step 
took 1.491 seconds (time including prior step).  Ad that point the query did a 
total of 1365 shared block hits.  I believe it found 291 rows that matched.
-  The LockRows step-  Again,  I believe this is coming from the SELECT FOR 
UPDATE SKIP LOCKED.  Like the top query this also shows it returned 0 rows 
which implies to me all 291 rows were locked.  But here we can see the runtime 
was 31.239 milliseconds.

So my question-  assuming my understanding above is correct both plans show it 
taking minimal time to get to the LockRows step.  The top query skipped more 
rows than the bottom query but the bottom query took significantly more time at 
this LockRows step.  Why would that be the case?

Thanks in advance.
Steve

2023-11-29 18:12:45 
UTC:10.210.202.108(50098):alertu@pgbatchshared01aq:[15251]:LOG:  duration: 
1.697 ms  plan:
 Query Text: /*+ NestLoop(t1 t2) IndexScan(t2 pkproduct) */ SELECT  
T1.ENTRY_GUID FROM bchalertdemo.ENTRY T1,bchalertdemo.PRODUCT T2  WHERE 
T1.NEXT_RUN_DATE<$1 AND T1.STATUS_CODE=1 AND T2.ACTIVE_STATUS != 'F'  AND 
(END_DATE IS NULL OR ($2 < END_DATE)) AND (($3 NOT BETWEEN SUSPEND_DATE AND 
RESUME_DATE) OR SUSPEND_DATE IS NULL OR RESUME_DATE IS NULL) AND 
T1.PRODUCT_CODE = T2.PRODUCT_CODE AND T1.LAST_RUN_DATE<$4 AND T2.PRODUCT_VALUE 
IN ($5) AND MOD(ABS(HASHTEXT(T1.ENTRY_GUID)),150) = $6 ORDER BY T1.PRIORITY, 
T1.ENTRY_FREQUENCY_CODE, T1.NEXT_RUN_DATE FOR UPDATE SKIP LOCKED LIMIT 1
 Limit  (cost=0.42..4.63 rows=1 width=64) (actual time=1.695..1.696 rows=0 
loops=1)
   Buffers: shared hit=841
   ->  LockRows  (cost=0.42..1393.48 rows=331 width=64) (actual 
time=1.695..1.695 rows=0 loops=1)
 Buffers: shared hit=841
 ->  Nested Loop  (cost=0.42..1390.17 rows=331 width=64) (actual 
time=0.032..0.812 rows=318 loops=1)
   Join Filter: (t1.product_code = t2.product_code)
   Buffers: shared hit=337
   ->  Index Scan using entry_guid_mod_idx on entry t1  
(cost=0.42..1384.18 rows=331 width=63) (actual time=0.016..0.619 rows=318 
loops=1)
 Index Cond: ((mod(abs(hashtext((entry_guid)::text)), 
150) = 141) AND (next_run_date < '2023-11-29 12:12:38.535'::timestamp without 
time zone) AND (last_run_date < '2023-11-29 12:12:38.535'::timestamp without 
time zone)
AND (status_code = '1'::numeric))
 Filter: (((end_date IS NULL) OR ('2023-11-29 
12:12:38.535'::timestamp without time zone < end_date)) AND (('2023-11-29 
12:12:38.535'::timestamp without time zone < suspend_date) OR ('2023-11-29 
12:12:38.535'::timestamp
without time zone > resume_date) OR 
(suspend_date IS NULL) OR (resume_date IS NULL)))
 Buffers: shared hit=336
   ->  Materialize  (cost=0.00..1.02 rows=1 width=20) (actual 
time=0.000..0.000 rows=1 loops=318)
 Buffers: shared hit=1
 ->  Seq Scan on product t2  (cost=0.00..1.01 rows=1 
width=20) (actual time=0.013..0.013 rows=1 loops=1)
   Filter: ((active_status <> 'F'::bpchar) AND 
((product_value)::text = 'bchalert-poc'::text))
   Buffers: shared hit=1

2023-11-29 18:12:45 
UTC:10.210.202.108(50006):alertu@pgbatchshared01aq:[15226]:LOG:  duration: 
31.241 ms  plan:
 Query Text: /*+ NestLoop(t1 t2) IndexS

RE: [EXT] Re: Query runtime differences- trying to understand why.

2023-11-29 Thread Dirschel, Steve
Sorry.  DB is Aurora Postgres 14.6.  Hints are enabled via extension 
pg_hint_plan.  But my question isn't so much about the hints but the execution 
plan and where the time is going and why it appears for 1 execution of the 
query skipping locked rows seems to take a lot more time than another execution 
of the query.

Regards

-Original Message-
From: Andreas Kretschmer  
Sent: Wednesday, November 29, 2023 3:10 PM
To: pgsql-general@lists.postgresql.org
Subject: [EXT] Re: Query runtime differences- trying to understand why.

External Email: Use caution with links and attachments.



Am 29.11.23 um 21:25 schrieb Dirschel, Steve:
>
> I have a question on the execution time of a query and the 
> fluctuations I'm seeing.  I enabled auto_trace to capture some actual 
> executions of a query by an app.  Below are 2 executions of it.  The 
> top one took 1.697 milliseconds. The 2nd one took 31.241 milliseconds.  
> Note the query has hints in it
>

what database are you using? PostgreSQL doesn't hav hints...

Regards, Andreas

--
Andreas Kretschmer - currently still (garden leave) Technical Account Manager 
(TAM) 
https://urldefense.com/v3/__http://www.enterprisedb.com__;!!GFN0sa3rsbfR8OLyAw!eF3r308h3pnOusNslRf-zZZ1LUoQFkGOKK47Sm1ByfdIdLAtF0xPY4-4Xuo_F25zoV12fGtyNUelCdG10fZEKldOigKm0FXp$
 





Using ALTER TABLE DETACH PARTITION CONCURRENTLY inside a procedure

2024-05-14 Thread Dirschel, Steve
We have a custom procedure to add/drop partitions in Postgres.  It has a main 
FOR LOOP to find tables needing to be processed.  Inside that FOR LOOP there is 
a BEGIN so if a single table gets an error we catch the error in an exception.  
At the end of the END for the FOR LOOP it issues a commit.  So if there are 20 
tables to process and the 15th table gets an error it will capture that error, 
write a message to a log file, and continue processing the remaining tables.

We have ran into some locking issues when trying to DETACH a partition where if 
there is a long running query against the partitioned table it will block the 
DETACH PARTITION command.  Then what happens app sessions trying to insert into 
the table get blocked by the session trying to detach the partition and the app 
gets into a bad state.  In testing I found if I use the CONCURRENTLY clause 
with DETACH PARTITION the detach partition command can still get blocked by a 
long running query but that does not block app sessions from inserting into the 
table.  So this is great.

But when I try and run the command inside the procedure it throws this error:

STATE: 25001 MESSAGE: ALTER TABLE ... DETACH CONCURRENTLY cannot run inside a 
transaction block DETAIL:  HINT:  CONTEXT: SQL statement "alter table t2.test1 
detach partition t2.test1_gentime_20240511 concurrently"
PL/pgSQL function part.partition_maintenance() line 323 at EXECUTE

The documentation states:

CONCURRENTLY cannot be run in a transaction block and is not allowed if the 
partitioned table contains a default partition.

Is there an option to call that CONCURRENTLY inside a procedure as I describe?

Thanks in advance.

This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


Hash join and picking which result set to build the hash table with.

2024-05-22 Thread Dirschel, Steve
The query and execution plan are shown below.  My question is related to the 
result set the optimizer is choosing to build the hash table from.  My 
understanding is for a hash join you want to build the hash table out of the 
smaller result set.  If you look at the execution plan below you can see the 
optimizer estimates 1000 rows from the seq scan of table collection and 120,000 
rows from the seq scan of table docloc_test but is building the hash buckets 
from those 12 rows rather than from the 1000 rows.  In our case under 
certain volume that causes that to spill to temp and under high load it hangs 
up the Aurora Postgres database.  But if it were to build the hash table out of 
those 1000 rows it would fit in work_mem so no problems.  Why is it picking the 
larger result set?

Another funky thing here-  here are the table definitions:

pgcci01ap=> \d CCI.COLLECTION

Column|  Type  | Collation | Nullable | 
   Default
--++---+--+
 collection_name  | character varying(40)  |   | not null |
 l_stage  | numeric(11,0)  |   |  |
 p_stage  | numeric(11,0)  |   |  |
 t_stage  | numeric(11,0)  |   |  |
 last_upd_datetime| timestamp(6) without time zone |   | not null |
 last_upd_inits   | character varying(30)  |   | not null |
 owner_name   | character varying(30)  |   |  |
 password | character varying(30)  |   |  |
 email_address| character varying(2000)|   |  |
 available_flag   | character(1)   |   |  |
 collection_id| numeric(11,0)  |   | not null |
 collection_type  | character varying(20)  |   |  |
 retrieval_password   | character varying(40)  |   |  |
 partner_coll_name| character varying(40)  |   |  |
 relation2partner | character varying(20)  |   |  |
 reload_flag  | character(1)   |   |  | 
'N'::bpchar
 partner_id   | character varying(40)  |   |  |
 content_timezone | character varying(40)  |   | not null | 
'America/Chicago'::character varying
 token_type   | character varying(30)  |   |  |
 cc_collection_dest   | character varying(40)  |   |  |
 auto_reclaim_enabled | character(1)   |   | not null | 
'N'::bpchar
 collection_family| character varying(40)  |   | not null | 
''::character varying
 access_password  | character(40)  |   |  |
 mic_group| character varying(40)  |   |  |
 mic_type | character varying(10)  |   |  |
 retrieval_source | character varying(40)  |   | not null | 
'DOC1'::character varying
Indexes:
"xpkcollection" PRIMARY KEY, btree (collection_name)
"xak1collection" UNIQUE CONSTRAINT, btree (collection_id)
"xie1collection" btree (relation2partner)
"xie2collection" btree (collection_family, collection_name)
"xie3collection" btree (mic_group)
Referenced by:
TABLE "cci.index_update_proc" CONSTRAINT "rfk12_index_update_proc" FOREIGN 
KEY (collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.authority_update_proc" CONSTRAINT "rfk1_authority_update_proc" 
FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.collection_event" CONSTRAINT "rfk1_collection_event" FOREIGN KEY 
(collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.collection_pit" CONSTRAINT "rfk1_collection_pit" FOREIGN KEY 
(collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.collection_stage" CONSTRAINT "rfk1_collection_stage" FOREIGN KEY 
(collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.csloc_update_proc" CONSTRAINT "rfk1_csloc_update_proc" FOREIGN 
KEY (collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.docloc_update_proc" CONSTRAINT "rfk1_docloc_update_proc" FOREIGN 
KEY (collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.index_set_mrg" CONSTRAINT "rfk1_index_set_mrg" FOREIGN KEY 
(collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.index_set_stats" CONSTRAINT "rfk1_index_set_stats" FOREIGN KEY 
(collection_name) REFERENCES cci.collection(collection_name)
TABLE "cci.index_system_attr" CONSTRAINT "rfk1_index_system_attr" FOREIGN 
KEY (collection_name) REFERENCES 

Query tuning question

2024-08-21 Thread Dirschel, Steve
Aurora Postgres version 13.7.

Table definition:

acquisition_channel_db=> \d acquisition_channel.acquired_object
Table 
"acquisition_channel.acquired_object"
 Column |   Type   | Collation 
| Nullable | Default
+--+---+--+-
acquired_object_uuid   | character varying(36)|   | 
not null |
acquired_object_name   | text |   | 
not null |
acquired_object_size   | bigint   |   | 
 |
acquisition_run_record_id  | bigint   |   | 
 |
correlation_id | character varying(36)|   | 
 |
acquired_datetime  | timestamp with time zone |   | 
 |
checksum   | character varying(128)   |   | 
 |
acquisition_method_id  | bigint   |   | 
 |
ol_version | integer  |   | 
not null |
created_datetime   | timestamp with time zone |   | 
not null |
updated_datetime   | timestamp with time zone |   | 
 |
status | character varying(50)|   | 
 |
parent_acquired_object_uuid| character varying(36)|   | 
 |
extracted_from_object_path | text |   | 
 |
resource_group_id  | bigint   |   | 
 |
s3_gcs_bucket  | character varying(100)   |   | 
 |
s3_key | character varying(500)   |   | 
 |
metadata_s3_gcs_bucket | character varying(100)   |   | 
 |
metadata_s3_key| character varying(500)   |   | 
 |
routing_result_acquisition_channel_id  | bigint   |   | 
 |
acquired_object_type   | character varying(100)   |   | 
 |
created_by_id  | integer  |   | 
 |
updated_by_id  | integer  |   | 
 |
metadata   | jsonb|   | 
 |
acquired_object_id | bigint   |   | 
not null | nextval('acquisition_channel.acquired_object_id_seq'::regclass)
routed_to_acquisition_channel_datetime | timestamp with time zone |   | 
 |
routed_to_acquisition_channel_id   | bigint   |   | 
 |
rendition_guid | character varying(36)|   | 
 |
revision_guid  | character varying(36)|   | 
 |
channel_availability_status| character varying(100)   |   | 
 |
mime_type  | character varying(100)   |   | 
 |
encryption_public_key_hash | text |   | 
 |
acquired_metadata  | jsonb|   | 
 |
original_acquired_object_name  | text |   | 
 |
acquired_family_uuid   | character varying(36)|   | 
 |
last_broadcast_datetime| timestamp with time zone |   | 
 |
original_checksum  | character varying(128)   |   | 
 |
Indexes:
"acquired_object_pkey" PRIMARY KEY, btree (acquired_object_uuid)
"acquired_family_uuid_idx" btree (acquired_family_uuid)
"acquired_object_acq_method_id_acq_dt_acquired_object_uuid" btree 
(acquisition_method_id, acquired_datetime DESC NULLS LAST, acquired_object_uuid)
"acquired_object_acquired_items_initial_ui_page" btree 
(acquisition_method_id, acquired_datetime DESC NULLS LAST, 
acquired_object_uuid) WHERE status::text <> 'TEST'::text AND 
parent_acquired_object_uuid IS NULL
"acquired_object_acquired_object_id_unq" UNIQUE, btree (acquired_object_id)
"acquired_object_acquired_object_name" btree (lower(acquired_object_name))
"acquired_object_acquisition_method_id" btree (acquisition_method_id)
"acquired_object_acquisition_run_record_id" btree 
(acquisition_run_record_id)
"acquired_object_checksum" btree (checksum)
"acquired_object_correlation_id" btree (correlation_id)
"acquired_object_idx2" btree (parent_acquired_object_uuid, 
extracted_from_object_path)
"a

pg_stat_statements

2022-01-11 Thread Dirschel, Steve
Hello,

I'm not sure if this is the correct distribution list for this type of question 
but I'll try anyways.  We have an app that uses multiple schemas.  It will do a 
set schema 'schema_name' and execute queries.  The queries executed are the 
same regardless of the schema the connection set.

In pg_stat_statements the exact same query will get a different queryid for 
each schema that executes the query.

I'm unable to determine which queryid comes from which schema the query was 
executed under.  Is anyone aware of a way to determine this?

Thanks in advance.

Steve
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


Trying to dynamically create a procedure

2025-03-26 Thread Dirschel, Steve
Hi,

I have the need to dynamically create a procedure.  Here is a simple procedure:

create or replace procedure junk.test_proc()
LANGUAGE plpgsql
AS $$
declare
  v_cnt integer := 0;
begin
  raise notice 'v_cnt is %', v_cnt;
end $$;

That creates and runs fine.

Here I’m trying to create it inside PL/pgSQL block (yes there is nothing 
dynamic below but the real code will have parts of the procedure that needs to 
have code dynamically generated):


DO $$

BEGIN

EXECUTE 'create or replace procedure junk.test_proc() ' ||
  'LANGUAGE plpgsql  '  ||
  'AS $$ '||
  'declare  '  ||
  '  v_cnt integer := 0; '   ||
  'begin '||
  '  raise notice 'v_cnt is %', v_cnt; ' ||
  'end $$';

END;

$$

It throws this error:

ERROR:  syntax error at or near "$$
DO $$"
LINE 1: $$
^
dbtest=>
dbtest=> END;
WARNING:  there is no transaction in progress
COMMIT
dbtest=>
dbtest=> $$


I think the problem has to do with having AS $$ and END $$ with the 2 $’s.  I’m 
not sure if there is different syntax I can use outside the $$ or if there is 
something I should use in the PL/pgSQL to escape those $$ to get this to work.

Any help would be appreciated.

Thanks
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html


Postgres_fdw- User Mapping with md5-hashed password

2025-04-08 Thread Dirschel, Steve
I know I can create user steve_test with password testpassword122 as md5 by 
doing:

select 'md5'||md5('testpassword122steve_test');  Returns --> 
md5eb7e220574bf85096ee99370ad67cbd3

CREATE USER steve_test WITH  PASSWORD 'md5eb7e220574bf85096ee99370ad67cbd3';

And then I can login as steve_test with password testpassword122.

I'm trying to use similar logic when creating a user mapping:

CREATE USER MAPPING FOR postgres SERVER steve_snap0 OPTIONS (user 'steve_test', 
password 'md5eb7e220574bf85096ee99370ad67cbd3');

When I try and import a foreign schema I get an error:

ERROR:  could not connect to server "steve_snap0"

If I create the user mapping with the password:

CREATE USER MAPPING FOR postgres SERVER steve_snap0 OPTIONS (user 'steve_test', 
password 'testpassword122');

It works fine.

Is it not possible to use the same logic for the user mapping password that can 
be used when creating a user?

Thanks in advance.
This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website: 
https://www.thomsonreuters.com/en/resources/disclosures.html