Postgres NOT IN vs NOT EXISTS optimization
We are in the process of migrating from Oracle to Postgres and the following query does much less work with Oracle vs Postgres. explain (analyze, buffers) select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_, favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as position7_2_, favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as product_9_2_, favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_ from cf0.FAVORITE_GROUP favoritegr0_ where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView') and (favoritegr0_.FAVORITE_GROUP_SID not in (select favoriteen1_.FAVORITE_GROUP_SID from cf0.FAVORITE_GROUP_MEMBER favoriteen1_ cross join cf0.CATEGORY_PAGE categorypa2_ where favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID and categorypa2_.UNIQUE_NAME='Florida' and categorypa2_.IS_DELETED=0 and favoriteen1_.IS_DELETED=0)) and favoritegr0_.IS_DELETED=0 and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null) and favoritegr0_.PRISM_GUID='ia7448342012ca23eacf87bb0ed56' order by favoritegr0_.POSITION desc; Here is the plan in Postgres. It did 1426 shared block hits. If you look at this plan it is not pushing filtering into the NOT IN subquery- it is fully resolving that part of the query driving off where UNIQUE_NAME = 'Florida'. QUERY PLAN - Sort (cost=5198.22..5198.22 rows=1 width=144) (actual time=6.559..6.560 rows=1 loops=1) Sort Key: favoritegr0_."position" DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=1426 -> Index Scan using favorite_group_idx01 on favorite_group favoritegr0_ (cost=5190.18..5198.21 rows=1 width=144) (actual time=6.514..6.515 rows=1 loops=1) Index Cond: (((prism_guid)::text = 'ia7448342012ca23eacf87bb0ed56'::text) AND (is_deleted = 0)) Filter: ((NOT (hashed SubPlan 1)) AND ((usage_type = 0) OR (usage_type IS NULL)) AND ('FORMS.WESTLAW'::text = (product_sid)::text) AND ((product_view)::text = 'DefaultProductView'::text)) Buffers: shared hit=1423 SubPlan 1 -> Nested Loop (cost=0.70..5189.90 rows=1 width=33) (actual time=6.459..6.459 rows=0 loops=1) Buffers: shared hit=1417 -> Index Scan using category_page_idx04 on category_page categorypa2_ (cost=0.42..5131.71 rows=7 width=33) (actual time=0.035..6.138 rows=92 loops=1) Index Cond: ((unique_name)::text = 'Florida'::text) Filter: (is_deleted = 0) Buffers: shared hit=1233 -> Index Scan using favorite_group_member_idx03 on favorite_group_member favoriteen1_ (cost=0.28..8.30 rows=1 width=66) (actual time=0.003..0.003 rows=0 loops=92) Index Cond: ((category_page_sid)::text = (categorypa2_.category_page_sid)::text) Filter: (is_deleted = 0) Buffers: shared hit=184 Planning Time: 1.624 ms Execution Time: 6.697 ms If I compare that to the plan Oracle uses it pushes the favoritegr0_.FAVORITE_GROUP_SID predicate into the NOT IN. I'm able to get a similar plan with Postgres if I change the NOT IN to a NOT EXISTS: explain (analyze, buffers) select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_, favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as position7_2_, favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as product_9_2_, favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_ from cf0.FAVORITE_GROUP favoritegr0_ where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView') and not exists ( select 'x' from cf0.FAVORITE_GROUP_MEMBER favoriteen1_ cross join cf0.CATEGORY_PAGE categorypa2_ where favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID and categorypa2_.UNIQUE_NAME='Florida' and categorypa2_.IS_DELETED=0 and favoriteen1_.IS_DELETED=0 and favoritegr0_
Tuning a query with ORDER BY and LIMIT
I am fairly new to tuning Postgres queries. I have a long background tuning Oracle queries. Posrgres version 10.11 Here is the DDL for the index the query is using: create index workflow_execution_initial_ui_tabs on workflow_execution (workflow_id asc, status asc, result asc, completed_datetime desc); explain (analyze, verbose, costs, buffers, timing, summary, hashes) select * from workflow_execution where workflow_id = 14560 and status = 'COMPLETED' and result in ('SUCCEEDED','REEXECUTED','ABORTED','DISCONTINUED','FAILED','PARTIAL_SUCCESS') order by completed_datetime desc limit 50; -- Limit (cost=56394.91..56395.04 rows=50 width=1676) (actual time=3400.608..3400.622 rows=50 loops=1) " Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid" Buffers: shared hit=142368 -> Sort (cost=56394.91..56432.71 rows=15118 width=1676) (actual time=3400.607..3400.615 rows=50 loops=1) "Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid" Sort Key: workflow_execution.completed_datetime DESC Sort Method: top-N heapsort Memory: 125kB Buffers: shared hit=142368 -> Index Scan using workflow_execution_initial_ui_tabs on workflow.workflow_execution (cost=0.69..55892.70 rows=15118 width=1676) (actual time=0.038..2258.579 rows=2634718 loops=1) " Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid" " Index Cond: ((workflow_execution.workflow_id = 14560) AND ((workflow_execution.status)::text = 'COMPLETED'::text) AND ((workflow_execution.result)::text = ANY ('{SUCCEEDED,REEXECUTED,ABORTED,DISCONTINUED,FAILED,PARTIAL_SUCCESS}'::text[])))" Buffers: shared hit=142368 Planning time: 0.217 ms Execution time: 3400.656 ms With Oracle for a query like this since the index is on the 3 columns matching the WHERE clause and the ORDER BY clause is in the 4th position Oracle would be able to scan that index and as soon as it finds the first matching 50 rows. But as you can see above Postgres is finding 2,634,718 matching rows for the WHERE clause , sorts them, and then returns the first 50 rows. I was questioning if the result IN clause was causing the issue so I ran the query with result = and see the same results: explain (analyze, verbose, costs, buffers, timing, summary, hashes) select * from workflow_execution where workflow_id = 14560 and status = 'COMPLETED' and result = 'SUCCEEDED' order by completed_datetime desc limit 50; Limit (cost=54268.09..54268.22 rows=50 width=1676) (actual time=3372.453..3372.467 rows=50 loops=1) " Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid" Buffers: shared hit=140313 -> Sort (cost=54268.09..54304.46 rows=14547 width=1676) (actual time=3372.452..3372.460 rows=50 loops=1) "Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_
log_min_messages = warning
We recently upgraded from postgres 12.8 to 14.3. We are running Aurora Postgres on AWS. We have procedures that will make calls to RAISE NOTICE to write out messages if you interactively call the procedure through psql. These procedures are getting called by pg_cron. Since upgrading these RAISE NOTICE messages are getting written to the postgres log file: For example: 13:45:00.720882-05 CONTEXT: PL/pgSQL function perf.snap_stats() line 242 at RAISE 2022-09-06 18:45:01 UTC::@:[21535]:LOG: cron job 2: NOTICE: Snapping dba_hist_system_event at 2022-09-06 13:45:00.725818-05 Log_min_messages is set to warning so my understanding is that should only log messages at warning, error, log, fatal, or panic. Any idea how to troubleshoot why these are getting written to the log file? Thanks This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
RE: [EXT] Re: log_min_messages = warning
>On 9/6/22 12:07, Dirschel, Steve wrote: >> We recently upgraded from postgres 12.8 to 14.3. We are running >> Aurora Postgres on AWS. >> >> We have procedures that will make calls to RAISE NOTICE to write out >> messages if you interactively call the procedure through psql. These >> procedures are getting called by pg_cron. >> >> Since upgrading these RAISE NOTICE messages are getting written to the >> postgres log file: >> >> For example: >> >> 13:45:00.720882-05 >> >> CONTEXT: PL/pgSQL function perf.snap_stats() line 242 at RAISE >> >> 2022-09-06 18:45:01 UTC::@:[21535]:LOG: cron job 2: NOTICE: Snapping >> dba_hist_system_event at 2022-09-06 13:45:00.725818-05 >> >> Log_min_messages is set to warning so my understanding is that should >> only log messages at warning, error, log, fatal, or panic. Any idea >> how to troubleshoot why these are getting written to the log file? > > >1) Has log_min_messages been commented out? > >2) Was the setting changed from something else? > >3) If 2) was the server reloaded/restarted to catch the change? > >4) Are you sure you are looking at the correct setting? > >Try: > >select setting, source, sourcefile, pending_restart from pg_settings where >name = 'log_min_messages'; > >to see if the value is coming from something like an include file. > >> >> Thanks >> >>This e-mail is for the sole use of the intended recipient and contains >> information that may be privileged and/or confidential. If you are not >> an intended recipient, please notify the sender by return e-mail and >> delete this e-mail and any attachments. Certain required legal entity >> disclosures can be accessed on our website: >> https://www.thomsonreuters.com/en/resources/disclosures.html >> > >-- >Adrian Klaver >adrian.kla...@aklaver.com shgroup02s=> select setting, source, sourcefile, pending_restart from pg_settings where name = 'log_min_messages'; setting | source | sourcefile | pending_restart -+-++- warning | default || f (1 row) Regards Steve
RE: [EXT] Re: log_min_messages = warning
We recently upgraded from postgres 12.8 to 14.3. We are running Aurora Postgres on AWS. >Aurora Postgres != Postgres. This doesn't seem like something they'd change, >but we can't rule that out entirely. >>> Try: >>> select setting, source, sourcefile, pending_restart from pg_settings >>> where name = 'log_min_messages'; >> shgroup02s=> select setting, source, sourcefile, pending_restart from >> pg_settings where name = 'log_min_messages'; setting | source | >> sourcefile | pending_restart >> -+-++- >> warning | default || f >> (1 row) >OK, so log_min_messages has the expected value in your interactive session, >and yet evidently not in the sessions running these RAISE NOTICE commands. >Maybe you have done ALTER USER SET or ALTER DATABASE SET commands that'd >change it in >those sessions? You could look in the pg_db_role_setting >catalog to answer that definitively. > regards, tom lane We have opened a case with AWS on this issue as well since it is Aurora but I decided to reach out to the community as well. We have pg_cron (which is kicking off the procedures that are executing the code that is causing these NOTICE messages to get written to the log file) setup with cron.database_name = shgroup02s. That database is oid 16436. shgroup02s=> select * from pg_db_role_setting ; setdatabase | setrole | setconfig -+-+-- 16401 | 0 | {auto_explain.log_min_duration=-1} 16401 | 10 | {log_min_messages=panic} 16436 | 0 | {TimeZone=America/Chicago} 0 | 10 | {TimeZone=utc,log_statement=all,log_min_error_statement=debug5,log_min_messages=panic,exit_on_error=0,statement_timeout=0,role=rdsadmin,auto_explain.log_min_duration=-1,temp_file_limit= -1,"search_path=pg_catalog, public",pg_hint_plan.enable_hint=off,default_transaction_read_only=off} (4 rows) If I login to the shgroup02s database as postgres user (using psql) and interactively call a procedure that cron calls which causes the RAISE NOTICE commands to be written to the log they do NOT get written to the log when I call the procedure. The messages will be displayed on my screen but I don't see them getting written to the log. I see cron has a setting- cron.log_min_messages which is set to warning so it doesn't seem to come from that. Regards Steve
RE: [EXT] Re: log_min_messages = warning
> "Dirschel, Steve" writes: >> setdatabase | setrole | >> >> setconfig >> -+-+-- >> -+-+-- >> -+-+-- >> -+-+-- >> -+-+-- >> >>16401 | 0 | {auto_explain.log_min_duration=-1} >>16401 | 10 | {log_min_messages=panic} >>16436 | 0 | {TimeZone=America/Chicago} >>0 | 10 | >> {TimeZone=utc,log_statement=all,log_min_error_statement=debug5,log_min_messages=panic,exit_on_error=0,statement_timeout=0,role=rdsadmin,auto_explain.log_min_duration=-1,temp_file_limit= >> -1,"search_path=pg_catalog, >> public",pg_hint_plan.enable_hint=off,default_transaction_read_only=off >> } >> (4 rows) >> If I login to the shgroup02s database as postgres user (using psql) >> and interactively call a procedure that cron calls which causes the >> RAISE NOTICE commands to be written to the log they do NOT get written >> to the log when I call the procedure. The messages will be displayed >> on my screen but I don't see them getting written to the log. >You've evidently done "ALTER USER postgres SET log_min_messages=panic", so the >lack of any messages under that userid is unsurprising. But we're not a lot >closer to being sure why it's different for the procedures' normal execution >environment. >At this point it's hard to avoid the conclusion that those procedures' >session is changing the value itself. (This is scary, because it implies that >you're running those as superuser, which seems like a bad idea.) You might >have to enable log_statement = all to verify that. > regards, tom lane I appreciate your feedback Tom. To simplify this I created this procedure: create or replace procedure part.test1() LANGUAGE plpgsql AS $$ DECLARE BEGIN raise notice '* raise notice test *'; END $$; If I call that through PSQL this is returned to my screen: shgroup02s=> call part.test1(); NOTICE: * raise notice test * CALL And if I monitor the log file nothing is written to it. But if I schedule that procedure through pg_cron: SELECT cron.schedule('Minute test', '* * * * *', 'call part.test1()'); If I monitor the log file I see it writing this to the log- it actually writes it out 2 times: 2022-09-07 12:54:33 UTC::@:[21535]:LOG: cron job 6: NOTICE: * raise notice test * CONTEXT: PL/pgSQL function part.test1() line 6 at RAISE 2022-09-07 12:54:33 UTC::@:[21535]:LOG: cron job 6: NOTICE: * raise notice test * CONTEXT: PL/pgSQL function part.test1() line 6 at RAISE If I create another test procedure: create or replace procedure part.test2() LANGUAGE plpgsql AS $$ DECLARE BEGIN raise exception '* raise ERROR test *'; END $$; When I execute that through PSQL this is returned: shgroup02s=> call part.test2(); ERROR: * raise ERROR test * And in the log file I see this written: 2022-09-07 12:58:00 UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:ERROR: * raise ERROR test * 2022-09-07 12:58:00 UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:CONTEXT: PL/pgSQL function part.test2() line 6 at RAISE 2022-09-07 12:58:00 UTC:10.210.119.217(42434):postgres@shgroup02s:[15409]:STATEMENT: call part.test2(); So the raise exception gets written to the log file when I execute it through PSQL which I believe is expected. I believe this clearly points to an issue with pg_cron. Would you agree? Regards Steve
pg_stat_activity.backend_xmin
We are troubleshooting an issue where autovacuum is not cleaning up a table. The application using this database runs with autocommit turned off. We can see in pg_stat_activity lots of sessions "idle in transaction" even though those sessions have not executed any DML- they have executed selects but no DML. The database's isolation level is set to read committed. In a test database if I login through psql and set autocommit off and issue a select I can see my session in pg_stat_activity has xact_start populated but backend_xmin is null. If I run vacuum against the table I selected from (that has garbage rows that need to be cleaned up) it will clean them up. But if I do a "set transaction isolation level repeatable read" and then do the select pg_stat_activity xact_start is populated and backend_xmin is also populated. In a different session if I delete/insert into the table I selected from and then run vacuum against the table those rows will not get cleaned up because the xmin is of the rows that need to get cleaned up are higher (or is it lower) than the backend_xmin of my select session. That scenario is the scenario we are seeing through the application. According to the app team they are not aware of their code changing the isolation level to repeatable read. Are there other scenarios where the transaction isolation is set to read committed, the session has autocommit off, and a select will populate backend_xmin in pg_stat_activity for the session that issued the select? In session #1: \set AUTOCOMMIT off Set transaction isolation level repeatable read; Select * from test1; In session #2: Delete from test1; Insert into test1 values (1); ... do 10 more inserts, delete the rows, do 10 more inserts, delete the rows. ... vacuum(verbose) test1; INFO: vacuuming "public.test1" INFO: "test1": found 0 removable, 55 nonremovable row versions in 1 out of 1 pages DETAIL: 44 dead row versions cannot be removed yet, oldest xmin: 32708199 There were 172 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: vacuuming "pg_toast.pg_toast_329295" INFO: "pg_toast_329295": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 32708199 There were 0 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM The xmin value of 32708199 is the value in pg_stat_activity.backend_xmin for the session who issued the select. Once I issue a commit in that session and then try autovacuum again: vacuum(verbose) test1; INFO: vacuuming "public.test1" INFO: "test1": removed 44 row versions in 1 pages INFO: "test1": found 44 removable, 11 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 32708957 There were 172 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: vacuuming "pg_toast.pg_toast_329295" INFO: "pg_toast_329295": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 32708957 There were 0 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM Thanks This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
RE: [EXT] Re: pg_stat_activity.backend_xmin
On Wed, 2022-09-21 at 14:11 +, Dirschel, Steve wrote: >> We are troubleshooting an issue where autovacuum is not cleaning up a table. >> The application using this database runs with autocommit turned off. >> We can see in pg_stat_activity lots of sessions “idle in transaction” >> even though those sessions have not executed any DML- they have >> executed selects but no DML. The database’s isolation level is set to read >> committed. > "backend_xmin" is set when the session has an active snapshot. Such a > snapshot is held for the whole duration of a transaction in the REPEATABLE > READ isolation level, but there are cases where you can see that in READ > COMMITTED isolation level > as well: > > - if there is a long running query > > - if there is a cursor open > > Perhaps you could ask your developers if they have long running read-only > transactions with cursors. > > Yours, > Laurenz Albe Thanks for the reply Laurenz. For an application session in this "state" pg_stat_activity shows the state of "idle in transaction" and backend_xmin is populated. The query shows the last select query it ran. It is not currently executing a query. And dev has said they are not using a cursor for the query. So it does not appear they have long running read-only transactions with cursors. Outside that scenario can you think of any others where a session: 1. Login to the database 2. Set autocommit off 3. Run select query, query completes, session does nothing after that query completes. 4. transaction isolation level is read committed That session sitting there idle in transaction has backend_xmin populated. When I run that test backend_xmin does not get populated unless I set my transaction isolation level to repeatable read. We have enabled statement logging so we can see if their sessions are changing that transaction isolation level behind the scenes that they are not aware of but so far we have not seen that type of command logged. Regards Steve
RE: [EXT] pg_stat_activity.backend_xmin
>> On Sep 21, 2022, at 9:32 AM, Dirschel, Steve >> wrote: >> >> On Wed, 2022-09-21 at 14:11 +, Dirschel, Steve wrote: >>>> We are troubleshooting an issue where autovacuum is not cleaning up a >>>> table. >>>> The application using this database runs with autocommit turned off. >>>> We can see in pg_stat_activity lots of sessions “idle in transaction” >>>> even though those sessions have not executed any DML- they have >>>> executed selects but no DML. The database’s isolation level is set to >>>> read committed. >> >>> "backend_xmin" is set when the session has an active snapshot. Such a >>> snapshot is held for the whole duration of a transaction in the REPEATABLE >>> READ isolation level, but there are cases where you can see that in READ >>> COMMITTED isolation level > as well: >>> >>> - if there is a long running query >>> >>> - if there is a cursor open >>> >>> Perhaps you could ask your developers if they have long running read-only >>> transactions with cursors. >>> >>> Yours, >>> Laurenz Albe >> >> Thanks for the reply Laurenz. For an application session in this "state" >> pg_stat_activity shows the state of "idle in transaction" and backend_xmin >> is populated. The query shows the last select query it ran. It is not >> currently executing a query. And dev has said they are not using a cursor >> for the query. So it does not appear they have long running read-only >> transactions with cursors. >> >> Outside that scenario can you think of any others where a session: >> 1. Login to the database >> 2. Set autocommit off >> 3. Run select query, query completes, session does nothing after that query >> completes. >> 4. transaction isolation level is read committed >> >> That session sitting there idle in transaction has backend_xmin populated. >> When I run that test backend_xmin does not get populated unless I set my >> transaction isolation level to repeatable read. We have enabled statement >> logging so we can see if their sessions are changing that transaction >> isolation level behind the scenes that they are not aware of but so far we >> have not seen that type of command logged. >> >> Regards >> Steve > >What stack is the application using? Anything like Spring or Hibernate >involved? Java is the stack. Thanks
Interpreting postgres execution plan along with AND/OR precedence
Table definition: Table "load.lm_queue" Column | Type | Collation | Nullable | Default ++---+--+- guid | character(33) | | not null | host_name | character varying(40) | | | priority | numeric(11,0) | | not null | request_time | timestamp(6) without time zone | | not null | collection_name| character varying(40) | | not null | stage_id | numeric(11,0) | | | source_file| character varying(250) | | | lm_id | numeric(11,0) | | | start_time | timestamp(6) without time zone | | | status_text| character varying(225) | | not null | NULL::character varying load_data_id | character varying(500) | | | docs_in_load | numeric(11,0) | | | client_name| character varying(50) | | | status_code| numeric(11,0) | | | email_address | character varying(2000)| | | hold_flag | character(1) | | | process_type | character varying(40) | | | cancel_flag| character(1) | | | file_type | character varying(6) | | | lm_data| character varying(4000)| | | ds_request_time| timestamp(6) without time zone | | | ds_id | numeric(11,0) | | | ds_start_time | timestamp(6) without time zone | | | auto_promote_flag | character(1) | | | extract_out_file | character varying(250) | | | last_upd_time | timestamp(6) without time zone | | | ds_fail_count | numeric(11,0) | | | cc_collection | character varying(40) | | | cc_environment | character varying(40) | | | cc_fail_on_db_mismatch | character(1) | | | cc_tracking_guid | character varying(33) | | | cc_numrows | character varying(50) | | | cc_owner | character varying(30) | | | cc_password| character varying(30) | | | parent_guid| character varying(33) | | | Indexes: "xpklm_queue" PRIMARY KEY, btree (guid) "idx_hn_cn_dsid_sc_dst_dfc" btree (host_name, collection_name, ds_id, status_code, ds_start_time, ds_fail_count) "ix_lm_cc" btree (collection_name, client_name) "ix_lm_chl" btree (client_name, host_name, lm_id) Query and plan: explain SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, PROCESS_TYPE, PARENT_GUID FROM LOAD.LM_QUEUE lmq1 WHERE CLIENT_NAME='WLCASES' AND HOLD_FLAG='Y' AND HOST_NAME='WLLOADB' AND STATUS_CODE in (1) AND NOT EXISTS (SELECT COLLECTION_NAME FROM LOAD.LM_QUEUE lmq2 WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME AND LM_ID <> 0 AND PROCESS_TYPE NOT IN('EXTRACT')) OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN') AND LM_ID = 0 AND CLIENT_NAME='WLCASES' AND HOST_NAME= 'WLLOADB' ) ORDER BY PRIORITY DESC, REQUEST_TIME ASC; Sort (cost=1578.99..1579.00 rows=1 width=120) Sort Key: lmq1.priority DESC, lmq1.request_time -> Index Scan using ix_lm_chl on lm_queue lmq1 (cost=0.40..1578.98 rows=1 width=120) Index Cond: (((client_name)::text = 'WLCASES'::text) AND ((host_name)::text = 'WLLOADB'::text)) Filter: (((hold_flag = 'Y'::bpchar) AND (status_code = '1'::numeric) AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2))) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = '0'::numeric))) SubPlan 1 -> Index Scan using ix_lm_cc on lm_queue lmq2 (cost=0.40..177.93 rows=1 width=0) Index Cond: ((collection_name)::text = (lmq1.collection_name)::text) Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text)) SubPlan 2 -> Seq Scan on lm_queue lmq2_1 (cost=0.00..124999.06 rows=12 width=32) Filter: ((lm_id <> '0'::numer
DBeaver session populating pg_stat_activity.backend_xmin
Not sure if anyone on this list may have ideas on this or not. I will also try and find a DBeaver e-mail list and send to that also. We are running into a problem where users will connect to a database through DBeaver. I have found if I leave all DBeaver options as default this issue does not happen. But I have found if I edit my connection under "Connection settings", "Initialization" and if I uncheck Auto-commit and then under "Connection settings", "Metadata" if I unselect "Open separate connection for metadata read" I can duplicate the issue. When I connect to the database through DBeaver with those 2 default settings changed and find that session in pg_stat_activity column xact_start is populated along with backend_xmin. Those get populated just by logging in. I don't know if changing those 2 default settings is the only way to get the session to get like this but it does mimic the problem. The problem is users will connect using DBeaver and their sessions will sit idle. We have a table with high update activity and I will start seeing queries hitting this table do more and more logical reads because autovacuum is not able to cleanup records newer than the backend_xmin for those session. We have told users not to leave their sessions sitting idle like that and are also considering setting idle_in_transaction_session_timeout to kill sessions that sit like this. But I am trying to understand what DBeaver is doing to get a session into that state. I've set log_statement to ALL and did a login through DBeaver with the 2 settings changed to see what all it is executing. I then changed the DBeaver settings back to default so I could see what it executes upon connect without having xact_start and backend_xmin populated. It executes the exact same commands except when the 2 default DBeaver settings are changed to show the issue it is issuing a BEGIN and COMMIT around one block of code and then at another part of code it issues a BEGIN, runs some queries, and never issues a COMMIT. The log info is below with <=== next to the BEGIN/COMMIT's that happen with the non-default settings. I try and duplicate this in psql and cannot. In psql I do a: \set AUTOCOMMIT off I then run the exact same commands as log_statements = ALL shows DBeaver runs without a COMMIT at the end of the last statement (those are shown below). I can find that psql session in pg_stat_activity and xact_start is populated but backend_xmin is NOT populated. Under that scenario it would not cause the problem of autovacuum not being able to cleanup this heavy update table due to a session sitting out there with backend_xmin populated. Any ideas why I cannot duplicate this in psql if I'm running what appears to be the exact same commands DBeaver is? Is it possible DBeaver is running something else that does not get written to the log file when log_statements is set to all? One slight difference in my psql test was a couple of the queries below were using bind variables when executed through DBeaver. I put the literal values in the query when I executed through psql. Is it possible that somehow using bind variables with autocommit off, using BEGIN causes backend_xmin to get populated? Thanks >From log file: SET extra_float_digits = 3 SET application_name = 'PostgreSQL JDBC Driver' SET application_name = 'DBeaver 22.0.0 - Main ' BEGIN <= SELECT current_schema(),session_user SELECT n.oid,n.*,d.description FROM pg_catalog.pg_namespace n LEFT OUTER JOIN pg_catalog.pg_description d ON d.objoid=n.oid AND d.objsubid=0 AND d.classoid='pg_namespace'::regclass ORDER BY nspname SELECT n.nspname = ANY(current_schemas(true)), n.nspname, t.typname FROM pg_catalog.pg_type t JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid WHERE t.oid = '1034' SELECT typinput='array_in'::regproc as is_array, typtype, typname FROM pg_catalog.pg_type LEFT JOIN (select ns.oid as nspoid, ns.nspname, r.r from pg_namespace as ns join ( select s.r, (current_schemas(false))[s.r] as nspnamefrom generate_series(1, array_upper(current_schemas(false), 1)) as s(r) ) as r using ( nspname )) as sp ON sp.nspoid = typnamespace WHERE typname = '_aclitem' ORDER BY sp.r, pg_type.oid DESC SHOW search_path COMMIT < BEGIN < SELECT db.oid,db.* FROM pg_catalog.pg_database db WHERE datname='dbtest' select * from pg_catalog.pg_settings where name='standard_conforming_strings' select string_agg(word, ',') from pg_catalog.pg_get_keywords() where word <> ALL ('{a,abs,absolute,action,ada,add,admin,after,all,allocate,alter,always,and,any,are,array,as,asc,as
RE: [EXT] Re: DBeaver session populating pg_stat_activity.backend_xmin
>> When I connect to the database through DBeaver with those 2 default >> settings changed and find that session in pg_stat_activity column >> xact_start is populated along with backend_xmin. Those get populated >> just by logging in. >As you found out in the log, the driver runs DbEaver run multiple SQL >statements during the "log in". >And the first query will start a transaction, but as autocommit is disabled, >nothing will end that transaction. Yes, I agree with that. And if I set autocommit off in psql I see the same where a select will start a transaction. >> The problem is users will connect using DBeaver and their sessions >> will sit idle. >Idle is not a problem, "idle in transaction" is. >From my perspective "idle in transaction" isn't necessarily a problem >(although I don't like seeing sessions sitting like that for a long time). >The problem is when pg_stat_activity.backend_xmin is populated- that can >prevent autovacuum from cleaning up old records. Again, if I login to psql, >set auto commit off and run a select I see pg_stat_activity.xact_start >populated but pg_stat_activity.backend_xmin is NOT populated. So that >transaction from psql would not prevent autovacuum from cleaning up. But when >this happens through DBeaver not only is pg_stat_activity.xact_start populated >but pg_stat_activity.backend_xmin is also populated. My main question is what >could DBeaver be doing to get pg_stat_activity.backend_xmin populated? It >doesn't happen when running a test in psql. So at this point I'm unable to >duplicate this issue in psql running the same thing I think DBeaver is >running. Maybe if I asked the question a little differently- in psql if I >set autocommit off and run a select what else do I need to run to get >pg_stat_activity.backend_xmin populated through that session? Is there a >certain "type" of select I could run to get it populated? I know if I insert >or update or delete a row it will get populated but I also know DBeaver is not >executing an insert/update/delete. >> It executes the exact same commands except when the 2 default DBeaver >> settings are changed to show the issue it is issuing a BEGIN and >> COMMIT around one block of code and then at another part of code it >> issues a BEGIN, runs some queries, and never issues a COMMIT. >Yes, that's how turning off autocommit works. As soon as a statement is sent >through the JDBC driver, the driver will send a BEGIN to start the >transaction, but the the application (or the user) is responsible to end it >through a COMMIT (or ROLLBACK). Thanks
RE: [EXT] DBeaver session populating pg_stat_activity.backend_xmin
Hi Christophe, Thanks for the reply. I am fairly new to Postgres and based on your reply below I am not understanding something. Please see my test case below. I can show where a session is "idle in transaction" but backend_xmin is NOT populated and I show vacuuming a table with 10 dead rows will work fine. I then show a session is "idle in transaction" with backend_xmin populated and vacuuming a table with 10 dead rows will not work because of that session having backend_xmin populated. Note in all cases here these sessions have not executed any DML. Based on your reply below in both of these cases the vacuum should not have been able to cleanup these 10 dead rows. What am I missing here? Regards Steve > -Original Message- > From: Christophe Pettus > Sent: Wednesday, January 25, 2023 4:49 PM > To: Dirschel, Steve > Cc: Thomas Kellerer ; pgsql-general@lists.postgresql.org > Subject: Re: [EXT] DBeaver session populating pg_stat_activity.backend_xmin > > On Jan 25, 2023, at 14:21, Dirschel, Steve > > wrote: > > From my perspective "idle in transaction" isn't necessarily a problem > > (although I don't like seeing sessions sitting like that for a long time). > > The problem is when pg_stat_activity.backend_xmin is populated- that can > > prevent autovacuum from cleaning up old records. > I don't think that analysis is quite correct. There's nothing about > backend_xmin that blocks tuple cleanup *if the transaction has been > committed*. The problem is that the session is sitting in idle in > transaction state, and *that* blocks tuple cleanup. The fix is to not leave > sessions open in idle in transaction state. Test case: create table test1(a numeric, b numeric); *** * TEST #1 * *** Dbeaver setup: Connection setting, Initialization, Auto-commit not checked Connection setting, Metadata, Open separate connection for metadata read is checked. Connect to the database through DBeaver. Here are the 2 sessions: select pid, application_name, xact_start, backend_xmin, state, query from pg_stat_activity where usename = 'postgres' and application_name like 'DBeaver%'; dbtest=> select pid, application_name, xact_start, backend_xmin, state, query from pg_stat_activity where usename = 'postgres' and application_name like 'DBeaver%'; pid | application_name | xact_start | backend_xmin | state | query ---+++--+---+ 30229 | DBeaver 22.0.0 - Main || | idle | SHOW TRANSACTION ISOLATION LEVEL 30231 | DBeaver 22.0.0 - Metadata || | idle | SELECT t.oid,t.*,c.relkind,format_type(nullif(t.typbasetype, 0), t.typtypmod) as base_type_name, d.description+ ||| | | FROM pg_catalog.pg_type t + ||| | | LEFT OUTER JOIN pg_catalog.pg_type et ON et.oid=t.typelem + ||| | | LEFT OUTER JOIN pg_catalog.pg_class c ON c.oid=t.typrelid + ||| | | LEFT OUTER JOIN pg_catalog.pg_description d ON t.oid=d.objoid + ||| | | WHERE t.typname IS NOT NULL + ||| | | AND (c.relkind IS NULL OR c.relkind = 'c') AND (et.typcategory IS NULL OR et.typcategory <> 'C') *** Note neither session is in a transaction. State for both is idle. Leave the 2 DBeaver sessions as is. In psql I run this: dbtest=> do $$ dbtest$> begin dbtest$>for cnt in 1..10 loop dbtest$> insert into test1 values (cnt, cnt); dbtest$> delete from test1; dbtest$>end loop; dbtest$> end; $$ dbtest-> ; DO dbtest=> dbtest=> VACUUM (VERBOSE) test1; INFO: vacuuming "public.test1" INFO: table "test1": removed 10 dead item identifiers in 1 pages INFO: table "test1": found 10 removable, 0 nonremovable row versions in 1 out of 1 pages <--- the 10 dea
Why using a partial index is doing slightly more logical I/O than a normal index
Table definition: workflow_db=> \d workflow_execution_test Table "public.workflow_execution_test" Column | Type | Collation | Nullable | Default --+--+---+--+- execution_id | bigint | | | state_machine_id | bigint | | | workflow_id | bigint | | | started_datetime | timestamp with time zone | | | completed_datetime | timestamp with time zone | | | status | character varying(50)| | | execution_context_s3_arn | character varying(200) | | | ol_version | integer | | | created_datetime | timestamp with time zone | | | updated_datetime | timestamp with time zone | | | deleted_millis | bigint | | | acquisition_channel_id | bigint | | | correlation_id | character varying(36)| | | result | character varying(50)| | | state_machine_execution_arn | character varying(200) | | | created_by_id| bigint | | | updated_by_id| bigint | | | acquired_gcs_s3_object | text | | | sqs_trigger_id | bigint | | | trigger_message | text | | | acquired_gcs_s3_object_uuid | character varying(36)| | | api_trigger_id | bigint | | | scheduled_trigger_id | bigint | | | notification_trigger_workflow_id | bigint | | | acquired_object_name | text | | | subscription_guid| uuid | | | processing_class_code| character varying(50)| | | root_execution_guid | uuid | | | Indexes: "test_workflow_execution_active_pending_ordered_by_created_date_" btree (workflow_id, status, deleted_millis, created_datetime) WHERE (status::text = ANY (ARRAY['ACTION_NEEDED'::character varying, 'NOT_STARTED'::character varying, 'PAUSED'::character varying, 'PENDING'::character varying, 'RUNNING'::character varying]::text[])) AND deleted_millis <= 0 "test_workflow_execution_initial_ui_tabs" btree (workflow_id, status, result, completed_datetime DESC NULLS LAST) I created/populated this table with 22 million rows. Afterwards I then created the 2 indexes. So those indexes are packed tightly. As is the optimizer decides to use the partial index. Below shows it did 33 logical reads. This index should be very good for this query- the leading 3 columns of the index are on the 3 criteria in the WHERE clause and the partial part is only storing rows that match the status and deleted_millis filters. explain (analyze, buffers) select * from workflow_execution_test where workflow_id = 1070 AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING') and deleted_millis <= 0; QUERY PLAN Index Scan using test_workflow_execution_active_pending_ordered_by_created_date_ on workflow_execution_test (cost=0.43..15824.82 rows=4335 width=1309) (actual time=0.040..0.095 rows=56 loops=1) Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[]))) Buffers: shared hit=33 Planning: Buffers: shared hit=2 Planning Time: 0.321 ms Execution Time: 0.117 ms If I hint the query to use the other index it does less work- it does 24 logical reads vs 33 using the partial index. /*+ IndexScan(workflow_execution_test test_workflow_execution_initial_ui_tabs) */ explain (analyze, buffers) select * from workflow_execution_test where workflow_id = 1070 AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING') and deleted_millis <= 0;
RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
Thanks for the reply Jeff. Yes- more of an academic question. Regarding this part: Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[]))) Filter: (deleted_millis <= 0) Buffers: shared hit=24 For this usage, the =ANY is applied as an "in-index filter". It only descends the index once, to where workflow_id=1070, and then scans forward applying the =ANY to each index-tuple until it exhausts the =1070 condition. As long as all the =1070 entries fit into just a few buffers, the count of buffers accessed by doing this is fewer than doing the re-descents. (Stepping from tuple to tuple in the same index page doesn't count as a new access. While a re-descent releases and reacquires the buffer) There are 2,981,425 rows where workflow_id = 1070. Does that change your theory of using an “in-index filter” for that plan? When you say there was a bit of speculation on the “boundard condition” vs “in-index filter” is the speculation on if Postgres has 2 different ways of processing a =ANY filter or is the speculation that one is being used by one plan and the other is being used by the other plan? Thanks again for your reply. It is helpful. Steve
Query performance going from Oracle to Postgres
We are in the process of converting from Oracle to Postgres and I have a query that is using the same index in Postgres as is used in Oracle but in Postgres the query does 16x more buffer/logical reads. I'd like to understand why. The query is hitting a partitioned table but to simply things I changed it to hit a single partition. This partition has the same number of rows in Oracle as in Postgres. Here is the Postgres query, partition definition, execution plan. I will also include the Oracle execution plan below in case anyone is interested. explain (analyze, buffers) select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from hist28.history_event_display_timestamp_20230301 historyeve0_ where historyeve0_.IS_DELETED=0 and historyeve0_.PRISM_GUID='i0accd6a2018405f095ee669dc5b4' and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW' and (historyeve0_.EVENT_TYPE not in ('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryEvent')) and (historyeve0_.PRODUCT_VIEW in ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA') or historyeve0_.PRODUCT_VIEW is null) and historyeve0_.CLIENT_ID='WLNCE_VNJXL7' and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and '2023-03-01 23:59:59.999'); Aggregate (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 rows=1 loops=1) Buffers: shared hit=341 -> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..56.64 rows=1 width=33) (actual time=0.034..0.897 rows=332 loops=1) Index Cond: (((prism_guid)::text = 'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= '2023-03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text)) Filter: (((event_type)::text <> ALL ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[])) AND (((product_view)::text = ANY ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) OR (product_view IS NULL))) Buffers: shared hit=341 Planning: Buffers: shared hit=6 Planning Time: 0.266 ms Execution Time: 0.965 ms (10 rows) *** 341 logical reads to find 332 rows. Oracle will find the same 332 rows using the same index but in Oracle it only does 20 logical reads. I thought maybe the index was fragmented so I reindexed that index: reindex index hist28.history_event_display_timesta_prism_guid_display_timestamp_idx1; Plan after that: Aggregate (cost=40.64..40.65 rows=1 width=8) (actual time=0.707..0.708 rows=1 loops=1) Buffers: shared hit=328 -> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..40.64 rows=1 width=33) (actual time=0.032..0.683 rows=332 loops=1) Index Cond: (((prism_guid)::text = 'i0accd6a2018405f095ee669dc5b4'::text) AND (display_timestamp >= '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= '2023-03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text)) Filter: (((event_type)::text <> ALL ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[])) AND (((product_view)::text = ANY ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) OR (product_view IS NULL))) Buffers: shared hit=328 Planning: Buffers: shared hit=27 Planning Time: 0.321 ms Execution Time: 0.741 ms (10 rows) Shared hit came down a little but is still 16x more than Oracle. The actual query will hit 12 partitions so the extra amount of work it does in Postgres adds up (note some data points will find significantly more rows than 332 so performance is much worse than shown here but this data point is used just to show the difference between Oracle and Postgres). The interesting part it is seems Postgres is doing 1 shared hit per row. I don't know anyone on this list knows Postgres's internal index implementations vs Oracle's but is Postgres's Index Scan operation significantly different than Oracle's index range scan? There is something implemented much less efficiently here vs Oracle and I don't know what I can do to get this query to perform more closer to Oracle
RE: [EXT] Re: Query performance going from Oracle to Postgres
Thanks to all who replied on this issue. I overlooked the difference in the execution plan between Oracle and Postgres. Oracle in fact does an index range scan only with no table lookup. When I changed the query to be a count(*) Postgres went from 332 logical reads to 19. Oracle did 20. Oracle did 20 regardless if it was count(*) or count(non-indexed-column). Regards Steve -Original Message- From: Peter Geoghegan Sent: Thursday, September 7, 2023 8:22 PM To: David Rowley Cc: Dirschel, Steve ; pgsql-general@lists.postgresql.org; Wong, Kam Fook (TR Technology) Subject: [EXT] Re: Query performance going from Oracle to Postgres External Email: Use caution with links and attachments. On Thu, Sep 7, 2023 at 3:48 AM David Rowley wrote: > On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan wrote: > > It seems likely that the problem here is that some of the predicates > > appear as so-called "Filter:" conditions, as opposed to true index > > quals. > > hmm, if that were true we'd see "Rows Removed by Filter" in the > explain analyze. That's why I hedged, with "seems likely". The problem with using filter conditions rather than true index quals isn't limited to the problem of extra heap accesses. It happened to be convenient to make my point that way, but that isn't particularly fundamental here. I deliberately chose to make my example involve an index-only scan (that doesn't require any heap accesses) for this reason. > I think all that's going on is that each tuple is on a different page > and the heap accesses are just causing many buffers to be accessed. This index is an absolute monstrosity. I find it far easier to believe that the real explanation is the one that Steve intuited: that there is an issue with the way that the physical data structures (which are more or less comparable in both systems) are accessed in Postgres. The index in question ("history_event_display_timesta_prism_guid_display_timestamp_idx1") has certain columns that are omitted from the query. These columns nevertheless appear between other columns that the query filters on. The leading two columns ("prism_guid" and "display_timestamp") are made into index quals by Postgres, but the next index column after those two ("unique_lookup_key") is omitted by the query, and so isn't an index qual. In fact *four* columns are omitted after that one. But, the very-low-order index column "product_sid" *does* appear in the query, and so also appears as Postgres index quals. There is every chance that the true underlying explanation is that Oracle is able to skip over significant parts of the index structure dynamically. In fact I'm practically certain that that's the case, since the "product_sid" column appears as an "access predicate", rather than as a "filter predicate". These terms are explained here: https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WK7WLP60$ https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WCXw7ubs$ How could "product_sid" be used as an "access predicate" given the omitted index columns? It seems very likely that parts of the index can be skipped in Oracle, but not in Postgres -- at least not yet. Like Markus Winand, I think that it's a real problem that EXPLAIN doesn't yet expose the difference between access predicates and filter predicates. Some of the index quals shown by EXPLAIN are marked SK_BT_REQFWD/SK_BT_REQBKWD by nbtree (meaning that they're what Oracle calls "access predicates"), while other are not (meaning that they're what Oracle calls "filter predicates"). That issue makes it far harder to spot these sorts of problems. > It seems to me that an IOS would likely fix that problem which is why > I suggested count(*) instead of > count( > ) Probably, but why should we have to use an index-only scan? And what about my example, that already used one, and still showed a big disparity where there is no inherently reason why there had to be? My example didn't involve jumping to another part of the index because the problem seems to be more general than that. -- Peter Geoghegan
Query runtime differences- trying to understand why.
I have a question on the execution time of a query and the fluctuations I'm seeing. I enabled auto_trace to capture some actual executions of a query by an app. Below are 2 executions of it. The top one took 1.697 milliseconds. The 2nd one took 31.241 milliseconds. Note the query has hints in it and the top one did not obey the hint to use the pk index on table product. That table is very small so full scanning it isn't really a problem. When I look at the steps in the plan below I do not believe the difference in runtime is due to plan differences. Here are my observations for the top query/plan: - The query found 318 rows when scanning the entry_guid_mod_idx index on table entry. It did 336 or 337 shared block hits for that step and it took 619 milliseconds for this step. - I believe it nested looped to the full scan of table product 318 times. That step took 812 milliseconds (time including prior step). At that point the query did a total of 841 shared block hits. I believe it found 318 rows that matched. - The LockRows step- this is coming from the SELECT FOR UPDATE SKIP LOCKED. I believe this step shows it returned 0 rows. Runtime was 1.696 seconds. This implies all 318 rows that matched the query were all locked. Observations for the bottom query/plan: - The query found 291 rows when scanning the entry_guid_mod_idx index on table entry. It did 313 or 315 shared block sits for that step and it took 977 milliseconds. - I believe it nested looped to the index scan pkproduct 291 times. That step took 1.491 seconds (time including prior step). Ad that point the query did a total of 1365 shared block hits. I believe it found 291 rows that matched. - The LockRows step- Again, I believe this is coming from the SELECT FOR UPDATE SKIP LOCKED. Like the top query this also shows it returned 0 rows which implies to me all 291 rows were locked. But here we can see the runtime was 31.239 milliseconds. So my question- assuming my understanding above is correct both plans show it taking minimal time to get to the LockRows step. The top query skipped more rows than the bottom query but the bottom query took significantly more time at this LockRows step. Why would that be the case? Thanks in advance. Steve 2023-11-29 18:12:45 UTC:10.210.202.108(50098):alertu@pgbatchshared01aq:[15251]:LOG: duration: 1.697 ms plan: Query Text: /*+ NestLoop(t1 t2) IndexScan(t2 pkproduct) */ SELECT T1.ENTRY_GUID FROM bchalertdemo.ENTRY T1,bchalertdemo.PRODUCT T2 WHERE T1.NEXT_RUN_DATE<$1 AND T1.STATUS_CODE=1 AND T2.ACTIVE_STATUS != 'F' AND (END_DATE IS NULL OR ($2 < END_DATE)) AND (($3 NOT BETWEEN SUSPEND_DATE AND RESUME_DATE) OR SUSPEND_DATE IS NULL OR RESUME_DATE IS NULL) AND T1.PRODUCT_CODE = T2.PRODUCT_CODE AND T1.LAST_RUN_DATE<$4 AND T2.PRODUCT_VALUE IN ($5) AND MOD(ABS(HASHTEXT(T1.ENTRY_GUID)),150) = $6 ORDER BY T1.PRIORITY, T1.ENTRY_FREQUENCY_CODE, T1.NEXT_RUN_DATE FOR UPDATE SKIP LOCKED LIMIT 1 Limit (cost=0.42..4.63 rows=1 width=64) (actual time=1.695..1.696 rows=0 loops=1) Buffers: shared hit=841 -> LockRows (cost=0.42..1393.48 rows=331 width=64) (actual time=1.695..1.695 rows=0 loops=1) Buffers: shared hit=841 -> Nested Loop (cost=0.42..1390.17 rows=331 width=64) (actual time=0.032..0.812 rows=318 loops=1) Join Filter: (t1.product_code = t2.product_code) Buffers: shared hit=337 -> Index Scan using entry_guid_mod_idx on entry t1 (cost=0.42..1384.18 rows=331 width=63) (actual time=0.016..0.619 rows=318 loops=1) Index Cond: ((mod(abs(hashtext((entry_guid)::text)), 150) = 141) AND (next_run_date < '2023-11-29 12:12:38.535'::timestamp without time zone) AND (last_run_date < '2023-11-29 12:12:38.535'::timestamp without time zone) AND (status_code = '1'::numeric)) Filter: (((end_date IS NULL) OR ('2023-11-29 12:12:38.535'::timestamp without time zone < end_date)) AND (('2023-11-29 12:12:38.535'::timestamp without time zone < suspend_date) OR ('2023-11-29 12:12:38.535'::timestamp without time zone > resume_date) OR (suspend_date IS NULL) OR (resume_date IS NULL))) Buffers: shared hit=336 -> Materialize (cost=0.00..1.02 rows=1 width=20) (actual time=0.000..0.000 rows=1 loops=318) Buffers: shared hit=1 -> Seq Scan on product t2 (cost=0.00..1.01 rows=1 width=20) (actual time=0.013..0.013 rows=1 loops=1) Filter: ((active_status <> 'F'::bpchar) AND ((product_value)::text = 'bchalert-poc'::text)) Buffers: shared hit=1 2023-11-29 18:12:45 UTC:10.210.202.108(50006):alertu@pgbatchshared01aq:[15226]:LOG: duration: 31.241 ms plan: Query Text: /*+ NestLoop(t1 t2) IndexS
RE: [EXT] Re: Query runtime differences- trying to understand why.
Sorry. DB is Aurora Postgres 14.6. Hints are enabled via extension pg_hint_plan. But my question isn't so much about the hints but the execution plan and where the time is going and why it appears for 1 execution of the query skipping locked rows seems to take a lot more time than another execution of the query. Regards -Original Message- From: Andreas Kretschmer Sent: Wednesday, November 29, 2023 3:10 PM To: pgsql-general@lists.postgresql.org Subject: [EXT] Re: Query runtime differences- trying to understand why. External Email: Use caution with links and attachments. Am 29.11.23 um 21:25 schrieb Dirschel, Steve: > > I have a question on the execution time of a query and the > fluctuations I'm seeing. I enabled auto_trace to capture some actual > executions of a query by an app. Below are 2 executions of it. The > top one took 1.697 milliseconds. The 2nd one took 31.241 milliseconds. > Note the query has hints in it > what database are you using? PostgreSQL doesn't hav hints... Regards, Andreas -- Andreas Kretschmer - currently still (garden leave) Technical Account Manager (TAM) https://urldefense.com/v3/__http://www.enterprisedb.com__;!!GFN0sa3rsbfR8OLyAw!eF3r308h3pnOusNslRf-zZZ1LUoQFkGOKK47Sm1ByfdIdLAtF0xPY4-4Xuo_F25zoV12fGtyNUelCdG10fZEKldOigKm0FXp$
Using ALTER TABLE DETACH PARTITION CONCURRENTLY inside a procedure
We have a custom procedure to add/drop partitions in Postgres. It has a main FOR LOOP to find tables needing to be processed. Inside that FOR LOOP there is a BEGIN so if a single table gets an error we catch the error in an exception. At the end of the END for the FOR LOOP it issues a commit. So if there are 20 tables to process and the 15th table gets an error it will capture that error, write a message to a log file, and continue processing the remaining tables. We have ran into some locking issues when trying to DETACH a partition where if there is a long running query against the partitioned table it will block the DETACH PARTITION command. Then what happens app sessions trying to insert into the table get blocked by the session trying to detach the partition and the app gets into a bad state. In testing I found if I use the CONCURRENTLY clause with DETACH PARTITION the detach partition command can still get blocked by a long running query but that does not block app sessions from inserting into the table. So this is great. But when I try and run the command inside the procedure it throws this error: STATE: 25001 MESSAGE: ALTER TABLE ... DETACH CONCURRENTLY cannot run inside a transaction block DETAIL: HINT: CONTEXT: SQL statement "alter table t2.test1 detach partition t2.test1_gentime_20240511 concurrently" PL/pgSQL function part.partition_maintenance() line 323 at EXECUTE The documentation states: CONCURRENTLY cannot be run in a transaction block and is not allowed if the partitioned table contains a default partition. Is there an option to call that CONCURRENTLY inside a procedure as I describe? Thanks in advance. This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
Hash join and picking which result set to build the hash table with.
The query and execution plan are shown below. My question is related to the result set the optimizer is choosing to build the hash table from. My understanding is for a hash join you want to build the hash table out of the smaller result set. If you look at the execution plan below you can see the optimizer estimates 1000 rows from the seq scan of table collection and 120,000 rows from the seq scan of table docloc_test but is building the hash buckets from those 12 rows rather than from the 1000 rows. In our case under certain volume that causes that to spill to temp and under high load it hangs up the Aurora Postgres database. But if it were to build the hash table out of those 1000 rows it would fit in work_mem so no problems. Why is it picking the larger result set? Another funky thing here- here are the table definitions: pgcci01ap=> \d CCI.COLLECTION Column| Type | Collation | Nullable | Default --++---+--+ collection_name | character varying(40) | | not null | l_stage | numeric(11,0) | | | p_stage | numeric(11,0) | | | t_stage | numeric(11,0) | | | last_upd_datetime| timestamp(6) without time zone | | not null | last_upd_inits | character varying(30) | | not null | owner_name | character varying(30) | | | password | character varying(30) | | | email_address| character varying(2000)| | | available_flag | character(1) | | | collection_id| numeric(11,0) | | not null | collection_type | character varying(20) | | | retrieval_password | character varying(40) | | | partner_coll_name| character varying(40) | | | relation2partner | character varying(20) | | | reload_flag | character(1) | | | 'N'::bpchar partner_id | character varying(40) | | | content_timezone | character varying(40) | | not null | 'America/Chicago'::character varying token_type | character varying(30) | | | cc_collection_dest | character varying(40) | | | auto_reclaim_enabled | character(1) | | not null | 'N'::bpchar collection_family| character varying(40) | | not null | ''::character varying access_password | character(40) | | | mic_group| character varying(40) | | | mic_type | character varying(10) | | | retrieval_source | character varying(40) | | not null | 'DOC1'::character varying Indexes: "xpkcollection" PRIMARY KEY, btree (collection_name) "xak1collection" UNIQUE CONSTRAINT, btree (collection_id) "xie1collection" btree (relation2partner) "xie2collection" btree (collection_family, collection_name) "xie3collection" btree (mic_group) Referenced by: TABLE "cci.index_update_proc" CONSTRAINT "rfk12_index_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.authority_update_proc" CONSTRAINT "rfk1_authority_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_event" CONSTRAINT "rfk1_collection_event" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_pit" CONSTRAINT "rfk1_collection_pit" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.collection_stage" CONSTRAINT "rfk1_collection_stage" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.csloc_update_proc" CONSTRAINT "rfk1_csloc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.docloc_update_proc" CONSTRAINT "rfk1_docloc_update_proc" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_set_mrg" CONSTRAINT "rfk1_index_set_mrg" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_set_stats" CONSTRAINT "rfk1_index_set_stats" FOREIGN KEY (collection_name) REFERENCES cci.collection(collection_name) TABLE "cci.index_system_attr" CONSTRAINT "rfk1_index_system_attr" FOREIGN KEY (collection_name) REFERENCES
Query tuning question
Aurora Postgres version 13.7. Table definition: acquisition_channel_db=> \d acquisition_channel.acquired_object Table "acquisition_channel.acquired_object" Column | Type | Collation | Nullable | Default +--+---+--+- acquired_object_uuid | character varying(36)| | not null | acquired_object_name | text | | not null | acquired_object_size | bigint | | | acquisition_run_record_id | bigint | | | correlation_id | character varying(36)| | | acquired_datetime | timestamp with time zone | | | checksum | character varying(128) | | | acquisition_method_id | bigint | | | ol_version | integer | | not null | created_datetime | timestamp with time zone | | not null | updated_datetime | timestamp with time zone | | | status | character varying(50)| | | parent_acquired_object_uuid| character varying(36)| | | extracted_from_object_path | text | | | resource_group_id | bigint | | | s3_gcs_bucket | character varying(100) | | | s3_key | character varying(500) | | | metadata_s3_gcs_bucket | character varying(100) | | | metadata_s3_key| character varying(500) | | | routing_result_acquisition_channel_id | bigint | | | acquired_object_type | character varying(100) | | | created_by_id | integer | | | updated_by_id | integer | | | metadata | jsonb| | | acquired_object_id | bigint | | not null | nextval('acquisition_channel.acquired_object_id_seq'::regclass) routed_to_acquisition_channel_datetime | timestamp with time zone | | | routed_to_acquisition_channel_id | bigint | | | rendition_guid | character varying(36)| | | revision_guid | character varying(36)| | | channel_availability_status| character varying(100) | | | mime_type | character varying(100) | | | encryption_public_key_hash | text | | | acquired_metadata | jsonb| | | original_acquired_object_name | text | | | acquired_family_uuid | character varying(36)| | | last_broadcast_datetime| timestamp with time zone | | | original_checksum | character varying(128) | | | Indexes: "acquired_object_pkey" PRIMARY KEY, btree (acquired_object_uuid) "acquired_family_uuid_idx" btree (acquired_family_uuid) "acquired_object_acq_method_id_acq_dt_acquired_object_uuid" btree (acquisition_method_id, acquired_datetime DESC NULLS LAST, acquired_object_uuid) "acquired_object_acquired_items_initial_ui_page" btree (acquisition_method_id, acquired_datetime DESC NULLS LAST, acquired_object_uuid) WHERE status::text <> 'TEST'::text AND parent_acquired_object_uuid IS NULL "acquired_object_acquired_object_id_unq" UNIQUE, btree (acquired_object_id) "acquired_object_acquired_object_name" btree (lower(acquired_object_name)) "acquired_object_acquisition_method_id" btree (acquisition_method_id) "acquired_object_acquisition_run_record_id" btree (acquisition_run_record_id) "acquired_object_checksum" btree (checksum) "acquired_object_correlation_id" btree (correlation_id) "acquired_object_idx2" btree (parent_acquired_object_uuid, extracted_from_object_path) "a
pg_stat_statements
Hello, I'm not sure if this is the correct distribution list for this type of question but I'll try anyways. We have an app that uses multiple schemas. It will do a set schema 'schema_name' and execute queries. The queries executed are the same regardless of the schema the connection set. In pg_stat_statements the exact same query will get a different queryid for each schema that executes the query. I'm unable to determine which queryid comes from which schema the query was executed under. Is anyone aware of a way to determine this? Thanks in advance. Steve This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
Trying to dynamically create a procedure
Hi, I have the need to dynamically create a procedure. Here is a simple procedure: create or replace procedure junk.test_proc() LANGUAGE plpgsql AS $$ declare v_cnt integer := 0; begin raise notice 'v_cnt is %', v_cnt; end $$; That creates and runs fine. Here I’m trying to create it inside PL/pgSQL block (yes there is nothing dynamic below but the real code will have parts of the procedure that needs to have code dynamically generated): DO $$ BEGIN EXECUTE 'create or replace procedure junk.test_proc() ' || 'LANGUAGE plpgsql ' || 'AS $$ '|| 'declare ' || ' v_cnt integer := 0; ' || 'begin '|| ' raise notice 'v_cnt is %', v_cnt; ' || 'end $$'; END; $$ It throws this error: ERROR: syntax error at or near "$$ DO $$" LINE 1: $$ ^ dbtest=> dbtest=> END; WARNING: there is no transaction in progress COMMIT dbtest=> dbtest=> $$ I think the problem has to do with having AS $$ and END $$ with the 2 $’s. I’m not sure if there is different syntax I can use outside the $$ or if there is something I should use in the PL/pgSQL to escape those $$ to get this to work. Any help would be appreciated. Thanks This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
Postgres_fdw- User Mapping with md5-hashed password
I know I can create user steve_test with password testpassword122 as md5 by doing: select 'md5'||md5('testpassword122steve_test'); Returns --> md5eb7e220574bf85096ee99370ad67cbd3 CREATE USER steve_test WITH PASSWORD 'md5eb7e220574bf85096ee99370ad67cbd3'; And then I can login as steve_test with password testpassword122. I'm trying to use similar logic when creating a user mapping: CREATE USER MAPPING FOR postgres SERVER steve_snap0 OPTIONS (user 'steve_test', password 'md5eb7e220574bf85096ee99370ad67cbd3'); When I try and import a foreign schema I get an error: ERROR: could not connect to server "steve_snap0" If I create the user mapping with the password: CREATE USER MAPPING FOR postgres SERVER steve_snap0 OPTIONS (user 'steve_test', password 'testpassword122'); It works fine. Is it not possible to use the same logic for the user mapping password that can be used when creating a user? Thanks in advance. This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html