Hi,

I got an offline report from my colleague Zhibai Song that
close_cursor() is called for a freed PGconn, leading to a server
crash.  Here is a reproducer (the original reproducer he provided is a
bit complex, so I simplified it):

create server loopback
    foreign data wrapper postgres_fdw
    options (dbname 'postgres');
create user mapping for current_user server loopback;
create table t1 (id int, data text);
create foreign table ft1 (id int, data text)
    server loopback options (table_name 't1');
insert into ft1 values (1, 'foo');
start transaction;
-- This caches the remote connection's PGconn in PgFdwScanState
declare c1 cursor for select * from ft1;
fetch c1;
 id | data
----+------
  1 | foo
(1 row)

savepoint s1;
select * from ft1;
 id | data
----+------
  1 | foo
(1 row)

select pid from pg_stat_activity
    where datname = 'postgres'
      and application_name = 'postgres_fdw';
  pid
-------
 91853
(1 row)

-- This terminates the remote session
select pg_terminate_backend(91853);
 pg_terminate_backend
----------------------
 t
(1 row)

-- This leaves the remote connection's changing_xact_state as true
rollback to s1;
savepoint s1;
-- This calls pgfdw_reject_incomplete_xact_state_change(), freeing
-- the remote connection's PGconn as changing_xact_state is true
select * from ft1;
ERROR:  connection to server "loopback" was lost
rollback to s1;
-- This calls close_cursor() on the PGconn cached in PgFdwScanState,
-- which was freed above, leading to a server crash
close c1;

I think the root cause is that it is too early to free the PGconn in
pgfdw_reject_incomplete_xact_state_change() even if the connection is
in a state where we cannot use it any further; I think we should delay
that until abort cleanup (ie, pgfdw_xact_callback()).  Attached is a
patch for that.

Best regards,
Etsuro Fujita

Attachment: fix-connection-handling-in-postgres-fdw.patch
Description: Binary data

Reply via email to