On 20/03/26 05:27, Chao Li wrote:
I can reproduce the server crash following your procedure, and I traced the 
problem.

The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls 
disconnect_pg_server(), which destroys conn and sets ConnCacheEntry->conn = NULL, but does not 
update PgFdwScanState->conn. As a result, when "close c1" is executed later, 
PgFdwScanState->conn points to stale memory with random contents.

I am not sure we should still allow further commands to run after select * from ft1, given 
that it has already raised: "ERROR:  connection to server "loopback" was lost”. 
Maybe we should not keep going as if the connection were still there.

I am not very familiar with the FDW code, so I am not ready to suggest a concrete 
fix. But it seems wrong to let later paths keep using PgFdwScanState->conn 
after select * from ft1 has already failed with connection loss. My guess is that 
we either need to invalidate all dependent state when disconnect_pg_server() runs, 
or otherwise prevent later cleanup paths from touching the cached PGconn *.


Although I agree with this I think that it will be a quite invasive change to fix this issue, considering that it should be back-ported.

I see two ways to implement this:

1. ForeignScanState->conn points to a ConnCacheEntry and it access the PGConn via ConnCacheEntry->conn, so it can check if != NULL before using.

2. Make pgfdw_reject_incomplete_xact_state_change() or disconnect_pg_server() receive a PgFdwConnState and add a new field on this state to represent that the connection is closed and them check this field on the proper code paths.

With the patch proposed on the previous email the server don't crash and any use of PgFdwScanState->conn will make the command to fail with something like this:

   ERROR:  08006: no connection to the server
   CONTEXT:  remote SQL command: CLOSE c1
   LOCATION:  pgfdw_report_internal, connection.c:1059


So I don't think that this change would be worth, or I'm missing something? What do you think?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com


Reply via email to