Hi Chao, On Fri, Mar 20, 2026 at 5:27 PM Chao Li <[email protected]> wrote: > I can reproduce the server crash following your procedure, and I traced the > problem. > > The issue is that, during select * from ft1, > pgfdw_reject_incomplete_xact_state_change() calls disconnect_pg_server(), > which destroys conn and sets ConnCacheEntry->conn = NULL, but does not update > PgFdwScanState->conn. As a result, when "close c1" is executed later, > PgFdwScanState->conn points to stale memory with random contents.
That is right. > I am not sure we should still allow further commands to run after select * > from ft1, given that it has already raised: "ERROR: connection to server > "loopback" was lost”. Maybe we should not keep going as if the connection > were still there. The ERROR is thrown within a subtransaction, not within the main transaction, so we *should* allow commands (that don't use the connection) to run *after* rolling back to the subtransaction. (A transaction, like the example one, whose subtransactions failed abort cleanup due to a connection issue needs to abort in the end, as it can't commit the remote transaction anymore. Considering this limitation, the above behavior might be surprising to users. We might need to do something about this, but I think that that is another issue and thus should be discussed separately.) > I am not very familiar with the FDW code, so I am not ready to suggest a > concrete fix. But it seems wrong to let later paths keep using > PgFdwScanState->conn after select * from ft1 has already failed with > connection loss. Why? While we can't use the connection any further, with the patch we can still use the cashed PGconn, as pointed out by Matheus downthread. > My guess is that we either need to invalidate all dependent state when > disconnect_pg_server() runs, or otherwise prevent later cleanup paths from > touching the cached PGconn *. I don't think that that is a good idea, because that 1) would require invasive changes to the existing code, which isn't good especially for back-patching, as also mentioned by Matheus downthread, and 2) would result in something that is essentially the same as what I proposed; in other words, that would be a lot of effort with little result. Thanks for looking at this! Best regards, Etsuro Fujita
