Orphaned relations after crash/sigkill during CREATE TABLE
Postgres 12.4 I was directed in slack to mention here that we're being impacted by Postgres leaving orphaned pages in /base/ after a crash while a CREATE TABLE is being run in transaction. The issue is the same as the reproduction steps listed here [1], that is: - Start a CREATE TABLE transaction for a large table - Terminate the process via kill -9 - Pages are left in /base that have no filenode references anymore, such that `pg_database_size()` grows to be very large while total table+index size remains constant However in our particular case, we're using a managed/cloud Postgres server and our `CREATE TABLE` transaction was being terminated by the OOM killer. Using a managed service, we don't have filesystem access to go and clear out these orphaned pages. This caused our total db size to grow from 40GB of table+index data to 4TB on-disk (but still only 40GB of table+index data, the other ~3.95TB being orphaned CREATE TABLE pages) I realize (per a blog post from Robert Haas [2] and from slack conversation) that this is a known issue, but was directed here from slack to just mention that we were impacted by it, and have no resolution due to not having filesystem access, and not having a method internally to Postgres to deal with these orphaned relations. (Our recourse currently is to do something like a pg_dump/pg_restore onto a clean instance in order to escape the orphaned files) -Jason [1] https://github.com/bdrouvot/pg_orphaned#example-1 [2] https://rhaas.blogspot.com/2020/05/dont-manually-modify-postgresql-data.html
Re: Orphaned relations after crash/sigkill during CREATE TABLE
> On Tue, Aug 18, 2020 at 3:49 PM Adrian Klaver wrote: > > Have you tried with: > BEGIN; > CREATE TABLE some_table SELECT some_data FROM other_table LIMIT 1 WITH > NO DATA; > COMMIT; > > The above gets you the table structure, but no data. > > BEGIN; > INSERT into some_table SELECT * FROM other_table; >COMMIT; > > The above populates the table Thanks -- we were indeed creating and populating the new table all in a single transaction. I'll see if we can split this into two transactions so that the table structure is committed quickly. I think you're right that this would mostly sidestep the issue. -Jason p.s. Apologies if this is formatted wrong, this is my first mailing list post.
Re: Orphaned relations after crash/sigkill during CREATE TABLE
On Tue, Aug 18, 2020 at 3:49 PM Adrian Klaver wrote: > So from [1] you are using CREATE TABLE AS. Have you tried with: > > BEGIN; > CREATE TABLE some_table SELECT some_data FROM other_table LIMIT 1 WITH > NO DATA; > COMMIT; > > The above gets you the table structure, but no data. > > BEGIN; > INSERT into some_table SELECT * FROM other_table; > COMMIT; > > The above populates the table. Have not tested but I'm going to assume > if you kill the above the problem would not happen or would be fixable > by DELETE FROM some_table/TRUNCATE some_table; I was able to implement this, which creates the table quickly in a first transaction and populates it in a second transaction. However we were still seeing orphaned files on crash, and I believe I tracked it down to subsequent CREATE INDEX statements also creating these orphaned files (if they are running during a crash). Is that issue known as well? I don't believe I can use the same trick to sidestep that one... -Jason
Re: Orphaned relations after crash/sigkill during CREATE TABLE
On Thu, Aug 20, 2020 at 5:46 PM Tom Lane wrote: > Jason Myers writes: > > However we were still seeing orphaned files on crash, and I believe I > > tracked it down to subsequent CREATE INDEX statements also creating these > > orphaned files (if they are running during a crash). > > Is that issue known as well? I don't believe I can use the same trick to > > sidestep that one... > > Yeah, it's entirely intentional that we don't try to clean up orphaned > disk files after a database crash. There's a long discussion of this and > related topics in src/backend/access/transam/README. What that says about > why not is that such files' contents might be useful for forensic analysis > of the crash, and anyway "Orphan files are harmless --- at worst they > waste a bit of disk space". A point not made in that text, but true > anyway, is that it'd also be quite expensive to search a large database > for orphaned files, so people would likely not want to pay that price > on the way to getting their database back up. > > There might be value in a user-invokable tool that runs in an existing > non-crashed database and looks for orphan files, but I'm not aware that > anyone has written one. (Race conditions against concurrent table > creation would be a problem; but probably that can be finessed somehow, > maybe by noting the file's creation time.) > > In the meantime I've got to say that routinely kill 9'ing database > processes just doesn't seem like a very good idea. Yeah, we do our best > to ensure that there won't be data loss, but you're really doubling down > on a hard assumption that Postgres contains zero bugs when you operate > that way. I'd suggest reconfiguring things to avoid the OOM kill hazard; > or if your cloud provider makes that effectively impossible, maybe you > need another provider. But on most systems I'd think you could use ulimit > or the like even if you don't have root privileges. > > regards, tom lane > Understood, thanks for the reply. -Jason