Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE
Dear Adrian, On Fri, Jun 07, 2019 at 08:02:32AM -0700, Adrian Klaver wrote: > On 6/6/19 6:50 AM, Karsten Hilbert wrote: > > The current canonical solution (?) for verifying checksums in > > an existing database is, to may understanding, to pg_dump it > > (to /dev/null, perhaps): > > > > pg_dump --username=... --dbname=... --compress=0 --no-sync > > --format=custom --file=/dev/null > > > > as that will read and verify all blocks related to the dump > > of that database. > > The above works with the existing cluster, but would you not also want to > verify that the blocks written to on the new cluster also are good? Sure, but that much depends on what (or rather, where) the "new cluster" actually is. In my case I want to make sure that - before a run of "CREATE DATABASE new TEMPLATE old" - I can be reasonable sure that the disk blocks underlying "old" verify against their checksum. Since the CREATE DATABASE does not verify CRCs I use a pg_dump into /dev/null to verify checksums while the cluster is *online*. The problem I hope to protect against with this approach: the CREATE DATABASE might untaint corrupted data from a bad disk block into a good disk block virtue of doing a file level copy. I hope my reasoning isn't going astray. > > One will be tempted to include options to speed up the > > process, say: ... > > Is my assumption wrong ? > > Not sure, though it would seem to me including the above is a relatively > small incremental cost to the overall dump, assuming a data set of any size > greater then small. Agreed. Good putting of things into perspective. Will do. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE
Dear Tom, On Fri, Jun 07, 2019 at 11:41:36AM -0400, Tom Lane wrote: > On 6/6/19 6:50 AM, Karsten Hilbert wrote: > >> The current canonical solution (?) for verifying checksums in > >> an existing database is, to may understanding, to pg_dump it > >> (to /dev/null, perhaps): > >> as that will read and verify all blocks related to the dump > >> of that database. > > FWIW, that doesn't seem especially canonical from here. In particular, > pg_dump will not normally result in any examination at all of indexes > on user-defined tables --- it'll just be doing seqscans of the tables > proper. You could hope for reasonably complete coverage of the system > catalogs along with user tables, but missing out user indexes seems > like a pretty big gap. I agree it would, but not in the case I hope to protect: Within the database upgrade process, after cloning the database via "CREATE DATABASE ... TEMPLATE ...", all indexes are REINDEXed inside the new database. That would take care of corrupted data having been carried over within index related disk blocks, I would think. > The actual solution for this as of v11 is pg_verify_checksums > (renamed to just pg_checksums for v12). Sure but that is not (yet) possible against a cluster that's online. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE
On 6/11/19 12:15 AM, Karsten Hilbert wrote: Dear Adrian, Sure, but that much depends on what (or rather, where) the "new cluster" actually is. In my case I want to make sure that - before a run of "CREATE DATABASE new TEMPLATE old" - I can be reasonable sure that the disk blocks underlying "old" verify against their checksum. Since the CREATE DATABASE does not verify CRCs I use a pg_dump into /dev/null to verify checksums while the cluster is *online*. The problem I hope to protect against with this approach: the CREATE DATABASE might untaint corrupted data from a bad disk block into a good disk block virtue of doing a file level copy. I hope my reasoning isn't going astray. As I understand it checksums are done on the page level using a hash(for details: https://doxygen.postgresql.org/checksum__impl_8h_source.html). I am not sure how a page could get un-corrupted by virtue of a file copy. One will be tempted to include options to speed up the process, say: ... Is my assumption wrong ? Not sure, though it would seem to me including the above is a relatively small incremental cost to the overall dump, assuming a data set of any size greater then small. Agreed. Good putting of things into perspective. Will do. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B -- Adrian Klaver adrian.kla...@aklaver.com
Featured Big Name Users of Postgres
I'm doing a presentation about Postgres to SQL Server users this weekend, and I want to showcase some of the big names that use Postgres, e.g. MasterCard, Government agencies, Banks, etc. There used to be a Wiki page of Featured Users but that link is broken now. I also "found" a page about MasterCard at the EnterpriseDB website, but that one throws an error with too many redirects. Any suggestions on where to find a recent user list? Thanks!
Aw: Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE
> > The problem I hope to protect against with this approach: the > > CREATE DATABASE might untaint corrupted data from a bad disk > > block into a good disk block virtue of doing a file level > > copy. > > > > I hope my reasoning isn't going astray. > > As I understand it checksums are done on the page level using a hash(for > details: https://doxygen.postgresql.org/checksum__impl_8h_source.html). > I am not sure how a page could get un-corrupted by virtue of a file copy. Ah, no, I did not explain myself well. Let's assume a corrupted, bad (but readable at the hardware level) disk block B. A filesystem level copy (as in CREATE DATABASE) would successfully read that disk block B and copy the corrupted content into a good disk block G elsewhere on the disk. Verifying the checksum of the page sitting on block B before doing the database cloning would reveal the corruption before it got cloned. Does that make sense ? Karsten
Re: Featured Big Name Users of Postgres
On 11 June 2019 19:45:27 CEST, Igal Sapir wrote: >I'm doing a presentation about Postgres to SQL Server users this >weekend, >and I want to showcase some of the big names that use Postgres, e.g. >MasterCard, Government agencies, Banks, etc. > >There used to be a Wiki page of Featured Users but that link is broken >now. > >I also "found" a page about MasterCard at the EnterpriseDB website, but >that one throws an error with too many redirects. > >Any suggestions on where to find a recent user list? Thanks! Not so easy, we have a lot of big customers, but we are not allowed to tell the names ... You can visit our page, we have some case studies ... Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company
Re: Featured Big Name Users of Postgres
W dniu 2019-06-11 o 19:45, Igal Sapir pisze: I'm doing a presentation about Postgres to SQL Server users this weekend, and I want to showcase some of the big names that use Postgres, e.g. MasterCard, Government agencies, Banks, etc. You might be interested in this: https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-hello-postgres
Re: Featured Big Name Users of Postgres
At Adjust GmbH we have 5-10 PB data in Postgres. On Tue, Jun 11, 2019 at 9:28 PM Ireneusz Pluta/wp.pl wrote: > W dniu 2019-06-11 o 19:45, Igal Sapir pisze: > > I'm doing a presentation about Postgres to SQL Server users this > weekend, and I want to showcase > > some of the big names that use Postgres, e.g. MasterCard, Government > agencies, Banks, etc. > > > You might be interested in this: > https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-hello-postgres > > > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: Aw: Re: pg_dump include/exclude data, was: verify checksums / CREATE DATABASE
On 6/11/19 11:15 AM, Karsten Hilbert wrote: The problem I hope to protect against with this approach: the CREATE DATABASE might untaint corrupted data from a bad disk block into a good disk block virtue of doing a file level copy. I hope my reasoning isn't going astray. As I understand it checksums are done on the page level using a hash(for details: https://doxygen.postgresql.org/checksum__impl_8h_source.html). I am not sure how a page could get un-corrupted by virtue of a file copy. Ah, no, I did not explain myself well. Let's assume a corrupted, bad (but readable at the hardware level) disk block B. A filesystem level copy (as in CREATE DATABASE) would successfully read that disk block B and copy the corrupted content into a good disk block G elsewhere on the disk. Verifying the checksum of the page sitting on block B before doing the database cloning would reveal the corruption before it got cloned. Does that make sense ? Yes. Karsten -- Adrian Klaver adrian.kla...@aklaver.com
Re: Featured Big Name Users of Postgres
On Tue, Jun 11, 2019 at 2:11 PM Chris Travers wrote: > At Adjust GmbH we have 5-10 PB data in Postgres. > > > On Tue, Jun 11, 2019 at 9:28 PM Ireneusz Pluta/wp.pl wrote: > >> W dniu 2019-06-11 o 19:45, Igal Sapir pisze: >> > I'm doing a presentation about Postgres to SQL Server users this >> weekend, and I want to showcase >> > some of the big names that use Postgres, e.g. MasterCard, Government >> agencies, Banks, etc. >> > >> You might be interested in this: >> https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-hello-postgres >> >> Andreas - unfortunately I do not recognize any of the names so it's probably European entities that are not very popular here. Ireneusz - I love the article! Will definitely use this. Chris - very impressive! I will mention it in my talk. Thank you all, Igal
Re: Featured Big Name Users of Postgres
Igal Sapir schrieb am 12.06.2019 um 07:58: > Andreas - unfortunately I do not recognize any of the names so it's probably > European entities that are not very popular here. But Lucee is a Swiss company, so why aren't European companies interesting? I know that Zalando (European online clothing shop) uses Postgres (they have published several tools around it) and the German online tax system uses Postgres (there was a presentation by Alva Freude on the migration from Oracle a few years ago) I also found these pages: https://enlyft.com/tech/products/postgresql https://siftery.com/postgresql https://stackshare.io/postgresql https://www.builtincolorado.com/companies/tech/postgresql-companies-colorado https://www.builtinnyc.com/companies/tech/postgresql-companies-nyc but I don't know how credible those source are. Wikipedia lists some names: https://en.wikipedia.org/wiki/PostgreSQL#Notable_users including the International Space Station ;) Other Postgres support companies have also published some customer names: https://www.enterprisedb.com/customers https://www.cybertec-postgresql.com/en/postgresql-overview/solutions-who-uses-postgresql/