Re: questions about wraparound

2021-04-06 Thread Laurenz Albe
On Sat, 2021-04-03 at 15:22 +0200, Luca Ferrari wrote:
> why having a TransactionId that is 32 bits
> in depth while it is exposed (thru txid_current()) as a 64 bits value?
> I mean, having 64 bits would reduce the need for anti-wrap arpund
> vacuum. I suspect the usage of 32 bits is both for compatibility and
> tuple header size, but I'm just guessing.

Because there are two of these transaction IDs stored on each tuple
(xmin and xmax) to determine its visibility.  The overhead of 8 bytes
per tuples for visibility is already pretty high.

Another downside is that changing this would prevent the use of
pg_upgrade for upgrading, as the on-disk format changes.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





LDAP, single sign on from Windows client

2021-04-06 Thread chlor
Hi

I have a Linux server which is setup with authentication via LDAP against a
Windows A/D. In pg_hba I have
host ... ldap ldapserver=example.org ldapprefix="" ldapsuffix="@example.org"

The user is also created in PostgreSQL but without a password.
I can then login with psql from a Windows client with a user defined in the
AD.

But the problem is that psql asks for a password.
Is it possible to make a single sign-on without the password prompt?

The Linux (Ubuntu) has been joined to the domain with "realm" but it
doesn't seem to change anything.

best regards
Hans Schou


Re: MultiXactId wraparound and last aggressive vacuum time

2021-04-06 Thread Michael Schanne
After a closer reading of the documentation, it appears the database should
stop accepting new transactions before a wraparound would ever occur.  If
so, then the only possible explanations for this multixactid wraparound
error would be data corruption, or a bug in postgresql.  The exact version
I'm using is 9.6.10, which is quite a few versions behind the latest
(9.6.21), but I skimmed through the release notes of the later versions and
did not see any bugfixes in this area.  That would leave data corruption as
the only explanation.  Is my reasoning correct here?  I'm willing to
upgrade but I would need to justify it somehow, so if I am missing
something please let me know.

On Tue, Mar 23, 2021 at 11:54 AM Michael Lewis  wrote:

> I don't believe you can determine a date/time that it happened, but
> querying the age of the table is simple to do and then you can compare that
> with the freeze age parameters. A periodic manual vacuum freeze may
> preclude the system ever needing to perform the emergency autovacuum
> freeze, and as an added benefit, index only scans would be more likely to
> be chosen since the visibility map would be updated and the regular
> autovacuum runs would be faster since they can skip frozen pages. I think
> that applies back to 9.6 at least. It's possible it got implemented in 10.
>
> My caffeine hasn't fully kicked in yet and I am still a bit junior on this
> list, but I would hope and expect senior people to correct me if I have
> misunderstood or misrepresented things.
>
>>


Upgrade from 11.3 to 13.1 failed with out of memory

2021-04-06 Thread Mihalidesová Jana
Open

Hi,

I have aproximetly 560GB large database and try to upgrade it from 11.3 to 
13.1. I’ve successfully upgraded dev,test and ref environment but on the 
production pg_dump failed with out of memory. Yes, of course, the dev,test and 
ref are much much smaller then production database.
We are using OID data type so there’s a lot of large objects. pg_largeobject 
it’s 59GB large.
The upgrade process fail during the pg_dump schemas_only so I’m confused why 
it’s not enough 35GB RAM which is free on the server when there’s no data. When 
I tried to run same pg_dump command by hand as during upgrade it fails on line 
pg_dump: reading large objects.

Creating dump of global objects 
"/pgsql/bin/13.1_/bin/pg_dumpall" --host /pgsql/data/ --port 50432 
--username XX --globals-only --quote-all-identifiers --binary-upgrade 
--verbose -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1
ok
Creating dump of database schemas
"/pgsql/bin/13.1_/bin/pg_dump" --host /pgsql/data/ --port 50432 
--username XX --schema-only --quote-all-identifiers --binary-upgrade 
--format=custom --verbose --file="pg_upgrade_dump_16384.custom" 'dbname=' 
>> "pg_upgrade_dump_16384.log" 2>&1

*failure*
There were problems executing ""/pgsql/bin/13.1_/bin/pg_dump" --host 
/pgsql/data/ --port 50432 --username pgpnip --schema-only 
--quote-all-identifiers --binary-upgrade --format=custom --verbose 
--file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
"pg_upgrade_dump_16384.log" 2>&1"


Do you have any idea how to upgrade the database? This is my upgrade command:

/pgsql/bin/13.1_/bin/pg_upgrade -k -b /pgsql/bin/11.3_/bin -B 
/pgsql/bin/13.1_/bin -d /pgsql/data/ -D /pgsql/data//.new

Thank you for any ideas.

Best regards,

Jana Mihalidesova
Database Administrator

►CETIN a.s.
Českomoravská 2510/19, 190 00 Praha 9
m: +420 603 419 862t: +420 238 465 074
jana.mihalides...@cetin.cz



Re: Upgrade from 11.3 to 13.1 failed with out of memory

2021-04-06 Thread Magnus Hagander
On Tue, Apr 6, 2021 at 3:08 PM Mihalidesová Jana
 wrote:
>
> Open
>
>
> Hi,
>
>
>
> I have aproximetly 560GB large database and try to upgrade it from 11.3 to 
> 13.1. I’ve successfully upgraded dev,test and ref environment but on the 
> production pg_dump failed with out of memory. Yes, of course, the dev,test 
> and ref are much much smaller then production database.
>
> We are using OID data type so there’s a lot of large objects. pg_largeobject 
> it’s 59GB large.
>
> The upgrade process fail during the pg_dump schemas_only so I’m confused why 
> it’s not enough 35GB RAM which is free on the server when there’s no data. 
> When I tried to run same pg_dump command by hand as during upgrade it fails 
> on line pg_dump: reading large objects.
>
>
>
> Creating dump of global objects 
> "/pgsql/bin/13.1_/bin/pg_dumpall" --host /pgsql/data/ --port 50432 
> --username XX --globals-only --quote-all-identifiers --binary-upgrade 
> --verbose -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1
>
> ok
>
> Creating dump of database schemas
>
> "/pgsql/bin/13.1_/bin/pg_dump" --host /pgsql/data/ --port 50432 
> --username XX --schema-only --quote-all-identifiers --binary-upgrade 
> --format=custom --verbose --file="pg_upgrade_dump_16384.custom" 'dbname=' 
> >> "pg_upgrade_dump_16384.log" 2>&1
>
>
>
> *failure*
>
> There were problems executing ""/pgsql/bin/13.1_/bin/pg_dump" --host 
> /pgsql/data/ --port 50432 --username pgpnip --schema-only 
> --quote-all-identifiers --binary-upgrade --format=custom --verbose 
> --file="pg_upgrade_dump_16384.custom" 'dbname=' >> 
> "pg_upgrade_dump_16384.log" 2>&1"
>
>
>
>
>
> Do you have any idea how to upgrade the database? This is my upgrade command:
>
>
>
> /pgsql/bin/13.1_/bin/pg_upgrade -k -b /pgsql/bin/11.3_/bin -B 
> /pgsql/bin/13.1_/bin -d /pgsql/data/ -D /pgsql/data//.new
>


This is unfortunately a known limitation in pg_dump (and therefor by
proxy it becomes a problem with pg_upgrade) when you have many large
objects. It doesn't really matter how big they are, it matters how
*many* they are. It takes a long time and uses crazy amounts of
memory, but that's unfortunately where it's at. You'd have the same
problem with a plain dump/reload as well, not just the "binary upgrade
mode".

There's been some recent work on trying to find a remedy for this, but
nothing is available at this point. You'll need to either trim the
number of objects if you can (by maybe manually dumping them out to
files before the restore and then reloading them back in later), or
just add more memory/swap to the machine.

Long term you should probably consider switching to using bytea
columns when you have that many objects.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




RE: Open source licenses

2021-04-06 Thread DAVID Nicolas
My concern is, I guess, the same for all the software editor using opensource 
components. 
It is to make an inventory of all the used opensource licenses from all the 
used components, to check and respect the terms of use, to preserve copyrights 
and intellectual property.  
Companies providing opensource components or libraries now often publish a list 
of the modules and their licences because most of the time it is a prerequisite 
for the adoption in many companies.   
For example, Qt Company publishes this page: 
https://doc.qt.io/qt-5/licenses-used-in-qt.html.  

However, when I get PostgreSql binaries for Windows (Zip archive linked to 
https://www.enterprisedb.com/download-postgresql-binaries), I can see in 
installation-notes.html :
  -> "The software bundled together in this package is released under a number 
of different Open Source licences. By using any component of this installation 
package, you agree to abide by the terms and conditions of it's licence."
This is unclear and even if I found some license files, or header files with 
copyrights, I cannot know certainly the list of installed components and their 
licenses. And finally, whatever if I use a component, as soon as I install it, 
I distribute it and thus I have to know the conditions. 

Could the PostgreSQL Global Development Group consider to provide these 
information ? Is there a team or a group in charge of this ? Is there a direct 
email address to ask this kind of request ?

Best regards,

Nicolas DAVID




Re: Open source licenses

2021-04-06 Thread Laurenz Albe
On Tue, 2021-04-06 at 13:47 +, DAVID Nicolas wrote:
> My concern is, I guess, the same for all the software editor using opensource 
> components. 
> 
> It is to make an inventory of all the used opensource licenses from all the 
> used components,
>  to check and respect the terms of use, to preserve copyrights and 
> intellectual property.  
> 
> However, when I get PostgreSql binaries for Windows (Zip archive linked to
>  https://www.enterprisedb.com/download-postgresql-binaries), I can see in 
> installation-notes.html :
>   -> "The software bundled together in this package is released under a 
> number of different
>  Open Source licences. By using any component of this installation package, 
> you agree to abide
>  by the terms and conditions of it's licence."
> 
> Could the PostgreSQL Global Development Group consider to provide these 
> information ?
>  Is there a team or a group in charge of this ? Is there a direct email 
> address to ask this
>  kind of request ?

These installation packages are provided by EnterpriseDB, not by the PGDG.

I think your request is reasonable, but you'll have to ask the packager.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: questions about wraparound

2021-04-06 Thread Jehan-Guillaume de Rorthais
On Tue, 06 Apr 2021 10:46:08 +0200
Laurenz Albe  wrote:

> On Sat, 2021-04-03 at 15:22 +0200, Luca Ferrari wrote:
> > why having a TransactionId that is 32 bits
> > in depth while it is exposed (thru txid_current()) as a 64 bits value?
> > I mean, having 64 bits would reduce the need for anti-wrap arpund
> > vacuum. I suspect the usage of 32 bits is both for compatibility and
> > tuple header size, but I'm just guessing.  
> 
> Because there are two of these transaction IDs stored on each tuple
> (xmin and xmax) to determine its visibility.  The overhead of 8 bytes
> per tuples for visibility is already pretty high.
> 
> Another downside is that changing this would prevent the use of
> pg_upgrade for upgrading, as the on-disk format changes.

Indeed. Compatibility and size.

About the txid_current() format. It is showing the 64bit format used
internally. It is split in two parts:

* lower part is classical XID on 32 bits, stored in tuples header
* higher part is the "epoch" of the xid space, ie. the number of time the XID
  looped. This is not stored in tuples

Look for macros EpochFromFullTransactionId and XidFromFullTransactionId in
"include/access/transam.h".

Given txid_current() returning eg. 100:

  =# select txid_current();
   txid_current 
  --
 100

The epoch would be 2:

  $ echo $((100 >> 32))
  2

The 32bits XID stored in tuples header would be 1410065408:

  $ echo $((100 % 2**32))
  1410065408

When looking at the NextXID in the controldata file, you would find:

  $ pg_controldata $PGDATA|grep NextXID
  Latest checkpoint's NextXID:  2:1410065408

Regards,




Re: LDAP, single sign on from Windows client

2021-04-06 Thread Stephen Frost
Greetings,

* chlor (hans.sc...@gmail.com) wrote:
> I have a Linux server which is setup with authentication via LDAP against a
> Windows A/D. In pg_hba I have
> host ... ldap ldapserver=example.org ldapprefix="" ldapsuffix="@example.org"
> 
> The user is also created in PostgreSQL but without a password.
> I can then login with psql from a Windows client with a user defined in the
> AD.
> 
> But the problem is that psql asks for a password.
> Is it possible to make a single sign-on without the password prompt?

Yes, use GSSAPI based authentication instead of LDAP.  Using GSSAPI is
also more secure and avoids sending the user's password to the PG
server.

PG Docs: https://www.postgresql.org/docs/current/gssapi-auth.html

Blog I wrote about setting it up:

https://blog.crunchydata.com/blog/windows-active-directory-postgresql-gssapi-kerberos-authentication

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: MultiXactId wraparound and last aggressive vacuum time

2021-04-06 Thread Jehan-Guillaume de Rorthais
On Mon, 5 Apr 2021 17:28:06 -0400
Michael Schanne  wrote:

> After a closer reading of the documentation, it appears the database should
> stop accepting new transactions before a wraparound would ever occur.  If
> so, then the only possible explanations for this multixactid wraparound
> error would be data corruption, or a bug in postgresql.  The exact version
> I'm using is 9.6.10, which is quite a few versions behind the latest
> (9.6.21), but I skimmed through the release notes of the later versions and
> did not see any bugfixes in this area.  That would leave data corruption as
> the only explanation.  Is my reasoning correct here?

I didn't checked the changelog, and you should definitely run 9.6.21, but I
believe your reasoning is correct anyway. A bug might be possible, but I would
bet a coin on the corruption.

You might want to compare this number with the value reported by: 

  pg_controldata $PGDATA|grep NextMultiXactId

Backup your cluster, then, try to isolate the table(s) and block(s) where the
corruption occurs and check at them using eg. pageinspect.

> I'm willing to upgrade but I would need to justify it somehow, so if I am
> missing something please let me know.

you can justify the upgrade using this load of reasons:
https://why-upgrade.depesz.com/show?from=9.6.10&to=9.6.21

Regards,




Check constraint failure messages

2021-04-06 Thread Miles Elam
I've got a domain that validates email addresses. When inserting a bunch of
entries I simply get the error message

ERROR: value for domain po.email violates check constraint "email_check"
SQL state: 23514


When inserting 1000+ entries in a batch, finding the exact entry with the
problem is noticeably harder than with other error types. For example when
a column should be a uuid but you pass in 'Mary had a little lamb', the
error message tells you what the invalid value is as well as the column
name you're trying to put it into.

Are there any quick hacks floating around out there to solve or at least
mitigate this?

- Miles


Re: Check constraint failure messages

2021-04-06 Thread Ron

On 4/6/21 2:40 PM, Miles Elam wrote:
I've got a domain that validates email addresses. When inserting a bunch 
of entries I simply get the error message


ERROR: value for domain po.email violates check constraint
"email_check" SQL state: 23514


When inserting 1000+ entries in a batch, finding the exact entry with the 
problem is noticeably harder than with other error types. For example when 
a column should be a uuid but you pass in 'Mary had a little lamb', the 
error message tells you what the invalid value is as well as the column 
name you're trying to put it into.


Are there any quick hacks floating around out there to solve or at least 
mitigate this?


Is it a deferred constraint?

--
Angular momentum makes the world go 'round.


Re: Check constraint failure messages

2021-04-06 Thread Miles Elam
On Tue, Apr 6, 2021 at 1:03 PM Ron  wrote:

> On 4/6/21 2:40 PM, Miles Elam wrote:
>
> I've got a domain that validates email addresses. When inserting a bunch
> of entries I simply get the error message
>
> ERROR: value for domain po.email violates check constraint "email_check"
> SQL state: 23514
>
>
> When inserting 1000+ entries in a batch, finding the exact entry with the
> problem is noticeably harder than with other error types. For example when
> a column should be a uuid but you pass in 'Mary had a little lamb', the
> error message tells you what the invalid value is as well as the column
> name you're trying to put it into.
>
> Are there any quick hacks floating around out there to solve or at least
> mitigate this?
>
>
> Is it a deferred constraint?
>

Plain ole domain CHECK constraint.

CREATE DOMAIN po.email AS varchar
  CHECK (VALUE IS NULL OR (po.length_in(VALUE, 1, 254) AND NOT
po.email_expanded(VALUE) IS NULL));


where "po" is another schema, po.length_in(...) is an IMMUTABLE range
check, and po.email_expanded(...) is a function returning a record. Same
behavior happens if I remove the functions and define the check constraint
in place. The only info returned in a bulk insert is the name of the
violated check constraint, aka email_check.

An example table using it is defined as follows

CREATE TABLE IF NOT EXISTS profile (
id uuid PRIMARY KEY,
email po.email NOT NULL,

manager_email po.email NOT NULL

);


Nothing fancy.

INSERT INTO profile (id, email, manager_email) VALUES
  (gen_random_uuid(), 'us...@example.com', 'manag...@example.com'),
  (gen_random_uuid(), 'us...@example.com', 'manag...@example.com'),

 (gen_random_uuid(), 'user3&example.com', 'manag...@example.com
'),
 (gen_random_uuid(), 'us...@example.com', 'manager4.example.com
'),
 (gen_random_uuid(), 'us...@example.com', 'manag...@example.com
');

Inserting this batch will tell me that there was an error and that it was
"email_check" that failed, but no indication that the 3rd user's email
address or the 4th user's manager email was the problem, forcing a bisect
operation among 1,000+ entries to find the first error, then bisect from
there to find the second error if any, and repeat until no more constraint
errors.

- Miles


Re: Check constraint failure messages

2021-04-06 Thread Ron

On 4/6/21 3:50 PM, Miles Elam wrote:
On Tue, Apr 6, 2021 at 1:03 PM Ron > wrote:


On 4/6/21 2:40 PM, Miles Elam wrote:

I've got a domain that validates email addresses. When inserting a
bunch of entries I simply get the error message

ERROR: value for domain po.email violates check constraint
"email_check" SQL state: 23514


When inserting 1000+ entries in a batch, finding the exact entry with
the problem is noticeably harder than with other error types. For
example when a column should be a uuid but you pass in 'Mary had a
little lamb', the error message tells you what the invalid value is
as well as the column name you're trying to put it into.

Are there any quick hacks floating around out there to solve or at
least mitigate this?


Is it a deferred constraint?

Plain ole domain CHECK constraint.

CREATE DOMAIN po.email AS varchar
  CHECK (VALUE IS NULL OR (po.length_in(VALUE, 1, 254) AND NOT
po.email_expanded(VALUE) IS NULL));


where "po" is another schema, po.length_in(...) is an IMMUTABLE range 
check, and po.email_expanded(...) is a function returning a record. Same 
behavior happens if I remove the functions and define the check constraint 
in place. The only info returned in a bulk insert is the name of the 
violated check constraint, aka email_check.


The blunt force answer is to not use bulk inserts.  Try COPY; it's good at 
saying which record throws an error.


--
Angular momentum makes the world go 'round.


Re: Check constraint failure messages

2021-04-06 Thread Miles Elam
On Tue, Apr 6, 2021 at 1:59 PM Ron  wrote:

>
> The blunt force answer is to not use bulk inserts.  Try COPY; it's good at
> saying which record throws an error.
>

Sadly, this is a cloud-managed database without direct access to 5432 from
outside the VPC and bastian instances are frowned upon by our security
folks. Guess I'm stuck with bisecting. Thanks for the confirmation.