Re: Postgresql 9.3 Server will not start after Ubuntu Upgrade
Il 27/03/2018 20:00, Ken Beck ha scritto: And, looking for log files, I find none. Have you looked in /var/log/postgresql/ ?
Re: Postgresql 9.3 Server will not start after Ubuntu Upgrade
Il 27/03/2018 22:57, Adrian Klaver ha scritto: On 03/27/2018 01:46 PM, Ken Beck wrote: I am working on two systems, one running in Oracle VirtualBox on my laptop, the other in a DigitalOcean droplet. I know on one of them I tried to remove the postgres-9.6, and it must have been my laptop, here is the output from pg_lsclusters: Ver Cluster Port Status Owner Data directory Log file 9.3 main 5432 down postgres /var/lib/postgresql/9.3/main /var/log/postgresql/postgresql-9.3-main.log What happens if you do?: sudo pg_ctlcluster 9.3 main start If possible, I'll take a look at /var/log/postgresql/postgresql-9.3-main.log. That may give us a big help to uncover what's goin'on. Cheers Moreno.-
Re: Postgresql 9.3 Server will not start after Ubuntu Upgrade
Il 28/03/2018 18:28, Adrian Klaver ha scritto: On 03/28/2018 09:24 AM, Moreno Andreo wrote: Il 27/03/2018 22:57, Adrian Klaver ha scritto: On 03/27/2018 01:46 PM, Ken Beck wrote: I am working on two systems, one running in Oracle VirtualBox on my laptop, the other in a DigitalOcean droplet. I know on one of them I tried to remove the postgres-9.6, and it must have been my laptop, here is the output from pg_lsclusters: Ver Cluster Port Status Owner Data directory Log file 9.3 main 5432 down postgres /var/lib/postgresql/9.3/main /var/log/postgresql/postgresql-9.3-main.log What happens if you do?: sudo pg_ctlcluster 9.3 main start If possible, I'll take a look at /var/log/postgresql/postgresql-9.3-main.log. That may give us a big help to uncover what's goin'on. See this post: https://www.postgresql.org/message-id/5bad1f01-d06a-1526-96d2-c9cfd6062a11%40gmail.com Yes I saw it, but I can't say if that log was from 9.3 or 9.6 cluster... as you can see above he wrote that he had a 9.6, that he tried to remove, and a 9.3, which was not starting. So it's not clear to me if the log he sent was from one version or the another... Cheers Moreno.-
Schema-based replication
Hi folks:-) I'm about to design and develop a cross-platform schema-based replicator as a module for an app scheduled for alpha in the next mid-fall/winter. Before digging into it I need to know if I'm reinventing the wheel or not, but AFAIK pglogical can't give this kind of granularity, especially in a multi-master environment. User should have data (one database with one or more schema, but not necessarily all of them have to be in sync) synced across more than one device, with our server, where there's one database for all and one or more schema per user. Is there anything similar, among PG core or extensions, in progress or in beta? Thanks Moreno.
Re: Strange error in Windows 10 Pro
Il 21/04/2018 22:35, Adrian Klaver ha scritto: On 04/21/2018 01:08 PM, Dale Seaburg wrote: Thanks Adrian for the suggestion of running the installer with Admin rights. Unfortunately, I get the same results. It appears that all of the folders within C:\Program Files\PostgreSQL\9.6 path are created, and populated, BUT, when the items in the *data* folder are to be created, or copied into, it leaves an error message as noted previously. The *data* folder is empty. It's almost as if the PC is missing a critical .dll needed in the *data* folder filling function (my guess). Again, I am at a loss as to what to do. Have you looked at the system logs e.g. Event Viewer? If you look in %temp%", there should be one or more files named "bitrock_installer" or something similar (search for "bitrock"), that's the setup log with what's went good and what not. Another hint I had since 9.1 times (it was 2012, I guess) was to avoid installing Postgres under system folders (c:\program files, c:\users, and so on) because in some cases there could be some nasty behaviors... try installing on something like c:\PG96... HTH Cheers, Moreno.-
Re: Strange error in Windows 10 Pro
Il 24/04/2018 04:09, Dale Seaburg ha scritto: Thanks to Moreno and Igor for the Event Viewer suggestions. Here are a few lines of log file where they differ between a good install and a bad incomplete install. The good install was from my Shop PC with Windows 10 Pro. BTW, I had success with another Dell PC with Win 10 Pro about 4 months ago. This problem appears to be a one-off. Good Install: Called AclCheck(C:\Program Files\PostgreSQL\9.6\data) Called IsVistaOrNewer()... 'winmgmts' object initialized... Version:10. MajorVersion:10 Executing icacls to ensure the SHOP-PC\Dale account can read the path C:\Program Files\PostgreSQL\9.6\data Executing batch file 'rad0510A.bat'... processed file: C:\Program Files\PostgreSQL\9.6\data Successfully processed 1 files; Failed processing 0 files Bad Install: Called AclCheck(D:\PostgreSQL\9.6\data) Called IsVistaOrNewer()... 'winmgmts' object initialized... Version:10. MajorVersion:10 Executing icacls to ensure the WINDOWS-6BEGVO1\don king account can read the path D:\PostgreSQL\9.6\data Executing batch file 'rad6DBC7.bat'... It would be interesting to have the next 2 lines in "Bad install", like in "good install", to know if rad6DBC7.bat excecution terminated with or without errors so if there's an error it would be nice to see it. I assume D: to be a secondary local drive or anyway a local partition not a network or external one... Thanks Moreno.-
pgdg-keyring (or apt-key) failure on fresh 9.6 install
Hi folks, I'm trying to install Postgresql 9.6 on a test machine in Google Cloud Platform After a fresh install with Debian 9 (just after the instance has been created) I follow steps from here https://wiki.postgresql.org/wiki/Apt (instead of pg 10 I install pg 9.6) During the installation process i encounter the following strange warnings that, even if that's a test machine, make me think twice before going ahead. [...] Processing triggers for man-db (2.7.6.1-2) ... Setting up pgdg-keyring (2017.3) ... Removing apt.postgresql.org key from trusted.gpg: Warning: The postinst maintainerscript of the package pgdg-keyring Warning: seems to use apt-key (provided by apt) without depending on gnupg or gnupg2. Warning: This will BREAK in the future and should be fixed by the package maintainer(s). Note: Check first if apt-key functionality is needed at all - it probably isn't! OK Setting up xml-core (0.17) ... [...] I have to say that installation is successfully and database server goes up and apparently with no problems at all. Here's versions: postgres=# select version(); version PostgreSQL 9.6.9 on x86_64-pc-linux-gnu (Debian 9.6.9-2.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit(1 row) Anyone stumped on it? Googling around I see a post saying that's about a dirmngr package missing tried but no avail. Do I need to worry? Thanks Moreno.-
Re: pgdg-keyring (or apt-key) failure on fresh 9.6 install
Hi Tim, Il 29/05/2018 00:06, Tim Cross ha scritto: Moreno Andreo writes: Hi folks, I'm trying to install Postgresql 9.6 on a test machine in Google Cloud Platform After a fresh install with Debian 9 (just after the instance has been created) I follow steps from here https://wiki.postgresql.org/wiki/Apt (instead of pg 10 I install pg 9.6) During the installation process i encounter the following strange warnings that, even if that's a test machine, make me think twice before going ahead. [...] Processing triggers for man-db (2.7.6.1-2) ... Setting up pgdg-keyring (2017.3) ... Removing apt.postgresql.org key from trusted.gpg: Warning: The postinst maintainerscript of the package pgdg-keyring Warning: seems to use apt-key (provided by apt) without depending on gnupg or gnupg2. Warning: This will BREAK in the future and should be fixed by the package maintainer(s). Note: Check first if apt-key functionality is needed at all - it probably isn't! OK Setting up xml-core (0.17) ... [...] I have to say that installation is successfully and database server goes up and apparently with no problems at all. This looks like a warning for the package maintainers regarding ensuring the package depends on either gnupg or gnupg2 and nothing you need to worry about unless you are building/maintaining deb packages for postgres. Brilliant. That's what I needed to know. Just to avoid bitter surprises in the future... :-) The Debian package manager, apt, uses gpg keys to verify the authenticity of packages it downloads. My guess is that previously, you only needed to ensure the package had a dependency on apt-key and now apt has/is changing such that you need to have an explicit dependency on either gnupg or gnupg2. ... so if I update/upgrade this instance in the future it will be automatically fixed (and there shouldn't be issues), right? Thanks a lot! Moreno.-
Re: SQL problem (forgot to change header with earlier post!).
Il 29/05/2018 13:14, Paul Linehan ha scritto: Hi all, I have a problem that I just can't seem to solve: I want to divide the count of one table by the count of another - seems simple enough! I created simple VIEWs with counts of the tables, but I just can't grasp the logic! If it's not an excercise, I think you don't need them DDL and DML (simplified) at the bottom of post. I tried various combinations of things like basic SELECTs. SELECT avg FROM ((SELECT cnt1 FROM v1)/(SELECT cnt2 FROM v2)); Maybe I didn't catch the problem, but select (select count(*) from t1) / (select count(*) from t2)::float should be a starting point (if you need an integer as a return value, simply remove the ::float at the end HTH Moreno.-
Re: pgdg-keyring (or apt-key) failure on fresh 9.6 install
Il 30/05/2018 00:25, Tim Cross ha scritto: Personally, I tend to prefer using the packages which come with the particular flavour of Linux your installing as they are often more in-line with the current version of the package management system being used. I only grab packages from the specific Postgres repo if the package is not in the current version of the distribution I'm installing. You mean "don't add pgdg repos if not necessary"?
catalog is missing n attribute(s) for relid xxxx at character yy
Hi, PostgreSQL 9.1 x86 on Windows 10 (EOL, but in this case it doesn't apply :-) ) When querying a table, I receive the error reported in subject: catalog is missing 5 attribute(s) for relid 33238 at character 15 So I decided to drop and recreate the table. DROP TABLE tbl; same error. Is there anything else I can do to avoid to drop and recreate the whole database? Thanks Moreno.-
pgp_sym_decrypt() - error 39000: wrong key or corrupt data
Hi, while playing with pgcrypto I ran into a strange issue (postgresql 9.5.3 x86 on Windows 7) Having a table with a field dateofbirth text I made the following sequence of SQL commands update tbl_p set dateofbirth = pgp_sym_encrypt('2018-06-21', 'AES_KEY') where codguid = '0001-0001-0001-0001-0001'; OK select pgp_sym_decrypt(dateofbirth::bytea, 'AES_KEY') as datanasc from tbl_p where codguid = '0001-0001-0001-0001-0001' '2018-06-21' select * from tab_paz where pgp_sym_decrypt(natoil::bytea, 'AES_KEY') = '2018-06-21' ERROR: Wrong key or corrupt data ** Error ** ERROR: Wrong key or corrupt data SQL state: 39000 Can't find reference anywhere... Any help would be appreciated. Thanks, Moreno.-
Re: pgp_sym_decrypt() - error 39000: wrong key or corrupt data
Il 21/06/2018 23:31, Adrian Klaver ha scritto: On 06/21/2018 08:36 AM, Moreno Andreo wrote: Hi, while playing with pgcrypto I ran into a strange issue (postgresql 9.5.3 x86 on Windows 7) Having a table with a field dateofbirth text I made the following sequence of SQL commands update tbl_p set dateofbirth = pgp_sym_encrypt('2018-06-21', 'AES_KEY') where codguid = '0001-0001-0001-0001-0001'; OK select pgp_sym_decrypt(dateofbirth::bytea, 'AES_KEY') as datanasc from tbl_p where codguid = '0001-0001-0001-0001-0001' '2018-06-21' select * from tab_paz where pgp_sym_decrypt(natoil::bytea, 'AES_KEY') = '2018-06-21' You switched gears above. What is the data type of the natoil field in table tab_paz? Sorry, just a typo... natoil is, actually dateofbirth, so it's text. You can read it as select * from tbl_p where pgp_sym_decrypt(dateofbirth::bytea, 'AES_KEY') = '2018-06-21' Was the data encrypted in it using the 'AES_KEY'? Yes, the command sequence is exactly reported above. If I use pgp_sym_decrypt in a SELECT statement it's OK, but if it's in a where clause it seems not to be working.
Re: pgp_sym_decrypt() - error 39000: wrong key or corrupt data
Il 22/06/2018 15:18, Adrian Klaver ha scritto: On 06/22/2018 01:46 AM, Moreno Andreo wrote: Il 21/06/2018 23:31, Adrian Klaver ha scritto: On 06/21/2018 08:36 AM, Moreno Andreo wrote: Hi, while playing with pgcrypto I ran into a strange issue (postgresql 9.5.3 x86 on Windows 7) Having a table with a field dateofbirth text I made the following sequence of SQL commands update tbl_p set dateofbirth = pgp_sym_encrypt('2018-06-21', 'AES_KEY') where codguid = '0001-0001-0001-0001-0001'; OK select pgp_sym_decrypt(dateofbirth::bytea, 'AES_KEY') as datanasc from tbl_p where codguid = '0001-0001-0001-0001-0001' '2018-06-21' select * from tab_paz where pgp_sym_decrypt(natoil::bytea, 'AES_KEY') = '2018-06-21' You switched gears above. What is the data type of the natoil field in table tab_paz? Sorry, just a typo... natoil is, actually dateofbirth, so it's text. You can read it as select * from tbl_p where pgp_sym_decrypt(dateofbirth::bytea, 'AES_KEY') = '2018-06-21' Was the data encrypted in it using the 'AES_KEY'? Yes, the command sequence is exactly reported above. If I use pgp_sym_decrypt in a SELECT statement it's OK, but if it's in a where clause it seems not to be working. Are you sure that the entries where not encrypted with a different key because I can't replicate.(More comments below): (other replies below, inline) I'm almost sure (you're never absolutely sure :-) ), since I kept all commands I entered in PgAdminIII SQL Window, and they're reported above. On the other side, I tried the same procedure on another field and it succeeded. The only difference between the 2 fields, and I don't know if it can make any sense, is that the field I tried now and succeeded was created as text, while the other field (dateofbirth) was a timestamp I ALTERed with the statement alter table tbl_p alter column dateofbirth type text using to_char(dateofbirth, '-MM-DD'); I'm just afraid it can happen in production create table pgp_test(id integer, fld_1 varchar); insert into pgp_test values (1, pgp_sym_encrypt('2018-06-21', 'AES_KEY')) select * from pgp_test ; id | fld_1 + 1 | \xc30d04070302444e9b2792436e3d7ed23b01cf097f0a6a36298bab63ae4f22f39de54a9b0d8f905d48198ce76089de5f21669c46d96439718b6b0408c541427b6e7c11008bd3d0ebdae0dceb select * from pgp_test where pgp_sym_decrypt(fld_1::bytea, 'AES_KEY') = '2018-06-21'; id | fld_1 + 1 | \xc30d04070302444e9b2792436e3d7ed23b01cf097f0a6a36298bab63ae4f22f39de54a9b0d8f905d48198ce76089de5f21669c46d96439718b6b0408c541427b6e7c11008bd3d0ebdae0dceb Have you looked at the entry in its encrypted state to see if it looks the same as pgp_sym_encrypt('2018-06-21', 'AES_KEY')? Yes, it seems to have the same value
Re: pgp_sym_decrypt() - error 39000: wrong key or corrupt data
Il 22/06/2018 19:56, Adrian Klaver ha scritto: On 06/22/2018 09:50 AM, Moreno Andreo wrote: Il 22/06/2018 15:18, Adrian Klaver ha scritto: Are you sure that the entries where not encrypted with a different key because I can't replicate.(More comments below): (other replies below, inline) I'm almost sure (you're never absolutely sure :-) ), since I kept all commands I entered in PgAdminIII SQL Window, and they're reported above. On the other side, I tried the same procedure on another field and it succeeded. The only difference between the 2 fields, and I don't know if it can make any sense, is that the field I tried now and succeeded was created as text, while the other field (dateofbirth) was a timestamp I ALTERed with the statement alter table tbl_p alter column dateofbirth type text using to_char(dateofbirth, '-MM-DD'); Assuming the ALTER TABLE was done and then the values where encrypted, that does not seem to affect anything here(More below): test=# create table pgp_alter_test(id integer, birthdate date); CREATE TABLE test=# \d pgp_alter_test Table "public.pgp_alter_test" Column | Type | Collation | Nullable | Default ---+-+---+--+- id | integer | | | birthdate | date | | | test=# insert into pgp_alter_test values (1, '2018-06-21'); INSERT 0 1 test=# select * from pgp_alter_test ; id | birthdate + 1 | 2018-06-21 (1 row) test=# alter table pgp_alter_test alter column birthdate type text using to_char(birthdate, '-MM-DD'); ALTER TABLE test=# \d pgp_alter_test Table "public.pgp_alter_test" Column | Type | Collation | Nullable | Default ---+-+---+--+- id | integer | | | birthdate | text | test=# select * from pgp_alter_test ; id | birthdate + 1 | 2018-06-21 (1 row) test=# update pgp_alter_test set birthdate = pgp_sym_encrypt(birthdate, 'AES_KEY') where id = 1; UPDATE 1 test=# select * from pgp_alter_test ; id | birthdate + 1 | \xc30d04070302b3f55c80f9ab657c68d23b010cd9d4d82631f89c786394a8bceb35529db07c708e5a0c4f04cf91aed24b5ff397dd99c678ec4f5bb769e148cfae3cdfc5453daaeb34ddd2737c (1 row) ^ test=# select * from pgp_alter_test where pgp_sym_decrypt(birthdate::bytea, 'AES_KEY') = '2018-06-21'; id | birthdate + 1 | \xc30d04070302b3f55c80f9ab657c68d23b010cd9d4d82631f89c786394a8bceb35529db07c708e5a0c4f04cf91aed24b5ff397dd99c678ec4f5bb769e148cfae3cdfc5453daaeb34ddd2737c (1 row) I am at a loss now. The only thing I can think of is that data itself is actually corrupted. Maybe some sort of language encoding/collation issue. Just not sure how to test that at the moment. Actually, I tried it in a bunch of other fields with varying data types and everything went fine. I don't know if it's as you say and I mismatched keys (and I need another pair of glasses) or something else. Just hoping (but being confident) it won't happen again. Now trying to speed up a little some queries involving SELECTing among these encrypted fields, if I'm stuck I'll open a new thread. Thanks, Moreno.- I'm just afraid it can happen in production create table pgp_test(id integer, fld_1 varchar); insert into pgp_test values (1, pgp_sym_encrypt('2018-06-21', 'AES_KEY')) Have you looked at the entry in its encrypted state to see if it looks the same as pgp_sym_encrypt('2018-06-21', 'AES_KEY')? Yes, it seems to have the same value So
Re: Not able to update some rows in a table
Il 02/07/2018 16:51, Marco Fochesato ha scritto: Dear all, I have a table with 21 columns. Primary key is done with 20 of these colums. I have 3.313 records. I don't know why, but I'm not able to update some of these records. I don't understand, it seems that I'm not able to fetch. Could you post some of the statement you execute? If you are unable to SELECT some record, it's quite straightforward that you can't UPDATE them. It happens with Pgadmin, with Python Psycopg2 and also through Libreoffice Base. OS? PG Version? Looking to the properties of the table in Pgadmin, I can see only the estimated number of rows. Counted rows are 'not counted'. Seems that autovacuum is not active, or has never run (but i'm quite sure it's not related to your problem, so it can be investigated later). Cheers Moreno.-
Role problem in Windows
Hi, Running 9.1 on Windows 10, upgrading to 10 with pg_upgrade. "Once upon a time" there was a bug in our automatic role creation procedure that did not mask vowels with accent (used in Italian language), like "ò, è" and the result was a role with an empty name. We are now upgrading to 10, and pg_dumpall exits complaining with this role, showing its name (with mis-encoded UTF-8 accented vowel) as an invalid utf-8 character. Trying to get rid of the role, that can't be deleted with a drop role because of the empty name, I did delete from pg_authid where oid = Role disappeared from role list. At the next execution of the pg_upgrade it complains that role "" does not exist while dumping a trigger function. I tried remove the privilege from function ACL, but "role n does not exists". Is there a way to recreate the deleted role, either as a dummy, so I can finish upgrade? Is there another way to bypass the problem? Any help would be appreciated. Cheers, Moreno.-
Re: Role problem in Windows
Il 06/07/2018 17:34, Melvin Davidson ha scritto: On Fri, Jul 6, 2018 at 10:01 AM, Moreno Andreo <moreno.and...@evolu-s.it> wrote: Hi, Running 9.1 on Windows 10, upgrading to 10 with pg_upgrade. "Once upon a time" there was a bug in our automatic role creation procedure that did not mask vowels with accent (used in Italian language), like "ò, è" and the result was a role with an empty name. We are now upgrading to 10, and pg_dumpall exits complaining with this role, showing its name (with mis-encoded UTF-8 accented vowel) as an invalid utf-8 character. Trying to get rid of the role, that can't be deleted with a drop role because of the empty name, I did delete from pg_authid where oid = Role disappeared from role list. At the next execution of the pg_upgrade it complains that role "" does not exist while dumping a trigger function. I tried remove the privilege from function ACL, but "role n does not exists". Is there a way to recreate the deleted role, either as a dummy, so I can finish upgrade? Is there another way to bypass the problem? Any help would be appreciated. Cheers, Moreno.- >Is there a way to recreate the deleted role, either as a dummy, so I can finish upgrade? I can't really suggest how to recreate the dummy role, but I do have an alternate solution. Most probably pg_dump is complaining that role 'xxx' owns some tables. IIRC the complain was about "role does not exist" In the meantime I was able to pg_dump single databases (5 in total, one of them complaining about the role not existing but dumped with all data in its place) and, with my surprise (since I was convinced that pg_dump was working inside a single transaction) I found all roles (all but the "failing" one) at their place in the new server. So, lesson learned: don't mess with system catalogs before RTFM :-)) So you can use the attached script and add 'AND a.rolname = 'xxx' to the WHERE clause. Then as a superuser you can use ALTER TABLE xyz OWNER TO new_owner for each table found. I'll keep it, so if something similar happens maybe it can come in hand. Thanks for your time Moreno.-
New tablespace: just an advice
Hi everyone! My space on my Debian 8 DB server is running a bit low (10% left of a 2TB disk), so, since it's not possible to have a primary MBR disk with size > 2 TB, I decided to create another disk and map it on the server, creating another tablespace on it and moving databases aross disks to balance disk usage. After creating a test server on this night's production server image, I created a folder and assigned ownership to postgres user mkdir /newdisk/tbsp_new chown -R postgres /newdisk/tbsp_new then created new tablespace create tablspace tb2 location '/newdisk/tbsp_new'; and moved a database alter database db set tablespace tb2; As you can see a flat, basic tablespace with no customizations. I just ran a VACUUM FULL on the cluster before creating tablespace. After the ALTER DATABASE command ran successful, I checked disk space (df -h) and saw some more free space on primary disk, and the same space occupied on new disk. Just what I needed. I psql'd in the cluster with the user's username connecting on the database just migrated, and was able to SELECT and UPDATE (not tried INSERTing). Now, 2 questions. 1. Is it all or do I need to adjust something else about permissions, indexes, vacuuming, etc...? ALTERing the database namespace means copying its physical files to new directory, but is it applied to all objects (indexes, triggers, etc)? 2. What will happen to who tries to access the database while it's being moved from one tablespace to another? Thanks in advance, Moreno.
Re: New tablespace: just an advice
Il 16/10/2018 10:18, Laurenz Albe ha scritto: Moreno Andreo wrote: Now, 2 questions. 1. Is it all or do I need to adjust something else about permissions, indexes, vacuuming, etc...? ALTERing the database namespace means copying its physical files to new directory, but is it applied to all objects (indexes, triggers, etc)? All objects that are located in the default tablespace of the database will be moved. 2. What will happen to who tries to access the database while it's being moved from one tablespace to another? They will be "hanging" with a lock until the transaction is done. Yours, Laurenz Albe Perfect! Thanks for your time, Moreno
Updating 3-table dataset
Hi all, I don't know if that's the heat burning my brain but I can't find a solution to what seemed a simple operation to me. I have 3 tables create table t_all { id uuid, ref_id uuid (FK to t_ana.id) }; create table t_ana { id uuid, code text }; create table t_app { id uuid, code text (subset of t_ana.code) } I need to update t_all set t_all.id = t_app.id having t_ana.code in t_app.code (I wrote it in some kind of meta-sql but I hope it's clear) I tried to create a view but I need an INSTEAD OF trigger, since it spreads among 3 tables so I hope there's some faster path to achieve the solution Thanks Moreno
Re: Updating 3-table dataset
Il 09/08/19 16:50, Luca Ferrari ha scritto: On Fri, Aug 9, 2019 at 2:29 PM Moreno Andreo wrote: Hi all, I don't know if that's the heat burning my brain but I can't find a solution to what seemed a simple operation to me. I have 3 tables create table t_all { id uuid, ref_id uuid (FK to t_ana.id) }; create table t_ana { id uuid, code text }; create table t_app { id uuid, code text(subset of t_ana.code) } I need to update t_all set t_all.id = t_app.id having t_ana.code in t_app.code (I wrote it in some kind of meta-sql but I hope it's clear) I tried to create a view but I need an INSTEAD OF trigger, since it spreads among 3 tables so I hope there's some faster path to achieve the solution Not sure I got what you need, and I've not tested, but something like the following: WITH must_update AS ( SELECT app.id AS app_id, ana.id AS ana_id FROM t_app app, t_ana ana WHERE app.code = ana.code ) UPDATE t_all SET id = ( SELECT app_id FROM must_update WHERE ref_id = must_update.ana_id ); I've written the CTE because it is a little clearer in my mind, but you can push down as a subquery of course. Thanks Luca, that's the idea I needed... now some small trimming and I think I'll be there (just for the logs... the UPDATE statement needs a WHERE clause, otherwise it will NULL the id field in all rows where ref_id is not present in must_update :-) ) I owe you a beer :-) Cheers Moreno. Luca
How to conditionally change the 2nd part of a full join
I need to obtain a single record from 2 records in the same table grouping for a key, say id value value2 1 2 5 1 2 7 the result would be 1 2 5 7 and that works fine with a full join: SELECT * FROM (SELECT idp, data, i1, i2, m1, m2 from tblwk WHERE recordkey = 4) s1 FULL JOIN (SELECT idp, data, i1, m1, m2 from tblwk WHERE recordkey = 10) s2 USING (data, idp) Now, to get another dataset, I need the second subselect to change based on a value acquired from the first one. I tried with a case SELECT * FROM ( SELECT idp, data, idd, rif1, rif2, t1, t2, t3, t5, t7, t9, t10, i1, i2, i3, dg from tblwk WHERE recordkey = 1) s1 FULL JOIN case when i1=1 then (SELECT idp, data, desc, rif1, rif3, t1, t2, t5 from tblwk WHERE recordkey = 2) s2 when i1=2 then (SELECT idp, data, desc, rif1, t4, t5, i2 from tblwk WHERE recordkey = 3 order by i2) s2 when i1=3 then (SELECT idp, data, desc, rif1, t2, t5, t6, i2 from tblwk WHERE recordkey = 4 order by i2) s2 when i1=4 then (SELECT idp, data, desc, i2 from tblwk WHERE recordkey = 9) s2 end USING (data, idp) but it doesn't like "case" after a FULL JOIN. I read a bit of docs and discovered LATERAL, but AFAIK it's useless here. Is it necessary to write a function (which would be my last resort, not just because I'm not so good in writing functions) or there is some SQL syntax that can come in help? Thanks Moreno.-
Rebuild pg_toast from scratch?
Hi, I have an issue with a Windows 10 PC with Postgres 9.1 x86. Running some SELECTs we got "ERROR: could not open file "base/48121/784576": No such file or directory" I then ran select n.nspname AS tableschema, c.relname AS tablename from pg_class c inner join pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = 784576; and discovered that the involved table is pg_toast_49713. Now I need to recreate an empty copy of this table (data can be recovered later), how can I achieve this? Thanks in advance Moreno.-
Re: Rebuild pg_toast from scratch?
Hi Tom and thanks for your time, Il 05/09/19 15:53, Tom Lane ha scritto: Moreno Andreo writes: I have an issue with a Windows 10 PC with Postgres 9.1 x86. You realize 9.1 is long out of support ... Absolutely :-) I'm about to migrate it to 11 Now I need to recreate an empty copy of this table (data can be recovered later), how can I achieve this? TRUNCATE ought to be enough. truncate pg_toast_49713; ERROR: relation "pg_toast_49713" does not exist :-\ ... or do I need to truncate the "master" table (the table blobs are in)? Thanks
Pg11 -- MultiXactId xxxx has not been created yet -- apparent wraparound
Hi all, I'm encountering this issue in a Windows 10/Pg11.5 I followed the thread @ https://postgrespro.com/list/thread-id/2380690 but examining heap pages is far beyond my knowledge, so if any of the gurus would spend some time on it, I would be very grateful. So, here comes the facts database0=# select datminmxid from pg_database where datname = current_database(); datminmxid 365 (1 row) The CTID of the tuple BEFORE the failing one is 3159,51: database0=# select ctid from tablename offset 368 limit 1; ERROR: MultiXactId 12800 has not been created yet -- apparent wraparound database0=# select ctid from tablename offset 367 limit 1; ctid --- (3159,51) (1 row) Now, I started issuing the queries select * from heap_page_items(get_raw_page('tablename',3159)); select * from heap_page_items(get_raw_page('tablename',3160)); and so on for about 5 or 6 pages. What do I need to search for? Or is it better to \copy them to and send 'em as an attachment? How many pages do I need to search for? Until I see the error again? Thanks in advance Moreno.-
Re: Pg11 -- MultiXactId xxxx has not been created yet -- apparent wraparound
Il 04/10/19 17:30, Alvaro Herrera ha scritto: On 2019-Oct-04, Moreno Andreo wrote: select * from heap_page_items(get_raw_page('tablename',3159)); select * from heap_page_items(get_raw_page('tablename',3160)); and so on for about 5 or 6 pages. Please paste the output of that for pages 3159 and 3160, as well as the output of pg_controldata. Thanks Alvaro, you can find attached the data you requested 3159.csv Description: MS-Excel spreadsheet 3160.csv Description: MS-Excel spreadsheet Numero di versione di pg_control: 1100 Numero di versione del catalogo:201809051 Identificatore di sistema del database: 6742478776920498196 Stato del cluster di database: in produzione Ultima modifica a pg_control: 04/10/2019 15.57.47 Ultima posizione del checkpoint:3/5768A440 Locazione di REDO dell'ultimo checkpoint: 3/5768A408 File WAL di REDO dell'ultimo checkpoint:000100030057 TimeLineId dell'ultimo checkpoint: 1 PrevTimeLineID dell'ultimo checkpoint: 1 full_page_writes dell'ultimo checkpoint:attivato NextXID dell'ultimo checkpoint: 0:2405812 NextOID dell'ultimo checkpoint: 859841 NextMultiXactId dell'ultimo checkpoint: 366 NextMultiOffset dell'ultimo checkpoint: 0 oldestXID dell'ultimo checkpoint: 663 DB dell'oldestXID dell'ultimo checkpoint: 18836 oldestActiveXID dell'ultimo checkpoint: 2405812 oldestMultiXID dell'ultimo checkpoint: 365 DB dell'oldestMulti dell'ultimo checkpoint: 13011 oldestCommitTsXid dell'ultimo checkpoint: 0 newestCommitTsXid dell'ultimo checkpoint: 0 Orario ultimo checkpoint: 04/10/2019 15.57.43 Falso contatore LSN per rel. non loggate: 0/1 Posizione del minimum recovery ending: 0/0 Timeline posiz. minimum recovery ending:0 Posizione dell'inizio del backup: 0/0 Posizione della fine del backup:0/0 Record di fine backup richiesto:no Impostazione di wal_level: replica Impostazione di wal_log_hints: disattivato Impostazione di max_connections:1000 Impostazione di max_worker_processes: 8 Impostazione di max_prepared_xacts: 0 Impostazione di max_locks_per_xact: 64 Impostazione di track_commit_timestamp: disattivato Massimo allineamento dei dati: 8 Dimensione blocco database: 8192 Blocchi per ogni segmento grosse tabelle: 131072 Dimensione blocco WAL: 8192 Byte per segmento WAL: 16777216 Lunghezza massima degli identificatori: 64 Massimo numero di colonne in un indice: 32 Massima dimensione di un segmento TOAST:1996 Dimensione di un blocco large-object: 2048 Memorizzazione per tipi data/ora: interi a 64 bit Passaggio di argomenti Float4: per valore passaggio di argomenti Float8: per valore Versione somma di controllo dati pagine:0 Finto nonce di autenticazione: f4d747c6c35ca14eef04db5eb0b575e0ec8a8deb90c23c96bc986f27cdd5463a
Re: Pg11 -- MultiXactId xxxx has not been created yet -- apparent wraparound
Il 04/10/19 18:28, Alvaro Herrera ha scritto: I wonder if it would work to just clear that multixact with SELECT ... WHERE ctid='(3160,31)' FOR UPDATE select ...what? :-) Sorry but it's totally beyond my knowledge and my control after resolving the issue i'll surely go and search docs to understand what we've done If this was in my hands, I would scan the WAL looking for the place that last touched this page (and the latest FPI for this page, also). It might have an explanation of what went on. Maybe use the page's LSN (from pageinspect's page_header()) as starting point for the WAL location that modified the page. I hope you have a WAL archive that goes back to well before the previous checkpoint. One thing I forgot to report is that this cluster is just upgraded from a 9.1 that was crashing at least once a day (in many cases the upgrade itself resolved the issue) here's the log line 2019-10-03 15:11:52 CEST LOG: server process (PID 18668) was terminated by exception 0xC005 In this case probably the access violation was due to a data corruption. These are customer machines that are really badly kept and NTFS issues are not that rare, so I won't bother investigating what's happened but just make the customer up & running again. Thanks for your time Moreno
Re: Pg11 -- MultiXactId xxxx has not been created yet -- apparent wraparound
Il 04/10/19 21:14, Alvaro Herrera ha scritto: On 2019-Oct-04, Moreno Andreo wrote: Il 04/10/19 18:28, Alvaro Herrera ha scritto: I wonder if it would work to just clear that multixact with SELECT ... WHERE ctid='(3160,31)' FOR UPDATE select ...what? :-) Sorry but it's totally beyond my knowledge and my control after resolving the issue i'll surely go and search docs to understand what we've done This should do it: SELECT * FROM the_broken_table WHERE But of course I make no promise of it working or even having any effect at all ... Unfortunately, it didn't work :( db0=# select * from failing_table where ctid='(3160,31)' for update; ERROR: MultiXactId 12800 has not been created yet -- apparent wraparound Since the probability we are into corruption is very high, what if I \copy all the table but the failing row(s) to an external file, drop and recreate the table, and then \copy clean data back inside? Thanks Moreno.-
Re: Pg11 -- MultiXactId xxxx has not been created yet -- apparent wraparound
Hi Alvaro, sorry for late reply, I've been out of office. Il 09/10/19 19:51, Alvaro Herrera ha scritto: On 2019-Oct-07, Moreno Andreo wrote: Unfortunately, it didn't work :( db0=# select * from failing_table where ctid='(3160,31)' for update; ERROR: MultiXactId 12800 has not been created yet -- apparent wraparound Oh well. It was a long shot anyway ... It was a long shot, but it was worth trying Since the probability we are into corruption is very high, what if I \copy all the table but the failing row(s) to an external file, drop and recreate the table, and then \copy clean data back inside? Yes, that should work. It did not work... I think there was some big deal with the cluster itself. To extract these small parts of data I had to SELECT using OFFSET and LIMIT. Well, the same query (i.e. select * from table offset 35 limit 145) run as it is worked well, but from the moment I put it into a COPY statement, it was messing again with multixact, even if I tried back the only query. It ended recovering data from backups (2 days old, and that's good news) Thanks for your time Moreno.-
CASE(?) to write in a different column based on a string pattern
Hi, I need to create a CASE (I think) statement to check for a string pattern, and based on its value, write a substring in a different column (alias). I'm trying to create a COPY statement to port a table into antoher database, which has a table with another format (that's why the aliases) Let's write it in pseudoSQL: given this select pattern from tbl; pattern -- foo1234 bar5678 baz9012 That's what I'm trying to achieve select pattern, CASE when pattern like 'foo%' then ltrim(pattern, 'bar') as foo when pattern like 'bar%' then ltrim(pattern, 'bar') as bar when pattern like 'baz%' then ltrim(pattern, 'baz') as baz END from tbl; |foo |bar |baz | 1234 5678 9012 (hoping text formatting is ok... 1234 should go in column foo, 568 in bar and 9012 in baz) Is it possible? Thanks in advance Moreno.-
Re: CASE(?) to write in a different column based on a string pattern
Il 13/11/19 17:48, Andrew Kerber ha scritto: So what you are doing is transforming the table format from vertical to horizontal. I think you will want to use a union to join the table to itself along with the case statement to produce the output you are looking for. Not precisely, the string pattern is only part of a bigger table (30 columns in total), what I'm trying to achieve is what Geoff explained, just split values in 3 different columns based on the string pattern Thanks for your time Moreno.
Re: CASE(?) to write in a different column based on a string pattern
Il 13/11/19 17:36, Geoff Winkless ha scritto: Simplest way in plain SQL would be individual case statements for each column, I think. SELECT pattern, CASE WHEN pattern LIKE 'foo%' THEN SUBSTR(pattern, 4) ELSE '' END AS foo CASE WHEN pattern LIKE 'bar%' THEN SUBSTR(pattern, 4) ELSE '' END AS bar CASE WHEN pattern LIKE 'baz%' THEN SUBSTR(pattern, 4) ELSE '' END AS baz FROM tbl; Geoff Geoff, it worked perfectly! Thanks a lot! Moreno.-
Re: [SPAM] Remote Connection Help
Il 21/11/19 15:21, Jason L. Amerson ha scritto: could not connect to server: Connection refused (Ox274D/10061) Is the server running on host " xx.xx.xx.xx" and accepting TCP/IP connections on port 5432' Connection refused means somthing has blocked it. If it was all OK and simply Postgres was not listening, you should've received a "connection timed out"(10060) message. Have you tried adjusting/tearing off iptables and check what happens, as also Ekaterina pointed out? And, just as a side note, I normally don't activate IPv6 if it's not necessary (it has not been necessary in the last 10 years :-) ), 'cause I've run in some troubles that have been cleared getting rid of IPv6) so I'll try editing postgresql.conf as listen = '127.0.0.1' HTH, Moreno.-
Re: [SPAM] Remote Connection Help
Il 21/11/19 22:40, Peter J. Holzer ha scritto: On 2019-11-21 17:27:04 +0100, Moreno Andreo wrote: Connection refused means somthing has blocked it. If it was all OK and simply Postgres was not listening, you should've received a "connection timed out" (10060) message. Almost exactly the other way around. If you try to connect to a port where no server is listening, you get a connection refused message. If something is blocking the connection you may get a timeout. hp O.O You're right, obviously. Don't know why, it's since school days that sometines I swap these 2 error meanings (and have to deal with the consequences)... But I'm confident that one day there will be no more swapping... hopefully until I retire ...:-) Thanks for pointing out and sorry for the mistake (blushing) Moreno.-
Re: Should I reinstall over current installation?
Il 04/02/2020 00:16, Chris Charley ha scritto: I tried items you suggested (1-5), but could find no helpful info. Thanks for your help and going the extra mile! Hope I'm in time to try to save you from reinstall :-) How to check if PostgreSQL is running --- - Run Services.msc - In the console you should have an entry with your postgres version (I have 9.1 and 9.5 on this host) - Check it's automatically started and running (It's in Italian, "In esecuzione" means "running" and "Automatico", well :-)). - If it's not, try to start it (right click on the service and select start) - If you get an error, check the error message How to check the error message --- You have 2 choices 1 - Postgres Logs) of all, you should check PostgreSQL logs under \data\pg_log. Here you should find some files showing you most of the errors you would find in Event Viewer. Check error messages and, if still needed, post them 2 - Windows Event Log) - Enter Event Viewer - Sort Events by date descending - First entries should reveal why your postgres server is not running (I captured the first error I got on this host just for example) - On the lower pane, check what's the error message (in my case, "FATAL: The database system is starting up"). If you need further help, post the error message and we'll try to help you. Hope this help Moreno.-
Re: Should I reinstall over current installation?
Il 04/02/2020 21:18, Chris Charley ha scritto: Hello Moreno Thanks for the reply! I ran Services and it reported postsql as Disabled. A disabled service will never run nor leave error messages anywhere Right click on the Postgresql service, select properties. In the next window, choose startup type (the combo under the executable path) and select Automatic, then click on the Start button below Click OK and close the window. Now something should happen. If everything goes well, You'll see "Started" and "Automatic" in the two columns next to the service description, and you can start your application If there is an error message, then it should log something. See my last mail to extract log informations and post them here I am considering reinstall because I couldn't find any helpful info from these services. You got no helpful info because service does not start, andd so does not log :-) Thanks again for your help. I suspect when Windows was reinstalled, the firewall was reconfigured and PostgreSQL wasn't allowed to access the port? ATM there's a service start issue to be resolved, if there's also something about the firewall, we'll think about it later. Moreno.-
Unable to run psql on 9.5 after broken 12 remove
I have a production server running pg9.5 seamlessly. Yesterday I decided to install libpq to have some crypto functions. Unexpectedly, it installed postgresql 12 and its libpq. I don't need pg 12, so I decided to remove it. It did not went well root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12 Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: libuuid-perl Use 'apt-get autoremove' to remove it. The following packages will be REMOVED: postgresql-12* postgresql-client-12* postgresql-contrib* 0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded. 1 not fully installed or removed. After this operation, 30.9 MB disk space will be freed. Do you want to continue? [Y/n] y (Reading database ... 32065 files and directories currently installed.) Removing postgresql-contrib (12+215.pgdg80+1) ... Removing postgresql-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ... Removing postgresql-client-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode Processing triggers for postgresql-common (215.pgdg80+1) ... Building PostgreSQL dictionaries from installed myspell/hunspell packages... Removing obsolete dictionary files: Setting up python3.4 (3.4.2-1+deb8u3) ... File "/usr/lib/python3.4/http/client.py", line 1014 raise InvalidURL(f"URL can't contain control characters. {url!r} " ^ SyntaxError: invalid syntax dpkg: error processing package python3.4 (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: python3.4 E: Sub-process /usr/bin/dpkg returned an error code (1) After this, the package is not anymore on the installed list and I'm not able to issue the psql command: root@datastore-1:/home/user# sudo -u postgres psql Error: PostgreSQL version 12 is not installed How can I repair this? Thanks in advance Moreno.
Re: Unable to run psql on 9.5 after broken 12 remove
After upgrading python the InvalidUrl is gone, but I still can't run psql Output of pg_lsclusters, if needed root@datastore-1:/usr/share/postgresql-common# pg_lsclusters Ver Cluster Port Status Owner Data directory Log file 9.5 main 6543 online postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log 12 main 5432 down,binaries_missing /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log I followed https://askubuntu.com/questions/1223270/psql-command-error-postgresql-version-12-is-not-installed with no luck, apt-purge simply states that pg12 is not installed. Il 03/07/2020 10:37, Moreno Andreo ha scritto: I have a production server running pg9.5 seamlessly. Yesterday I decided to install libpq to have some crypto functions. Unexpectedly, it installed postgresql 12 and its libpq. I don't need pg 12, so I decided to remove it. It did not went well root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12 Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: libuuid-perl Use 'apt-get autoremove' to remove it. The following packages will be REMOVED: postgresql-12* postgresql-client-12* postgresql-contrib* 0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded. 1 not fully installed or removed. After this operation, 30.9 MB disk space will be freed. Do you want to continue? [Y/n] y (Reading database ... 32065 files and directories currently installed.) Removing postgresql-contrib (12+215.pgdg80+1) ... Removing postgresql-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ... Removing postgresql-client-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode Processing triggers for postgresql-common (215.pgdg80+1) ... Building PostgreSQL dictionaries from installed myspell/hunspell packages... Removing obsolete dictionary files: Setting up python3.4 (3.4.2-1+deb8u3) ... File "/usr/lib/python3.4/http/client.py", line 1014 raise InvalidURL(f"URL can't contain control characters. {url!r} " ^ SyntaxError: invalid syntax dpkg: error processing package python3.4 (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: python3.4 E: Sub-process /usr/bin/dpkg returned an error code (1) After this, the package is not anymore on the installed list and I'm not able to issue the psql command: root@datastore-1:/home/user# sudo -u postgres psql Error: PostgreSQL version 12 is not installed How can I repair this? Thanks in advance Moreno.
Re: Unable to run psql on 9.5 after broken 12 remove
Il 03/07/2020 16:51, Adrian Klaver ha scritto: On 7/3/20 1:54 AM, Moreno Andreo wrote: Looks like the cluster was not removed from the pgcommon setup. This would explain why you can't run psql. By default pgcommon looks for the version of psql connected with the most recent version of Postgres it knows about, in this case 12. Unfortunately that binary no longer exists. You still have psql. Do: /usr/lib/postgresql/9.5/bin/psql --help OK, it confirms that 9.5 client is still ok To avoid having to do that try: sudo pg_dropcluster 12 main This reported a warning root@datastore-1:/home/user# pg_dropcluster 12 main Warning: corrupted cluster: data directory does not exist ... but it did its job root@datastore-1:/home/user# sudo -u postgres psql psql (9.5.6) Type "help" for help. postgres=# \q Thanks, you saved my day again! Moreno. Il 03/07/2020 10:37, Moreno Andreo ha scritto: I have a production server running pg9.5 seamlessly. Yesterday I decided to install libpq to have some crypto functions. Unexpectedly, it installed postgresql 12 and its libpq. I don't need pg 12, so I decided to remove it. It did not went well root@datastore-1:/home/user# apt-get --purge remove postgresql-client-12 Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: libuuid-perl Use 'apt-get autoremove' to remove it. The following packages will be REMOVED: postgresql-12* postgresql-client-12* postgresql-contrib* 0 upgraded, 0 newly installed, 3 to remove and 89 not upgraded. 1 not fully installed or removed. After this operation, 30.9 MB disk space will be freed. Do you want to continue? [Y/n] y (Reading database ... 32065 files and directories currently installed.) Removing postgresql-contrib (12+215.pgdg80+1) ... Removing postgresql-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/postmaster.1.gz to provide /usr/share/man/man1/postmaster.1.gz (postmaster.1.gz) in auto mode Purging configuration files for postgresql-12 (12.3-1.pgdg80+1) ... Removing postgresql-client-12 (12.3-1.pgdg80+1) ... update-alternatives: using /usr/share/postgresql/9.5/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode Processing triggers for postgresql-common (215.pgdg80+1) ... Building PostgreSQL dictionaries from installed myspell/hunspell packages... Removing obsolete dictionary files: Setting up python3.4 (3.4.2-1+deb8u3) ... File "/usr/lib/python3.4/http/client.py", line 1014 raise InvalidURL(f"URL can't contain control characters. {url!r} " ^ SyntaxError: invalid syntax dpkg: error processing package python3.4 (--configure): subprocess installed post-installation script returned error exit status 1 Errors were encountered while processing: python3.4 E: Sub-process /usr/bin/dpkg returned an error code (1) After this, the package is not anymore on the installed list and I'm not able to issue the psql command: root@datastore-1:/home/user# sudo -u postgres psql Error: PostgreSQL version 12 is not installed How can I repair this? Thanks in advance Moreno.
Debian : No echo after pg_dump | psql
Hi guys, I'm facing a strange thing on my test server (Google Cloud) On my Debian 9 box I'm running Postgres 9.6.10, and I'm transferring some databases from another server (Debian 8, PG 9.5.15). The command I'm using is root@x:~# pg_dump -v -C -h -p 6543 -U postgres | psql -h localhost -p 6543 -U postgres It presents a double password prompt after I run it: Password: Password for user postgres: I enter _once_ my postgres password, and it starts processing data. At a certain point it suddenly stops processing without prompting anything, and goes on only if I insert my postgres password again, as if it was for one of the two prompts it showed at the beginning where I entered only one password. After another bit of processing, it asks another time (the third!) my postgres password. After I entered it, it goes to the end (and there are no issues in data or anything simia. After this, I have no more echo on what I'm typing, but if I type something and press enter, I can see the result of the command I just typed (and have not seen) on the console. Restarting the SSH session resolves the problem. The same strange behavior is the same if I switch servers (running pg_dump on Debian 8/PG 9.5 connecting on Debian 9/PG 9.6) Another thing I was forgetting to report: If I create database (just database, not schema) on target machine, I receive the error "database already exists" but the dump goes on If I don't create it, I receive the error "database does not exist" and processing aborts. Hope I've been clear enough.. Has someone bumped into it? Thanks in advance Moreno.-
Re: Debian : No echo after pg_dump | psql
Adrian, Andrew, thanks and apologies for the late reply Il 30/11/2018 05:08, Andrew Gierth ha scritto: "Moreno" == Moreno Andreo writes: Moreno> The command I'm using is Moreno> root@x:~# pg_dump -v -C -h -p 6543 -U postgres Moreno> | psql -h localhost -p 6543 -U postgres Moreno> It presents a double password prompt after I run it: Moreno> Password: Password for user postgres: This is going to prompt once for the remote host's password and twice for the local one (because -C), and the concurrently-running commands are going to be fighting over access to the terminal to do it. Best avoided by using pgpass or non-password-based auth methods. More seriously, you're misunderstanding how -C works. When you use -C, the database you specify to psql (or pg_restore) is NOT the database you're restoring into - the restored db will ALWAYS have the same name as it had when dumped (if that's not what you want then don't use -C). Instead, the database you specify to psql or pg_restore is the database to connect to to issue the CREATE DATABASE command, which should usually be 'postgres'. That explains this bit: Moreno> If I create database (just database, not schema) on target Moreno> machine, I receive the error "database already exists" but Moreno> the dump goes on If I don't create it, I receive the error Moreno> "database does not exist" and processing aborts. I removed -C usage since the target database (the database itself, not the schema) is created with a CREATE DATABASE before issuing pg_dump (so I don't need it), but strange prompt behavior remained the same I'm having a really hard time these days, so I can't investigate further. Maybe on holidays, when I hope the pressure will be released a bit. Will report as soon as I can. THanks again Cheers, Moreno.-
REVOKE to an user that doesn't exist
Hi all, I'm trying to pg_upgrade a cluster from 9.1 to 10 (Windows 10, but I don't think it matters). At a certain point an error is thrown while parsing a trigger: could not execute query: ERROR: role "1067431" does not exist command was: REVOKE ALL ON FUNCTION "x"() FROM PUBLIC; GRANT ALL ON FUNCTION "public"."x"() TO "1067431" WITH GRANT OPTION; Here's the evidence :-) postgres=# \du List of roles Role name | Attributes | Member of --+-+--- user5 | Create role, Create DB | {} user2 | Superuser, Create role, Create DB, Replication | {} user4 | Create role, Create DB | {} user3 | | {} user1 | Superuser, Create role, Create DB, Replication | {} postgres | Superuser, Create role, Create DB, Replication | {} postgres=# select * from pg_roles; rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil | rolconfig | oid --+--++---+-+--+-++--+-+---+---+- postgres | t | t | t | t | t | t | t | -1 | | | | 10 user1 | t | t | t | t | t | t | t | -1 | | | | 16393 user2 | t | t | t | t | t | t | t | -1 | | | | 16394 user3 | f | t | f | f | f | t | f | -1 | | | | 16395 user4 | f | t | t | t | f | t | f | -1 | | | | 1067432 user5 | f | t | t | t | f | t | f | -1 | | | | 30602 (6 rows) So, no user with that OID. I checked in the trigger pointed to the error and I found GRANT EXECUTE ON FUNCTION x() TO "1067431" WITH GRANT OPTION; How to REVOKE that non-existing user so pg_upgrade can proceed? thanks Moreno.-
Re: REVOKE to an user that doesn't exist
Il 12/12/2018 15:39, Adrian Klaver ha scritto: On 12/12/18 5:11 AM, Moreno Andreo wrote: Hi all, I'm trying to pg_upgrade a cluster from 9.1 to 10 (Windows 10, but I don't think it matters). At a certain point an error is thrown while parsing a trigger: could not execute query: ERROR: role "1067431" does not exist command was: REVOKE ALL ON FUNCTION "x"() FROM PUBLIC; GRANT ALL ON FUNCTION "public"."x"() TO "1067431" WITH GRANT OPTION; Here's the evidence :-) The below is from the 9.1 cluster, correct? Correct, 9.1.6, IIRC postgres=# select * from pg_roles; rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil | rolconfig | oid --+--++---+-+--+-++--+-+---+---+- postgres | t | t | t | t | t | t | t | -1 | | | | 10 user1 | t | t | t | t | t | t | t | -1 | | | | 16393 user2 | t | t | t | t | t | t | t | -1 | | | | 16394 user3 | f | t | f | f | f | t | f | -1 | | | | 16395 user4 | f | t | t | t | f | t | f | -1 | | | | 1067432 user5 | f | t | t | t | f | t | f | -1 | | | | 30602 (6 rows) So, no user with that OID. I checked in the trigger pointed to the error and I found GRANT EXECUTE ON FUNCTION x() TO "1067431" WITH GRANT OPTION; I am not following as a trigger would not have that in its code. Are you referring to the function x()? If so is the GRANT in the function? Sorry, the grant above is extracted from the CREATE statement that PgAdmin3 shows when you click on the trigger
Re: REVOKE to an user that doesn't exist
Il 12/12/2018 16:01, Tom Lane ha scritto: Moreno Andreo writes: I'm trying to pg_upgrade a cluster from 9.1 to 10 (Windows 10, but I don't think it matters). At a certain point an error is thrown while parsing a trigger: could not execute query: ERROR: role "1067431" does not exist command was: REVOKE ALL ON FUNCTION "x"() FROM PUBLIC; GRANT ALL ON FUNCTION "public"."x"() TO "1067431" WITH GRANT OPTION; Hm. We've seen similar bugs before; the mechanism is that at some point the function owner granted privileges to somebody else, and at some later point the somebody-else role got dropped, but the privilege grant stayed behind because the system had lost, or never made, the pg_shdepend entry indicating that this function had an ACL entry mentioning that role. The extra ACL entry is harmless, until you wonder why pg_dump is printing a nonsensical command due to it. That's the case. The customer never complained about any problem but today, while upgrading his cluster, we bumped into it. We fixed a couple of bugs of that ilk just last month [1], but they were for cases involving types and schemas, not functions. The last case involving function privileges that I see in a quick trawl of the commit log predates 9.0 release [2]. I wonder how old this cluster is ... The cluster is version 9.1.6, IIRC How to REVOKE that non-existing user so pg_upgrade can proceed? The safest way to clean it up manually would be to set the pg_proc.proacl field for that function to NULL. If there are other grants about the function, you could try removing the bad entry, but it would likely be safer to just re-grant after the upgrade. Is it not the case to edit proacl column to just remove the unwanted role? Is it faster and safer to null the column and just reGRANT? Thanks Moreno.-
Re: REVOKE to an user that doesn't exist
Il 12/12/2018 16:01, Tom Lane ha scritto: The safest way to clean it up manually would be to set the pg_proc.proacl field for that function to NULL. If there are other grants about the function, you could try removing the bad entry, but it would likely be safer to just re-grant after the upgrade. Setting the column (and several others from other tables such as pg_class) to null made me possible to upgrade the cluster. After the upgrade, I issued the necessary GRANTs and everything is up & running on Postgres 10 now! Thanks again Moreno.-
Re: problem
Check in your %temp% directory, there should be some bitrock* or bitrock_installer* file, these are setup logs and can point you to the problem. If initdb failed, you can try running something like this initdb -D "C:\My\Postgres\Setup\Path" --encoding=UTF8 --locale="Italian, Italy" -W –n and see if you get some error (if you are installing an old veresion (<9.3 IIRC) you can have to add -U postgres to have the user Postgres -- that's been removed in last versions -- to be the owner of data directory) If you point us to the os/postgres versions we could try to be more accurate :-) Cheers Moreno.- Il 31/01/2019 14:10, Mirco Gallazzi ha scritto: GoodMornig, I can’t install postgres on my pc because i have Always this error: “ Problem running post-install step. Installation may not complete correctly. The database cluster initialisation failed.” I looked for a guide that can help me to solve this problem but it was all a big fail. Itried to create a new user on my pc to create postgres server but nothing. I tried to follow some guides on web/YouTube but nothing. Can you please help me to solve this problem? I must need to solve this because in my university they use postgres and i must need this to complete the exam and the project on database. Thank for patience. Have a good day, mirco Inviato da Posta per Windows 10
Re: Connection pooling for differing databases?
Il 07/03/2019 20:27, Arjun Ranade ha scritto: Hi all, I'm wondering if there's a tool like pgpool that can provide a single origin point (host/port) that will proxy/direct connections to the specific servers that contain the db needing to be accessed. Yes, I think there are many, but I'm encouraging you to take a look at pgbouncer https://pgbouncer.github.io/ in pgbouncer.ini you enter database configuration values like database = host=hostname port=xyzk, like mydb1 = host=cluster1 port=6543 or mydb2 = host=cluster1 port=9876 mydb3 = host=cluster2 port=6543 but there many other parameters to refine your config (like "proxying" database names, so if you share names across clusters you can easily avoid conflicts) Pgbouncer should be installed on the same server as the databases or in another and listens on a different port than Postgres' (say 5431 while postgres is on 5432) I'm actively using in my environment with 2 clusters and about 500 databases, works flawlessly. One thing you have to consider, if under heavy workload (say 100's of connections) is to raise kernel value of maximum open files Cheers Moreno.-
Re: Connection pooling for differing databases?
Il 07/03/2019 21:19, Arjun Ranade ha scritto: I'm looking at pgbouncer and it does most of what I need. I'm wondering about clients connecting via pgadmin, is there a way for users using pgadmin or another tool to see all the databases that are part of the configs? It's an issue I ran into when I set up my pgbouncer architecture, but since all servers are reachable by the same private network pgAdmin host is, there's no security issue in connecting directly to them, instead of passing through pgbouncer, so I did not spend time (that I hadn't :-)) in investigating. If you resolve this (or someone has already done so), sharing the solution would be much appreciated. Cheers, Moreno.- On Thu, Mar 7, 2019 at 2:39 PM Moreno Andreo <moreno.and...@evolu-s.it> wrote: Il 07/03/2019 20:27, Arjun Ranade ha scritto: > Hi all, > > I'm wondering if there's a tool like pgpool that can provide a single > origin point (host/port) that will proxy/direct connections to the > specific servers that contain the db needing to be accessed. Yes, I think there are many, but I'm encouraging you to take a look at pgbouncer https://pgbouncer.github.io/ in pgbouncer.ini you enter database configuration values like database = host=hostname port=xyzk, like mydb1 = host=cluster1 port=6543 or mydb2 = host=cluster1 port=9876 mydb3 = host=cluster2 port=6543 but there many other parameters to refine your config (like "proxying" database names, so if you share names across clusters you can easily avoid conflicts) Pgbouncer should be installed on the same server as the databases or in another and listens on a different port than Postgres' (say 5431 while postgres is on 5432) I'm actively using in my environment with 2 clusters and about 500 databases, works flawlessly. One thing you have to consider, if under heavy workload (say 100's of connections) is to raise kernel value of maximum open files Cheers Moreno.-
Key encryption and relational integrity
Hello folks :-) Is there any workaround to implement key encryption without breaking relational integrity? Thanks Moreno.-
Re: Key encryption and relational integrity
Il 26/03/2019 15:24, Adrian Klaver ha scritto: On 3/26/19 7:19 AM, Moreno Andreo wrote: Hello folks :-) Is there any workaround to implement key encryption without breaking relational integrity? This is going to need more information. OK, I'll try to be as clearer as I can For starters 'key' has separate meanings for encryption and RI. I could make some guesses about what you want, but to avoid false assumptions a simple example would be helpful. In a master-detail relation, I need to encrypt one of master table PK or detail table FK, in order to achieve pseudonimization, required by GDPR in Europe when managing particular data Imagine I have Table users id surname last name 1 John Doe 2 Jane Doe 3 Foo Bar Table medications id user_id med 1 1 Medication 2 1 Ear check ... ... medications.user_id is FK on users.id we should achieve Table medications id user_id med 1 sgkighs98 Medication 2 sghighs98 Ear check or the opposite (users.id encryption and medications.user_id kept plain) At a first glance, it IS breaking relational integrity, so is there a way to manage this encryption internally so RI is kept safe? Thanks Moreno.-
Re: Key encryption and relational integrity
Il 27/03/2019 07:42, Tony Shelver ha scritto: Not in Europe, but have worked a bit with medical records systems in the USA, including sharing across providers. The primary key of the user is _should_ be system generated, and this is meaningless from a user identity standpoint. It is, I was planning to use UUID If you encrypt user name and other significant personal data on the user id record, you can query the id column all day long, and there would be no way to identify who the user is without the encryption key. That's the first idea that came in my mind. Unfortunately, the customer needs to do "real time" search in personal data, so for each key that's pressed, I need to query the whole table filtering the encrypted value (1st layer of slowness) LIKE (2nd layer of slowness) the value the user is typing. I ran a test on my i7-2660 with SSD and 16G RAM and on a 2500 rows table these queries last about 2 seconds each keypress. So it's not the way to go, we have to find something different. Same thing with clinical data, it would be a mess because when I open the patient clinical record I need to see all treatments, or all drugs that have been prescripted, all diseases and so on, and it would be not-so fast if every clinical data name (i.e. all drug names) is encrypted. The only other way to do it would be to store the encrypted key value in both user.id and medications.user_id. That would encrypt the data and maintain relational integrity. Hmmm... if user.id and medications.user_id are the same, I can link user with medication... and GDPR rule does not apply. or am I missing something? For US medical record systems, there is a strict act (HIPAA) that specifies both privacy and security rules, with lists of what is regarded as sensitive or private information, what can be shared with various providers and outside organizations, and so on.. As far as user name goes, that is almost never a decisive form of identification for any person in a system. GDPR is almost the same concept, even if some rules may differ While GDPR is relatively young and untested, surely someone in your organization (or outside) has a list of the relevant rules broken down to specific requirements. You sure? :-) Also, securing the data internally on a database system MAY have very different requirements to making that data visible to applications or transmitting it to external parties. Profiling, IMHO, has to be designed in application, here I'm trying to find a way so nobody can, without the use of the application, match a patient with their clinical records (i.e. someone breaking into the server -- data breach) Storing the user id in plain on the medications record and encrypted on the user primary key would seem meaningless, assuming some meaning could be derived from a system generated ID. It is a system generated ID, obviously the query is more logical and quicker if i look from master into detail, so I SELECT something FROM medications WHERE medications.user_id = encrypt(user.id). Encrypting the (ex-)FK appears the best solution so far, but I'm afraid of the consequences of losing RI. I would suggest you sit down with the application / legal guys generating the requirements to see what the real rules are. if they want everything encrypted, then automatic primary key generation in the DB has to be invalidated, and they should provide that from the application side. Having everything encrypted would be a big performance hit. We are trying to achieve best performance with the right masking. Thanks Moreno.-
Re: Key encryption and relational integrity
Il 26/03/2019 18:08, Adrian Klaver ha scritto: On 3/26/19 9:08 AM, Moreno Andreo wrote: Il 26/03/2019 15:24, Adrian Klaver ha scritto: On 3/26/19 7:19 AM, Moreno Andreo wrote: Hello folks :-) Is there any workaround to implement key encryption without breaking relational integrity? This is going to need more information. OK, I'll try to be as clearer as I can For starters 'key' has separate meanings for encryption and RI. I could make some guesses about what you want, but to avoid false assumptions a simple example would be helpful. In a master-detail relation, I need to encrypt one of master table PK or detail table FK, in order to achieve pseudonimization, required by GDPR in Europe when managing particular data Imagine I have Table users id surname last name 1 John Doe 2 Jane Doe 3 Foo Bar Table medications id user_id med 1 1 Medication 2 1 Ear check ... ... medications.user_id is FK on users.id we should achieve Table medications id user_id med 1 sgkighs98 Medication 2 sghighs98 Ear check or the opposite (users.id encryption and medications.user_id kept plain) At a first glance, it IS breaking relational integrity, so is there a way to manage this encryption internally so RI is kept safe? Not that I know of. RI is based on maintaining a link between parent and child. So by definition you would be able to get to the parent record via the child. That's what I was afraid of :-( A quick search on pseudonymisation found a boatload of interpretations of how to implement this: "Pseudonymisation' means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person." To me it would seem something like: Table medications id user_id med 1 sgkighs98 Medication 2 sghighs98 Ear check Table users id surname last name sgkighs98 John Doe jkopkl1 Jane Doe uepoti21 Foo Bar Where there is no direct link between the two. Are you sure there isn't?... the key "sgkighs98" is present on both tables and I can join tables on that field, so the pseudonimysation does not apply, it's just "separation" (that was OK with the last privacy act, but not with GDPR The problem is not on the application side... there you can do almost anything you want to do. The prolem is that if someone breaks in the server (data breach) it is easy to join patients and their medications. Instead permissions would prevent linking from medications to users even via a SELECT. One could also use pgcrypto: https://www.postgresql.org/docs/10/pgcrypto.html on the users table to further hide the personal info. That's what I used to try to encrypt first name, last name, street address and some other fields (that would be the best solution because RI was not impacted at all), but the customer stated that they have to perform real-time search (like when you type in the Google search box), and the query that has to decrypt all names and return only the ones that begin with a certain set of characters is way too slow (tried on a good i7 configuration, that's about 2 seconds for each key pressed on a 2500-row table). So I dropped this approach. *NOTE* I am not a lawyer so any advice on my part as to meeting legal requirements are just me thinking out loud. I would suggest, if not already done, getting proper legal advice on what the section quoted above actually means. Relax, I'm not here to ask and then sue anyone :-)
Re: Key encryption and relational integrity
Il 28/03/2019 15:45, Adrian Klaver ha scritto: On 3/28/19 7:29 AM, Moreno Andreo wrote: Il 27/03/2019 07:42, Tony Shelver ha scritto: Not in Europe, but have worked a bit with medical records systems in the USA, including sharing across providers. The only other way to do it would be to store the encrypted key value in both user.id <http://user.id> and medications.user_id. That would encrypt the data and maintain relational integrity. Hmmm... if user.id and medications.user_id are the same, I can link user with medication... and GDPR rule does not apply. or am I missing something? Yes the link means that someone could use the medications.user_id to fetch the rest of the user information from the user table. Unless you encrypted that information also, which I gather you do not want to do for performance reasons. Yes, but to be GDPR compliant I _have_ to remove that link. As you reported in an earlier email, they say that you can't link patient and medication if not with an external resource. In this case we are linking them without an external resource Wandering on the web I also bumped on an article that suggested to have the tables on 2 databases on 2 different servers with different credentials Interesting, but still without RI. I think I have to answer this question: "is it acceptable to have this kind of master-detail structure in a database without RI?" (The example is about 2 tables, obviously in the real world the master will have 50+ detail tables) Thanks Moreno.
Re: Key encryption and relational integrity
Il 28/03/2019 23:29, Peter J. Holzer ha scritto: On 2019-03-28 18:36:40 +0100, Moreno Andreo wrote: Il 26/03/2019 18:08, Adrian Klaver ha scritto: To me it would seem something like: Table medications id user_id med 1 sgkighs98 Medication 2 sghighs98 Ear check Table users id surname last name sgkighs98 John Doe jkopkl1 Jane Doe uepoti21 Foo Bar Where there is no direct link between the two. Are you sure there isn't?... the key "sgkighs98" is present on both tables and I can join tables on that field, so the pseudonimysation does not apply, Yes. It doesn't matter whether the key is 'sgkighs98' or 1438 or 692da0c1-cf2d-476d-8910-7f82c050f8fe. it's just "separation" (that was OK with the last privacy act, but not with GDPR I doubt that this is correct. The GDPR doesn't prescribe specific technical means (there may be laws or standards in your country which prescribe such means for medical data, but that's not the GDPR). That was told me by a privacy consultant, there was an Italian law (196/2003) that introduced "minimal security measures", that has been revoked with the GDPR appliance. The problem is not on the application side... there you can do almost anything you want to do. The prolem is that if someone breaks in the server (data breach) it is easy to join patients and their medications. I sure hope that the doctors are able to join patients and their medications. So at some level that has to be possible. It would be possible at application level, that resides on another server (so it would be compliant the separation between the pseudonimysation and the reverse method) If you assume a break-in into the server, there will always be a level of penetration at which the attacker will be able to access any data an authorized user can access. That's not what I got reading the GDPR article... but I may have misunderstood (juridic text is non my cup of tea). My understanding was that even in a data breach event there should be a mechanism that prevents (or "mitigate the risk that") the attacker to gain access to the data in the "joined" form, so he cannot acquire that patient John Doe has got Alzheimer, for instance, but only that in that database there is a patient which name is John Doe and someone that has got Alzheimer. And I tried to find a solution, and since I did not like that much what I found (and it seems that neither you do :-) ), I came here hoping that someone would share his experience to shed some light on the topic. hp
Re: Key encryption and relational integrity
Il 28/03/2019 23:50, Peter J. Holzer ha scritto: On 2019-03-28 15:29:50 +0100, Moreno Andreo wrote: here I'm trying to find a way so nobody can, without the use of the application, match a patient with their clinical records (i.e. someone breaking into the server -- data breach) I think it is very optimistic to assume that an intruder would get access to the database but not the application. hp Well, application resides on another instance (server), but if the attacker has been able to take control of one server, he surely could try to break another one, but it takes time.
Re: Key encryption and relational integrity
Il 29/03/2019 20:23, Adrian Klaver ha scritto: On 3/29/19 9:01 AM, Moreno Andreo wrote: And I tried to find a solution, and since I did not like that much what I found (and it seems that neither you do :-) ), I came here hoping that someone would share his experience to shed some light on the topic. From what you have posted the biggest issue you are having is less then real time search on patient names due to the need to meet pseudonymisation. To me that is always going to be a problem as there are two opposing forces at work, overhead to implement pseudonymisation vs quick lookup. Might be time to lower expectations on what can be done. ... or just do NOT meet pseudonimization at all, but try to enforce other rules suggested bu GDPR. Peter put in evidence a concept " The GDPR doesn't say how to do that at all (the legislators were wise enough that any attempt to do that would result in a mess). So you can't say "the GDPR says we have to do it this way" (and if your consultant says that it is probably time to get a different one). You have to consider all the risks (and yes, an attacker getting access to some or all of the data is a risk, but a doctor not being able to access a patient's records is also a risk) and implement the best you can do considering "the state of the art, the costs of implementation", etc. " that would be absolutely right. I'm not forced to use pseudonimysation if there's the risk to get things worse in a system. I've got to speak about these"two opposing forces at work" to a privacy expert (maybe choosing another one, as Peter suggested :-) ) and ask him if it could be used as a matter of declining pseudonymisation because of "pseudonimysation puts at risk overall performance or database integrity" What do you think? hp
Re: [SPAM] Re: Key encryption and relational integrity
Il 01/04/2019 20:48, Rory Campbell-Lange ha scritto: On 01/04/19, Moreno Andreo (moreno.and...@evolu-s.it) wrote: ... I'm not forced to use pseudonimysation if there's the risk to get things worse in a system. I've got to speak about these"two opposing forces at work" to a privacy expert (maybe choosing another one, as Peter suggested :-) ) and ask him if it could be used as a matter of declining pseudonymisation because of "pseudonimysation puts at risk overall performance or database integrity" How to interpret the pseudonymisation conditions is ... complicated. Yes, it is indeed... :-) The UK's Information Commissioner's Office (ICO) writes that pseudoanonymisation relates to: “…the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” and that this "...can reduce the risks to the data subjects". The concept of application realms may be relevant to consider here. An application may be considered GDPR compliant without pseudonymisation if other measures are taken and the use case is appropriate. That could be my case, so I'll have to discuss the strategy and measures to be adopted with a privacy consultant. On the other hand, a copy of a production database in testing which has been pseudonymised may, if compromised, still leak personal data. As the ICO states: “…Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person…” https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/what-is-personal-data/what-is-personal-data/ If leakage occurs pseudonymisation has achieved nothing. That's another aspect of the question. Thanks for the clarification, Moreno.-
Re: Connection refused (0x0000274D/10061)
Il 18/06/19 15:57, Adrian Klaver ha scritto: On 6/18/19 1:06 AM, Sourav Majumdar wrote: Hii, I have checked the logged file. I am attaching it with this mail. PFA In future please do not attach a 1.4MB file. Most of it was Unpacking info. The relevant part was at the end: " fixing permissions on existing directory C:/PostgreSQL/data ... initdb: could not change permissions of directory "C:/PostgreSQL/data": Permission denied [...] Did you follow the instructions here?: https://www.enterprisedb.com/docs/en/11.0/PG_Inst_Guide_v11/PostgreSQL_Installation_Guide.1.08.html# In particular the part about: "To perform an installation using the graphical installation wizard, you must have superuser or administrator privileges. To start the installation wizard, assume sufficient privileges and double-click the installer icon; if prompted, provide a password. Note that in some versions of Windows, you can invoke the installer with Administrator privileges by right clicking on the installer icon and selecting Run as Administrator from the context menu." We had this issue in the past, and noted that re-issuing initdb from an Administrator Command Prompt (not doing anything else) from the Bin directory worked perfectly. This was applied for 9.5, 9.6 and 10.x. Still not using 11 in production, so I have no cases to test at the moment. My 2 cent Moreno.-
Partial table duplication via triggger
Hi *, suppose I have 2 tables CREATE TABLE t1( id uuid, name text, surname text, ... PRIMARY KEY(id) ) CREATE TABLE t2( id uuid, master_id uuid, op_ts timestamp with time zone, name text, surname text, ... PRIMARY KEY(id) ) I need to write an AFTER TRIGGER on UPDATE so all columns of t1 go in the same columns in t2 (except for t1.id that goes in t2.master_id, and t2.op_ts gets now()) I cannot write an 1 to 1 column assignment (like NEW.name := OLD.name and so on) because the trigger has to be used on many tables, that has different (and evolving) schema and I don't want to write dozen of function that have to be frequently mantained. I'm quite noob at pl-pgsql; at the moment I wrote this, but I can't "tell the function that fields are from OLD row" (the error is "missing FROM-clause entry for table 'old') I tried also with field names alone (without OLD.), with no success. Trigger is fired AFTER UPDATE in t1 (CREATE TRIGGER update_id AFTER UPDATE ON t1 FOR EACH ROW EXECUTE PROCEDURE update_op()); CREATE OR REPLACE FUNCTION update_op() RETURNS TRIGGER AS $$ DECLARE fieldlist text := (select string_agg(column_name, ', ') from information_schema.columns c where table_name = TG_TABLE_NAME and (column_name <> 'id')); oldfieldlist text := (select string_agg(column_name, ', OLD.') from information_schema.columns c where table_name = TG_TABLE_NAME and (column_name <> 'id')); BEGIN EXECUTE 'INSERT INTO t2 (master_id, op_ts, '|| fieldlist ||') VALUES (OLD.id, now(), OLD.'||oldfieldlist||')' USING OLD; RETURN NULL; END; $$ LANGUAGE 'plpgsql'; What am I missing? Thanks, Moreno
Re: [SPAM] Re: Partial table duplication via triggger
On 22/02/24 17:49, Erik Wienhold wrote: On 2024-02-22 15:14 +0100, Moreno Andreo wrote: suppose I have 2 tables [snip] What am I missing? The parameters you pass in with USING have to be referenced as $1, $2, and so on. For example: DECLARE fieldlist text := ( SELECT string_agg(quote_ident(column_name), ', ') FROM information_schema.columns WHERE table_name = TG_TABLE_NAME AND column_name <> 'id' ); oldfieldlist text := ( SELECT string_agg('$1.' || quote_ident(column_name), ', ') FROM information_schema.columns WHERE table_name = TG_TABLE_NAME AND column_name <> 'id' ); BEGIN EXECUTE ' INSERT INTO t2 (id, master_id, op_ts, ' || fieldlist || ') VALUES (gen_random_uuid(), $1.id, now(), ' || oldfieldlist || ') ' USING OLD; RETURN NULL; END; Also make sure to use quote_ident() when constructing statements that way to avoid SQL injections via column names in this case. Or use format() with placeholder %I, although it's not simpler when you need to construct that variable list of identifiers. Erik, It worked perfectly! I had not clear in mind how to use $1, $2, etc, with using; after your reply I had a closer look at the docs and now it's clearer to me. Many thanks, Moreno.
Re: Windows XP to Win 10 migration issue
Il 11/12/2017 14:37, Vincent Veyron ha scritto: On Sat, 9 Dec 2017 10:11:42 -0600 Dale Seaburg wrote: No Go! Would not start. Any error message in your logs? I would certainly second Scott's suggestion to check the processors. I've had to do what you describe once, and it took me four machines before I got one that would start postgresql with my rescued data folder. You might have better luck finding an old XP machine similar to your deceased one, installing postgresql on it, and using that to start your cluster. I'd take a peek into logs (maybe EventLog), it happened to me that the postgres user in windows 10 did not have the permission to access data directory. Setting the right permissions could make it start. Obviously the processor architecture (32/64 bit) must be honored. My 2 cent
Re: Functions and Indexes
On 18/11/24 15:43, Gianni Ceccarelli wrote: On Mon, 18 Nov 2024 15:37:57 +0100 Moreno Andreo wrote: If a column appears in the WHERE clause (and so it should be placed in index), in case it is "processed" in a function (see below), is it possible to insert this function to further narrow down things? You probably want to look at https://www.postgresql.org/docs/current/indexes-expressional.html Hi Gianni, correct. That's a good starting point (now I have to add complexity since the index should have 4 columns, one of them is referenced with LIKE) Thanks! Moreno
Re: Functions and Indexes
On 18/11/24 20:05, Laurenz Albe wrote: On Mon, 2024-11-18 at 15:37 +0100, Moreno Andreo wrote: I'm creating indexes for some tables, and I came across a doubt. If a column appears in the WHERE clause (and so it should be placed in index), in case it is "processed" in a function (see below), is it possible to insert this function to further narrow down things? Common index: SELECT foo1, foo2 FROM bar WHERE foo1 = 2 CREATE index bar1_idx ON bar USING btree(foo1); What about if query becomes SELECT foo1, foo2 FROM bar WHERE (POSITION(foo1 IN 'blah blah') >0) You could create an index like CREATE INDEX ON bar (position(foo1 IN 'blah blah')); Alternatively, you could have a partial index: CREATE INDEX ON bar (foo1) INCLUDE (foo2) WHERE position(foo1 IN 'blah blah') > 0; Interesting. Never seen this form, I'll look further on it. I stumbled into https://www.cybertec-postgresql.com/en/indexing-like-postgresql-oracle/ and discovered text_pattern_ops. I'm wondering if it can be of any use in my index, that should hold a WHERE condition with a combination of LIKE and the POSITION expression above. More docs to read ... :-) Second question: I 've seen contrasting opinions about putting JOIN parameters (ON a.field1 = b.field2) in an index and I'd like to know your thoughts. That depends on the join strategy PostgreSQL chooses. You can use EXPLAIN to figure out the join strategy. This article should explain details: https://www.cybertec-postgresql.com/en/join-strategies-and-performance-in-postgresql/ Very nice article, clear and easy to understand! Yours, Laurenz Albe Thanks, Moreno.
Functions and Indexes
Hi folks, I'm creating indexes for some tables, and I came across a doubt. If a column appears in the WHERE clause (and so it should be placed in index), in case it is "processed" in a function (see below), is it possible to insert this function to further narrow down things? Common index: SELECT foo1, foo2 FROM bar WHERE foo1 = 2 CREATE index bar1_idx ON bar USING btree(foo1); What about if query becomes SELECT foo1, foo2 FROM bar WHERE (POSITION(foo1 IN 'blah blah') >0) Second question: I 've seen contrasting opinions about putting JOIN parameters (ON a.field1 = b.field2) in an index and I'd like to know your thoughts. Thanks, Moreno.
Re: Functions and Indexes
On 19/11/24 12:34, Laurenz Albe wrote: On Tue, 2024-11-19 at 11:53 +0100, Moreno Andreo wrote: What about if query becomes SELECT foo1, foo2 FROM bar WHERE (POSITION(foo1 IN 'blah blah') >0) You could create an index like CREATE INDEX ON bar (position(foo1 IN 'blah blah')); Alternatively, you could have a partial index: CREATE INDEX ON bar (foo1) INCLUDE (foo2) WHERE position(foo1 IN 'blah blah') > 0; Interesting. Never seen this form, I'll look further on it. I stumbled into https://www.cybertec-postgresql.com/en/indexing-like-postgresql-oracle/ and discovered text_pattern_ops. I'm wondering if it can be of any use in my index, that should hold a WHERE condition with a combination of LIKE and the POSITION expression above. More docs to read ... :-) I don't think "text_pattern_ops" will help here - queries that use LIKE to search for a substring (LIKE '%string%') cannot make use of a b-tree index. Oh, OK, i was happy to use BTREEs 'cause I had some issues with GIN/GIST (like indexes way bigger than table and so inefficient). OK, I'll stick with these and try harder to obtain better results. One thing I can't understand well. In https://www.cybertec-postgresql.com/en/join-strategies-and-performance-in-postgresql/ you say "Note that for inner joins there is no distinction between the join condition and the|WHERE|condition, but that doesn't hold for outer joins." What do you mean? Thanks Moreno
BTREE index: field ordering
Hi, Postgres 16.4 (planning to go on 17.4) I'm creating some indexes based on some slow query reported by logs. These queries involve a WHERE with more than 5 fields, that are matching by =, <>, LIKE and IN() I read that equality fields must be first, then the others. Is it correct? Based on this query SELECT COUNT(id) AS total FROM nx.tbl1 WHERE (date_order >= '2025-03-21') AND (date_order <= '2025-03-29') AND (flag = TRUE) AND (( -- (flag = TRUE) -- AND (((tipo <> 'C') AND (tipo <> 'V') AND (tipo <> 'F')) OR (tipo IS NULL) OR (tipo = '')) (((op <> 'C') OR (op IS NULL)) OR (tipo = 'F')) AND (s_state IN ('ENQ','WFR','BLQ','BLR','WFA','FRW','FRO','','0001')) AND (tiporic IS NOT NULL) AND (tiporic NOT LIKE '%cart%') ) OR ( (tiporic LIKE '%cart%') AND (S_state <> 'CON') AND (s_state <> '0002') AND ((op <> 'C') OR (op IS NULL)) )) AND (priv IS NULL OR priv = false OR (priv = true and idpriv = 'TEST'))); Should the following index be correct? CREATE INDEX IF NOT EXISTS tbl1_count_idx on nx.tbl1 USING BTREE(flag, tipo, op, priv, idpriv, date_order, s_state, tiporic); Would it be better to create a separate GIN/GIST index for the field matched with LIKE? Thanks in advance, Moreno
Logical replication, need to reclaim big disk space
Hi, we are moving our old binary data approach, moving them from bytea fields in a table to external storage (making database smaller and related operations faster and smarter). In short, we have a job that runs in background and copies data from the table to an external file and then sets the bytea field to NULL. (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) This results, at the end of the operations, to a table that's less than one tenth in size. We have a multi-tenant architecture (100s of schemas with identical architecture, all inheriting from public) and we are performing the task on one table per schema. The problem is: this is generating BIG table bloat, as you may imagine. Running a VACUUM FULL on an ex-22GB table on a standalone test server is almost immediate. If I had only one server, I'll process a table a time, with a nightly script, and issue a VACUUM FULL to tables that have already been processed. But I'm in a logical replication architecture (we are using a multimaster system called pgEdge, but I don't think it will make big difference, since it's based on logical replication), and I'm building a test cluster. I've been instructed to issue VACUUM FULL on both nodes, nightly, but before proceeding I read on docs that VACUUM FULL can disrupt logical replication, so I'm a bit concerned on how to proceed. Rows are cleared one a time (one transaction, one row, to keep errors to the record that issued them) I read about extensions like pg_squeeze, but I wonder if they are still not dangerous for replication. Thanks for your help. Moreno.-
Re: Logical replication, need to reclaim big disk space
On 19/05/25 20:49, Achilleas Mantzios wrote: On 19/5/25 17:38, Moreno Andreo wrote: On 19/05/25 14:41, Achilleas Mantzios wrote: On 5/19/25 09:14, Moreno Andreo wrote: On 16/05/25 21:33, Achilleas Mantzios wrote: On 16/5/25 18:45, Moreno Andreo wrote: Hi, we are moving our old binary data approach, moving them from bytea fields in a table to external storage (making database smaller and related operations faster and smarter). In short, we have a job that runs in background and copies data from the table to an external file and then sets the bytea field to NULL. (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) This results, at the end of the operations, to a table that's less than one tenth in size. We have a multi-tenant architecture (100s of schemas with identical architecture, all inheriting from public) and we are performing the task on one table per schema. So? toasted data are kept on separate TOAST tables, unless those bytea cols are selected, you won't even touch them. I cannot understand what you are trying to achieve here. Years ago, when I made the mistake to go for a coffee and let my developers "improvise" , the result was a design similar to what you are trying to achieve. Years after, I am seriously considering moving those data back to PostgreSQL. The "related operations" I was talking about are backups and database maintenance when needed, cluster/replica management, etc. With a smaller database size they would be easier in timing and effort, right? Ok, but you'll lose replica functionality for those blobs, which means you don't care about them, correct me if I am wrong. I'm not saying I don't care about them, the opposite, they are protected with Object Versioning and soft deletion, this should assure a good protection against e.g. ransomware, if someone manages to get in there (and if this happens, we'll have bigger troubles than this). PostgreSQL has become very popular because of ppl who care about their data. Yeah, it's always been famous for its robustness, and that's why I chose PostgreSQL more than 10 years ago, and, in spite of how a "normal" user treats his PC, we never had corruption (only where FS/disk were failing, but that's not PG fault) We are mostly talking about costs, here. To give things their names, I'm moving bytea contents (85% of total data) to files into Google Cloud Storage buckets, that has a fraction of the cost of the disks holding my database (on GCE, to be clear ). May I ask the size of the bytea data (uncompressed) ?. single records vary from 150k to 80 MB, the grand total is more than 8,5 TB in a circa 10 TB data footprint This data is not accessed frequently (just by the owner when he needs to do it), so no need to keep it on expensive hardware. I've already read in these years that keeping many big bytea fields in databases is not recommended, but might have misunderstood this. Ok, I assume those are unimportant data, but let me ask, what is the longevity or expected legitimacy of those ? I haven't worked with those just reading : https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing would you choose e.g. "*Anywhere Cache storage" ? * Absolutely not, this is *not* unimportant data, and we are using Standard Storage, for 0,02$/GB/month + operations, that compared to a 0.17$/GB/month of an SSD or even more for the Hyperdisks we are using, is a good price drop. How about hosting your data in your own storage and spend 0$/GB/month ? If we could host on our own hardware I'd not be here talking. Maybe we would have a 10-node full-mesh multimaster architecture with barman backup on 2 separate SANs. But we are a small company that has to balance performance, consistency, security and, last but not latter, costs. And margins are tightening. ** Another way would have been to move these tables to a different tablespace, in cheaper storage, but it still would have been 3 times the buckets cost. can you actually mount those Cloud Storage Buckets under a supported FS in linux and just move them to tablespaces backed by this storage ? Never tried, I mounted this via FUSE and had some simple operations in the past, but not sure it can handle database operations in terms of I/O bandwidth Why are you considering to get data back to database tables? Because now if we need to migrate from cloud to on-premise, or just upgrade or move the specific server which holds those data I will have an extra headache. Also this is a single point of failure, or best case a cause for fragmented technology introduced just for the sake of keeping things out of the DB. This is managed as an hierarchical disk structure, so
Re: Logical replication, need to reclaim big disk space
On 20/05/25 12:58, Achilleas Mantzios wrote: Στις 20/5/25 12:17, ο/η Moreno Andreo έγραψε: On 19/05/25 20:49, Achilleas Mantzios wrote: On 19/5/25 17:38, Moreno Andreo wrote: On 19/05/25 14:41, Achilleas Mantzios wrote: On 5/19/25 09:14, Moreno Andreo wrote: On 16/05/25 21:33, Achilleas Mantzios wrote: On 16/5/25 18:45, Moreno Andreo wrote: Hi, we are moving our old binary data approach, moving them from bytea fields in a table to external storage (making database smaller and related operations faster and smarter). In short, we have a job that runs in background and copies data from the table to an external file and then sets the bytea field to NULL. (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) This results, at the end of the operations, to a table that's less than one tenth in size. We have a multi-tenant architecture (100s of schemas with identical architecture, all inheriting from public) and we are performing the task on one table per schema. So? toasted data are kept on separate TOAST tables, unless those bytea cols are selected, you won't even touch them. I cannot understand what you are trying to achieve here. Years ago, when I made the mistake to go for a coffee and let my developers "improvise" , the result was a design similar to what you are trying to achieve. Years after, I am seriously considering moving those data back to PostgreSQL. The "related operations" I was talking about are backups and database maintenance when needed, cluster/replica management, etc. With a smaller database size they would be easier in timing and effort, right? Ok, but you'll lose replica functionality for those blobs, which means you don't care about them, correct me if I am wrong. I'm not saying I don't care about them, the opposite, they are protected with Object Versioning and soft deletion, this should assure a good protection against e.g. ransomware, if someone manages to get in there (and if this happens, we'll have bigger troubles than this). PostgreSQL has become very popular because of ppl who care about their data. Yeah, it's always been famous for its robustness, and that's why I chose PostgreSQL more than 10 years ago, and, in spite of how a "normal" user treats his PC, we never had corruption (only where FS/disk were failing, but that's not PG fault) We are mostly talking about costs, here. To give things their names, I'm moving bytea contents (85% of total data) to files into Google Cloud Storage buckets, that has a fraction of the cost of the disks holding my database (on GCE, to be clear ). May I ask the size of the bytea data (uncompressed) ?. single records vary from 150k to 80 MB, the grand total is more than 8,5 TB in a circa 10 TB data footprint This data is not accessed frequently (just by the owner when he needs to do it), so no need to keep it on expensive hardware. I've already read in these years that keeping many big bytea fields in databases is not recommended, but might have misunderstood this. Ok, I assume those are unimportant data, but let me ask, what is the longevity or expected legitimacy of those ? I haven't worked with those just reading : https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing would you choose e.g. "*Anywhere Cache storage" ? * Absolutely not, this is *not* unimportant data, and we are using Standard Storage, for 0,02$/GB/month + operations, that compared to a 0.17$/GB/month of an SSD or even more for the Hyperdisks we are using, is a good price drop. How about hosting your data in your own storage and spend 0$/GB/month ? If we could host on our own hardware I'd not be here talking. Maybe we would have a 10-node full-mesh multimaster architecture with barman backup on 2 separate SANs. But we are a small company that has to balance performance, consistency, security and, last but not latter, costs. And margins are tightening. ** Another way would have been to move these tables to a different tablespace, in cheaper storage, but it still would have been 3 times the buckets cost. can you actually mount those Cloud Storage Buckets under a supported FS in linux and just move them to tablespaces backed by this storage ? Never tried, I mounted this via FUSE and had some simple operations in the past, but not sure it can handle database operations in terms of I/O bandwidth Why are you considering to get data back to database tables? Because now if we need to migrate from cloud to on-premise, or just upgrade or move the specific server which holds those data I will have an extra headache. Also this is a single point of failure, or best case a cause for fragmented technology introduced jus
Re: Logical replication, need to reclaim big disk space
On 16/05/25 21:33, Achilleas Mantzios wrote: On 16/5/25 18:45, Moreno Andreo wrote: Hi, we are moving our old binary data approach, moving them from bytea fields in a table to external storage (making database smaller and related operations faster and smarter). In short, we have a job that runs in background and copies data from the table to an external file and then sets the bytea field to NULL. (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) This results, at the end of the operations, to a table that's less than one tenth in size. We have a multi-tenant architecture (100s of schemas with identical architecture, all inheriting from public) and we are performing the task on one table per schema. So? toasted data are kept on separate TOAST tables, unless those bytea cols are selected, you won't even touch them. I cannot understand what you are trying to achieve here. Years ago, when I made the mistake to go for a coffee and let my developers "improvise" , the result was a design similar to what you are trying to achieve. Years after, I am seriously considering moving those data back to PostgreSQL. The "related operations" I was talking about are backups and database maintenance when needed, cluster/replica management, etc. With a smaller database size they would be easier in timing and effort, right? We are mostly talking about costs, here. To give things their names, I'm moving bytea contents (85% of total data) to files into Google Cloud Storage buckets, that has a fraction of the cost of the disks holding my database (on GCE, to be clear ). This data is not accessed frequently (just by the owner when he needs to do it), so no need to keep it on expensive hardware. I've already read in these years that keeping many big bytea fields in databases is not recommended, but might have misunderstood this. Another way would have been to move these tables to a different tablespace, in cheaper storage, but it still would have been 3 times the buckets cost. Why are you considering to get data back to database tables? The problem is: this is generating BIG table bloat, as you may imagine. Running a VACUUM FULL on an ex-22GB table on a standalone test server is almost immediate. If I had only one server, I'll process a table a time, with a nightly script, and issue a VACUUM FULL to tables that have already been processed. But I'm in a logical replication architecture (we are using a multimaster system called pgEdge, but I don't think it will make big difference, since it's based on logical replication), and I'm building a test cluster. So you use PgEdge , but you wanna lose all the benefits of multi-master , since your binary data won't be replicated ... I don't think I need it to be replicated, since this data cannot be "edited", so either it's there or it's been deleted. Buckets have protections for data deletions or events like ransomware attacks and such. Also multi-master was an absolute requirement one year ago because of a project we were building, but it has been abandoned and now a simple logical replication would be enough, but let's do one thing a time. I've been instructed to issue VACUUM FULL on both nodes, nightly, but before proceeding I read on docs that VACUUM FULL can disrupt logical replication, so I'm a bit concerned on how to proceed. Rows are cleared one a time (one transaction, one row, to keep errors to the record that issued them) PgEdge is based on the old pg_logical, the old 2ndQuadrant extension, not the native logical replication we have since pgsql 10. But I might be mistaken. Don't know about this, it keeps running on latest pg versions (we are about to upgrade to 17.4, if I'm not wrong), but I'll ask I read about extensions like pg_squeeze, but I wonder if they are still not dangerous for replication. What's pgEdge take on that, I mean the bytea thing you are trying to achieve here. They are positive, it's they that suggested to do VACUUM FULL on both nodes... I'm quite new to replication, so I'm searching some advise here. Thanks for your help. Moreno.-
Re: Logical replication, need to reclaim big disk space
On 19/05/25 14:41, Achilleas Mantzios wrote: On 5/19/25 09:14, Moreno Andreo wrote: On 16/05/25 21:33, Achilleas Mantzios wrote: On 16/5/25 18:45, Moreno Andreo wrote: Hi, we are moving our old binary data approach, moving them from bytea fields in a table to external storage (making database smaller and related operations faster and smarter). In short, we have a job that runs in background and copies data from the table to an external file and then sets the bytea field to NULL. (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) This results, at the end of the operations, to a table that's less than one tenth in size. We have a multi-tenant architecture (100s of schemas with identical architecture, all inheriting from public) and we are performing the task on one table per schema. So? toasted data are kept on separate TOAST tables, unless those bytea cols are selected, you won't even touch them. I cannot understand what you are trying to achieve here. Years ago, when I made the mistake to go for a coffee and let my developers "improvise" , the result was a design similar to what you are trying to achieve. Years after, I am seriously considering moving those data back to PostgreSQL. The "related operations" I was talking about are backups and database maintenance when needed, cluster/replica management, etc. With a smaller database size they would be easier in timing and effort, right? Ok, but you'll lose replica functionality for those blobs, which means you don't care about them, correct me if I am wrong. I'm not saying I don't care about them, the opposite, they are protected with Object Versioning and soft deletion, this should assure a good protection against e.g. ransomware, if someone manages to get in there (and if this happens, we'll have bigger troubles than this) We are mostly talking about costs, here. To give things their names, I'm moving bytea contents (85% of total data) to files into Google Cloud Storage buckets, that has a fraction of the cost of the disks holding my database (on GCE, to be clear ). May I ask the size of the bytea data (uncompressed) ?. single records vary from 150k to 80 MB, the grand total is more than 8,5 TB in a circa 10 TB data footprint This data is not accessed frequently (just by the owner when he needs to do it), so no need to keep it on expensive hardware. I've already read in these years that keeping many big bytea fields in databases is not recommended, but might have misunderstood this. Ok, I assume those are unimportant data, but let me ask, what is the longevity or expected legitimacy of those ? I haven't worked with those just reading : https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing would you choose e.g. "*Anywhere Cache storage" ? * Absolutely not, this is *not* unimportant data, and we are using Standard Storage, for 0,02$/GB/month + operations, that compared to a 0.17$/GB/month of an SSD or even more for the Hyperdisks we are using, is a good price drop. ** Another way would have been to move these tables to a different tablespace, in cheaper storage, but it still would have been 3 times the buckets cost. can you actually mount those Cloud Storage Buckets under a supported FS in linux and just move them to tablespaces backed by this storage ? Never tried, I mounted this via FUSE and had some simple operations in the past, but not sure it can handle database operations in terms of I/O bandwidth Why are you considering to get data back to database tables? Because now if we need to migrate from cloud to on-premise, or just upgrade or move the specific server which holds those data I will have an extra headache. Also this is a single point of failure, or best case a cause for fragmented technology introduced just for the sake of keeping things out of the DB. This is managed as an hierarchical disk structure, so the calling server may be literally everywhere, it just needs an account (or a service account) to get in there The problem is: this is generating BIG table bloat, as you may imagine. Running a VACUUM FULL on an ex-22GB table on a standalone test server is almost immediate. If I had only one server, I'll process a table a time, with a nightly script, and issue a VACUUM FULL to tables that have already been processed. But I'm in a logical replication architecture (we are using a multimaster system called pgEdge, but I don't think it will make big difference, since it's based on logical replication), and I'm building a test cluster. So you use PgEdge , but you wanna lose all the benefits of multi-master , since your binary data won't be replicated ... I don't think I need it