Re: how to know if the sql will run a seq scan

2024-10-16 Thread Vijaykumar Jain
On Wed, 16 Oct 2024 at 02:59, Adrian Klaver 
wrote:

> On 10/15/24 13:50, Vijaykumar Jain wrote:
> > Sorry top posting, coz Gmail app on phone.
> >
> > Yeah, my point was for example we have a large table and we are
> > attaching a table as a partition. Now it will scan the whole table to
> > validate the constraint and that will create all sorts of problems.
>
> Now you have changed the problem description.
>
> To get a proper answer you will need to provide a more detailed
> description of what you are doing with the following information:
>
> 1) Postgres version.
>
> 2) Definition of 'large'.
>
> 3) The command/process being used to create the partition.
>
> 4) The actual constraint definition.
>
> 5) The table definition.
>
>
/*
postgres=# create table t(col1 int) partition by list(col1);
CREATE TABLE
postgres=# create table t1(col1 int)
postgres-# ;
CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 10) x;
INSERT 0 10
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan | age | seq_tup_read
-+--+---+-+--
 t1  |0 |   | |0
(1 row)

postgres=# alter table t1 add constraint col10 check (col1 = 0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 |
   10
(1 row)

postgres=# -- this results in a seq scan , which is ok, but then when i
attach the partition it does a seq scan again
postgres=# alter table t attach partition t1 for values in (0);
 ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 |
   20
(1 row)

postgres=# -- why , when there is a constraint that helps with the
partition boundary/value

postgres=# alter table t detach partition t1;
ALTER TABLE

postgres=# alter table t attach partition t1 for values in (0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 |
   30
(1 row)

-- despite there being a constraint, it does a full table scan to attach
the partition. why ? note the tup read is full table of t1.

*/

above is one of the cases i found.
my core question still was, how do i know which statement will cause a
full table rewrite
full table scan

how do i get to know that. i know implictly i can use the above stat tables
and pg_rel_filepath function etc to figure out the change in oid , update
in seq count etc.
but i want to pin point which statement made what change among 100 other
statements in production.

I mean is there a way that a certain alter table will do a table rewrite on
disk and other alter table will not.
access exclusive lock on tables does not help answer that question.

if i am not clear, maybe ignore my question. i have some issues explaining
things clearly, so i try to use demos.







Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain 


Re: What are best practices wrt passwords?

2024-10-16 Thread mbork
On 2024-10-16, at 14:41, Dominique Devienne  wrote:

> On Wed, Oct 16, 2024 at 2:25 PM  wrote:
>> I'd like to be able to use psql without typing passwords again and
>> again.  I know about `.pgpass` and PGPASSFILE, but I specifically do not
>> want to use it - I have the password in the `.env` file, and having it
>> in _two_ places comes with its own set of problems, like how to make
>> sure they don't get out of sync.
>
> What's wrong with PGPASSWORD?
> https://www.postgresql.org/docs/current/libpq-envars.html

`ps auxe` shows all processes with their environments, no?

>> I understand why giving the password on the command line or in an
>> environment variable is a security risk (because of `ps`), but I do not
>> understand why `psql` doesn't have an option like `--password-command`
>> accepting a command which then prints the password on stdout.  For
>> example, I could then use `pass` (https://www.passwordstore.org/) with
>> gpg-agent.
>
> It's not psql, it's libpq, that does that, FTR.

Good point, thanks.

> My own apps are libpq based, and inherit all its env-vars and defaults.
>
> But I'd welcome a way to store password encrypted,
> unlike the current mechanisms. And what you propose
> would allow that I guess, if I understand correctly. So +1.
> (and since transient better than enrypted/obfuscated passwords)
>
>> Is there any risk associated with this usage pattern?  What is the
>> recommended practice in my case other than using `.pgpass`?
>
> Storing password in plain text? --DD

You have to store it somewhere on the server where your application
(which connects to the database) lives anyway, right?  I see no
significant difference wrt security between .env and .pgpass.  (Though
I'm far from a security expert.)

Best,

-- 
Marcin Borkowski
https://mbork.pl
https://crimsonelevendelightpetrichor.net/




Re: What are best practices wrt passwords?

2024-10-16 Thread Bruce Momjian
On Wed, Oct 16, 2024 at 06:16:57PM +0200, mb...@mbork.pl wrote:
> On 2024-10-16, at 14:41, Dominique Devienne  wrote:
> 
> > On Wed, Oct 16, 2024 at 2:25 PM  wrote:
> >> I'd like to be able to use psql without typing passwords again and
> >> again.  I know about `.pgpass` and PGPASSFILE, but I specifically do not
> >> want to use it - I have the password in the `.env` file, and having it
> >> in _two_ places comes with its own set of problems, like how to make
> >> sure they don't get out of sync.
> >
> > What's wrong with PGPASSWORD?
> > https://www.postgresql.org/docs/current/libpq-envars.html
> 
> `ps auxe` shows all processes with their environments, no?

I think that only shows for super-user.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: how to know if the sql will run a seq scan

2024-10-16 Thread Adrian Klaver

On 10/16/24 00:02, Vijaykumar Jain wrote:






postgres=# create table t(col1 int) partition by list(col1);
CREATE TABLE
postgres=# create table t1(col1 int)
postgres-# ;
CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 10) x;
INSERT 0 10
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
= 't1';

  relname | seq_scan | last_seq_scan | age | seq_tup_read
-+--+---+-+--
  t1      |        0 |               |     |            0
(1 row)

postgres=# alter table t1 add constraint col10 check (col1 = 0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
= 't1';
  relname | seq_scan |         last_seq_scan         |       age
| seq_tup_read

-+--+---+--+--
  t1      |        1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 
|       10

(1 row)

postgres=# -- this results in a seq scan , which is ok, but then when i 
attach the partition it does a seq scan again
postgres=# alter table t attach partition t1 for values in (0);  
                                                    ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
= 't1';
  relname | seq_scan |         last_seq_scan         |       age
| seq_tup_read

-+--+---+--+--
  t1      |        2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 
|       20

(1 row)

postgres=# -- why , when there is a constraint that helps with the 
partition boundary/value


postgres=# alter table t detach partition t1;
ALTER TABLE

postgres=# alter table t attach partition t1 for values in (0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
= 't1';
  relname | seq_scan |         last_seq_scan         |       age
| seq_tup_read

-+--+---+--+--
  t1      |        3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 
|       30

(1 row)

-- despite there being a constraint, it does a full table scan to attach 
the partition. why ? note the tup read is full table of t1.


*/

above is one of the cases i found.
my core question still was, how do i know which statement will cause a
full table rewrite
full table scan


I don't have time now to create an example, but I can point you at:

https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

5.12.2.2. Partition Maintenance

"As an alternative to creating a new partition, it is sometimes more 
convenient to create a new table separate from the partition structure 
and attach it as a partition later. ... "


Read the section starting above.



how do i get to know that. i know implictly i can use the above stat 
tables and pg_rel_filepath function etc to figure out the change in oid 
, update in seq count etc.
but i want to pin point which statement made what change among 100 other 
statements in production.


I mean is there a way that a certain alter table will do a table rewrite 
on disk and other alter table will not.

access exclusive lock on tables does not help answer that question.

if i am not clear, maybe ignore my question. i have some issues 
explaining things clearly, so i try to use demos.








Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain 


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: What are best practices wrt passwords?

2024-10-16 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Oct 16, 2024 at 06:16:57PM +0200, mb...@mbork.pl wrote:
>> `ps auxe` shows all processes with their environments, no?

> I think that only shows for super-user.

I believe it depends on your platform --- some BSDen are pretty
permissive about this, if memory serves.  On a Linux box it seems
to work for processes owned by yourself even if you're not superuser.

regards, tom lane




Re: What are best practices wrt passwords?

2024-10-16 Thread Christophe Pettus



> On Oct 16, 2024, at 09:47, Tom Lane  wrote:
> I believe it depends on your platform --- some BSDen are pretty
> permissive about this, if memory serves.  On a Linux box it seems
> to work for processes owned by yourself even if you're not superuser.

I just tried it on an (admittedly kind of old) Ubuntu system and MacOS 14, and 
it looks like shows everything owned by everyone, even from a non-sudoer user.



Re: What are best practices wrt passwords?

2024-10-16 Thread felix . quintgz
Use the PGPASSWORD environment variable.
Example:

SET PGPASSWORD=P0stgres
psql -h localhost -p 5432 -U postgres -d postgres -c "SELECT 1;'"

https://www.postgresql.org/docs/current/libpq-envars.html


 On Wednesday, October 16, 2024 at 08:26:05 AM GMT-4,  wrote:

 Hello all,

I'd like to be able to use psql without typing passwords again and
again.  I know about `.pgpass` and PGPASSFILE, but I specifically do not
want to use it - I have the password in the `.env` file, and having it
in _two_ places comes with its own set of problems, like how to make
sure they don't get out of sync.

I understand why giving the password on the command line or in an
environment variable is a security risk (because of `ps`), but I do not
understand why `psql` doesn't have an option like `--password-command`
accepting a command which then prints the password on stdout.  For
example, I could then use `pass` (https://www.passwordstore.org/) with
gpg-agent.

Is there any risk associated with this usage pattern?  What is the
recommended practice in my case other than using `.pgpass`?

Thanks in advance,

P.S. Please CC me in replies, since I'm not subscribed to the list.
Thanks.

--
Marcin Borkowski
https://mbork.pl
https://crimsonelevendelightpetrichor.net/




Re: Support for dates before 4713 BC

2024-10-16 Thread Richards, Nina
Dear all,


As Alex, I'm involved with the OpenAtlas system (https://openatlas.eu), and in 
a recent meeting the issue about tracking dates before 4713 BC came up again. 
We appreciate all the work you put into developing PostgreSQL and love working 
with it. But as already mentioned in the discussion two years ago by Alex and 
Stefan Eichert (Message ID 
https://www.postgresql.org/message-id/CAJuUXVeNQWm9x7d2_cvzj2rEvtQgNeGF7vog9dmaN1W5e7EDCQ%40mail.gmail.com),
 working with dates before 4713 would facilitate our work tremendously.

Even though we know there was no calendar at that time, it would make our work 
much easier if we could use data before 4713 BC in the same way. Especially for 
statistical analyses and scientific dating methods (14C, dendrochronology), 
this would be a great benefit for us as well as the users of OpenAtlas.


Are there, by any chance, news about this issue? Or are you aware of any 
suitable workarounds on the database level in the meantime?


Thank you so much in advance for all your work. It is highly appreciated! All 
the best,

Nina




 [1674051323205]

Nina Richards MA BSc

Technische Mitarbeiterin OpenAtlas|Technical Staff OpenAtlas


Austrian Centre for Digital Humanities and Cultural Heritage Österreichische 
Akademie der Wissenschaften | Austrian Academy of Sciences


Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria


https://www.oeaw.ac.at/acdh/
https://openatlas.eu






What are best practices wrt passwords?

2024-10-16 Thread mbork
Hello all,

I'd like to be able to use psql without typing passwords again and
again.  I know about `.pgpass` and PGPASSFILE, but I specifically do not
want to use it - I have the password in the `.env` file, and having it
in _two_ places comes with its own set of problems, like how to make
sure they don't get out of sync.

I understand why giving the password on the command line or in an
environment variable is a security risk (because of `ps`), but I do not
understand why `psql` doesn't have an option like `--password-command`
accepting a command which then prints the password on stdout.  For
example, I could then use `pass` (https://www.passwordstore.org/) with
gpg-agent.

Is there any risk associated with this usage pattern?  What is the
recommended practice in my case other than using `.pgpass`?

Thanks in advance,

P.S. Please CC me in replies, since I'm not subscribed to the list.
Thanks.

-- 
Marcin Borkowski
https://mbork.pl
https://crimsonelevendelightpetrichor.net/




Re: What are best practices wrt passwords?

2024-10-16 Thread Alvaro Herrera
On 2024-Oct-16, mb...@mbork.pl wrote:

> I understand why giving the password on the command line or in an
> environment variable is a security risk (because of `ps`), but I do not
> understand why `psql` doesn't have an option like `--password-command`
> accepting a command which then prints the password on stdout.  For
> example, I could then use `pass` (https://www.passwordstore.org/) with
> gpg-agent.

We had a patch to add PGPASSCOMMAND once:
https://www.postgresql.org/message-id/flat/CAE35ztOGZqgwae3mBA%3DL97pSg3kvin2xycQh%3Dir%3D5NiwCApiYQ%40mail.gmail.com

I don't remember the overall conclusions (other than the patch being
rejected), but maybe you can give that a read.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: What are best practices wrt passwords?

2024-10-16 Thread Dominique Devienne
On Wed, Oct 16, 2024 at 2:25 PM  wrote:
> I'd like to be able to use psql without typing passwords again and
> again.  I know about `.pgpass` and PGPASSFILE, but I specifically do not
> want to use it - I have the password in the `.env` file, and having it
> in _two_ places comes with its own set of problems, like how to make
> sure they don't get out of sync.

What's wrong with PGPASSWORD?
https://www.postgresql.org/docs/current/libpq-envars.html

> I understand why giving the password on the command line or in an
> environment variable is a security risk (because of `ps`), but I do not
> understand why `psql` doesn't have an option like `--password-command`
> accepting a command which then prints the password on stdout.  For
> example, I could then use `pass` (https://www.passwordstore.org/) with
> gpg-agent.

It's not psql, it's libpq, that does that, FTR.
My own apps are libpq based, and inherit all its env-vars and defaults.

But I'd welcome a way to store password encrypted,
unlike the current mechanisms. And what you propose
would allow that I guess, if I understand correctly. So +1.
(and since transient better than enrypted/obfuscated passwords)

> Is there any risk associated with this usage pattern?  What is the
> recommended practice in my case other than using `.pgpass`?

Storing password in plain text? --DD




Re: Query performance issue

2024-10-16 Thread yudhi s
On Thu, Oct 17, 2024 at 3:06 AM Peter J. Holzer  wrote:

> On 2024-10-16 23:20:36 +0530, yudhi s wrote:
> > Below is a query which is running for ~40 seconds.
> [...]
> > In the execution path below , the line number marked in bold are the top
> lines
> > for the IN and NOT IN subquery evaluation and they are showing "Actual
> time" as
> >  Approx ~9 seconds and ~8 seconds and they seems to be summed up and the
> top
> > lines showing it to be ~19 seconds. Then onwards it keeps on increasing
> with
> > other "nested loop" joins.
> >
> > Note:- This query is running on a MYSQL 8.0 database. So I'm wondering
> if there
> > is any mysql list similar to Oracle list , in which i can share this
> issue?
>
> The execution plan looks like a postgresql execution plan, not a mysql
> execution plan. Did you run this query on postgresql? That may be
> interesting for comparison purposese, but ultimately it is useless: You
> won't get mysql to work like postgresql, and any tips to speed up this
> query on postgresql (which is all you can expect on a postgresql mailing
> list) probably won't work on mysql.
>
>
>

Agreed. Postgres and mysql may have differences in how the optimizer is
interpreting the stats and coming up with the execution oath. However, I
was looking if the query can be written efficiently by tweaking the current
logic. It's actually spending the majority of the time doing the "IN" and
"NOT IN" evaluation and in that it's using the same exact subquery for the
"UNION ALL" . And the overall execution time is summation of the IN and NOT
IN clause evaluation. So  I was thinking of a better way of writing the
same logically.


Re: how to know if the sql will run a seq scan

2024-10-16 Thread Vijaykumar Jain
>
>
>
> I don't have time now to create an example, but I can point you at:
>
>
> https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
>
> 5.12.2.2. Partition Maintenance
>
> "As an alternative to creating a new partition, it is sometimes more
> convenient to create a new table separate from the partition structure
> and attach it as a partition later. ... "
>
> Read the section starting above.
>

Thanks Adrian,
I tried similar things as in the doc, but I found my mistake.
I casually did not put a constraint of not null on the partition column. as
a result i think to invalidate any of that data, it does a scan despite the
boundary constraint.

which when i used not null for the partition column, the scans stopped.
although my question was asking something else, but maybe i will not be
able to put it clearly.
at least this part got addressed well. thanks for your help and time.



/*
postgres=# drop table t;
DROP TABLE
postgres=# drop table t1;
ERROR:  table "t1" does not exist
postgres=# create table t(col1 int) partition by list(col1);
CREATE TABLE
postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING
CONSTRAINTS);   CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 10) x;
INSERT 0 10
postgres=# alter table t1 add constraint col10 check (col1 = 0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |1 | 2024-10-17 05:33:40.080319+00 | -00:00:02.488752 |
   10
(1 row)

postgres=# alter table t attach partition t1 for values in (0);  --
note the scan count increased as col1 was not set to not null
   ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |2 | 2024-10-17 05:33:48.248001+00 | -00:00:01.510694 |
   20
(1 row)

postgres=# drop table t;
DROP TABLE
postgres=# drop table t1;
ERROR:  table "t1" does not exist
postgres=# create table t(col1 int not null) partition by list(col1);
 CREATE TABLE
postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING
CONSTRAINTS);   CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 10) x;
 INSERT 0 10
postgres=# alter table t1 add constraint col10 check (col1 = 0);

ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age   |
seq_tup_read
-+--+---+-+--
 t1  |1 | 2024-10-17 05:34:41.363401+00 | -00:00:03.19836 |
   10
(1 row)

postgres=# alter table t attach partition t1 for values in (0);  -- note no
scan count bump as not null on column to avoid scanning to filter those
values-
   ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan |   age|
seq_tup_read
-+--+---+--+--
 t1  |1 | 2024-10-17 05:34:41.363401+00 | -00:00:08.241812 |
   10
(1 row)

*/


Re: Support for dates before 4713 BC

2024-10-16 Thread Asad Ali
Hi Nina Richards,

Instead of using PostgreSQL's DATE or TIMESTAMP types, you can store years
as NUMERIC or BIGINT values.
You can manually represent dates before 4713 BC and build custom functions
for date operations like addition, subtraction, or comparison.

To facilitate comparisons or operations on your custom dates (BC/AD), you
can create user-defined functions in PostgreSQL for adding, subtracting, or
comparing dates.

Best Regards,

Asad Ali


On Thu, Oct 17, 2024 at 5:36 AM Tom Lane  wrote:

> David Rowley  writes:
> > It's by no means a trivial thing to do, but it is possible to
> > implement new types in PostgreSQL [1]. If you invented your own type,
> > you could significantly widen the upper and lower bounds when compared
> > with the standard date type.
>
> However, you'd then have to reimplement some large fraction of the
> existing datetime support to have something useful.
>
> We're already inventing freely to use the Gregorian calendar for
> millenia before Pope Gregory lived, so I see no conceptual argument
> not to extend that back even further.
>
> IIRC the stumbling block for not going back past Julian day 0 was
> uncertainty about whether the date2j and j2date algorithms behave
> correctly for negative Julian dates --- which at the time was
> compounded by the fact that C90 was vague about the rounding direction
> for integer division with negative inputs.  Now that we assume C99
> with its well-defined rule for that, at least some of the uncertainty
> is gone.  Somebody would still have to study that code and either
> prove that it's OK or correct it.  And then there would be a
> nontrivial amount of effort to work outwards and fix anything else
> that is assuming that limitation.  So it would take somebody with
> considerable motivation to make it happen, but if such a somebody were
> to appear with a patch, we'd likely take it.  (To be clear, I doubt
> any of the principal current hackers are interested in doing this.)
>
> Now, this would still only get you to a lower-bound date somewhere
> around 30 BC.  If you need to deal with geological or astronomical
> time spans, then yeah you need a new type --- but presumably you would
> not feel a need to tie it to Gregorian calendar dates, so the need to
> reimplement a ton of related logic would not be there.
>
> regards, tom lane
>
>
>


Merging logical subscriptions

2024-10-16 Thread Michał Kłeczek
Hi,

I have a question about an algorithm to “merge” two logical replication 
subscriptions. Here is the scenario:

We have a server with many tables replicated using logical replication to 
downstream replicas.
Each table is published with a separate publication.

At the moment each replica uses a single subscription with many publications.

The problem is that a subscription is a unit of concurrency in logical 
replication and sometimes we need to process replication streams in parallel.
“Splitting” subscriptions so that groups of publications are processed in 
parallel is (quite) easy:

1. Disable existing subscription
2. Clone replication slot
3. Create a new subscription with a subset of first subscription publications 
(don’t create a replication slot, use the replication slot created in step 2, 
don’t copy data)
4. Drop the subset of publications from the first subscription.
5. Enable and refresh the first subscription

But is there a procedure to “merge” subscriptions?

The problem I see is that there is no way to synchronise two subscriptions so 
that they stop at the same LSN.
We can skip transactions by advancing the LSN. But we cannot process 
transactions until a specific LSN.

Any ideas?

Thanks,
Michal



Re: Backup

2024-10-16 Thread Muhammad Usman Khan
Hi,
you can refer the following link where many backup methods are mentioned
according to your need.

Backups in PostgreSQL. In PostgreSQL, different types of… | by Usman Khan |
Aug, 2024 | Medium



On Thu, 17 Oct 2024 at 00:37, Andy Hartman  wrote:

> I am very new to Postgres and have always worked in the mssql world. I'm
> looking for suggestions on DB backups. I currently have a DB used to store
> Historical information that has images it's currently around 100gig.
>
> I'm looking to take a monthly backup as I archive a month of data at a
> time. I am looking for it to be compressed and have a machine that has
> multiple cpu's and ample memory.
>
> Suggestions on things I can try ?
> I did a pg_dump using these parms
> --format=t --blobs lobarch
>
> it ran my device out of storage:
>
> pg_dump: error: could not write to output file: No space left on device
>
> I have 150gig free on my backup drive... can obviously add more
>
> looking for the quickest and smallest backup file output...
>
> Thanks again for help\suggestions
>
>


Re: Logical replication disabled, recovery not possible because of 1 large transaction with schema changes?

2024-10-16 Thread Muhammad Usman Khan
Hi,
When you execute schema-altering operations on the publisher, these changes
are not automatically replicated to the subscriber  which causes the
following error
logical replication target relation "public.dummy_table" is missing
replicated columns: "contact_email", "status", "phone_number", "username"

Before making schema changes, temporarily disable the subscription to
prevent replication errors.
ALTER SUBSCRIPTION your_subscription_name DISABLE;

Manually apply the same schema modifications to the subscriber database to
ensure alignment.

Once the schema changes are applied to both databases, re-enable the
subscription:
ALTER SUBSCRIPTION your_subscription_name ENABLE;


On Thu, 17 Oct 2024 at 02:59, Koen De Groote  wrote:

> If this question is more suitable for another mailing list, please let me
> know.
>
> I've set up the following table on both publisher and subscriber, both are
> pg16:
>
> CREATE TABLE dummy_table (
> id SERIAL PRIMARY KEY,
> name VARCHAR(100) NOT NULL,
> email VARCHAR(100) UNIQUE NOT NULL,
> age INT,
> created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
> );
>
> Added to publication, refreshed subscription.
>
> Add some data on the publisher side:
> INSERT INTO dummy_table (name, email, age)
> VALUES
> ('John Doe', 'john@example.com', 25),
> ('Jane Smith', 'jane.sm...@example.com', 30),
> ('Michael Johnson', 'michae...@example.com', 45),
> ('Emily Davis', 'emil...@example.com', 27),
> ('Robert Brown', 'robert.br...@example.com', 40);
>
> The data can be seen on the subscriber. So far, so good.
>
> I then execute the following patch on the publisher:
> https://gist.github.com/KoenDG/d1c06d8c740c64e4e5884d0c64b81f11
>
> It is a single transaction that does the following:
>
> 1/ Insert data, 1000 items
> 2/ Drop a column
> 3/ Alter a column name
> 4/ Add 2 columns, nullable
> 5/ Add a column and give it a unique constraint
> 6/ Update values for a column with NULL values, added in step 4.
> 7/ Set the column updated in step 6 to be NOT NULL
> 8/ Create a unique index with the columns from step 3 and 6
> 9/ Insert a column with a default value
> 10/ Insert data for this schema, another 1000 items.
>
> The subscription disabled, this is to be expected, there are new columns
> names, the schema needs to be updated on the subscriber side.
>
> However, it seems I'm stuck.
>
> I can't enable the subscription. This is to be expected, it will try to
> resume and run into the same issues.
>
> Ok, I update the schema and enable again. It runs into an error for the
> inserts of step 1. These set values for columns dropped in step 2.
>
> I revert to the old schema and enable again. It runs into an error again,
> this time for values that don't exist yet at step 1.
>
> I tried dropping the table at the subscriber side, recreating the correct
> schema, but this runs into the same error.
>
> I remove the table from the publication and retry. Same error. Even with
> the table no longer in the publication, and the table on the subscriber
> side dropped and re-created, I'm still getting the exact same errors of
> "logical replication target relation "public.dummy_table" is missing
> replicated columns: "contact_email", "status", "phone_number", "username""
>
>
> The only solution I've found is to drop the table from the publication,
> and then drop the entire subscription and set it back up again, with the
> correct schema.
>
> Am I making a mistake? Or does putting all these commands in a single
> transaction ruin my chances?
>
> Clarification much appreciated.
>
> Regards,
> Koen De Groote
>
>


Re: Backup

2024-10-16 Thread Asad Ali
Hi Andy,

I hope you're doing well. Based on your inquiry about PostgreSQL backups
for your 100GB historical database with images, here are some suggestions
that should help you achieve compressed, efficient backups without running
into storage issues.

*1. Use Custom Format with Compression*
A more efficient option would be to use the custom format (-Fc) with
compression. You can also adjust the compression level and make use of your
machine's multiple CPUs by using parallel jobs:

pg_dump -Fc -Z 9 -j 4 --blobs -f /path/to/backup/file.dump
your_database_name

   - -Fc: Custom format (supports compression and flexible restore options).
   - -Z 9: Maximum compression level (0-9 scale).
   - -j 4: Number of parallel jobs (adjust based on CPU cores).
   - --blobs: Includes large objects (important for your images).

This approach should give you a smaller backup file with faster performance.

*2. Splitting Backups into Parts*
If you're concerned about running out of storage space, consider splitting
the backup by table or schema, allowing more control over the backup size:

pg_dump -Fc --table=my_large_table -f /path/to/backup/my_large_table.dump
your_database_name
pg_dump -Fc --table=other_table -f /path/to/backup/other_table.dump
your_database_name

This can be helpful when you archive different tables or sections of data.

*3. External Compression Tools*
If you need additional compression, you can pipe the pg_dump output through
an external compression tool like gzip:

pg_dump -Fc --blobs your_database_name | gzip > /path/to/backup/file.dump.gz

This should further reduce the size of your backups.

*4. Consider Alternative Backup Methods*
- Explore other backup methods like `*pgBackRest` or `WAL-E`*. These tools
are specifically designed for PostgreSQL backups and offer features like
incremental backups and point-in-time recovery
pgbackrest --stanza=your-database --type=full --compress-type=zst
--compress-level=6 --process-max=4 backup

- You can use *pg_basebackup* for PostgreSQL backups, but it has
limitations compared to tools like pgBackRest. While pg_basebackup is easy
to use and built-in with PostgreSQL, it is primarily designed for physical
backups (base backups) and doesn't offer as many advanced features such as
incremental backups, sophisticated compression, or parallelism.

However, it does support basic compression and can be used for full backups.

pg_basebackup -D /path/to/backup/directory -F t -z -Z 9 -P -X stream

   - -D: The destination directory for the backup.
   - -F t: Specifies the tar format for the backup, which is required for
   compression.
   - -z: Compresses the output.
   - -Z 9: Compression level (0–9, where 9 is the highest).
   - -P: Shows the progress of the backup.
   - -X stream: Includes the WAL files needed to make the backup consistent
   (important for recovery).

pg_basebackup -D /backups/db_backup -F t -z -Z 9 -P -X stream
This command will take a full physical backup of the database, compress the
output using gzip, and store the backup in a tarball.

*5. Automating Backups*
Since you need monthly backups, I recommend automating this process with a
cron job. For example, you can set this up to run on the 1st of every month
at 2 AM:

0 2 1 * * /usr/bin/pg_dump -Fc -Z 9 -j 4 --blobs -f
/path/to/backup/file.dump your_database_name

*6. Monitoring Disk Usage & * *Backup Performance*
Finally, it's important to monitor your available storage. You can either
ensure you have enough free space or consider moving older backups to
external or cloud storage to free up space.
Use monitoring tools to track the performance of your backups. This will
help you identify any potential bottlenecks and optimize the backup process.

I hope this helps you create smaller and quicker backups for your
PostgreSQL database. Let me know if you have any questions or need further
assistance!

Best regards,

Asad Ali


On Thu, Oct 17, 2024 at 12:38 AM Andy Hartman 
wrote:

> I am very new to Postgres and have always worked in the mssql world. I'm
> looking for suggestions on DB backups. I currently have a DB used to store
> Historical information that has images it's currently around 100gig.
>
> I'm looking to take a monthly backup as I archive a month of data at a
> time. I am looking for it to be compressed and have a machine that has
> multiple cpu's and ample memory.
>
> Suggestions on things I can try ?
> I did a pg_dump using these parms
> --format=t --blobs lobarch
>
> it ran my device out of storage:
>
> pg_dump: error: could not write to output file: No space left on device
>
> I have 150gig free on my backup drive... can obviously add more
>
> looking for the quickest and smallest backup file output...
>
> Thanks again for help\suggestions
>
>


Query performance issue

2024-10-16 Thread yudhi s
Hi,

Below question got in one discussion.Appreciate any guidance on this.

Below is a query which is running for ~40 seconds. As it's a query which is
executed from UI , we were expecting it to finish in <~5 seconds. It has a
"IN" and a "NOT IN" subquery , from the execution path it seems the total
response time is mainly, to be sum of the "IN" and the "NOT IN" subquery
section. My thought was that both "IN" and "NOT IN" should be
executed/evaluated in parallel but not in serial fashion.

In the execution path below , the line number marked in *bold* are the top
lines for the IN and NOT IN subquery evaluation and they are showing
"Actual time" as  Approx ~9 seconds and ~8 seconds and they seems to be
summed up and the top lines showing it to be ~19 seconds. Then onwards it
keeps on increasing with other "nested loop" joins.

*Note*:- This query is running on a MYSQL 8.0 database. So I'm wondering if
there is any mysql list similar to Oracle list , in which i can share this
issue?

Added the query in below path:-

https://gist.github.com/databasetech0073/95bce00c3a6bd4ae8d195401e0383185

SELECT ..
FROM R_CON_ESTS RC_STS,
 R_CON rc,
 D_LKP D_LKP_STS,
D_LKP D_LKP_FRQ,
 (select RCE.DRV_DT, RCE.AID, RCE.R_CON_ID, RCE.R_CON_VER_NB
from R_CON_E RCE
where RCE.MTNE_ID in (SELECT  MI1.MTNE_ID
  FROM M_INF mi1 WHERE MI1.AID = :AID
  UNION
  SELECT rg.RG_MF_SK_ID
 from RG_M_F_INF rg where rg.AS_ID =:AID
  UNION
  SELECT fti.FT_SRK_ID
from M_FT fti where fti.AS_ID= :AID
)
 and (RCE.DRV_DT, RCE.AID, RCE.R_CON_ID, RCE.R_CON_ver_nb) NOT IN
 (SELECT RCE_NS.DRV_DT, RCE_NS.AID, RCE_NS.R_CON_ID, RCE_NS.R_CON_VER_NB
  FROM R_CON_E RCE_NS
  WHERE RCE_NS.MTNE_ID NOT IN (select MI2.MTNE_ID
 from M_INF MI2  where MI2.AID = :AID
  UNION
  SELECTRG2.RG_MF_SK_ID
from RG_M_F_INF RG2 where   RG2.AS_ID =:AID
 UNION
 SELECTFTI1.FT_SRK_ID
 from M_FT FTI1  where FTI1.AS_ID= :AID
))
) b
where RC_STS.RR_FRQ_NB = D_LKP_FRQ.D_LKP_NB
  and RC_STS.R_CON_ESTS_NB = D_LKP_STS.D_LKP_NB
  and RC_STS.R_CON_ID = rc.R_CON_ID
  and RC_STS.R_CON_VER_NB = rc.R_CON_VER_NB
  and RC_STS.AID = rc.AID
  and RC_STS.AID = b.AID
  and RC_STS.R_CON_ID = b.R_CON_ID
  and RC_STS.R_CON_VER_NB = b.R_CON_VER_NB
order by 3,4,2;




-> Sort: RC_STS.R_CON_ID, RC_STS.R_CON_VER_NB, RC_STS.R_EX_RID  (actual
time=44392.655..44644.844 rows=745483 loops=1)
-> Stream results  (cost=311479029610.37 rows=860847650219) (actual
time=8957.556..42133.969 rows=745483 loops=1)
-> Nested loop inner join  (cost=311479029610.37 rows=860847650219)
(actual time=8957.548..40891.903 rows=745483 loops=1)
-> Nested loop inner join  (cost=225393084569.25
rows=860847650219) (actual time=8957.541..40597.741 rows=745483 loops=1)
-> Nested loop inner join  (cost=139307139528.12
rows=860847650219) (actual time=8957.530..40092.267 rows=745483 loops=1)
-> Nested loop antijoin  (cost=53221194487.00
rows=532199430400) (actual time=8957.477..29529.382 rows=671352 loops=1)


*-> Nested loop inner join  (cost=886687.00
rows=729520) (actual time=0.123..19714.306 rows=692583 loops=1)
-> Filter: (RCE.MTNE_ID,(select #3))
 (cost=84215.00 rows=729520) (actual time=0.085..9045.124 rows=692583
loops=1)*-> Covering index scan on RCE
using R_58  (cost=84215.00 rows=729520) (actual time=0.055..534.110
rows=742706 loops=1)
-> Select #3 (subquery in condition;
dependent)
-> Limit: 1 row(s)  (cost=4.41..4.41
rows=1) (actual time=0.010..0.010 rows=1 loops=742706)
-> Table scan on 
 (cost=4.41..5.70 rows=2) (actual time=0.010..0.010 rows=1 loops=742706)
-> Union materialize with
deduplication  (cost=3.18..3.18 rows=2) (actual time=0.010..0.010 rows=1
loops=742706)
-> Limit table size: 1
unique row(s)
   -> Limit: 1 row(s)
 (cost=1.13 rows=1) (actual time=0.006..0.006 rows=1 loops=742706)
-> Covering index
lookup on mi1 using M_INF_AID_index (AID='XXX',
MTNE_ID=(RCE.MTNE_ID))  (cost=1.13 rows=1) (actual time=0.006..0.006
rows=1 loops=742706)
-> Limit table size: 1
unique row(s)
-> Limit: 1 row(s)
 (cost=1.10 rows=1) (actual time=0.003..0.003 rows=1 loops=132294)
-> Single-row
covering index lookup on rg using PRIMARY
(RG_MF_SK_ID=(RCE.MTNE_ID), AS_ID='XXX')  (cost=1.10
rows=1) (actual time=0.003..0.003 rows=1 loops=132294)
-> Limit table size: 1
unique row(s)
-> Limit: 1 row(s)
 (cost=0.74 rows

Re: What are best practices wrt passwords?

2024-10-16 Thread Matthew Tice



> On Oct 16, 2024, at 10:50 AM, Christophe Pettus  wrote:
> 
> 
> 
>> On Oct 16, 2024, at 09:47, Tom Lane  wrote:
>> I believe it depends on your platform --- some BSDen are pretty
>> permissive about this, if memory serves.  On a Linux box it seems
>> to work for processes owned by yourself even if you're not superuser.
> 
> I just tried it on an (admittedly kind of old) Ubuntu system and MacOS 14, 
> and it looks like shows everything owned by everyone, even from a non-sudoer 
> user.
> 
Interesting, that’s not my experience.  Only root can see the env variables of 
another user.

Terminal 1

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ whoami
testusr

$ export FOOBAR=true

$ bash

$ env | grep FOOBAR
FOOBAR=true

Terminal 2
$  whoami
mtice

$  ps e -U testusr | grep -c FOOBAR
0

$  sudo ps e -U testusr | grep -c FOOBAR
1





Re: Backup

2024-10-16 Thread Achilleas Mantzios


Στις 16/10/24 22:55, ο/η Ron Johnson έγραψε:
On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman  
wrote:


I am very new to Postgres and have always worked in the mssql
world. I'm looking for suggestions on DB backups. I currently have
a DB used to store Historical information that has images it's
currently around 100gig.

I'm looking to take a monthly backup as I archive a month of data
at a time. I am looking for it to be compressed and have a machine
that has multiple cpu's and ample memory.

Suggestions on things I can try ?
I did a pg_dump using these parms
--format=t --blobs lobarch

it ran my device out of storage:

pg_dump: error: could not write to output file: No space left on
device

I have 150gig free on my backup drive... can obviously add more

looking for the quickest and smallest backup file output...

Thanks again for help\suggestions


Step 1: redesign your DB to *NOT* use large objects.  It's an old, 
slow and unmaintained data type.  The data type is what you should use.
You mean bytea I guess. As a side note, (not a fan of LOs), I had the 
impression that certain drivers such as the JDBC support streaming for 
LOs but not for bytea? It's been a while I haven't hit the docs tho.


Step 2: show us the "before" df output, the whole pg_dump command, and 
the "after" df output when it fails. "du -c --max-depth=0 
$PGDATA/base" also very useful.


And tell us what version you're using.

--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: Backup

2024-10-16 Thread Ron Johnson
On Wed, Oct 16, 2024 at 3:52 PM Tomas Vondra  wrote:
[snip]

> You didn't specify the Postgres version - that matters, because older
> pg_dump versions (before PG 16) do not support compression. Since PG 16
> you can use either -Fc or -Fd (instead of the tar format), and it'll
> compress the output using gzip.
>

Not true.  pg_dump has had built-in gzipping of directory format backups
since at least 9.6.  Probably earlier.
-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


Re: Backup

2024-10-16 Thread Ron Johnson
On Wed, Oct 16, 2024 at 4:00 PM Achilleas Mantzios <
a.mantz...@cloud.gatewaynet.com> wrote:

> Στις 16/10/24 22:55, ο/η Ron Johnson έγραψε:
>
> On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman 
> wrote:
>
>> I am very new to Postgres and have always worked in the mssql world. I'm
>> looking for suggestions on DB backups. I currently have a DB used to store
>> Historical information that has images it's currently around 100gig.
>>
>> I'm looking to take a monthly backup as I archive a month of data at a
>> time. I am looking for it to be compressed and have a machine that has
>> multiple cpu's and ample memory.
>>
>> Suggestions on things I can try ?
>> I did a pg_dump using these parms
>> --format=t --blobs lobarch
>>
>> it ran my device out of storage:
>>
>> pg_dump: error: could not write to output file: No space left on device
>>
>> I have 150gig free on my backup drive... can obviously add more
>>
>> looking for the quickest and smallest backup file output...
>>
>> Thanks again for help\suggestions
>>
>
> Step 1: redesign your DB to *NOT* use large objects.  It's an old, slow
> and unmaintained data type.  The data type is what you should use.
>
> You mean bytea I guess. As a side note, (not a fan of LOs), I had the
> impression that certain drivers such as the JDBC support streaming for LOs
> but not for bytea? It's been a while I haven't hit the docs tho.
>

Our database is stuffed with images in bytea fields.  The Java application
uses JDBC and handles them just fine.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


Re: Backup

2024-10-16 Thread Adrian Klaver

On 10/16/24 12:52, Tomas Vondra wrote:

On 10/16/24 21:37, Andy Hartman wrote:

I am very new to Postgres and have always worked in the mssql world. I'm



You didn't specify the Postgres version - that matters, because older
pg_dump versions (before PG 16) do not support compression. Since PG 16
you can use either -Fc or -Fd (instead of the tar format), and it'll
compress the output using gzip.


Oh, it goes back further then that:

https://www.postgresql.org/docs/7.1/app-pgdump.html

"-F format, --format=format

Format can be one of the following:

p

output a plain text SQL script file (default)
t

output a tar archive suitable for input into pg_restore. Using 
this archive format allows reordering and/or exclusion of schema 
elements at the time the database is restored. It is also possible to 
limit which data is reloaded at restore time.

c

output a custom archive suitable for input into pg_restore. 
This is the most flexible format in that it allows reordering of data 
load as well as schema elements. This format is also compressed by default.


"



Alternatively, you can use --compress=method:level (the supported
methods depend on how the packages were built, no idea what platform
you're on etc.). See

   https://www.postgresql.org/docs/current/app-pgdump.html

If you're on older version, you should be able to write the dump to
standard output, and compress that way. Something like

   pg_dump -Fc | gzip -c > compressed.dump.gz

However, be aware that pg_dump is more an export tool than a backup
suitable for large databases / quick recovery. It won't allow doing PITR
and similar stuff.


regards



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: serializable master and non-serializable hot standby: feasible set up?

2024-10-16 Thread Jacob Biesinger
>
> Just out of curiosity, what’s the use case for this?
>
>
We use the read-only replica as a data source for analytics workflows. The
setup allows us to have fresh data without affecting performance of the
production environment.


Re: serializable master and non-serializable hot standby: feasible set up?

2024-10-16 Thread Jacob Biesinger
On Tue, Oct 15, 2024 at 9:23 PM Laurenz Albe 
wrote:

> On Tue, 2024-10-15 at 16:27 -0700, Jacob Biesinger wrote:
> > *would you* expect to be able to stand up a `repeatable read` replica
> against a
> > `serializable` master? My expectation is that you'd simply change the
> setting in
> > a .conf file on the replica and be good to go; is there something that
> would make
> > this process really difficult / impossible?
>
> I expect that to work fine, at least I cannot think of a problem with such
> a setup.
> But I have been wrong before, so test it.
>

The setup (serializable master, repeatable read replica) definitely works
-- we've been running that way for over a year now. I guess I'm really
asking "how would you go about getting the replica into the appropriate
state?" Would you expect to have to downgrade the master's isolation level
as I describe? Or would you expect to be able to stand up the replica using
a modified conf file initially?

Thanks as always for your help!


Re: Support for dates before 4713 BC

2024-10-16 Thread David Rowley
On Thu, 17 Oct 2024 at 01:26, Richards, Nina  wrote:
> Even though we know there was no calendar at that time, it would make our 
> work much easier if we could use data before 4713 BC in the same way. 
> Especially for statistical analyses and scientific dating methods (14C, 
> dendrochronology), this would be a great benefit for us as well as the users 
> of OpenAtlas.
>
>
> Are there, by any chance, news about this issue? Or are you aware of any 
> suitable workarounds on the database level in the meantime?

It's by no means a trivial thing to do, but it is possible to
implement new types in PostgreSQL [1]. If you invented your own type,
you could significantly widen the upper and lower bounds when compared
with the standard date type.

David

[1] https://www.postgresql.org/docs/current/sql-createtype.html




Re: Support for dates before 4713 BC

2024-10-16 Thread Tom Lane
David Rowley  writes:
> It's by no means a trivial thing to do, but it is possible to
> implement new types in PostgreSQL [1]. If you invented your own type,
> you could significantly widen the upper and lower bounds when compared
> with the standard date type.

However, you'd then have to reimplement some large fraction of the
existing datetime support to have something useful.

We're already inventing freely to use the Gregorian calendar for
millenia before Pope Gregory lived, so I see no conceptual argument
not to extend that back even further.

IIRC the stumbling block for not going back past Julian day 0 was
uncertainty about whether the date2j and j2date algorithms behave
correctly for negative Julian dates --- which at the time was
compounded by the fact that C90 was vague about the rounding direction
for integer division with negative inputs.  Now that we assume C99
with its well-defined rule for that, at least some of the uncertainty
is gone.  Somebody would still have to study that code and either
prove that it's OK or correct it.  And then there would be a
nontrivial amount of effort to work outwards and fix anything else
that is assuming that limitation.  So it would take somebody with
considerable motivation to make it happen, but if such a somebody were
to appear with a patch, we'd likely take it.  (To be clear, I doubt
any of the principal current hackers are interested in doing this.)

Now, this would still only get you to a lower-bound date somewhere
around 30 BC.  If you need to deal with geological or astronomical
time spans, then yeah you need a new type --- but presumably you would
not feel a need to tie it to Gregorian calendar dates, so the need to
reimplement a ton of related logic would not be there.

regards, tom lane




Re: Help in dealing with OOM

2024-10-16 Thread Siraj G
Thanks Joe, I will set these kernel parameters.

I also would like to highlight that the issue happened on SECONDARY. While
the PRIMARY has less memory and computation in comparison to SECONDARY, not
sure if there is anything wrong in the PgSQL.

PRIMARY: 48vCPUs & 48GB memory
SECONDARY: 64vCPUs & 64GB memory

I noticed a few things which do not sound tidy:
1. Total number of DBs are: 1860  (DB environment serves a product that has
tenants - around 1100 tenants which means these many DBs are active)
 : Is there any metric for optimal performance on the number of DBs we
should have per instance? I would assume NO (and it should be purely based
on the overall operations), but just a question out of curiosity.
2. max_connections is set to 1.
I tried to reduce it to 4000 but was unable to do so (I tried this after
reducing the max_connections in PRIMARY to 4000). This is the error:
FATAL:  hot standby is not possible because max_connections = 4000 is a
lower setting than on the master server (its value was 1)

If I am clubbing multiple things, sorry for the clutter.

Regards
Siraj

On Tue, Oct 15, 2024 at 12:39 AM Joe Conway  wrote:

> On 10/14/24 14:37, Siraj G wrote:
> > This is from the OS log (/var/log/kern.log):
> >
> > oom-
> >
> kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.sli
>  ce/system-postgresql.slice/postgresql@12-main.service
> ,task=postgres,pid=2334587,uid=114
> >494 Oct 14 09:58:10 gce-k12-prod-as1-erp-pg-secondary kernel:
> > [6905020.514569] Out of memory: Killed process 2334587 (postgres) total-
> > vm:26349584kB, anon-rss:3464kB, file-rss:0kB, shmem-rs
> >   s:21813032kB, UID:114 pgtables:49024kB oom_score_adj:0
>
>
> 1. Do you happen to have swap disabled? If so, don't do that.
>
> 2. Does the postgres cgroup have memory.limit (cgroup v1) or memory.max
> (cgroup v2) set?
>
> 3. If #2 answer is no, have you followed the documented guidance here
> (in particular vm.overcommit_memory=2):
>
>
>
> https://www.postgresql.org/docs/12/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
>
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: Backup

2024-10-16 Thread Achilleas Mantzios


Στις 16/10/24 23:02, ο/η Ron Johnson έγραψε:
On Wed, Oct 16, 2024 at 4:00 PM Achilleas Mantzios 
 wrote:


Στις 16/10/24 22:55, ο/η Ron Johnson έγραψε:


On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman
 wrote:

I am very new to Postgres and have always worked in the mssql
world. I'm looking for suggestions on DB backups. I currently
have a DB used to store Historical information that has
images it's currently around 100gig.

I'm looking to take a monthly backup as I archive a month of
data at a time. I am looking for it to be compressed and have
a machine that has multiple cpu's and ample memory.

Suggestions on things I can try ?
I did a pg_dump using these parms
--format=t --blobs lobarch

it ran my device out of storage:

pg_dump: error: could not write to output file: No space left
on device

I have 150gig free on my backup drive... can obviously add more

looking for the quickest and smallest backup file output...

Thanks again for help\suggestions


Step 1: redesign your DB to *NOT* use large objects.  It's an
old, slow and unmaintained data type.  The data type is what you
should use.

You mean bytea I guess. As a side note, (not a fan of LOs), I had
the impression that certain drivers such as the JDBC support
streaming for LOs but not for bytea? It's been a while I haven't
hit the docs tho.


Our database is stuffed with images in bytea fields.  The Java 
application uses JDBC and handles them just fine.
Likewise, but the "streaming" part is still not clear to me, unless one 
reads the newest JDBC source. Lots of problems due to image explosion, 
java app heap space exhaustion and the like.

--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: Backup

2024-10-16 Thread Peter J. Holzer
On 2024-10-16 16:02:24 -0400, Ron Johnson wrote:
> On Wed, Oct 16, 2024 at 4:00 PM Achilleas Mantzios <
> a.mantz...@cloud.gatewaynet.com> wrote:
> Στις 16/10/24 22:55, ο/η Ron Johnson έγραψε:
> On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman 
> wrote:
[...]
> 
> Step 1: redesign your DB to NOT use large objects.  It's an old, slow
> and unmaintained data type.  The data type is what you should use.
> 
> You mean bytea I guess. As a side note, (not a fan of LOs), I had the
> impression that certain drivers such as the JDBC support streaming for LOs
> but not for bytea? It's been a while I haven't hit the docs tho.
> 
> 
> Our database is stuffed with images in bytea fields.  The Java application 
> uses
> JDBC and handles them just fine.

Images are usually small enough (a few MB) that they don't need to be
streamed.

I don't think bytea can be streamed in general. It's just like text, you
write and read the whole thing at once.

If you have data which is too large for that and want to store it in
bytea fields, you'll probably have to chunk it yourself (which you
probably have to anyway because for me "so large it has to be streamed"
implies "at least possibly larger than 1 GB").

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Logical replication disabled, recovery not possible because of 1 large transaction with schema changes?

2024-10-16 Thread Koen De Groote
If this question is more suitable for another mailing list, please let me
know.

I've set up the following table on both publisher and subscriber, both are
pg16:

CREATE TABLE dummy_table (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
age INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Added to publication, refreshed subscription.

Add some data on the publisher side:
INSERT INTO dummy_table (name, email, age)
VALUES
('John Doe', 'john@example.com', 25),
('Jane Smith', 'jane.sm...@example.com', 30),
('Michael Johnson', 'michae...@example.com', 45),
('Emily Davis', 'emil...@example.com', 27),
('Robert Brown', 'robert.br...@example.com', 40);

The data can be seen on the subscriber. So far, so good.

I then execute the following patch on the publisher:
https://gist.github.com/KoenDG/d1c06d8c740c64e4e5884d0c64b81f11

It is a single transaction that does the following:

1/ Insert data, 1000 items
2/ Drop a column
3/ Alter a column name
4/ Add 2 columns, nullable
5/ Add a column and give it a unique constraint
6/ Update values for a column with NULL values, added in step 4.
7/ Set the column updated in step 6 to be NOT NULL
8/ Create a unique index with the columns from step 3 and 6
9/ Insert a column with a default value
10/ Insert data for this schema, another 1000 items.

The subscription disabled, this is to be expected, there are new columns
names, the schema needs to be updated on the subscriber side.

However, it seems I'm stuck.

I can't enable the subscription. This is to be expected, it will try to
resume and run into the same issues.

Ok, I update the schema and enable again. It runs into an error for the
inserts of step 1. These set values for columns dropped in step 2.

I revert to the old schema and enable again. It runs into an error again,
this time for values that don't exist yet at step 1.

I tried dropping the table at the subscriber side, recreating the correct
schema, but this runs into the same error.

I remove the table from the publication and retry. Same error. Even with
the table no longer in the publication, and the table on the subscriber
side dropped and re-created, I'm still getting the exact same errors of
"logical replication target relation "public.dummy_table" is missing
replicated columns: "contact_email", "status", "phone_number", "username""


The only solution I've found is to drop the table from the publication, and
then drop the entire subscription and set it back up again, with the
correct schema.

Am I making a mistake? Or does putting all these commands in a single
transaction ruin my chances?

Clarification much appreciated.

Regards,
Koen De Groote


Re: What are best practices wrt passwords?

2024-10-16 Thread Tom Lane
"Peter J. Holzer"  writes:
> On Linux, unprivileged users can only see the environment of their own
> processes since a *very* long time ago. Possibly even before Ubuntu even
> existed. So I'm somewhat sceptical about that. Some other Unixes were
> more permissive. I don't know what camp MacOS falls into.

I poked at that on current macOS (Sequoia).  The rule for unprivileged
users seems to be the same as on Linux: you can see the environment
variables of processes belonging to you.  What is really interesting
is that "sudo ps auxwwe" and "ps auxwwe" have the same results: you
can still see only your own processes' environment variables.  The
sole exception when I tried it was that under sudo the root-owned "ps"
command showed its own environment variables, which is pretty bizarre.
Looks like the test is not actually on the reported process ownership
but some inherited property.

To confuse matters even more, another machine that's running Sequoia
but with SIP turned off acts more like Linux: "sudo ps auxwwe" shows
environment variables for everything.  So Apple's marching to their
own drummer as usual, but it doesn't look like there's any case where
they are more permissive than the Linux rule.  I'm still not convinced
about whether all *BSD flavors match that, though.

regards, tom lane




Re: Query performance issue

2024-10-16 Thread Adrian Klaver




On 10/16/24 10:50 AM, yudhi s wrote:

Hi,

Below question got in one discussion.Appreciate any guidance on this.

Below is a query which is running for ~40 seconds. As it's a query which 
is executed from UI , we were expecting it to finish in <~5 seconds. It 
has a "IN" and a "NOT IN" subquery , from the execution path it seems 
the total response time is mainly, to be sum of the "IN" and the "NOT 
IN" subquery section. My thought was that both "IN" and "NOT IN" should 
be executed/evaluated in parallel but not in serial fashion.


In the execution path below , the line number marked in *bold* are the 
top lines for the IN and NOT IN subquery evaluation and they are showing 
"Actual time" as  Approx ~9 seconds and ~8 seconds and they seems to be 
summed up and the top lines showing it to be ~19 seconds. Then onwards 
it keeps on increasing with other "nested loop" joins.


*Note*:- This query is running on a MYSQL 8.0 database. So I'm wondering 
if there is any mysql list similar to Oracle list , in which i can share 
this issue?




This is the Postgres list.

As to your question maybe one of the forums?:

https://forums.mysql.com/

In particular Performance:

https://forums.mysql.com/list.php?24



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: What are best practices wrt passwords?

2024-10-16 Thread Tom Lane
Achilleas Mantzios  writes:
> Στις 16/10/24 19:47, ο/η Tom Lane έγραψε:
>> I believe it depends on your platform --- some BSDen are pretty
>> permissive about this, if memory serves.  On a Linux box it seems

> As of FreeBSD smadevnu 14.1-RELEASE-p5 , only the user and root can view 
> the env, such as PGPASSWORD of a user. Other users can't.

NetBSD 10 seems to behave the same.  I don't have a recent OpenBSD
installation to try.

regards, tom lane




Re: What are best practices wrt passwords?

2024-10-16 Thread Peter J. Holzer
On 2024-10-16 09:50:41 -0700, Christophe Pettus wrote:
> > On Oct 16, 2024, at 09:47, Tom Lane  wrote:
> > I believe it depends on your platform --- some BSDen are pretty
> > permissive about this, if memory serves.  On a Linux box it seems
> > to work for processes owned by yourself even if you're not superuser.
> 
> I just tried it on an (admittedly kind of old) Ubuntu system and MacOS
> 14, and it looks like shows everything owned by everyone, even from a
> non-sudoer user.

On Linux, unprivileged users can only see the environment of their own
processes since a *very* long time ago. Possibly even before Ubuntu even
existed. So I'm somewhat sceptical about that. Some other Unixes were
more permissive. I don't know what camp MacOS falls into.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Query performance issue

2024-10-16 Thread Peter J. Holzer
On 2024-10-16 23:20:36 +0530, yudhi s wrote:
> Below is a query which is running for ~40 seconds.
[...]
> In the execution path below , the line number marked in bold are the top lines
> for the IN and NOT IN subquery evaluation and they are showing "Actual time" 
> as
>  Approx ~9 seconds and ~8 seconds and they seems to be summed up and the top
> lines showing it to be ~19 seconds. Then onwards it keeps on increasing with
> other "nested loop" joins.
> 
> Note:- This query is running on a MYSQL 8.0 database. So I'm wondering if 
> there
> is any mysql list similar to Oracle list , in which i can share this issue?

The execution plan looks like a postgresql execution plan, not a mysql
execution plan. Did you run this query on postgresql? That may be
interesting for comparison purposese, but ultimately it is useless: You
won't get mysql to work like postgresql, and any tips to speed up this
query on postgresql (which is all you can expect on a postgresql mailing
list) probably won't work on mysql.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: What are best practices wrt passwords?

2024-10-16 Thread Bruce Momjian
On Wed, Oct 16, 2024 at 11:27:15PM +0200, Peter J. Holzer wrote:
> On 2024-10-16 09:50:41 -0700, Christophe Pettus wrote:
> > > On Oct 16, 2024, at 09:47, Tom Lane  wrote:
> > > I believe it depends on your platform --- some BSDen are pretty
> > > permissive about this, if memory serves.  On a Linux box it seems
> > > to work for processes owned by yourself even if you're not superuser.
> > 
> > I just tried it on an (admittedly kind of old) Ubuntu system and MacOS
> > 14, and it looks like shows everything owned by everyone, even from a
> > non-sudoer user.
> 
> On Linux, unprivileged users can only see the environment of their own
> processes since a *very* long time ago. Possibly even before Ubuntu even
> existed. So I'm somewhat sceptical about that. Some other Unixes were
> more permissive. I don't know what camp MacOS falls into.

Yes, I thought this was fixed long ago.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"




Re: Backup

2024-10-16 Thread Tomas Vondra
On 10/16/24 21:37, Andy Hartman wrote:
> I am very new to Postgres and have always worked in the mssql world. I'm
> looking for suggestions on DB backups. I currently have a DB used to
> store Historical information that has images it's currently around 100gig. 
> 
> I'm looking to take a monthly backup as I archive a month of data at a
> time. I am looking for it to be compressed and have a machine that has
> multiple cpu's and ample memory.
> 
> Suggestions on things I can try ?  
> I did a pg_dump using these parms
> --format=t --blobs lobarch
> 
> it ran my device out of storage:
> 
> pg_dump: error: could not write to output file: No space left on device
> 
> I have 150gig free on my backup drive... can obviously add more
> 
> looking for the quickest and smallest backup file output...
> 
> Thanks again for help\suggestions 
> 

You didn't specify the Postgres version - that matters, because older
pg_dump versions (before PG 16) do not support compression. Since PG 16
you can use either -Fc or -Fd (instead of the tar format), and it'll
compress the output using gzip.

Alternatively, you can use --compress=method:level (the supported
methods depend on how the packages were built, no idea what platform
you're on etc.). See

  https://www.postgresql.org/docs/current/app-pgdump.html

If you're on older version, you should be able to write the dump to
standard output, and compress that way. Something like

  pg_dump -Fc | gzip -c > compressed.dump.gz

However, be aware that pg_dump is more an export tool than a backup
suitable for large databases / quick recovery. It won't allow doing PITR
and similar stuff.


regards

-- 
Tomas Vondra





Re: Backup

2024-10-16 Thread Ron Johnson
On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman 
wrote:

> I am very new to Postgres and have always worked in the mssql world. I'm
> looking for suggestions on DB backups. I currently have a DB used to store
> Historical information that has images it's currently around 100gig.
>
> I'm looking to take a monthly backup as I archive a month of data at a
> time. I am looking for it to be compressed and have a machine that has
> multiple cpu's and ample memory.
>
> Suggestions on things I can try ?
> I did a pg_dump using these parms
> --format=t --blobs lobarch
>
> it ran my device out of storage:
>
> pg_dump: error: could not write to output file: No space left on device
>
> I have 150gig free on my backup drive... can obviously add more
>
> looking for the quickest and smallest backup file output...
>
> Thanks again for help\suggestions
>

Step 1: redesign your DB to *NOT* use large objects.  It's an old, slow and
unmaintained data type.  The data type is what you should use.

Step 2: show us the "before" df output, the whole pg_dump command, and the
"after" df output when it fails. "du -c --max-depth=0 $PGDATA/base" also
very useful.

And tell us what version you're using.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


Backup

2024-10-16 Thread Andy Hartman
I am very new to Postgres and have always worked in the mssql world. I'm
looking for suggestions on DB backups. I currently have a DB used to store
Historical information that has images it's currently around 100gig.

I'm looking to take a monthly backup as I archive a month of data at a
time. I am looking for it to be compressed and have a machine that has
multiple cpu's and ample memory.

Suggestions on things I can try ?
I did a pg_dump using these parms
--format=t --blobs lobarch

it ran my device out of storage:

pg_dump: error: could not write to output file: No space left on device

I have 150gig free on my backup drive... can obviously add more

looking for the quickest and smallest backup file output...

Thanks again for help\suggestions


Re: What are best practices wrt passwords?

2024-10-16 Thread Achilleas Mantzios



Στις 16/10/24 19:47, ο/η Tom Lane έγραψε:

Bruce Momjian  writes:

On Wed, Oct 16, 2024 at 06:16:57PM +0200, mb...@mbork.pl wrote:

`ps auxe` shows all processes with their environments, no?

I think that only shows for super-user.

I believe it depends on your platform --- some BSDen are pretty
permissive about this, if memory serves.  On a Linux box it seems



As of FreeBSD smadevnu 14.1-RELEASE-p5 , only the user and root can view 
the env, such as PGPASSWORD of a user. Other users can't.




to work for processes owned by yourself even if you're not superuser.

regards, tom lane







Re: Backup

2024-10-16 Thread Andy Hartman
I'm on Ver16 and yes Our database has  image in a bytea field.

Running on Win22 box...

On Wed, Oct 16, 2024 at 5:49 PM Peter J. Holzer  wrote:

> On 2024-10-16 16:02:24 -0400, Ron Johnson wrote:
> > On Wed, Oct 16, 2024 at 4:00 PM Achilleas Mantzios <
> > a.mantz...@cloud.gatewaynet.com> wrote:
> > Στις 16/10/24 22:55, ο/η Ron Johnson έγραψε:
> > On Wed, Oct 16, 2024 at 3:37 PM Andy Hartman <
> hartman60h...@gmail.com>
> > wrote:
> [...]
> >
> > Step 1: redesign your DB to NOT use large objects.  It's an old,
> slow
> > and unmaintained data type.  The data type is what you should
> use.
> >
> > You mean bytea I guess. As a side note, (not a fan of LOs), I had the
> > impression that certain drivers such as the JDBC support streaming
> for LOs
> > but not for bytea? It's been a while I haven't hit the docs tho.
> >
> >
> > Our database is stuffed with images in bytea fields.  The Java
> application uses
> > JDBC and handles them just fine.
>
> Images are usually small enough (a few MB) that they don't need to be
> streamed.
>
> I don't think bytea can be streamed in general. It's just like text, you
> write and read the whole thing at once.
>
> If you have data which is too large for that and want to store it in
> bytea fields, you'll probably have to chunk it yourself (which you
> probably have to anyway because for me "so large it has to be streamed"
> implies "at least possibly larger than 1 GB").
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>