date:20250113

Need help in logical replication

2025-01-13 Thread Divyansh Gupta JNsThMAudy

Hii PostgreSQL Community,

I am setting up logical replication between two clusters in the same subnet
group. I’ve created a publication on the primary and a subscription on the
secondary, and the replication slot has been created. However, the slot
remains inactive, and changes aren’t being applied on the subscriber.

I am using AWS RDS for that

Could you please help me identify the possible cause and resolution for
this issue?

Thank you in advance!

[image: image.png]
[image: image.png]

CREATE PUBLICATION logical_replication_test

FOR TABLE dbo.logical_rep_test;

CREATE SUBSCRIPTION logical_replication_test_su

CONNECTION 'conn_strig'

PUBLICATION logical_replication_test;

pg_repack and locks

2025-01-13 Thread nicolas

Hello everyone, 

We are using postgresql v12 and added the pg_repack package

Since I cannot stop other process, I use the “--no-kill-backend” and Pg_repack 
will wait indefinitly until pg_repack get the lock

I get sometimes a problem of lock:

sometimes, I get indefinitly this message : “NOTICE: Waiting for 1 transactions 
to finish. First PID: ”

this is a real problem because the database is usd all the time.
If I kill the process, a trigger on source table will still exist and  
temporary tables and type still exists in the repack schema. The tables are not 
empty if data has been modified in the source table during the repack.

If I drop table repack tables, I will loose all data modifications done on 
source table
how can I properly cleanup the database ?

Hope this is clear and that somebody can help

best raegards,
Nicolas

Re: Automatic upgrade of passwords from md5 to scram-sha256

2025-01-13 Thread Ron Johnson

On Mon, Jan 13, 2025 at 3:41 PM Peter J. Holzer  wrote:

> On 2025-01-13 12:19:06 -0500, Ron Johnson wrote:
> > On Sun, Jan 12, 2025 at 5:59 PM Tom Lane  wrote:
> >  [snip]
> >
> > I think this idea is a nonstarter, TLS or not.  We're generally
> moving
> > in the direction of never letting the server see cleartext passwords.
> > It's already possible to configure libpq to refuse such requests
> > (see require_auth parameter), although that hasn't been made the
> > default.
> >
> >
> > ALTER ROLE xxx WITH PASSWORD accepts hashed values, so a client with the
> > SCRAM-SHA algorithm could:
> > 1. remember the password that was just used to log in,
> > 2. generate the new hash,
> > 3. send that as an ALTER ROLE statement.
>
> Modifying the client to re-set the password is actually something I
> thought about. There are some technical unknowns (e.g. is
> PQencryptPasswordConn accessible through ODBC?) and some organisational
> difficulties (e.g. can we get the customers to upgrade to the newest
> version?), but I guess in our case it would be doable. But in general
> changing every to client to upgrade the password doesn't seem feasible.
> Unless maybe you are proposing that libpq should do that? That might
> work, but it probably also shouldn't do it by default.
>

That seems to me to be the fastest way to get the feature out to users.
(JDBC would also need it.)

Then clients like psql, pgAdmin, etc would need to add those calls.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!

Re: Automatic upgrade of passwords from md5 to scram-sha256

2025-01-13 Thread Peter J. Holzer

On 2025-01-13 12:19:06 -0500, Ron Johnson wrote:
> On Sun, Jan 12, 2025 at 5:59 PM Tom Lane  wrote:
>  [snip]
> 
> I think this idea is a nonstarter, TLS or not.  We're generally moving
> in the direction of never letting the server see cleartext passwords.
> It's already possible to configure libpq to refuse such requests
> (see require_auth parameter), although that hasn't been made the
> default.
> 
> 
> ALTER ROLE xxx WITH PASSWORD accepts hashed values, so a client with the
> SCRAM-SHA algorithm could:
> 1. remember the password that was just used to log in,
> 2. generate the new hash, 
> 3. send that as an ALTER ROLE statement.

Modifying the client to re-set the password is actually something I
thought about. There are some technical unknowns (e.g. is
PQencryptPasswordConn accessible through ODBC?) and some organisational
difficulties (e.g. can we get the customers to upgrade to the newest
version?), but I guess in our case it would be doable. But in general
changing every to client to upgrade the password doesn't seem feasible.
Unless maybe you are proposing that libpq should do that? That might
work, but it probably also shouldn't do it by default.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

2025-01-13 Thread Enrico Schenone


Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in production 
environment.
I wrote a program that loads a temporary table, loads 2049 rows into 
them from a baseline_table and finally declare two nested cursors.
The first cursor is on the temp table as parent while the second is on a 
lookup table as child.


The program logic is the transposition of one fragment of several 
production programs that was failing on cursors, and has to be intended 
as a POC only.


The program has been wrote in both pure C with libpq (see attached 
source program) and in 4Js Genero language.


Each program was executed by a shell script loop who ran 10 times the 
program each minute with 1 second sleep between each run (see attachment).


An automatic scheduler has continuously submitted 4 parallel tasks (two 
for C version and two for 4Js version programs).


The test was started the Dec, 29 2024 and it was kept in execution for 
many days directly in production environment.
In total, nearly a billion of child test cursors were executed while all 
other production tasks was running (normally 20 to 30 concurrent batch 
services on a pool of 100).


And Well, I'm quite confused: no error at all has been detected, not 
only on the test programs but in the whole production system. The error 
was completely disappeared.


Then I have stopped the four tasks of the stress test leaving all other 
services running for a week, and again no error at all.


No setup was changed nor servers was rebooted, nor infrastructure has 
been upgraded during the test period.


As a result, at the moment I'm not understood not only Why & Where the 
error was occurring, but also Why it is disappeared.


Anyone may feel free to give me his opinion.
For the moment I'll make no other test unless the error is knocking back 
to my door.


*Enrico Schenone*
Software Architect

*Cleis Tech s.r.l.* - www.gruppocleis.it
Sede di Genova, Via Paolo Emilio Bensa, 2 - 16124 Genova, ITALY
Tel: +39-0104071400   Fax: +39-0104073276
Mobile: +39-320 
7709352file:///home/enrico/Documenti/Work/Clienti/hh24/Incident/err-6372/C-test/C-testCursors.c

E-mail: eschen...@cleistech.it





Il 26/12/24 00:20, Adrian Klaver ha scritto:

On 12/24/24 14:23, Enrico Schenone wrote:

Hi, Adrian.
I'm arranging a test program with two nested cursors in two versions:

 1. 4Js Genero BDL language
 2. pure C with libpq language

I'll put both programs in stress execution into the production 
environment looking for some hours how they behaves.

Possible combinations are:

 1. no-one throws an error
 2. only the 4Js Genero version throws an error
 3. only the pure C version throws an error
 4. both versions throws the error

This stress test should address further investigations.
I'll keep you informed.


Yes, would like to see how this turns out.



Regards.
Enrico Schenone




#include 
#include 
#include 
#include 
#include 

// This function print the timestamp
void print_timestamp(FILE *stream) {
struct timeval tv;
gettimeofday(&tv, NULL);

time_t now = tv.tv_sec;
struct tm *timeinfo = localtime(&now);

char buffer[64];
strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", timeinfo);

fprintf(stream, "[%s.%03ld] ", buffer, tv.tv_usec / 1000);
}

// Error handler
void handle_error(PGconn *conn, PGresult *res, const char *context) {
print_timestamp(stderr);
const char *sqlstate = res ? PQresultErrorField(res, PG_DIAG_SQLSTATE) : NULL;
fprintf(stderr, "ERROR in %s: %s", context, PQerrorMessage(conn));
if (sqlstate) {
fprintf(stderr, " (SQLSTATE: %s)", sqlstate);
}
fprintf(stderr, "\n");

if (res) PQclear(res);
PQfinish(conn);
exit(EXIT_FAILURE);
}

int main() {
print_timestamp(stdout);
printf("Start program C-testCursors\n");
const char *conninfo = "host=192.168.20.94 dbname=hh24odds_prod user=my_user password=my_password";
PGconn *conn = PQconnectdb(conninfo);

if (PQstatus(conn) != CONNECTION_OK) {
print_timestamp(stderr);
fprintf(stderr, "Connection to database failed: %s\n", PQerrorMessage(conn));
PQfinish(conn);
return EXIT_FAILURE;
}

print_timestamp(stdout);
printf("Connected to database.\n");

// Start the transaction
PGresult *res = PQexec(conn, "BEGIN");
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
handle_error(conn, res, "BEGIN");
}
PQclear(res);

// Creating temporary table
res = PQexec(conn, "CREATE TEMP TABLE IF NOT EXISTS tt1 (like template_table)");
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
handle_error(conn, res, "CREATE TEMP TABLE");
}
PQclear(res);
print_timestamp(stdout);
printf("Temp table created.\n");

// Caricamento della temp table
res = PQexec(conn, "INSERT INTO tt1 SELECT * FROM test_baseline_table order by nelab, bkmkc_id, spor

Re: could not open file "base/XX/XX": Interrupted system call

2025-01-13 Thread Torsten Krah

Am Donnerstag, dem 26.12.2024 um 13:48 +0200 schrieb Slava Shpitalny:
> Do you happen to have any updates on this issue?

Hi,

unfortunately not, the only thing I found is this in "man 2 open":

...

EINTR  While blocked waiting to complete an open of a slow device
(e.g., a FIFO; see fifo(7)), the call was interrupted by a signal
handler; see signal(7).
...


To me this reads that postgres should handle that error code (e.g. try
again n times before failing) - but maybe it does that already, I did
not verify all open calls, if or how postgres does handle EINTR.

The only thing I did change is, that I use a volume (an anonymous one
in my usecase) again, because with that, I never encountered that error
- if I omit that volume and write directly to the containers fs (which
would be sufficient for me), I still had that error.

kind regards

Torsten

About PostgreSQL Query Plan

2025-01-13 Thread Eşref Halıcıoğlu

Hello, I have a query in PostgreSQL and I want this query to retrieve only data from the last 3 months. However, when I examine the query plan, I see that all partitions are listed. This raises a few questions in my mind:Are all partitions really being accessed, or only the partitions of the last 3 months are being accessed while the other partitions have to be shown in the query plan?If only the relevant partitions are accessed, what is the reason for including other partitions in the query plan?I do not fully understand the logic of this issue. I would be very grateful if you can share information on the subject. The query plan is as follows. Update on "test_table1" tt1  (cost=0.13..159112.84 rows=0 width=0)  Update on "test_table1_partition_2020_10" tt1  Update on "test_table1_partition_2020_11" tt1  Update on "test_table1_partition_2020_12" tt1  Update on "test_table1_partition_2021_01" tt1  Update on "test_table1_partition_2021_02" tt1  Update on "test_table1_partition_2021_03" tt1  Update on "test_table1_partition_2021_04" tt1  Update on "test_table1_partition_2021_05" tt1  Update on "test_table1_partition_2021_06" tt1  Update on "test_table1_partition_2021_07" tt1  Update on "test_table1_partition_2021_08" tt1  Update on "test_table1_partition_2021_09" tt1  Update on "test_table1_partition_2021_10" tt1  Update on "test_table1_partition_2021_11" tt1  Update on "test_table1_partition_2021_12" tt1  Update on "test_table1_partition_2022_01" tt1  Update on "test_table1_partition_2022_02" tt1  Update on "test_table1_partition_2022_03" tt1  Update on "test_table1_partition_2022_04" tt1  Update on "test_table1_partition_2022_05" tt1  Update on "test_table1_partition_2022_06" tt1  Update on "test_table1_partition_2022_07" tt1  Update on "test_table1_partition_2022_08" tt1  Update on "test_table1_partition_2022_09" tt1  Update on "test_table1_partition_2022_10" tt1  Update on "test_table1_partition_2022_11" tt1  Update on "test_table1_partition_2022_12" tt1  Update on "test_table1_partition_2023_01" tt1  Update on "test_table1_partition_2023_02" tt1  Update on "test_table1_partition_2023_03" tt1  Update on "test_table1_partition_2023_04" tt1  Update on "test_table1_partition_2023_05" tt1  Update on "test_table1_partition_2023_06" tt1  Update on "test_table1_partition_2023_07" tt1  Update on "test_table1_partition_2023_08" tt1  Update on "test_table1_partition_2023_09" tt1  Update on "test_table1_partition_2023_10" tt1  Update on "test_table1_partition_2023_11" tt1  Update on "test_table1_partition_2023_12" tt1  Update on "test_table1_partition_2024_01" tt1  Update on "test_table1_partition_2024_02" tt1  Update on "test_table1_partition_2024_03" tt1  Update on "test_table1_partition_2024_04" tt1  Update on "test_table1_partition_2024_05" tt1  Update on "test_table1_partition_2024_06" tt1  Update on "test_table1_partition_2024_07" tt1  Update on "test_table1_partition_2024_08" tt1  Update on "test_table1_partition_2024_09" tt1  Update on "test_table1_partition_2024_10" tt1_sub1  Update on "test_table1_partition_2024_11" tt1_sub2  Update on "test_table1_partition_2024_12" tt1_sub3  Update on "test_table1_partition_2025_01" tt1_sub4  Update on "test_table1_partition_2025_02" tt1  Update on "test_table1_partition_2025_03" tt1  Update on "test_table1_partition_2025_04" tt1  Update on "test_table1_partition_2025_05" tt1  Update on "test_table1_partition_2025_06" tt1  Update on "test_table1_partition_2025_07" tt1  Update on "test_table1_partition_2025_08" tt1  Update on "test_table1_partition_2025_09" tt1  Update on "test_table1_partition_2025_10" tt1  Update on "test_table1_partition_2025_11" tt1  Update on "test_table1_partition_2025_12" tt1  Update on "test_table1_partition_default" tt1  ->  Nested Loop  (cost=0.13..159112.84 rows=1 width=53)    ->  Seq Scan on "temp_test_table1" temp  (cost=0.00..19.20 rows=920 width=31)    ->  Append  (cost=0.13..172.29 rows=64 width=38)  Subplans Removed: 60  ->  Index Scan using test_table1_partition_2024_10_pkey on test_table1_partition_2024_10 tt1_sub1  (cost=0.43..4.21 rows=1 width=38)    Index Cond: (("col1" = temp."col_temp1") AND ("col2" >= (CURRENT_DATE - '3 mons'::interval)) AND ("col2" <= CURRENT_DATE))    Filter: ((NOT "col_deleted") AND ("col3" = temp."col_temp3"))  ->  Index Scan using test_table1_partition_2024_11_pkey on test_table1_partition_2024_11 tt1_sub2  (cost=0.43..4.23 rows=1 width=38)    Index Cond: (("col1" = temp."col_temp1") AND ("col2" >= (CURRENT_DATE - '3 mons'::interval)) AND ("col2" <= CURRENT_DATE))    Filter: ((NOT "col_deleted") AND ("col3" = temp."col_temp3"))  ->  Index Scan using test_table1_partition_2024_12_pkey on test_table1_partition_2024_12 tt1_sub3  (cost=0.43..4.34 rows=1 width=38)    Index Cond: (("col1" = temp."col_temp1") AND ("col2" >= (CURRENT_DATE - '3 mons'::interval)) AND ("col2" <= CURRENT_

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

2025-01-13 Thread Adrian Klaver


On 1/13/25 08:59, Enrico Schenone wrote:


Il 13/01/25 17:19, Adrian Klaver ha scritto:

On 1/13/25 00:45, Enrico Schenone wrote:

Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in 
production environment.
I wrote a program that loads a temporary table, loads 2049 rows into 
them from a baseline_table and finally declare two nested cursors.
The first cursor is on the temp table as parent while the second is 
on a lookup table as child.


The program logic is the transposition of one fragment of several 
production programs that was failing on cursors, and has to be 
intended as a POC only.





And Well, I'm quite confused: no error at all has been detected, not 
only on the test programs but in the whole production system. The 
error was completely disappeared.


Then I have stopped the four tasks of the stress test leaving all 
other services running for a week, and again no error at all.


No setup was changed nor servers was rebooted, nor infrastructure has 
been upgraded during the test period.


You are absolutely sure about the above?
I can say Yes. All test operations has been logged and verified against 
the Postgresql log.
The only component not under my control is the Provider's 
Infrastructure, but  the infrastructure admin ensured me that no 
operation at all has been made. I beleave him because it is a reliable 
tecnician end a well known person.


In your OP you stated:

"Production environments can be:

 * Distinct application server and DB server on distinct subnets (no
dropped packet detected on firewall, no memory/disk/network failure
detected by "nmon" tool)
  * Distinct application server and DB server on same subnet (no firewall)
  * Same server for PostgreSQL and applications
"

In all those cases are the various servers all running completely within 
the providers infrastructure?



Errors that 'fix' themselves are the most frustrating kind, as you 
know in the back of your mind they will likely pop up again.

True, knocking again to my door ... I still can't beleave.


Going forward one of three things are likely to happen:

1) The error never shows again.

2) It does show up again but in a manner that allows it to be traced.

3) The worst case, it plays hide and seek as previously.




Thanks a lot for your interest in sharing my strange experience.
Best regards.
Enrico

*Enrico Schenone*
Software Architect


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Automatic upgrade of passwords from md5 to scram-sha256

2025-01-13 Thread Joe Conway


On 1/12/25 17:59, Tom Lane wrote:

"Peter J. Holzer"  writes:

The web framework Django will automatically and transparently rehash any
password with the currently preferred algorithm if it isn't stored that
way already.


Really?  That implies that the framework has access to the original
cleartext password, which is a security fail already.


Can PostgreSQL do that, too? (I haven't found anything)


No.  The server has only the hashed password, it can't reconstruct
the original.


If the password for the user is stored as an MD5 hash, the server
replies to the startup message with an AuthenticationCleartextPassword
respnse to force the client to send the password in the clear
(obviously you only want to do that if the connection is TLS-encrypted
or otherwise safe from eavesdropping).


I think this idea is a nonstarter, TLS or not.  We're generally moving
in the direction of never letting the server see cleartext passwords.
It's already possible to configure libpq to refuse such requests
(see require_auth parameter), although that hasn't been made the
default.



Given PQchangePassword[1] in pg17, I wonder if the next step could be to 
have libpq somehow know/detect that an algorithm change is needed and 
execute that (or some equivalent) from the client side? And presumably 
we could ask pgjdbc to implement something similar.



Joe

[1] 
https://www.postgresql.org/docs/17/libpq-misc.html#LIBPQ-PQCHANGEPASSWORD

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Automatic upgrade of passwords from md5 to scram-sha256

2025-01-13 Thread Ron Johnson

On Sun, Jan 12, 2025 at 5:59 PM Tom Lane  wrote:
 [snip]

> I think this idea is a nonstarter, TLS or not.  We're generally moving
> in the direction of never letting the server see cleartext passwords.
> It's already possible to configure libpq to refuse such requests
> (see require_auth parameter), although that hasn't been made the
> default.
>

ALTER ROLE xxx WITH PASSWORD accepts hashed values, so a client with the
SCRAM-SHA algorithm could:
1. remember the password that was just used to log in,
2. generate the new hash,
3. send that as an ALTER ROLE statement.

Anything which shows up in the logs would be no different than when someone
types ALTER ROLE ... WITH PASSWORD from the psql prompt.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 lobster!

Re: Need help in logical replication

2025-01-13 Thread Divyansh Gupta JNsThMAudy

Thank you for the response Justin found and fixed the error

On Mon, 13 Jan 2025, 10:15 pm Justin,  wrote:

> Hi Divyansh,
>
> Go to the subscriber and look for errors in the PostgreSQL logs.
>
> When creating a subscription the default action is to sync the tables. Is
> the subscriber table empty??
>
> Thank you,
>
> On Mon, Jan 13, 2025 at 7:30 AM Divyansh Gupta JNsThMAudy <
> ag1567...@gmail.com> wrote:
>
>> Hii PostgreSQL Community,
>>
>> I am setting up logical replication between two clusters in the same
>> subnet group. I’ve created a publication on the primary and a subscription
>> on the secondary, and the replication slot has been created. However, the
>> slot remains inactive, and changes aren’t being applied on the subscriber.
>>
>> I am using AWS RDS for that
>>
>> Could you please help me identify the possible cause and resolution for
>> this issue?
>>
>> Thank you in advance!
>>
>> [image: image.png]
>> [image: image.png]
>>
>> CREATE PUBLICATION logical_replication_test
>>
>> FOR TABLE dbo.logical_rep_test;
>>
>> CREATE SUBSCRIPTION logical_replication_test_su
>>
>> CONNECTION 'conn_strig'
>>
>> PUBLICATION logical_replication_test;
>>
>>

Re: pg_repack and locks

2025-01-13 Thread hubert depesz lubaczewski

On Mon, Jan 13, 2025 at 12:40:06PM +, nicolas wrote:
> Hello everyone, 
> 
> We are using postgresql v12 and added the pg_repack package
> 
> Since I cannot stop other process, I use the “--no-kill-backend” and 
> Pg_repack will wait indefinitly until pg_repack get the lock
> 
> I get sometimes a problem of lock:
> 
> sometimes, I get indefinitly this message : “NOTICE: Waiting for 1 
> transactions to finish. First PID: ”
> 
> this is a real problem because the database is usd all the time.
> If I kill the process, a trigger on source table will still exist and  
> temporary tables and type still exists in the repack schema. The tables are 
> not empty if data has been modified in the source table during the repack.
> 
> If I drop table repack tables, I will loose all data modifications done on 
> source table
> how can I properly cleanup the database ?

Allow it to kill offending backends after some time? For example -T 7200?

Best regards,

depesz

Re: Need help in logical replication

2025-01-13 Thread Rob Sargent




> On Jan 13, 2025, at 5:30 AM, Divyansh Gupta JNsThMAudy  
> wrote:
> 
> 
> Hii PostgreSQL Community,
> 
> I am setting up logical replication between two clusters in the same subnet 
> group. I’ve created a publication on the primary and a subscription on the 
> secondary, and the replication slot has been created. However, the slot 
> remains inactive, and changes aren’t being applied on the subscriber.
> 
> I am using AWS RDS for that
> 
> Could you please help me identify the possible cause and resolution for this 
> issue?
> 
> Thank you in advance!
> 
> 

Maybe you could send your ddl in plain text?

>

Re: About PostgreSQL Query Plan

2025-01-13 Thread hubert depesz lubaczewski

On Mon, Jan 13, 2025 at 05:26:09PM +0300, Eşref Halıcıoğlu wrote:
> Hello,
>  
> I have a query in PostgreSQL and I want this query to retrieve only data from 
> the last 3 months. However, when I examine the query
> plan, I see that all partitions are listed.

Please note that your explain is for update, not select (which
"retrieve" in your mail would suggest).

> This raises a few questions in my mind:
> 
>   • Are all partitions really being accessed, or only the partitions of the 
> last 3 months are being accessed while the other
> partitions have to be shown in the query plan?

Not really possible to tell without reading explain *analyze*.
Potentially all. But perhaps just fewer.

Best regards,

depesz

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

2025-01-13 Thread Adrian Klaver


On 1/13/25 00:45, Enrico Schenone wrote:

Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in production 
environment.
I wrote a program that loads a temporary table, loads 2049 rows into 
them from a baseline_table and finally declare two nested cursors.
The first cursor is on the temp table as parent while the second is on a 
lookup table as child.


The program logic is the transposition of one fragment of several 
production programs that was failing on cursors, and has to be intended 
as a POC only.





And Well, I'm quite confused: no error at all has been detected, not 
only on the test programs but in the whole production system. The error 
was completely disappeared.


Then I have stopped the four tasks of the stress test leaving all other 
services running for a week, and again no error at all.


No setup was changed nor servers was rebooted, nor infrastructure has 
been upgraded during the test period.


You are absolutely sure about the above?



As a result, at the moment I'm not understood not only Why & Where the 
error was occurring, but also Why it is disappeared.


Errors that 'fix' themselves are the most frustrating kind, as you know 
in the back of your mind they will likely pop up again.




Anyone may feel free to give me his opinion.
For the moment I'll make no other test unless the error is knocking back 
to my door.


That is all you can do.



*Enrico Schenone*
Software Architect



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

2025-01-13 Thread Enrico Schenone


Il 13/01/25 18:26, Adrian Klaver ha scritto:

On 1/13/25 08:59, Enrico Schenone wrote:


Il 13/01/25 17:19, Adrian Klaver ha scritto:

On 1/13/25 00:45, Enrico Schenone wrote:

Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in 
production environment.
I wrote a program that loads a temporary table, loads 2049 rows 
into them from a baseline_table and finally declare two nested 
cursors.
The first cursor is on the temp table as parent while the second is 
on a lookup table as child.


The program logic is the transposition of one fragment of several 
production programs that was failing on cursors, and has to be 
intended as a POC only.





And Well, I'm quite confused: no error at all has been detected, 
not only on the test programs but in the whole production system. 
The error was completely disappeared.


Then I have stopped the four tasks of the stress test leaving all 
other services running for a week, and again no error at all.


No setup was changed nor servers was rebooted, nor infrastructure 
has been upgraded during the test period.


You are absolutely sure about the above?
I can say Yes. All test operations has been logged and verified 
against the Postgresql log.
The only component not under my control is the Provider's 
Infrastructure, but  the infrastructure admin ensured me that no 
operation at all has been made. I beleave him because it is a 
reliable tecnician end a well known person.


In your OP you stated:

"Production environments can be:

 * Distinct application server and DB server on distinct subnets (no
    dropped packet detected on firewall, no memory/disk/network failure
    detected by "nmon" tool)
  * Distinct application server and DB server on same subnet (no 
firewall)

  * Same server for PostgreSQL and applications
"

In all those cases are the various servers all running completely 
within the providers infrastructure?


No, the second production environment is On Premises at the customer 
Datacenter under the same vmware hypervisor.
I'll make more investigations on second and third environments by 
increasing the verbosity of both DB and Application logs.


Errors that 'fix' themselves are the most frustrating kind, as you 
know in the back of your mind they will likely pop up again.

True, knocking again to my door ... I still can't beleave.


Going forward one of three things are likely to happen:

1) The error never shows again.

2) It does show up again but in a manner that allows it to be traced.

3) The worst case, it plays hide and seek as previously.


In our standard applications it normally plays hide because normally our 
programs doesn't apply TRY/CATCH paradigm on FETCH.
In 4Js Genero BDL a fetch can return only "success" or "not found" 
status, so normally we don't catch for any exception unless we want to 
detect locks.
Furthermore for standard cursors we don't use the OPEN/FETCH/CLOSE 
statements but simply a FOREACH ... END FOREACH statement that 
masquerades in a more easy way the OPEN/FETCH/CLOSE process.
We detected the problem looking at Postgresql log and investigating at 
client side for any program exception or abort in FOREACH/END FOREACH 
block by increasing log verbosity and applying the OPEN/FETCH/CLOSE and 
TRY/CATCH paradigm on suspect cursors.



Thanks a lot for your interest in sharing my strange experience.
Best regards.
Enrico

*Enrico Schenone*
Software Architect



*Enrico Schenone*
Software Architect

Re: About PostgreSQL Query Plan

2025-01-13 Thread hubert depesz lubaczewski

On Mon, Jan 13, 2025 at 07:48:09PM +0300, Eşref Halıcıoğlu wrote:
> Hello,
>  
> Yes, you are right; this query is not a SELECT, it is an UPDATE query, there 
> was a mistake in expressing it here. The columns I want
> to update here only operate on data from the last 3 months time interval.
>  
> I am also sharing the EXPLAIN ANALYZE output of the relevant query below:
>  
> Update on "TestTable1" t1  (cost=0.13..159114.84 rows=0 width=0) (actual 
> time=0.007..0.008 rows=0 loops=1)
>   Update on "PartitionTable_2020_10" t1
… 61 lines removed …
>   Update on "PartitionTable_2025_12" t1
>   Update on "DefaultPartitionTable" t1
>   ->  Nested Loop  (cost=0.13..159114.84 rows=1 width=53) (actual 
> time=0.006..0.006 rows=0 loops=1)
>     ->  Seq Scan on "TempTable1" tmp  (cost=0.00..19.20 rows=920 
> width=31) (actual time=0.006..0.006 rows=0 loops=1)
>     ->  Append  (cost=0.13..172.29 rows=64 width=38) (never executed)
>   Subplans Removed: 60
>   ->  Index Scan using partitiontable_2024_10_pkey on 
> "PartitionTable_2024_10" t2  (cost=0.43..4.21 rows=1 width=38) (never 
> executed)
>     Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= 
> (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))
>     Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))
>   ->  Index Scan using partitiontable_2024_11_pkey on 
> "PartitionTable_2024_11" t3  (cost=0.43..4.23 rows=1 width=38) (never 
> executed)
>     Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= 
> (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))
>     Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))
>   ->  Index Scan using partitiontable_2024_12_pkey on 
> "PartitionTable_2024_12" t4  (cost=0.43..4.34 rows=1 width=38) (never 
> executed)
>     Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= 
> (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))
>     Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))
>   ->  Index Scan using partitiontable_2025_01_pkey on 
> "PartitionTable_2025_01" t5  (cost=0.43..3.72 rows=1 width=38) (never 
> executed)
>     Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= 
> (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))
>     Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))
> Planning Time: 3.860 ms
> Execution Time: 0.066 ms

Well, it helped, and I missed some bits of information earlier.
Generally it looks that it was trying to scan only 4 partitions, but
they all got skipped.

Best regards,

depesz

Re: About PostgreSQL Query Plan

2025-01-13 Thread Eşref Halıcıoğlu

Hello, Yes, you are right; this query is not a SELECT, it is an UPDATE query, there was a mistake in expressing it here. The columns I want to update here only operate on data from the last 3 months time interval. I am also sharing the EXPLAIN ANALYZE output of the relevant query below: Update on "TestTable1" t1  (cost=0.13..159114.84 rows=0 width=0) (actual time=0.007..0.008 rows=0 loops=1)  Update on "PartitionTable_2020_10" t1  Update on "PartitionTable_2020_11" t1  Update on "PartitionTable_2020_12" t1  Update on "PartitionTable_2021_01" t1  Update on "PartitionTable_2021_02" t1  Update on "PartitionTable_2021_03" t1  Update on "PartitionTable_2021_04" t1  Update on "PartitionTable_2021_05" t1  Update on "PartitionTable_2021_06" t1  Update on "PartitionTable_2021_07" t1  Update on "PartitionTable_2021_08" t1  Update on "PartitionTable_2021_09" t1  Update on "PartitionTable_2021_10" t1  Update on "PartitionTable_2021_11" t1  Update on "PartitionTable_2021_12" t1  Update on "PartitionTable_2022_01" t1  Update on "PartitionTable_2022_02" t1  Update on "PartitionTable_2022_03" t1  Update on "PartitionTable_2022_04" t1  Update on "PartitionTable_2022_05" t1  Update on "PartitionTable_2022_06" t1  Update on "PartitionTable_2022_07" t1  Update on "PartitionTable_2022_08" t1  Update on "PartitionTable_2022_09" t1  Update on "PartitionTable_2022_10" t1  Update on "PartitionTable_2022_11" t1  Update on "PartitionTable_2022_12" t1  Update on "PartitionTable_2023_01" t1  Update on "PartitionTable_2023_02" t1  Update on "PartitionTable_2023_03" t1  Update on "PartitionTable_2023_04" t1  Update on "PartitionTable_2023_05" t1  Update on "PartitionTable_2023_06" t1  Update on "PartitionTable_2023_07" t1  Update on "PartitionTable_2023_08" t1  Update on "PartitionTable_2023_09" t1  Update on "PartitionTable_2023_10" t1  Update on "PartitionTable_2023_11" t1  Update on "PartitionTable_2023_12" t1  Update on "PartitionTable_2024_01" t1  Update on "PartitionTable_2024_02" t1  Update on "PartitionTable_2024_03" t1  Update on "PartitionTable_2024_04" t1  Update on "PartitionTable_2024_05" t1  Update on "PartitionTable_2024_06" t1  Update on "PartitionTable_2024_07" t1  Update on "PartitionTable_2024_08" t1  Update on "PartitionTable_2024_09" t1  Update on "PartitionTable_2024_10" t2  Update on "PartitionTable_2024_11" t3  Update on "PartitionTable_2024_12" t4  Update on "PartitionTable_2025_01" t5  Update on "PartitionTable_2025_02" t1  Update on "PartitionTable_2025_03" t1  Update on "PartitionTable_2025_04" t1  Update on "PartitionTable_2025_05" t1  Update on "PartitionTable_2025_06" t1  Update on "PartitionTable_2025_07" t1  Update on "PartitionTable_2025_08" t1  Update on "PartitionTable_2025_09" t1  Update on "PartitionTable_2025_10" t1  Update on "PartitionTable_2025_11" t1  Update on "PartitionTable_2025_12" t1  Update on "DefaultPartitionTable" t1  ->  Nested Loop  (cost=0.13..159114.84 rows=1 width=53) (actual time=0.006..0.006 rows=0 loops=1)    ->  Seq Scan on "TempTable1" tmp  (cost=0.00..19.20 rows=920 width=31) (actual time=0.006..0.006 rows=0 loops=1)    ->  Append  (cost=0.13..172.29 rows=64 width=38) (never executed)  Subplans Removed: 60  ->  Index Scan using partitiontable_2024_10_pkey on "PartitionTable_2024_10" t2  (cost=0.43..4.21 rows=1 width=38) (never executed)    Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))    Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))  ->  Index Scan using partitiontable_2024_11_pkey on "PartitionTable_2024_11" t3  (cost=0.43..4.23 rows=1 width=38) (never executed)    Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))    Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))  ->  Index Scan using partitiontable_2024_12_pkey on "PartitionTable_2024_12" t4  (cost=0.43..4.34 rows=1 width=38) (never executed)    Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))    Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))  ->  Index Scan using partitiontable_2025_01_pkey on "PartitionTable_2025_01" t5  (cost=0.43..3.72 rows=1 width=38) (never executed)    Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE))    Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))Planning Time: 3.860 msExecution Time: 0.066 ms Thank you for your interest and support. Good work, Eşref HalıcıoğluTo: Eşref Halıcıoğlu (esref.halicio...@primeit.com.tr);Cc: pgsql-general@lists.postgresql.org;Subject: About PostgreSQL Query Plan;13.01.2025, 18:42, "hubert depesz lubaczewski" :On Mon, Jan 13, 2025 at 05:26:09PM +0300, Eşref Halıc

Re: About PostgreSQL Query Plan

2025-01-13 Thread Eşref Halıcıoğlu

Hello, Yes, you are right; it seems that only 4 batches had data changes. However, the query also accessed other batches and then removed them again. What could be the reason for this and how can it be solved? Obviously, I would like to understand this situation better and learn the solution. I would be very grateful if you could help me. Thank you, Eşref Halıcıoğlu To: Eşref Halıcıoğlu (esref.halicio...@primeit.com.tr);Cc: pgsql-general@lists.postgresql.org;Subject: About PostgreSQL Query Plan;13.01.2025, 19:56, "hubert depesz lubaczewski" :On Mon, Jan 13, 2025 at 07:48:09PM +0300, Eşref Halıcıoğlu wrote: Hello,   Yes, you are right; this query is not a SELECT, it is an UPDATE query, there was a mistake in expressing it here. The columns I want to update here only operate on data from the last 3 months time interval.   I am also sharing the EXPLAIN ANALYZE output of the relevant query below:   Update on "TestTable1" t1  (cost=0.13..159114.84 rows=0 width=0) (actual time=0.007..0.008 rows=0 loops=1)   Update on "PartitionTable_2020_10" t1… 61 lines removed …   Update on "PartitionTable_2025_12" t1   Update on "DefaultPartitionTable" t1   ->  Nested Loop  (cost=0.13..159114.84 rows=1 width=53) (actual time=0.006..0.006 rows=0 loops=1) ->  Seq Scan on "TempTable1" tmp  (cost=0.00..19.20 rows=920 width=31) (actual time=0.006..0.006 rows=0 loops=1) ->  Append  (cost=0.13..172.29 rows=64 width=38) (never executed)   Subplans Removed: 60   ->  Index Scan using partitiontable_2024_10_pkey on "PartitionTable_2024_10" t2  (cost=0.43..4.21 rows=1 width=38) (never executed) Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE)) Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))   ->  Index Scan using partitiontable_2024_11_pkey on "PartitionTable_2024_11" t3  (cost=0.43..4.23 rows=1 width=38) (never executed) Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE)) Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))   ->  Index Scan using partitiontable_2024_12_pkey on "PartitionTable_2024_12" t4  (cost=0.43..4.34 rows=1 width=38) (never executed) Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE)) Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5"))   ->  Index Scan using partitiontable_2025_01_pkey on "PartitionTable_2025_01" t5  (cost=0.43..3.72 rows=1 width=38) (never executed) Index Cond: (("Col1" = tmp."Col2") AND ("Col3" >= (CURRENT_DATE - '3 mons'::interval)) AND ("Col3" <= CURRENT_DATE)) Filter: ((NOT "IsDeleted") AND ("Col4" = tmp."Col5")) Planning Time: 3.860 ms Execution Time: 0.066 msWell, it helped, and I missed some bits of information earlier.Generally it looks that it was trying to scan only 4 partitions, butthey all got skipped.Best regards,depesz   --

Re: Need help in logical replication

2025-01-13 Thread Justin

Hi Divyansh,

Go to the subscriber and look for errors in the PostgreSQL logs.

When creating a subscription the default action is to sync the tables. Is
the subscriber table empty??

Thank you,

On Mon, Jan 13, 2025 at 7:30 AM Divyansh Gupta JNsThMAudy <
ag1567...@gmail.com> wrote:

> Hii PostgreSQL Community,
>
> I am setting up logical replication between two clusters in the same
> subnet group. I’ve created a publication on the primary and a subscription
> on the secondary, and the replication slot has been created. However, the
> slot remains inactive, and changes aren’t being applied on the subscriber.
>
> I am using AWS RDS for that
>
> Could you please help me identify the possible cause and resolution for
> this issue?
>
> Thank you in advance!
>
> [image: image.png]
> [image: image.png]
>
> CREATE PUBLICATION logical_replication_test
>
> FOR TABLE dbo.logical_rep_test;
>
> CREATE SUBSCRIPTION logical_replication_test_su
>
> CONNECTION 'conn_strig'
>
> PUBLICATION logical_replication_test;
>
>

Re: About PostgreSQL Query Plan

2025-01-13 Thread hubert depesz lubaczewski

On Mon, Jan 13, 2025 at 08:01:56PM +0300, Eşref Halıcıoğlu wrote:
> Yes, you are right; it seems that only 4 batches had data changes. However, 
> the query also accessed other batches and then removed
> them again. What could be the reason for this and how can it be solved?
>  
> Obviously, I would like to understand this situation better and learn the 
> solution.
>  
> I would be very grateful if you could help me.

You would need to have explain analyze form some case where the update
actually updates something. And query that you used would be helpful
too.

Best regards,

depesz

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

2025-01-13 Thread Enrico Schenone




Il 13/01/25 17:19, Adrian Klaver ha scritto:

On 1/13/25 00:45, Enrico Schenone wrote:

Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in 
production environment.
I wrote a program that loads a temporary table, loads 2049 rows into 
them from a baseline_table and finally declare two nested cursors.
The first cursor is on the temp table as parent while the second is 
on a lookup table as child.


The program logic is the transposition of one fragment of several 
production programs that was failing on cursors, and has to be 
intended as a POC only.





And Well, I'm quite confused: no error at all has been detected, not 
only on the test programs but in the whole production system. The 
error was completely disappeared.


Then I have stopped the four tasks of the stress test leaving all 
other services running for a week, and again no error at all.


No setup was changed nor servers was rebooted, nor infrastructure has 
been upgraded during the test period.


You are absolutely sure about the above?
I can say Yes. All test operations has been logged and verified against 
the Postgresql log.
The only component not under my control is the Provider's 
Infrastructure, but  the infrastructure admin ensured me that no 
operation at all has been made. I beleave him because it is a reliable 
tecnician end a well known person.




As a result, at the moment I'm not understood not only Why & Where 
the error was occurring, but also Why it is disappeared.


Errors that 'fix' themselves are the most frustrating kind, as you 
know in the back of your mind they will likely pop up again.

True, knocking again to my door ... I still can't beleave.




Anyone may feel free to give me his opinion.
For the moment I'll make no other test unless the error is knocking 
back to my door.


That is all you can do.



*Enrico Schenone*
Software Architect




Thanks a lot for your interest in sharing my strange experience.
Best regards.
Enrico

*Enrico Schenone*
Software Architect

Need help in logical replication

pg_repack and locks

Re: Automatic upgrade of passwords from md5 to scram-sha256

Re: Automatic upgrade of passwords from md5 to scram-sha256

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

Re: could not open file "base/XX/XX": Interrupted system call

About PostgreSQL Query Plan

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

Re: Automatic upgrade of passwords from md5 to scram-sha256

Re: Automatic upgrade of passwords from md5 to scram-sha256

Re: Need help in logical replication

Re: pg_repack and locks

Re: Need help in logical replication

Re: About PostgreSQL Query Plan

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

Re: About PostgreSQL Query Plan

Re: About PostgreSQL Query Plan

Re: About PostgreSQL Query Plan

Re: Need help in logical replication

Re: About PostgreSQL Query Plan

Re: Intermittent errors when fetching cursor rows on PostgreSQL 16

22 matches

Site Navigation

Mail list logo

Footer information