RE: Query on Postgres SQL transaction

2024-03-27 Thread Bandi, Venkataramana - Dell Team
Hi Adrian,

I will check with the customer on below info for that node and will share you.

For instance different OS or OS version, different encoding, different location 
on the network, different data it is working, etc.

As I already mentioned we have enabled Postgres SQL debug logs to trace out 
this scenario, but it is not helping us.


Regards,
Venkat


Internal Use - Confidential
-Original Message-
From: Adrian Klaver 
Sent: Monday, March 25, 2024 8:40 PM
To: Bandi, Venkataramana - Dell Team ; Greg 
Sabino Mullane 
Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team 
; Alampalli, Kishore 

Subject: Re: Query on Postgres SQL transaction


[EXTERNAL EMAIL]

On 3/25/24 00:18, Bandi, Venkataramana - Dell Team wrote:
> Hi,
>
> Please find my inline comments for your questions.
>
>
> Regards,
> Venkat
>
>
> Internal Use - Confidential
> -Original Message-
> From: Adrian Klaver 
> Sent: Tuesday, March 19, 2024 9:33 PM
> To: Bandi, Venkataramana - Dell Team
> ; Greg Sabino Mullane
> 
> Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team
> ; Alampalli, Kishore
> 
> Subject: Re: Query on Postgres SQL transaction
>
>
> [EXTERNAL EMAIL]
>
> On 3/19/24 02:18, Bandi, Venkataramana - Dell Team wrote:
>> Hi Greg,
>>
>> We are using hibernate framework to persist the data into Postgres
>> SQL DB and data is persisting and committing for all the clients but
>> one of the client data is not inserted into DB.
>
> What is different about that client?
> Ans: In our application data is getting from different nodes(systems) and 
> persisting into Postgres SQL DB but for one of the nodes(system) data is not 
> persisting and sometimes data is persisting for this node also. We have to 
> trace out the transaction why data is not persisting sometimes.

That we knew already. What I was after was whether that particular
node(system) is different in some important way from the others. For instance 
different OS or OS version, different encoding, different location on the 
network, different data it is working, etc.

Define what you have done to trace the path of the transaction.

> Are all the clients passing data through the same instance of the framework?
> Ans: Since it is a monolithic architecture application, it is running on same 
> instance.
> Are you sure that the client is pointed at the correct database?
> Ans: Yes, its pointed to correct database and with same database connection, 
> data is persisting for other nodes.
> Is the log entry below from that client?
> Ans: Yes
>>

--
Adrian Klaver
adrian.kla...@aklaver.com





RE: Query on Postgres SQL transaction

2024-03-27 Thread Bandi, Venkataramana - Dell Team
Hi,

As l already mentioned, for this specific node also data is persisting but 
sometimes(randomly) data is not persisting.

As you mentioned our application doesn't have any restrictions on OS level and 
on network etc.

different OS or OS version, different encoding, different location on the 
network, different data it is working, etc.

Regards,
Venkat

Internal Use - Confidential
-Original Message-
From: Bandi, Venkataramana - Dell Team
Sent: Wednesday, March 27, 2024 10:18 AM
To: Adrian Klaver ; Greg Sabino Mullane 

Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team 
; Alampalli, Kishore 

Subject: RE: Query on Postgres SQL transaction

Hi Adrian,

I will check with the customer on below info for that node and will share you.

For instance different OS or OS version, different encoding, different location 
on the network, different data it is working, etc.

As I already mentioned we have enabled Postgres SQL debug logs to trace out 
this scenario, but it is not helping us.


Regards,
Venkat

-Original Message-
From: Adrian Klaver 
Sent: Monday, March 25, 2024 8:40 PM
To: Bandi, Venkataramana - Dell Team ; Greg 
Sabino Mullane 
Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team 
; Alampalli, Kishore 

Subject: Re: Query on Postgres SQL transaction


[EXTERNAL EMAIL]

On 3/25/24 00:18, Bandi, Venkataramana - Dell Team wrote:
> Hi,
>
> Please find my inline comments for your questions.
>
>
> Regards,
> Venkat
>
>
> Internal Use - Confidential
> -Original Message-
> From: Adrian Klaver 
> Sent: Tuesday, March 19, 2024 9:33 PM
> To: Bandi, Venkataramana - Dell Team
> ; Greg Sabino Mullane
> 
> Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team
> ; Alampalli, Kishore
> 
> Subject: Re: Query on Postgres SQL transaction
>
>
> [EXTERNAL EMAIL]
>
> On 3/19/24 02:18, Bandi, Venkataramana - Dell Team wrote:
>> Hi Greg,
>>
>> We are using hibernate framework to persist the data into Postgres
>> SQL DB and data is persisting and committing for all the clients but
>> one of the client data is not inserted into DB.
>
> What is different about that client?
> Ans: In our application data is getting from different nodes(systems) and 
> persisting into Postgres SQL DB but for one of the nodes(system) data is not 
> persisting and sometimes data is persisting for this node also. We have to 
> trace out the transaction why data is not persisting sometimes.

That we knew already. What I was after was whether that particular
node(system) is different in some important way from the others. For instance 
different OS or OS version, different encoding, different location on the 
network, different data it is working, etc.

Define what you have done to trace the path of the transaction.

> Are all the clients passing data through the same instance of the framework?
> Ans: Since it is a monolithic architecture application, it is running on same 
> instance.
> Are you sure that the client is pointed at the correct database?
> Ans: Yes, its pointed to correct database and with same database connection, 
> data is persisting for other nodes.
> Is the log entry below from that client?
> Ans: Yes
>>

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Active sessions does not terminated due to statement_timeout

2024-03-27 Thread Ц

I was able to connect to this process using
 
strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o 
strace_of_active_session -ff
(the -T flag gives us the time the system call has been running at the end of 
each line).
 
If the session had been hanging active for some time, there was only one 
unfinished line in strace:
epoll_wait(4,
 
 
But if we got it a bit earlier, we got this output:
 
***something before process was catched***
 
epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 
<0.003515>
sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 1968, 
0, NULL, 0) = 1968 <0.68>
sendto(10, " \320\277\320\276\320\273 
\321\204\320\265\320\262\321\200\320\260\320\273\321\214 
\320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0.51>
pread64(407, "y\221\275\0\0\0\0\0\0\342\275[\0\325\320\244\320\232\320\243 
\320\220\320\264\320\274\320\270\320\275\320"..., 8192, 10354688) = 8192 
<0.65>
sendto(10, "\377\377\0\0\0\270(907.0702.1340380180.111.2"..., 8192, 0, NULL, 0) 
= 8192 <0.66>
pread64(407, "00057476 \320\276\321\202 31.01.2024 \320\262 \321\202.\321"..., 
8192, 10362880) = 8192 <0.44>
sendto(10, "7233430 \320\225\320\264\320\270\320\275\321\213\320\271 
\320\275\320\260\320\273\320\276\320\263\320"..., 8192, 0, NULL, 0) = 8192 
<0.73>
 
***a lot of pread64, sendto***
 
sendto(10, 
"010006300\377\377\377\377\0\0\0X\320\241\320\274\320\276\320\273\320\265\320\275\321\201\320"...,
 8192, 0, NULL, 0) = 8192 <0.23>
pread64(407, "901001\10\2\0\0\30\0\3\0 
y\221\235\0\0\0\0\0\0\34\202[\0\20\2\0\0"..., 8192, 16949248) = 8192 <0.20>
sendto(10, "\321\200\320\265\320\265\321\201\321\202\321\200\321\203 
\342\204\226 24119 \320\276\321\202 19"..., 8192, 0, NULL, 0) = 8192 <0.24>
sendto(10, ") \320\273/\321\20120907234120  \320\221\320\23690741220"..., 8192, 
0, NULL, 0) = 8192 <0.22>
pread64(407, "\320\275\321\201\320\272\320\260 
(\320\243\320\277\321\200\320\260\320\262\320\273\320\265\320\275\320\270\320\265
 \320"..., 8192, 16957440) = 8192 <0.19>
sendto(10, "632\0\0\0\n7707083893\0\0\0\0\0\0\0\t6730020"..., 8192, 0, NULL, 0) 
= 8192 <0.22>
pread64(407, "\1\0\0\30\0\3\0 
y\221\275\0\0\0\0\0\0\356\367[\0\375\320\244\320\232\320\243 \320\220\320"..., 
8192, 16965632) = 8192 <0.20>
sendto(10, "\0\0\0\02403234643667010006300\377\377\377\377\0\0\0d"..., 8192, 0, 
NULL, 0) = 6632 <0.26>
sendto(10, "\320\260\321\206\320\270\320\270 
\320\263\320\276\321\200\320\276\320\264\320\260 
\320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = -1 EAGAIN 
(Resource temporarily unavailable) <0.19>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) 
<2.855325>
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=2292607, si_uid=2001} ---
kill(2292604, SIGURG)   = 0 <0.23>
rt_sigreturn({mask=[URG]})  = -1 EINTR (Interrupted system call) 
<0.52>
sendto(10, "\320\260\321\206\320\270\320\270 
\320\263\320\276\321\200\320\276\320\264\320\260 
\320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = 1560 
<0.33>
pread64(407, 
"\320\265\321\202.\23\1\202\4\0J\6\210\23\025044525411\0276732208"..., 8192, 
16973824) = 8192 <0.26>
sendto(10, "2.24,4085 \320\276\321\202 21.02.24 \320\272-\321\202 01"..., 8192, 
0, NULL, 0) = 8192 <0.25>
pread64(407, "\276\320\263. 59003491 \320\276\321\202 16.11.2007 \320\235"..., 
8192, 16982016) = 8192 <0.20>
 
***a lot of pread64, sendto***
 
pread64(407, "\260 (\320\234\320\221\320\236\320\243 \"\320\241\320\250 
\342\204\226 11\")+40702"..., 8192, 17375232) = 8192 <0.21>
sendto(10, 
"\274\320\276\320\273\320\265\320\275\321\201\320\272\320\276\320\265 
\320\276\321\202\320\264\320\265\320\273\320\265\320\275"..., 8192, 0, NULL, 0) 
= 4000 <0.25>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, 
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.19>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 
<0.18>
read(11, 
"\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 128 <0.20>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) 
<4.214523>
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
kill(2292604, SIGURG)   = 0 <0.28>
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, 
it_value={tv_sec=1790, tv_usec=905739}}, NULL) = 0 <0.20>
rt_sigreturn({mask=[URG]})  = -1 EINTR (Interrupted system call) 
<0.18>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, 
NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.23>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 
<0.17>
read(11, 
"\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
1024) = 128 <0.21>
 
***First hang in strace output for  <1790.90547

Re: Active sessions does not terminated due to statement_timeout

2024-03-27 Thread Ц

After terminating session file has been updated with new lines.
 
***content from the previous letter***
 
***That one unfinished line. Freeze for ~6 hours.***
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) 
<21328.371590>
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2340453, si_uid=2001} ---
kill(2292604, SIGURG)   = 0 <0.56>
rt_sigreturn({mask=[URG]})  = -1 EINTR (Interrupted system call) 
<0.27>
write(2, "\0\0;\1|\373\"\0t2024-03-27 18:43:56 MSK"..., 324) = 324 <0.27>
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0 <0.18>
close(64)   = 0 <0.41>
close(218)  = 0 <0.29>
close(220)  = 0 <0.27>
close(219)  = 0 <0.23>
unlink("base/121442067/t4_182045966")   = 0 <0.000520>
unlink("base/121442067/t4_182045966.1") = -1 ENOENT (No such file or directory) 
<0.87>
unlink("base/121442067/t4_182045966_fsm") = -1 ENOENT (No such file or 
directory) <0.60>
unlink("base/121442067/t4_182045966_vm") = -1 ENOENT (No such file or 
directory) <0.53>
unlink("base/121442067/t4_182045966_init") = -1 ENOENT (No such file or 
directory) <0.61>
unlink("base/121442067/t4_182045963")   = 0 <0.000454>
 
***a lot of unlink***
 
unlink("base/121442067/t4_182045892")   = 0 <0.46>
unlink("base/121442067/t4_182045892.1") = -1 ENOENT (No such file or directory) 
<0.27>
unlink("base/121442067/t4_182045892_fsm") = -1 ENOENT (No such file or 
directory) <0.23>
unlink("base/121442067/t4_182045892_vm") = -1 ENOENT (No such file or 
directory) <0.21>
unlink("base/121442067/t4_182045892_init") = -1 ENOENT (No such file or 
directory) <0.27>
close(407)  = 0 <0.32>
stat("base/pgsql_tmp/pgsql_tmp2292604.0", {st_mode=S_IFREG|0600, 
st_size=19141117, ...}) = 0 <0.24>
unlink("base/pgsql_tmp/pgsql_tmp2292604.0") = 0 <0.010861>
sendto(9, "\24\0\0\0\30\0\0\0\23\17=\7]U\0\0\375\21$\1\0\0\0\0", 24, 0, NULL, 
0) = 24 <0.67>
write(2, "\0\0\265\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 190) = 190 <0.30>
brk(0x555d798fa000) = 0x555d798fa000 <0.002482>
brk(0x555d78902000) = 0x555d78902000 <0.003686>
brk(0x555d76f75000) = 0x555d76f75000 <0.008538>
openat(AT_FDCWD, "global/pg_filenode.map", O_RDONLY) = 64 <0.43>
read(64, 
"\27'Y\0+\0\0\0\356\4\0\0r\316Y\10\224\v\0\0y\316Y\10\275\4\0\0\177\316Y\10"...,
 512) = 512 <0.22>
close(64)   = 0 <0.19>
openat(AT_FDCWD, "base/121442067/pg_filenode.map", O_RDONLY) = 64 <0.27>
read(64, 
"\27'Y\0\21\0\0\0\353\4\0\0\3\26?\10\341\4\0\0\334\32?\10\347\4\0\0\347\4\0\0"...,
 512) = 512 <0.18>
close(64)   = 0 <0.18>
close(20)   = 0 <0.19>
 
***a lot of close***
 
close(138)  = 0 <0.19>
openat(AT_FDCWD, "base/121442067/138352131", O_RDWR) = 5 <0.52>
lseek(5, 0, SEEK_END)   = 149766144 <0.18>
openat(AT_FDCWD, "base/121442067/2601", O_RDWR) = 6 <0.29>
lseek(6, 0, SEEK_END)   = 8192 <0.17>
kill(21, SIGURG)    = 0 <0.35>
sendto(9, "\31\0\0\0\20\0\0\0\23\17=\7\4\0\0\0", 16, 0, NULL, 0) = 16 <0.78>
sendto(9, 
"\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960, 
0, NULL, 0) = 960 <0.36>
sendto(9, 
"\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960, 
0, NULL, 0) = 960 <0.41>
 
***a lot of sendto***
 
sendto(9, 
"\20\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0"..., 72, 0, 
NULL, 0) = 72 <0.30>
write(2, "\0\0\322\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 219) = 219 <0.50>
futex(0x7fd5aacbf6c4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.20>
exit_group(1)   = ?
+++ exited with 1 +++
 
PS: I guess "No such file or directory" messages due to execution of strace on 
host, and postgres process is executing inside docker.
>Среда, 27 марта 2024, 18:39 +03:00 от Ц :
> 
>I was able to connect to this process using
> 
>strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o 
>strace_of_active_session -ff
>(the -T flag gives us the time the system call has been running at the end of 
>each line).
> 
>If the session had been hanging active for some time, there was only one 
>unfinished line in strace:
>epoll_wait(4,
> 
> 
>But if we got it a bit earlier, we got this output:
> 
>***something before process was catched***
> 
>epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 
><0.003515>
>sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 
>1968, 0, NULL, 0) = 1968 <0.68>
>sendto(10, " \320\277\320\276\320\273 
>\321\204\320\265\320\262\321\200\320\260\320\273\321\214 
>\320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0

Re: Query on Postgres SQL transaction

2024-03-27 Thread Adrian Klaver

On 3/27/24 04:29, Bandi, Venkataramana - Dell Team wrote:

Hi,

As l already mentioned, for this specific node also data is persisting but 
sometimes(randomly) data is not persisting.


How do you know which data is not persisting?



As you mentioned our application doesn't have any restrictions on OS level and 
on network etc.

different OS or OS version, different encoding, different location on the 
network, different data it is working, etc.


I don't understand what the above is saying. Do you mean there are 
differences in these attributes between the nodes or no differences?


Also please do not top post, use either bottom or inline posting per:

https://en.wikipedia.org/wiki/Posting_style



Regards,
Venkat




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross

On 3/20/24 17:04, Tom Lane wrote:


Adrian Klaver  writes:

Haven't had a chance to go through this yet. I'm going to say though
that Tom Lane is looking for a shorter generic case that anyone could
run on their system.

Yeah, it's a long way from that trigger function definition to a
working (i.e. failing) example.  Shortening the trigger might help by
eliminating some parts of the infrastructure that would need to be
shown --- but nobody's going to try to reverse-engineer all that.

regards, tom lane


It took some digging but I've found a very simple fix to this.

Somewhere (sorry, can't find it again) I read that a postgresql cursor 
is sometimes referenced as "portal".  This was when I was still pretty 
sure that this was a psycopg2 issue.


Further testing ruled that out--I wasn't getting the error on the 
psycopg2 commit statements, I was getting the error when the plpython3u 
function itself exits and tries to commit.


I only use one plpython3u cursor in that function.  The plpython docs say:

"Cursors are automatically disposed of. But if you want to explicitly 
release all resources held by a cursor, use the |close| method. Once 
closed, a cursor cannot be fetched from anymore."


https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3

Perhaps "pinned" in the error message means "open"?

I added a cursor.close() as the last line called in that function and it 
works again.


I haven't been able to come up with a test case that throws the same 
error, though, so I consider this a solution to what is very likely an 
odd corner case.


Jeff


Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Tom Lane
Jeff Ross  writes:
> I only use one plpython3u cursor in that function.  The plpython docs say:

> "Cursors are automatically disposed of. But if you want to explicitly 
> release all resources held by a cursor, use the |close| method. Once 
> closed, a cursor cannot be fetched from anymore."

> https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3

> Perhaps "pinned" in the error message means "open"?

No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

> I added a cursor.close() as the last line called in that function and it 
> works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane




Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross

On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to share 
a solution to this apparently rare error with the community.


Jeff


Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Rob Sargent



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.

Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Adrian Klaver

On 3/27/24 16:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


I would start with:

def logging(comment):
global database
<...>

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross



On 3/27/24 17:41, Adrian Klaver wrote:

On 3/27/24 16:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

I added a cursor.close() as the last line called in that function 
and it

works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

    regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


I would start with:

def logging(comment):
    global database
    <...>


Already removed that--thanks, though.




Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross


On 3/27/24 17:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


Yes, my read was the same.

There are exactly 3 references to that cursor now that I added the 
close() at the end.


Here are the first 2 (cursor renamed from the code I posted):

    plpy_cursor = plpy.cursor(schemas_query)
    while True:
    schema_rows = plpy_cursor.fetch(100)

The last is:

    plpy_cursor.close()

I don't know how to proceed further.


Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Adrian Klaver

On 3/27/24 18:00, Jeff Ross wrote:


On 3/27/24 17:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


Yes, my read was the same.

There are exactly 3 references to that cursor now that I added the 
close() at the end.


Here are the first 2 (cursor renamed from the code I posted):

     plpy_cursor = plpy.cursor(schemas_query)
     while True:
     schema_rows = plpy_cursor.fetch(100)


If the above is the complete while loop how you expect it to break out 
of the loop?


Or did you do per Postgres docs?:

https://www.postgresql.org/docs/current/plpython-database.html

cursor = plpy.cursor("select num from largetable")
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
for row in rows:
if row['num'] % 2:
odd += 1




The last is:

     plpy_cursor.close()

I don't know how to proceed further.



--
Adrian Klaver
adrian.kla...@aklaver.com