RE: Query on Postgres SQL transaction
Hi Adrian, I will check with the customer on below info for that node and will share you. For instance different OS or OS version, different encoding, different location on the network, different data it is working, etc. As I already mentioned we have enabled Postgres SQL debug logs to trace out this scenario, but it is not helping us. Regards, Venkat Internal Use - Confidential -Original Message- From: Adrian Klaver Sent: Monday, March 25, 2024 8:40 PM To: Bandi, Venkataramana - Dell Team ; Greg Sabino Mullane Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team ; Alampalli, Kishore Subject: Re: Query on Postgres SQL transaction [EXTERNAL EMAIL] On 3/25/24 00:18, Bandi, Venkataramana - Dell Team wrote: > Hi, > > Please find my inline comments for your questions. > > > Regards, > Venkat > > > Internal Use - Confidential > -Original Message- > From: Adrian Klaver > Sent: Tuesday, March 19, 2024 9:33 PM > To: Bandi, Venkataramana - Dell Team > ; Greg Sabino Mullane > > Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team > ; Alampalli, Kishore > > Subject: Re: Query on Postgres SQL transaction > > > [EXTERNAL EMAIL] > > On 3/19/24 02:18, Bandi, Venkataramana - Dell Team wrote: >> Hi Greg, >> >> We are using hibernate framework to persist the data into Postgres >> SQL DB and data is persisting and committing for all the clients but >> one of the client data is not inserted into DB. > > What is different about that client? > Ans: In our application data is getting from different nodes(systems) and > persisting into Postgres SQL DB but for one of the nodes(system) data is not > persisting and sometimes data is persisting for this node also. We have to > trace out the transaction why data is not persisting sometimes. That we knew already. What I was after was whether that particular node(system) is different in some important way from the others. For instance different OS or OS version, different encoding, different location on the network, different data it is working, etc. Define what you have done to trace the path of the transaction. > Are all the clients passing data through the same instance of the framework? > Ans: Since it is a monolithic architecture application, it is running on same > instance. > Are you sure that the client is pointed at the correct database? > Ans: Yes, its pointed to correct database and with same database connection, > data is persisting for other nodes. > Is the log entry below from that client? > Ans: Yes >> -- Adrian Klaver adrian.kla...@aklaver.com
RE: Query on Postgres SQL transaction
Hi, As l already mentioned, for this specific node also data is persisting but sometimes(randomly) data is not persisting. As you mentioned our application doesn't have any restrictions on OS level and on network etc. different OS or OS version, different encoding, different location on the network, different data it is working, etc. Regards, Venkat Internal Use - Confidential -Original Message- From: Bandi, Venkataramana - Dell Team Sent: Wednesday, March 27, 2024 10:18 AM To: Adrian Klaver ; Greg Sabino Mullane Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team ; Alampalli, Kishore Subject: RE: Query on Postgres SQL transaction Hi Adrian, I will check with the customer on below info for that node and will share you. For instance different OS or OS version, different encoding, different location on the network, different data it is working, etc. As I already mentioned we have enabled Postgres SQL debug logs to trace out this scenario, but it is not helping us. Regards, Venkat -Original Message- From: Adrian Klaver Sent: Monday, March 25, 2024 8:40 PM To: Bandi, Venkataramana - Dell Team ; Greg Sabino Mullane Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team ; Alampalli, Kishore Subject: Re: Query on Postgres SQL transaction [EXTERNAL EMAIL] On 3/25/24 00:18, Bandi, Venkataramana - Dell Team wrote: > Hi, > > Please find my inline comments for your questions. > > > Regards, > Venkat > > > Internal Use - Confidential > -Original Message- > From: Adrian Klaver > Sent: Tuesday, March 19, 2024 9:33 PM > To: Bandi, Venkataramana - Dell Team > ; Greg Sabino Mullane > > Cc: pgsql-general@lists.postgresql.org; Kishore, Nanda - Dell Team > ; Alampalli, Kishore > > Subject: Re: Query on Postgres SQL transaction > > > [EXTERNAL EMAIL] > > On 3/19/24 02:18, Bandi, Venkataramana - Dell Team wrote: >> Hi Greg, >> >> We are using hibernate framework to persist the data into Postgres >> SQL DB and data is persisting and committing for all the clients but >> one of the client data is not inserted into DB. > > What is different about that client? > Ans: In our application data is getting from different nodes(systems) and > persisting into Postgres SQL DB but for one of the nodes(system) data is not > persisting and sometimes data is persisting for this node also. We have to > trace out the transaction why data is not persisting sometimes. That we knew already. What I was after was whether that particular node(system) is different in some important way from the others. For instance different OS or OS version, different encoding, different location on the network, different data it is working, etc. Define what you have done to trace the path of the transaction. > Are all the clients passing data through the same instance of the framework? > Ans: Since it is a monolithic architecture application, it is running on same > instance. > Are you sure that the client is pointed at the correct database? > Ans: Yes, its pointed to correct database and with same database connection, > data is persisting for other nodes. > Is the log entry below from that client? > Ans: Yes >> -- Adrian Klaver adrian.kla...@aklaver.com
Re: Active sessions does not terminated due to statement_timeout
I was able to connect to this process using strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o strace_of_active_session -ff (the -T flag gives us the time the system call has been running at the end of each line). If the session had been hanging active for some time, there was only one unfinished line in strace: epoll_wait(4, But if we got it a bit earlier, we got this output: ***something before process was catched*** epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 <0.003515> sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 1968, 0, NULL, 0) = 1968 <0.68> sendto(10, " \320\277\320\276\320\273 \321\204\320\265\320\262\321\200\320\260\320\273\321\214 \320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0.51> pread64(407, "y\221\275\0\0\0\0\0\0\342\275[\0\325\320\244\320\232\320\243 \320\220\320\264\320\274\320\270\320\275\320"..., 8192, 10354688) = 8192 <0.65> sendto(10, "\377\377\0\0\0\270(907.0702.1340380180.111.2"..., 8192, 0, NULL, 0) = 8192 <0.66> pread64(407, "00057476 \320\276\321\202 31.01.2024 \320\262 \321\202.\321"..., 8192, 10362880) = 8192 <0.44> sendto(10, "7233430 \320\225\320\264\320\270\320\275\321\213\320\271 \320\275\320\260\320\273\320\276\320\263\320"..., 8192, 0, NULL, 0) = 8192 <0.73> ***a lot of pread64, sendto*** sendto(10, "010006300\377\377\377\377\0\0\0X\320\241\320\274\320\276\320\273\320\265\320\275\321\201\320"..., 8192, 0, NULL, 0) = 8192 <0.23> pread64(407, "901001\10\2\0\0\30\0\3\0 y\221\235\0\0\0\0\0\0\34\202[\0\20\2\0\0"..., 8192, 16949248) = 8192 <0.20> sendto(10, "\321\200\320\265\320\265\321\201\321\202\321\200\321\203 \342\204\226 24119 \320\276\321\202 19"..., 8192, 0, NULL, 0) = 8192 <0.24> sendto(10, ") \320\273/\321\20120907234120 \320\221\320\23690741220"..., 8192, 0, NULL, 0) = 8192 <0.22> pread64(407, "\320\275\321\201\320\272\320\260 (\320\243\320\277\321\200\320\260\320\262\320\273\320\265\320\275\320\270\320\265 \320"..., 8192, 16957440) = 8192 <0.19> sendto(10, "632\0\0\0\n7707083893\0\0\0\0\0\0\0\t6730020"..., 8192, 0, NULL, 0) = 8192 <0.22> pread64(407, "\1\0\0\30\0\3\0 y\221\275\0\0\0\0\0\0\356\367[\0\375\320\244\320\232\320\243 \320\220\320"..., 8192, 16965632) = 8192 <0.20> sendto(10, "\0\0\0\02403234643667010006300\377\377\377\377\0\0\0d"..., 8192, 0, NULL, 0) = 6632 <0.26> sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.19> epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <2.855325> --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=2292607, si_uid=2001} --- kill(2292604, SIGURG) = 0 <0.23> rt_sigreturn({mask=[URG]}) = -1 EINTR (Interrupted system call) <0.52> sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = 1560 <0.33> pread64(407, "\320\265\321\202.\23\1\202\4\0J\6\210\23\025044525411\0276732208"..., 8192, 16973824) = 8192 <0.26> sendto(10, "2.24,4085 \320\276\321\202 21.02.24 \320\272-\321\202 01"..., 8192, 0, NULL, 0) = 8192 <0.25> pread64(407, "\276\320\263. 59003491 \320\276\321\202 16.11.2007 \320\235"..., 8192, 16982016) = 8192 <0.20> ***a lot of pread64, sendto*** pread64(407, "\260 (\320\234\320\221\320\236\320\243 \"\320\241\320\250 \342\204\226 11\")+40702"..., 8192, 17375232) = 8192 <0.21> sendto(10, "\274\320\276\320\273\320\265\320\275\321\201\320\272\320\276\320\265 \320\276\321\202\320\264\320\265\320\273\320\265\320\275"..., 8192, 0, NULL, 0) = 4000 <0.25> sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.19> epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.18> read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.20> epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <4.214523> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} --- kill(2292604, SIGURG) = 0 <0.28> setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1790, tv_usec=905739}}, NULL) = 0 <0.20> rt_sigreturn({mask=[URG]}) = -1 EINTR (Interrupted system call) <0.18> sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.23> epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.17> read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.21> ***First hang in strace output for <1790.90547
Re: Active sessions does not terminated due to statement_timeout
After terminating session file has been updated with new lines. ***content from the previous letter*** ***That one unfinished line. Freeze for ~6 hours.*** epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <21328.371590> --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2340453, si_uid=2001} --- kill(2292604, SIGURG) = 0 <0.56> rt_sigreturn({mask=[URG]}) = -1 EINTR (Interrupted system call) <0.27> write(2, "\0\0;\1|\373\"\0t2024-03-27 18:43:56 MSK"..., 324) = 324 <0.27> rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0 <0.18> close(64) = 0 <0.41> close(218) = 0 <0.29> close(220) = 0 <0.27> close(219) = 0 <0.23> unlink("base/121442067/t4_182045966") = 0 <0.000520> unlink("base/121442067/t4_182045966.1") = -1 ENOENT (No such file or directory) <0.87> unlink("base/121442067/t4_182045966_fsm") = -1 ENOENT (No such file or directory) <0.60> unlink("base/121442067/t4_182045966_vm") = -1 ENOENT (No such file or directory) <0.53> unlink("base/121442067/t4_182045966_init") = -1 ENOENT (No such file or directory) <0.61> unlink("base/121442067/t4_182045963") = 0 <0.000454> ***a lot of unlink*** unlink("base/121442067/t4_182045892") = 0 <0.46> unlink("base/121442067/t4_182045892.1") = -1 ENOENT (No such file or directory) <0.27> unlink("base/121442067/t4_182045892_fsm") = -1 ENOENT (No such file or directory) <0.23> unlink("base/121442067/t4_182045892_vm") = -1 ENOENT (No such file or directory) <0.21> unlink("base/121442067/t4_182045892_init") = -1 ENOENT (No such file or directory) <0.27> close(407) = 0 <0.32> stat("base/pgsql_tmp/pgsql_tmp2292604.0", {st_mode=S_IFREG|0600, st_size=19141117, ...}) = 0 <0.24> unlink("base/pgsql_tmp/pgsql_tmp2292604.0") = 0 <0.010861> sendto(9, "\24\0\0\0\30\0\0\0\23\17=\7]U\0\0\375\21$\1\0\0\0\0", 24, 0, NULL, 0) = 24 <0.67> write(2, "\0\0\265\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 190) = 190 <0.30> brk(0x555d798fa000) = 0x555d798fa000 <0.002482> brk(0x555d78902000) = 0x555d78902000 <0.003686> brk(0x555d76f75000) = 0x555d76f75000 <0.008538> openat(AT_FDCWD, "global/pg_filenode.map", O_RDONLY) = 64 <0.43> read(64, "\27'Y\0+\0\0\0\356\4\0\0r\316Y\10\224\v\0\0y\316Y\10\275\4\0\0\177\316Y\10"..., 512) = 512 <0.22> close(64) = 0 <0.19> openat(AT_FDCWD, "base/121442067/pg_filenode.map", O_RDONLY) = 64 <0.27> read(64, "\27'Y\0\21\0\0\0\353\4\0\0\3\26?\10\341\4\0\0\334\32?\10\347\4\0\0\347\4\0\0"..., 512) = 512 <0.18> close(64) = 0 <0.18> close(20) = 0 <0.19> ***a lot of close*** close(138) = 0 <0.19> openat(AT_FDCWD, "base/121442067/138352131", O_RDWR) = 5 <0.52> lseek(5, 0, SEEK_END) = 149766144 <0.18> openat(AT_FDCWD, "base/121442067/2601", O_RDWR) = 6 <0.29> lseek(6, 0, SEEK_END) = 8192 <0.17> kill(21, SIGURG) = 0 <0.35> sendto(9, "\31\0\0\0\20\0\0\0\23\17=\7\4\0\0\0", 16, 0, NULL, 0) = 16 <0.78> sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.36> sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.41> ***a lot of sendto*** sendto(9, "\20\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72 <0.30> write(2, "\0\0\322\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 219) = 219 <0.50> futex(0x7fd5aacbf6c4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.20> exit_group(1) = ? +++ exited with 1 +++ PS: I guess "No such file or directory" messages due to execution of strace on host, and postgres process is executing inside docker. >Среда, 27 марта 2024, 18:39 +03:00 от Ц : > >I was able to connect to this process using > >strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o >strace_of_active_session -ff >(the -T flag gives us the time the system call has been running at the end of >each line). > >If the session had been hanging active for some time, there was only one >unfinished line in strace: >epoll_wait(4, > > >But if we got it a bit earlier, we got this output: > >***something before process was catched*** > >epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 ><0.003515> >sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., >1968, 0, NULL, 0) = 1968 <0.68> >sendto(10, " \320\277\320\276\320\273 >\321\204\320\265\320\262\321\200\320\260\320\273\321\214 >\320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0
Re: Query on Postgres SQL transaction
On 3/27/24 04:29, Bandi, Venkataramana - Dell Team wrote: Hi, As l already mentioned, for this specific node also data is persisting but sometimes(randomly) data is not persisting. How do you know which data is not persisting? As you mentioned our application doesn't have any restrictions on OS level and on network etc. different OS or OS version, different encoding, different location on the network, different data it is working, etc. I don't understand what the above is saying. Do you mean there are differences in these attributes between the nodes or no differences? Also please do not top post, use either bottom or inline posting per: https://en.wikipedia.org/wiki/Posting_style Regards, Venkat -- Adrian Klaver adrian.kla...@aklaver.com
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/20/24 17:04, Tom Lane wrote: Adrian Klaver writes: Haven't had a chance to go through this yet. I'm going to say though that Tom Lane is looking for a shorter generic case that anyone could run on their system. Yeah, it's a long way from that trigger function definition to a working (i.e. failing) example. Shortening the trigger might help by eliminating some parts of the infrastructure that would need to be shown --- but nobody's going to try to reverse-engineer all that. regards, tom lane It took some digging but I've found a very simple fix to this. Somewhere (sorry, can't find it again) I read that a postgresql cursor is sometimes referenced as "portal". This was when I was still pretty sure that this was a psycopg2 issue. Further testing ruled that out--I wasn't getting the error on the psycopg2 commit statements, I was getting the error when the plpython3u function itself exits and tries to commit. I only use one plpython3u cursor in that function. The plpython docs say: "Cursors are automatically disposed of. But if you want to explicitly release all resources held by a cursor, use the |close| method. Once closed, a cursor cannot be fetched from anymore." https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3 Perhaps "pinned" in the error message means "open"? I added a cursor.close() as the last line called in that function and it works again. I haven't been able to come up with a test case that throws the same error, though, so I consider this a solution to what is very likely an odd corner case. Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
Jeff Ross writes: > I only use one plpython3u cursor in that function. The plpython docs say: > "Cursors are automatically disposed of. But if you want to explicitly > release all resources held by a cursor, use the |close| method. Once > closed, a cursor cannot be fetched from anymore." > https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3 > Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). > I added a cursor.close() as the last line called in that function and it > works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor.
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 16:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. I would start with: def logging(comment): global database <...> -- Adrian Klaver adrian.kla...@aklaver.com
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 17:41, Adrian Klaver wrote: On 3/27/24 16:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. I would start with: def logging(comment): global database <...> Already removed that--thanks, though.
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 17:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. Yes, my read was the same. There are exactly 3 references to that cursor now that I added the close() at the end. Here are the first 2 (cursor renamed from the code I posted): plpy_cursor = plpy.cursor(schemas_query) while True: schema_rows = plpy_cursor.fetch(100) The last is: plpy_cursor.close() I don't know how to proceed further.
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 18:00, Jeff Ross wrote: On 3/27/24 17:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. Yes, my read was the same. There are exactly 3 references to that cursor now that I added the close() at the end. Here are the first 2 (cursor renamed from the code I posted): plpy_cursor = plpy.cursor(schemas_query) while True: schema_rows = plpy_cursor.fetch(100) If the above is the complete while loop how you expect it to break out of the loop? Or did you do per Postgres docs?: https://www.postgresql.org/docs/current/plpython-database.html cursor = plpy.cursor("select num from largetable") while True: rows = cursor.fetch(batch_size) if not rows: break for row in rows: if row['num'] % 2: odd += 1 The last is: plpy_cursor.close() I don't know how to proceed further. -- Adrian Klaver adrian.kla...@aklaver.com