Re: Seq Scan because of stats or because of cast?

2023-03-14 Thread Dominique Devienne
On Mon, Mar 13, 2023 at 2:53 PM Tom Lane  wrote:

> regards, tom lane
>

Thank you very much Tom. Very informative.


Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-14 Thread Alban Hertroys


> On 7 Mar 2023, at 4:11, David G. Johnston  wrote:
> 
> On Mon, Mar 6, 2023 at 7:51 PM David Rowley  wrote:
> On Tue, 7 Mar 2023 at 12:40, Tom Lane  wrote:
> >
> > Ben Clements  writes:
> > > As shown above, the following calculated column can bring in the city 
> > > name,
> > > even though the city name isn't in the GROUP BY:
> > >max(city) keep (dense_rank first order by population desc)
> >
> > You haven't really explained what this does, let alone why it can't
> > be implemented with existing features such as FILTER and ORDER BY.
> 
> (It wasn't clear to me until I watched the youtube video.) 
> 
> Likely KEEP is more flexible than just the given example but I think
> that something similar to the example given could be done by inventing
> a TOP() and BOTTOM() aggregate. Then you could write something like:
> 
> select
>country,
>count(*),
>max(population),
>bottom(city, population)
> from
>cities
> group by
>country
> having
>count(*) > 1
> 
> the transfn for bottom() would need to remember the city and the
> population for the highest yet seen value of the 2nd arg.
> 
> BOTTOM() remembers the highest value?
>  
> Where this wouldn't work would be if multiple columns were
> required to tiebreak the sort.
> 
> TOP(city, ROW(population, land_area)) ?

What should be the expected behaviour on a tie though?

Say that we count the number of districts or airfields or train stations per 
city and query for the one(s) with the most or least of them? There could well 
be multiple cities with the same max number, and there will be many cities with 
the same minimum number (namely 0).

Should the result be just the first of the maximums (or minimums) through some 
selection criterium (such as their alphabetical order), should that give each 
of the tied results, or should there be a means to define that behaviour?

I suppose a combination with FIRST and LAST could solve that issue?

Regards,
Alban Hertroys
--
There is always an exception to always.








Re: Uppercase version of ß desired

2023-03-14 Thread Thorsten Glaser
On Tue, 14 Mar 2023, Celia McInnis wrote:

>uc_alphabet = lc_alphabet.replace('ß', 'ẞ').upper()

That’s probably for the best. The uppercase Eszett was only added
to Unicode under the rule that the lowercase Eszett’s case rules
are kept unchanged, and the former’s considered normally only ever
typed manually.

Of course, the grammar rules about uppercasing ß have since changed,
but since there’s two valid ways, choosing is the application’s duty.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-14 Thread David Rowley
On Tue, 14 Mar 2023 at 21:01, Alban Hertroys  wrote:
> > On 7 Mar 2023, at 4:11, David G. Johnston  
> > wrote:
> > TOP(city, ROW(population, land_area)) ?
>
> What should be the expected behaviour on a tie though?

Undefined.  Same as having an ORDER BY on a column that's not unique.
The sort implementation effectively defines the order.  David did
specify the ROW() idea as a means to add additional columns so that
the tiebreak could be done with some other deciding factor.

> Should the result be just the first of the maximums (or minimums) through 
> some selection criterium (such as their alphabetical order), should that give 
> each of the tied results, or should there be a means to define that behaviour?

It's an aggregate function. There's only 1 return value per group. If
you didn't want that you'd likely want to use a window function such
as rank() and add an outer query and filter out anything that's not
rank 1.

David




Re: Idea: PostgreSQL equivalent to Oracle's KEEP clause

2023-03-14 Thread David Rowley
On Tue, 14 Mar 2023 at 16:07, Ben Clements  wrote:
> Similar to your "TOP() and BOTTOM() aggregate" idea, you might find Erwin 
> Brandstetter's solution using the LAST() aggregate function interesting: 
> (https://dba.stackexchange.com/a/324646/100880)

Interesting.  Just note that ORDER BY aggregates cannot be
parallelised and there are no shortcuts to just look for the highest /
lowest ordered row.  All rows in the group must be sorted and the
aggregate will just take the first or last of those once the sort is
done.  The difference there (unless using PG16 and an index provides
presorted input) is that there would be O(N log2 N) comparisons to
perform the sort, where as the TOP() / BOTTOM() idea both allows
parallelism and requires less memory and only requires O(N)
comparisons.

If performance is not too critical row now, then what you've found
looks great.  I just wanted to mention that as it may be a factor that
matters at some point, even if it does not right now.

David




Re: Uppercase version of ß desired

2023-03-14 Thread Philip Semanchuk



> On Mar 13, 2023, at 5:38 PM, Celia McInnis  wrote:
> 
> HI:
> 
> I would be really happy if postgresql  had an upper case version of the ß 
> german character. The wiki page 
> https://en.wikipedia.org/wiki/%C3%9F
> 
> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was 
> encoded by ISO 10646 in 2008.
> 
> BTW the reason that I'd like upper('ß') to give something different than 'ß'  
> is because I have written a simple substitution puzzle for a large number of 
> languages where I show the encrypted lower case words in upper case and the 
> successful letter substitution submissions in lower case - so I need the 
> upper and lower case versions of each letter to be different!
> 
> Thanks for any assistance! Maybe I can hack what I want in python (which is 
> what I am using for the puzzle).

Hi Celia,
I ran into this too back when we were transitioning from Python 2 to 3 (2 
behaved differently from 3). While researching it I discovered this Python 
issue which maybe sheds some additional light on the subject: 
https://github.com/python/cpython/issues/74993

We ultimately found 90 characters that (under Python 3) grew longer when 
uppercased. 

python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != 
len(chr(c).upper())])”


I hope this is at least interesting. :-)

Cheers
Philip






Re: Uppercase version of ß desired

2023-03-14 Thread Kip Cole
The relevant Unicode reference is 
https://unicode.org/faq/casemap_charprop.html#11

Which basically says that since Unicode 5.0 (its now at Unicode 15.0) stability 
is guaranteed and the upper-casing to  (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S)  
is optional.

> On 14 Mar 2023, at 9:12 pm, Philip Semanchuk  
> wrote:
> 
> 
> 
>> On Mar 13, 2023, at 5:38 PM, Celia McInnis  wrote:
>> 
>> HI:
>> 
>> I would be really happy if postgresql  had an upper case version of the ß 
>> german character. The wiki page 
>> https://en.wikipedia.org/wiki/%C3%9F
>> 
>> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was 
>> encoded by ISO 10646 in 2008.
>> 
>> BTW the reason that I'd like upper('ß') to give something different than 'ß' 
>>  is because I have written a simple substitution puzzle for a large number 
>> of languages where I show the encrypted lower case words in upper case and 
>> the successful letter substitution submissions in lower case - so I need the 
>> upper and lower case versions of each letter to be different!
>> 
>> Thanks for any assistance! Maybe I can hack what I want in python (which is 
>> what I am using for the puzzle).
> 
> Hi Celia,
> I ran into this too back when we were transitioning from Python 2 to 3 (2 
> behaved differently from 3). While researching it I discovered this Python 
> issue which maybe sheds some additional light on the subject: 
> https://github.com/python/cpython/issues/74993 
> 
> 
> We ultimately found 90 characters that (under Python 3) grew longer when 
> uppercased. 
> 
> python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != 
> len(chr(c).upper())])”
> 
> 
> I hope this is at least interesting. :-)
> 
> Cheers
> Philip



Re: Uppercase version of ß desired

2023-03-14 Thread Celia McInnis
Thanks Philip. Certainly interesting, As long as postgresql and python
return something different for upper and lower case versions of these
letters and python indexing of the string picks out the characters
"properly" then it might not require a python fix for me.

What a can of worms!  But at least postgresql and python do far better with
unicode than mysql and perl did! :-)

Celia McInnis

On Tue, Mar 14, 2023 at 9:12 AM Philip Semanchuk <
phi...@americanefficient.com> wrote:

>
>
> > On Mar 13, 2023, at 5:38 PM, Celia McInnis 
> wrote:
> >
> > HI:
> >
> > I would be really happy if postgresql  had an upper case version of the
> ß german character. The wiki page
> > https://en.wikipedia.org/wiki/%C3%9F
> >
> > indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was
> encoded by ISO 10646 in 2008.
> >
> > BTW the reason that I'd like upper('ß') to give something different than
> 'ß'  is because I have written a simple substitution puzzle for a large
> number of languages where I show the encrypted lower case words in upper
> case and the successful letter substitution submissions in lower case - so
> I need the upper and lower case versions of each letter to be different!
> >
> > Thanks for any assistance! Maybe I can hack what I want in python (which
> is what I am using for the puzzle).
>
> Hi Celia,
> I ran into this too back when we were transitioning from Python 2 to 3 (2
> behaved differently from 3). While researching it I discovered this Python
> issue which maybe sheds some additional light on the subject:
> https://github.com/python/cpython/issues/74993
>
> We ultimately found 90 characters that (under Python 3) grew longer when
> uppercased.
>
> python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) !=
> len(chr(c).upper())])”
>
>
> I hope this is at least interesting. :-)
>
> Cheers
> Philip
>
>
>


Re: Seq Scan because of stats or because of cast?

2023-03-14 Thread Dominique Devienne
On Mon, Mar 13, 2023 at 2:53 PM Tom Lane  wrote:

> > PS: libpq uses unsigned int for Oid. So do OIDs go above 2GB in practice?
>
> Yes, eventually.
>

OK, I've added support for native OIDs values (i.e. unsigned int,
sizeof(4)) in my libpq wrapper.
Tested with binary binding and getting of scalar and array values.

But to truly test this is working OK, I'd need OIDs in the range [2^31,
2^32),
while the OIDs in my DB only reach in the 200M range.

So, any way to force the DB to create OIDs in that special range?
Without hosing my DB / Cluster that is... This is not a throw-away DB /
Cluster.

Thanks, --DD


Re: Problems connecting to the server

2023-03-14 Thread Adrian Klaver

On 3/14/23 12:23 AM, Arsen Deputat wrote:
Good evening, I have recently installed pgAdmin and postgreSQL on my 
computer (macOS), but now I can't connect to the server because I don't 
know the password, can you help me? Thank you for your understanding! 


Use the password you where prompted to supply when setting up pgAdmin4.


Have a good dayЗнімок екрана 2023-03-13 о 17.23.14.png



--
Adrian Klaver
adrian.kla...@aklaver.com




Issues Scaling Postgres Concurrency

2023-03-14 Thread Harrison Borges
Hello everyone.

I’m running into severe performance problems with Postgres as I increase
the number of concurrent requests against my backend. I’ve identified that
the bottleneck is Postgres, and to simplify the test case, I created an
endpoint that only does a count query on a table with ~500k rows. At 5
concurrent users, the response time was 33ms, at 10 users it was 60ms, and
at 20 users it was 120ms.

As the number of concurrent users increases, the response time for the
count query also increases significantly, indicating that Postgres may not
be scaling well to handle the increasing load.

This manifests in essentially a server meltdown on production. As the
concurrent requests stack up, our server is stuck waiting for more and more
queries. Eventually requests begin timing out as they start taking over 30
seconds to respond.

Am I doing something obviously wrong? Does this sound like normal behavior?
I'm not very experienced at DB ops so I'm not 100% sure what to expect
here, but I have worked as a Software Engineer for over 10 years and I've
not encountered problems like this before.

I would appreciate any insights or advice on how to optimize Postgres for
high concurrency scenarios. Thank you in advance for your help!


Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Adrian Klaver

On 3/13/23 9:24 AM, Harrison Borges wrote:

Hello everyone.

I’m running into severe performance problems with Postgres as I increase 
the number of concurrent requests against my backend. I’ve identified 
that the bottleneck is Postgres, and to simplify the test case, I 
created an endpoint that only does a count query on a table with ~500k 
rows. At 5 concurrent users, the response time was 33ms, at 10 users it 
was 60ms, and at 20 users it was 120ms.


As the number of concurrent users increases, the response time for the 
count query also increases significantly, indicating that Postgres may 
not be scaling well to handle the increasing load.


This manifests in essentially a server meltdown on production. As the 
concurrent requests stack up, our server is stuck waiting for more and 
more queries. Eventually requests begin timing out as they start taking 
over 30 seconds to respond.


Am I doing something obviously wrong? Does this sound like normal 
behavior? I'm not very experienced at DB ops so I'm not 100% sure what 
to expect here, but I have worked as a Software Engineer for over 10 
years and I've not encountered problems like this before.


1) https://wiki.postgresql.org/wiki/Slow_Counting

2) Are you using connection pooling?



I would appreciate any insights or advice on how to optimize Postgres 
for high concurrency scenarios. Thank you in advance for your help!



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Alan Hodgson
On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> Hello everyone.
> 
> I’m running into severe performance problems with Postgres as I
> increase the number of concurrent requests against my backend. I’ve
> identified that the bottleneck is Postgres, and to simplify the
> test case, I created an endpoint that only does a count query on a
> table with ~500k rows. At 5 concurrent users, the response time was
> 33ms, at 10 users it was 60ms, and at 20 users it was 120ms.

I'm no expert on high concurrency, but for something this simple I'd
expect that you're just CPU bottlenecked. Count in PostgreSQL
actually has to read all the rows in the table. And yeah you can't do
too many of them at the same time.


Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Laurenz Albe
On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> I’m running into severe performance problems with Postgres as I increase the 
> number
> of concurrent requests against my backend. I’ve identified that the 
> bottleneck is
> Postgres, and to simplify the test case, I created an endpoint that only does 
> a
> count query on a table with ~500k rows. At 5 concurrent users, the response 
> time
> was 33ms, at 10 users it was 60ms, and at 20 users it was 120ms.
> 
> As the number of concurrent users increases, the response time for the count 
> query
> also increases significantly, indicating that Postgres may not be scaling 
> well to
> handle the increasing load. 
> 
> This manifests in essentially a server meltdown on production. As the 
> concurrent
> requests stack up, our server is stuck waiting for more and more queries.
> Eventually requests begin timing out as they start taking over 30 seconds to 
> respond.
> 
> Am I doing something obviously wrong? Does this sound like normal behavior?

That sounds like quite normal and expected behavior.

A query that counts the number of rows in a table of half a million rows is
quite expensive and keeps a CPU core busy for a while (provided everything is
cached). At some degree of parallelism, your CPU is overloaded, which leads
to non-linear slowdown.

The thing you are doing wrong is that you are putting too much load on this
system.

Yours,
Laurenz Albe




Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Peter J. Holzer
On 2023-03-14 22:47:43 +0100, Laurenz Albe wrote:
> On Mon, 2023-03-13 at 12:24 -0400, Harrison Borges wrote:
> > I’m running into severe performance problems with Postgres as I
> > increase the number of concurrent requests against my backend. I’ve
> > identified that the bottleneck is Postgres, and to simplify the test
> > case, I created an endpoint that only does a count query on a table
> > with ~500k rows. At 5 concurrent users, the response time was 33ms,
> > at 10 users it was 60ms, and at 20 users it was 120ms.
[...]
> > This manifests in essentially a server meltdown on production. As
> > the concurrent requests stack up, our server is stuck waiting for
> > more and more queries. Eventually requests begin timing out as they
> > start taking over 30 seconds to respond.
> > 
> > Am I doing something obviously wrong? Does this sound like normal
> > behavior?
> 
> That sounds like quite normal and expected behavior.
> 
> A query that counts the number of rows in a table of half a million
> rows is quite expensive and keeps a CPU core busy for a while
> (provided everything is cached). At some degree of parallelism, your
> CPU is overloaded, which leads to non-linear slowdown.

The slowdown looks pretty linear to me (6ms per user).

The interesting thing is that on my laptop even two concurrent accesses
cause a 100% slowdown. I think this is because the task is actually
memory-bound: The cores may do the counting in parallel, but they have
to read the data from the same RAM (since it's too large to fit in the
CPU cache) and they have to take turns accessing it.

> The thing you are doing wrong is that you are putting too much load on
> this system.

Or possibly counting stuff far more often than necessary. If an exact
count is necessary more frequently than it changes it is probably a good
idea to store that somewhere and update it in a trigger.

(If the count doesn't have to be totally up-to-date, caching it in the
application may be even better.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Issues Scaling Postgres Concurrency

2023-03-14 Thread Tom Lane
"Peter J. Holzer"  writes:
> On 2023-03-14 22:47:43 +0100, Laurenz Albe wrote:
>> A query that counts the number of rows in a table of half a million
>> rows is quite expensive and keeps a CPU core busy for a while
>> (provided everything is cached). At some degree of parallelism, your
>> CPU is overloaded, which leads to non-linear slowdown.

> The interesting thing is that on my laptop even two concurrent accesses
> cause a 100% slowdown. I think this is because the task is actually
> memory-bound: The cores may do the counting in parallel, but they have
> to read the data from the same RAM (since it's too large to fit in the
> CPU cache) and they have to take turns accessing it.

I wondered if synchronize_seqscans might exacerbate this behavior by
encouraging more cores to be touching the same buffers at the same time.
I couldn't measure much difference between having it on vs. off in a
pgbench test with all the clients counting the same table ... but maybe
on other hardware the effect would show up.

regards, tom lane




Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-14 Thread Bryn Llewellyn
Section "43.7. Cursors” in the PL/pgSQL chapter of the doc 
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS)
 starts with this:

«
Rather than executing a whole query at once, it is possible to set up a cursor 
that encapsulates the query, and then read the query result a few rows at a 
time. One reason for doing this is to avoid memory overrun when the result 
contains a large number of rows. (However, PL/pgSQL users do not normally need 
to worry about that, since FOR loops automatically use a cursor internally to 
avoid memory problems.) A more interesting usage is to return a reference to a 
cursor that a function has created, allowing the caller to read the rows. This 
provides an efficient way to return large row sets from functions.
»

On its face, it seems to make sense. And I’ve written a few proof-of-concept 
tests. For example, I wrote a “security definer” function that's owned by a 
role that can select from the relevant table(s) that returns refcursor. And I 
called it from a subprogram that's owned by a role that cannot select from the 
relevant table(s) to loop through the rows. But I can't convince myself that 
this division of labor is useful. And especially I can't convince myself that 
the "pipeling" capability is relevant in a three-tier app with a stateless 
browser UI. Here, the paradigm has the client-side app checking out a 
connection from the pool, generating the entire response to the end-user's 
request, releasing the connection, and sending the response back to the 
browser. This paradigm isn't consistent with allowing the end user to navigate 
forwards and backwards in a scrollable cursor that is somehow held in its open 
state in in the sever by the the middle tier client on behalf of a browser 
session that comes back time and again to its dedicated middle tier client and 
thence yo its dedicated database server session. (Anyway, without anything like 
Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor 
variable between successive server calls.)

Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very 
special use-cases?



Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-14 Thread Adrian Klaver

On 3/14/23 17:50, Bryn Llewellyn wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc 
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) 
starts with this:

«
Rather than executing a whole query at once, it is possible to set up a cursor 
that encapsulates the query, and then read the query result a few rows at a 
time. One reason for doing this is to avoid memory overrun when the result 
contains a large number of rows. (However, PL/pgSQL users do not normally need 
to worry about that, since FOR loops automatically use a cursor internally to 
avoid memory problems.) A more interesting usage is to return a reference to a 
cursor that a function has created, allowing the caller to read the rows. This 
provides an efficient way to return large row sets from functions.
»

On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For 
example, I wrote a “security definer” function that's owned by a role that can select 
from the relevant table(s) that returns refcursor. And I called it from a subprogram 
that's owned by a role that cannot select from the relevant table(s) to loop through the 
rows. But I can't convince myself that this division of labor is useful. And especially I 
can't convince myself that the "pipeling" capability is relevant in a 
three-tier app with a stateless browser UI. Here, the paradigm has the client-side app 
checking out a connection from the pool, generating the entire response to the end-user's 
request, releasing the connection, and sending the response back to the browser. This 
paradigm isn't consistent with allowing the end user to navigate forwards and backwards 
in a scrollable cursor that is somehow held in its open state in in the sever by the the 
middle tier client on behalf of a browser session that comes back time and again to its 
dedicated middle tier client and thence yo its dedicated database server session. 
(Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold 
the opened cursor variable between successive server calls.)


I guess that would depend on how you define a server call:

https://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING

"Once a cursor has been opened, it can be manipulated with the 
statements described here.


These manipulations need not occur in the same function that opened the 
cursor to begin with. You can return a refcursor value out of a function 
and let the caller operate on the cursor. (Internally, a refcursor value 
is simply the string name of a so-called portal containing the active 
query for the cursor. This name can be passed around, assigned to other 
refcursor variables, and so on, without disturbing the portal.)


All portals are implicitly closed at transaction end. Therefore a 
refcursor value is usable to reference an open cursor only until the end 
of the transaction."




Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very 
special use-cases?



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-14 Thread Bryn Llewellyn
> adrian.kla...@aklaver.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Section "43.7. Cursors” in the PL/pgSQL chapter of the doc 
>> (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS)
>>  starts with this:
>> «
>> Rather than executing a whole query at once, it is possible to set up a 
>> cursor that encapsulates the query, and then read the query result a few 
>> rows at a time. One reason for doing this is to avoid memory overrun when 
>> the result contains a large number of rows. (However, PL/pgSQL users do not 
>> normally need to worry about that, since FOR loops automatically use a 
>> cursor internally to avoid memory problems.) A more interesting usage is to 
>> return a reference to a cursor that a function has created, allowing the 
>> caller to read the rows. This provides an efficient way to return large row 
>> sets from functions.
>> »
>> On its face, it seems to make sense. And I’ve written a few proof-of-concept 
>> tests. For example, I wrote a “security definer” function that's owned by a 
>> role that can select from the relevant table(s) that returns refcursor. And 
>> I called it from a subprogram that's owned by a role that cannot select from 
>> the relevant table(s) to loop through the rows. But I can't convince myself 
>> that this division of labor is useful. And especially I can't convince 
>> myself that the "pipeling" capability is relevant in a three-tier app with a 
>> stateless browser UI. Here, the paradigm has the client-side app checking 
>> out a connection from the pool, generating the entire response to the 
>> end-user's request, releasing the connection, and sending the response back 
>> to the browser. This paradigm isn't consistent with allowing the end user to 
>> navigate forwards and backwards in a scrollable cursor that is somehow held 
>> in its open state in in the sever by the the middle tier client on behalf of 
>> a browser session that comes back time and again to its dedicated middle 
>> tier client and thence yo its dedicated database server session. (Anyway, 
>> without anything like Oracle PL/SQL's packages, you have no mechanism to 
>> hold the opened cursor variable between successive server calls.)
> 
> I guess that would depend on how you define a server call:
> 
> www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING
> 
> "Once a cursor has been opened, it can be manipulated with the statements 
> described here.
> 
> These manipulations need not occur in the same function that opened the 
> cursor to begin with. You can return a refcursor value out of a function and 
> let the caller operate on the cursor. (Internally, a refcursor value is 
> simply the string name of a so-called portal containing the active query for 
> the cursor. This name can be passed around, assigned to other refcursor 
> variables, and so on, without disturbing the portal.)
> 
> All portals are implicitly closed at transaction end. Therefore a refcursor 
> value is usable to reference an open cursor only until the end of the 
> transaction."
> 
>> Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in 
>> very special use-cases?

Ah… I see. I had read this wrongly:

> « Internally, a refcursor value is simply the string name of a so-called 
> portal containing the active query for the cursor. This name can be passed 
> around, assigned to other refcursor variables, and so on, without disturbing 
> the portal. »


I thought that it was an "under the hood" explanation and that the only thing 
that you could see after entry into the PL/pgSQL subprogram that will be the 
top of the stack would be an opaque value (sometimes called a handle in other 
contexts) that was accessible only from PL/pgSQL.

I hadn't yet tried this in psql:

create schema s;
create table s.t(k int primary key, v text not null);
insert into s.t(k, v) values (1, 'cat'), (2, 'dog');

create function s.f(k_in in int)
  returns refcursor
  language plpgsql
as $body$
declare
  s_f_cur cursor(k_in int) for select v from s.t where k = k_in;
begin
  open s_f_cur(k_in);
  return s_f_cur;
end;
$body$;

create function s.g(cur in refcursor)
  returns text
  language plpgsql
as $body$
declare
  v text;
begin
  fetch cur into v;
  return v;
end;
$body$;

begin;
select ||s.f(1)|| as cur
\gset
select s.g(:cur) as result;
end;

I just did. And the result of "select s.g(:cur)" was the expected "cat".

It requires a leap of imagination, or help from the pgsql-general list, to get 
to this. So thanks!

Might the doc add an example like this?

Of course, it all falls into place now. I can see how I could write a client 
app in, say, Python to write a humongous report to a file by fetching 
manageably-sized chunks, time and again until done with a function like my 
"g()" here, from a cursor that I'd opened using a function like my "f()".

B.t.w., when I said "top-level call", I meant the SQL statement that a client 
issues—in this case most l

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-14 Thread Adrian Klaver

On 3/14/23 20:29, Bryn Llewellyn wrote:

adrian.kla...@aklaver.com  wrote:


b...@yugabyte.com  wrote:

Section "43.7. Cursors” in the PL/pgSQL chapter of the doc 
(www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR- 
DECLARATIONS) starts with this:

«
Rather than executing a whole query at once, it is possible to set up 
a cursor that encapsulates the query, and then read the query result 
a few rows at a time. One reason for doing this is to avoid memory 
overrun when the result contains a large number of rows. (However, 
PL/pgSQL users do not normally need to worry about that, since FOR 
loops automatically use a cursor internally to avoid memory 
problems.) A more interesting usage is to return a reference to a 
cursor that a function has created, allowing the caller to read the 
rows. This provides an efficient way to return large row sets from 
functions.

»
On its face, it seems to make sense. And I’ve written a few 
proof-of-concept tests. For example, I wrote a “security definer” 
function that's owned by a role that can select from the relevant 
table(s) that returns refcursor. And I called it from a subprogram 
that's owned by a role that cannot select from the relevant table(s) 
to loop through the rows. But I can't convince myself that this 
division of labor is useful. And especially I can't convince myself 
that the "pipeling" capability is relevant in a three-tier app with a 
stateless browser UI. Here, the paradigm has the client-side app 
checking out a connection from the pool, generating the entire 
response to the end-user's request, releasing the connection, and 
sending the response back to the browser. This paradigm isn't 
consistent with allowing the end user to navigate forwards and 
backwards in a scrollable cursor that is somehow held in its open 
state in in the sever by the the middle tier client on behalf of a 
browser session that comes back time and again to its dedicated 
middle tier client and thence yo its dedicated database server 
session. (Anyway, without anything like Oracle PL/SQL's packages, you 
have no mechanism to hold the opened cursor variable between 
successive server calls.)


I guess that would depend on how you define a server call:

www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING 


"Once a cursor has been opened, it can be manipulated with the 
statements described here.


These manipulations need not occur in the same function that opened 
the cursor to begin with. You can return a refcursor value out of a 
function and let the caller operate on the cursor. (Internally, a 
refcursor value is simply the string name of a so-called portal 
containing the active query for the cursor. This name can be passed 
around, assigned to other refcursor variables, and so on, without 
disturbing the portal.)


All portals are implicitly closed at transaction end. Therefore a 
refcursor value is usable to reference an open cursor only until the 
end of the transaction."


Is it fair to say that the PL/pgSQL refcursor is useful, at best, 
only in very special use-cases?


Ah… I see. I had read this wrongly:

« Internally, a refcursor value is simply the string name of a 
so-called portal containing the active query for the cursor. This name 
can be passed around, assigned to other refcursor variables, and so 
on, without disturbing the portal. »


I thought that it was an "under the hood" explanation and that the only 
thing that you could see after entry into the PL/pgSQL subprogram that 
will be the top of the stack would be an opaque value (sometimes called 
a handle in other contexts) that was accessible only from PL/pgSQL. >



I am not sure how this:

"These manipulations need not occur in the same function that opened the 
cursor to begin with. You can return a refcursor value out of a function 
and let the caller operate on the cursor. ..."


could be any clearer.


I just did. And the result of "select s.g(:cur)" was the expected "cat".

It requires a leap of imagination, or help from the pgsql-general list, 
to get to this. So thanks!


Might the doc add an example like this?



43.7.3.5. Returning Cursors

"PL/pgSQL functions can return cursors to the caller. This is useful to 
return multiple rows or columns, especially with very large result sets. 
To do this, the function opens the cursor and returns the cursor name to 
the caller (or simply opens the cursor using a portal name specified by 
or otherwise known to the caller). The caller can then fetch rows from 
the cursor. The cursor can be closed by the caller, or it will be closed 
automatically when the transaction closes."


And then a series of examples on how to do that.

I have a hard time fathoming why someone who writes documentation doe