Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-26 Thread Bryn Llewellyn
> adrian.kla...@aklaver.com wrote:
> 
>> b...@yugabyte.com replied to laurenz.a...@cybertec.at:
>> 
>> Thanks for the link to your SQL file at the line where you get the row count 
>> in the way that you describe... I noted that neither of these functions has 
>> a refcursor formal argument and that, rather, you open (i.e. create) each of 
>> the three cursors that you use within the two functions that uses them.
> 
> CREATE FUNCTION materialize_foreign_table(...)
> 
> CREATE FUNCTION db_migrate_refresh(...) 
> 
> What is not formal about the above?

I used the term of art "formal argument" ordinarily to denote what's listed in 
parentheses at the start of a subprogram definition. The term stands in 
contrast to "actual argument"—meaning the expression that's used in a 
subprogram invocation to provide a value for the corresponding formal argument. 
Go to this page:

dotnettutorials.net/lesson/types-of-function-arguments-in-python/

and search in it for "Example: formal and actual function arguments in python". 
The blurb that comes just before the example says what I just said. It seems to 
be more common to talk about formal and actual "parameters". But the PG doc 
prefers "argument".

I didn't at all mean that "formal" is good and that Laurenz's code is not 
formal and therefore is bad! So sorry if you took it to mean this. 

> Though it does not matter as, back to the docs again:
> 
> www.postgresql.org/docs/current/plpgsql-cursors.html
> 
> "All access to cursors in PL/pgSQL goes through cursor variables, which are 
> always of the special data type refcursor. One way to create a cursor 
> variable is just to declare it as a variable of type refcursor..." Though 
> maybe you are trying to differentiate between bound and unbound refcursor's...


Ah... you missed my point entirely. Looks like my prose was opaque. I was 
referring to this paradigm as described at the start of the page that you 
referenced:

«
A more interesting usage is to return a reference to a cursor that a function 
has created, allowing the caller to read the rows. This provides an efficient 
way to return large row sets from functions.
»

 This seems to me to imply that the producer function with a "refcursor" 
return is "security definer" and that the consumer subprogram cannot access the 
tables of interest with explicit "select”. I thinks that's nice. 

I needed to know that the definition and use of each "refcursor" variable, in 
Laurentz's code, are in the same subprogram because it's only when this is the 
case that using an ordinary "select count(*)" and an ordinary "for" loop, where 
the "select" is written in place within the "for" clause, are viable 
alternatives.

> Also order by is not relevant for getting a count.

Yes, of course. I know that. (But it does matter in many cases of results 
presentation.) I was simply noting that Laurenz’s use-case didn’t need ordering 
and that the "select" statements will be faster than if ordering were needed. 
Sorry if I implied something that I didn’t mean too. (The example in Laurenz's 
post, www.cybertec-postgresql.com/en/pagination-problem-total-result-count/ 
, 
does use "order by".)

>> I did a little test an saw that this:
>> 
>> move last in cur;
>> get diagnostics n = row_count;
>> 
>> reported just 1—in contrast to what you used:
>> 
>> move forward all in cur;
> 
> Because they are different actions. The first directly moves to the last row 
> in the cursor and counts just that row. The second moves through all the rows 
> left in the cursor and counts all the rows it traversed. It is spelled out 
> here:
> 
> https://www.postgresql.org/docs/current/sql-move.html
> 
> "The count is the number of rows that a FETCH command with the same 
> parameters would have returned (possibly zero)."

Yes, I'm afraid that I expressed myself poorly again. I meant to say that while 
I know nothing about the internals, I can't see how "move last in cur" can know 
where the last row is unless it traverses the entire result set from its 
current position until it finds no more rows. And while its doing this, it may 
just as well count the rows it traverses. It would seem, therefore, that a 
single operation that moves to where you asked for and that tells you how many 
rows it traversed would be enough—rather than two that both achieve the same 
effect in the same time where one gives you the count of rows traversed and the 
other doesn't. But never mind. The simple way to see it is that the operations 
simply do what they do—and I won't worry about how they might have been 
conceived differently.

I did some timing tests. I’ve copied the results and the code at the end—just 
for completeness. Here’s what I concluded.

(1) (and this is unremarkable), if you can find a way to meet your requirement 
with just a single SQL statement, then this is bound to be best. I chose a 
simple requirement: read from a table and in

Re: Binding Postgres to port 0 for testing

2023-03-26 Thread Michael Paquier
On Sat, Mar 25, 2023 at 11:01:33AM -0600, Markus Pilman wrote:
> Now the problem is that I need to find a TCP port for each running postgres
> instance. There's multiple ways to do this, but by far the easiest one I
> know is to bind to port 0. So my plan was to start postgres with "-p 0" and
> then parse stdout to figure out which port it actually uses. But that
> doesn't seem to work:

Note that you can find some inspiration about that in the code tree
within src/test/perl/PostgreSQL/Test/Cluster.pm, particularly
get_free_port(), where we have now accumulated a couple of years of
experience in designing something that's rather safe, even if it comes
with its own limits.  It is in perl so perhaps you could just reuse it
rather than reinvent the wheel?  Of course, still it should not be
complicated to translate that in a different language, but there may
be no need to reinvent the wheel.  And seeing your first message with
the requirements you list, this does what you are looking for:
- Create an empty cluster.
- Freely create databases, tablespaces, queries, etc.
- Wipe out the whole.

The test cases around src/test/recovery/t/ could be a good starting
point, as well.
--
Michael


signature.asc
Description: PGP signature


Re: Binding Postgres to port 0 for testing

2023-03-26 Thread Michael Paquier
On Sun, Mar 26, 2023 at 10:49:33PM -0600, Markus Pilman wrote:
> I somehow didn't consider looking at the postgres tests, though it makes
> sense that they need to solve this problem. If I read the perl code
> correctly though it seems that this could, in theory, cause a race? The
> script checks first whether the port has been assigned to a test, then
> binds a socket to check whether it is used by someone else, closes this
> test socker, and then starts a server process. I guess it's unlikely
> enough, but isn't there a risk that some other process (that isn't
> controlled by this perl script) binds to the found port right after this
> test bind but right before postgres calls bind? I guess it should be rare
> enough so that it wouldn't cause flaky tests.

In theory, yes, I recall that's possible in the scripts.  But only on
Windows where we cannot use socket directories and rely on SSPI
authentication while binding all the nodes to listen to 127.0.0.1.  On
all the other platforms, each test creates its own directory that will
be used as path for unix_socket_directories.  umask is 0700 and owned
by the OS user running the perl tests, so it is isolated as much as it
can be.  Using the same port should be actually fine as far as I
recall, as long as the unix socket paths are different.
--
Michael


signature.asc
Description: PGP signature