date:20250102

Re: Pipeline Mode vs Single Row Mode / Chunked Rows Mode

2025-01-02 Thread Dominique Devienne

On Thu, Dec 19, 2024 at 3:37 PM Daniel Frey  wrote:
> I'm adding support for Pipeline Mode to my C++ PostgreSQL client library [...]
> Anyway, is there some documentation about how these modes interact and how 
> they can be combined? Or should they never be combined?

Hi. Happy New Year. I was interested in that question as well, having
played with Pipeline mode a bit, but not Single Row mode yet. Any
insights? --DD

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Pavel Stehule

Hi

čt 2. 1. 2025 v 11:37 odesílatel Jan Behrens 
napsal:

> On Wed, 1 Jan 2025 11:19:32 -0700
> "David G. Johnston"  wrote:
>
> > On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens 
> wrote:
> >
> > > On Sat, 28 Dec 2024 00:40:09 +0100
> > > Jan Behrens  wrote:
> > >
> > > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > > "David G. Johnston"  wrote:
> > > >
> > > > > > Or is it documented somewhere?
> > > > >
> > > > >
> > >
> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > > >
> > > > I can't find any notes regarding functions and schemas in that
> section.
> > >
> > >
> > "Because PL/pgSQL saves prepared statements and sometimes execution plans
> > in this way, SQL commands that appear directly in a PL/pgSQL function
> must
> > refer to the same tables and columns on every execution; that is, you
> > cannot use a parameter as the name of a table or column in an SQL
> command."
> >
> > Changing search_path is just one possible way to change out which object
> a
> > name tries to refer to so it is not called out explicitly.
>
> The first part of the cited sentence seems helpful ("you must always
> refer to the same tables and columns on every execution"). I would thus
> conclude that using a dynamic search_path when running functions or
> procedures is *always* considered errorneous (even though not reported
> by the database as an error), except when using EXECUTE.
>
> I wonder if the database could/should generate an error (or at least a
> warning?) when a function or procedure without a "SET search_path"
> statement uses a non-qualified name? According to the documentation
> using a dynamic search_path to refer to different entities in the
> database is a case that "must" not happen.
>
> But following through, this might lead to more warnings one might
> expect, e.g. when using simple operators such as "=" or the "IN" or
> "CASE expression WHEN" statements, as these rely on the search_path as
> well. Should such code be considered non-idiomatic, dangerous, or even
> errorneous if a "SET search_path" option is missing in the
> function's/procedure's definition?
>
> Maybe I'm overthinking this. But in practice, I've been running into
> surprising issues whenever functions and schemas are involved, and I'm
> not sure if every programmer will be aware of how important it is to
> properly set a search_path in the function's defintion after reading
> the documentation. (Besides, it's not always possible in procedures.)
>

How can you identify unwanted usage of non qualified identifiers from
wanted usage of non qualified identifiers? It is a common pattern for
sharding. Using not qualified identifiers of operators, functions is common
when you are using orafce extensions, etc.

Using qualified identifiers everywhere strongly reduces readability. There
are no aliases to the schema, so aliases cannot help.

you can identify the functions where search_path is not explicitly assigned

select oid::regprocedure
  from pg_proc
where pronamespace::regnamespace not in ('pg_catalog',
'information_schema')
   and not exists(select 1 from unnest(proconfig) g(v) where  v ~
'^search_path');


Regards

Pavel


> >
> > > "SQL-language and PL-language functions provided by extensions are at
> > > risk of search-path-based attacks when they are executed, since parsing
> > > of these functions occurs at execution time not creation time."
> >
> > > Moreover, it isn't true for all
> > > SQL-language functions, as can be demonstrated with the following code:
> >
> > Yeah, when we added a second method to write an SQL-language function,
> one
> > that doesn't simply accept a string body, we didn't update that section
> to
> > point out that is the string input variant of create function that is
> > affected in this manner, the non-string (atomic) variant stores the
> result
> > of parsing the inline code as opposed to storing the raw text.
> >
> > David J.
>
> I missed that other part in the manual (which is in a totally different
> section). Should I report the missing update in section 36.17.6.1. of
> the documentation as a documentation issue, or is it not necessary?
>
> Kind regards,
> Jan Behrens
>
>
>

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens

On Wed, 1 Jan 2025 11:19:32 -0700
"David G. Johnston"  wrote:

> On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens  wrote:
> 
> > On Sat, 28 Dec 2024 00:40:09 +0100
> > Jan Behrens  wrote:
> >
> > > On Fri, 27 Dec 2024 13:26:28 -0700
> > > "David G. Johnston"  wrote:
> > >
> > > > > Or is it documented somewhere?
> > > >
> > > >
> > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> > >
> > > I can't find any notes regarding functions and schemas in that section.
> >
> >
> "Because PL/pgSQL saves prepared statements and sometimes execution plans
> in this way, SQL commands that appear directly in a PL/pgSQL function must
> refer to the same tables and columns on every execution; that is, you
> cannot use a parameter as the name of a table or column in an SQL command."
> 
> Changing search_path is just one possible way to change out which object a
> name tries to refer to so it is not called out explicitly.

The first part of the cited sentence seems helpful ("you must always
refer to the same tables and columns on every execution"). I would thus
conclude that using a dynamic search_path when running functions or
procedures is *always* considered errorneous (even though not reported
by the database as an error), except when using EXECUTE.

I wonder if the database could/should generate an error (or at least a
warning?) when a function or procedure without a "SET search_path"
statement uses a non-qualified name? According to the documentation
using a dynamic search_path to refer to different entities in the
database is a case that "must" not happen.

But following through, this might lead to more warnings one might
expect, e.g. when using simple operators such as "=" or the "IN" or
"CASE expression WHEN" statements, as these rely on the search_path as
well. Should such code be considered non-idiomatic, dangerous, or even
errorneous if a "SET search_path" option is missing in the
function's/procedure's definition?

Maybe I'm overthinking this. But in practice, I've been running into
surprising issues whenever functions and schemas are involved, and I'm
not sure if every programmer will be aware of how important it is to
properly set a search_path in the function's defintion after reading
the documentation. (Besides, it's not always possible in procedures.)

> 
> > "SQL-language and PL-language functions provided by extensions are at
> > risk of search-path-based attacks when they are executed, since parsing
> > of these functions occurs at execution time not creation time."
> 
> > Moreover, it isn't true for all
> > SQL-language functions, as can be demonstrated with the following code:
> 
> Yeah, when we added a second method to write an SQL-language function, one
> that doesn't simply accept a string body, we didn't update that section to
> point out that is the string input variant of create function that is
> affected in this manner, the non-string (atomic) variant stores the result
> of parsing the inline code as opposed to storing the raw text.
> 
> David J.

I missed that other part in the manual (which is in a totally different
section). Should I report the missing update in section 36.17.6.1. of
the documentation as a documentation issue, or is it not necessary?

Kind regards,
Jan Behrens

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens

On Thu, 2 Jan 2025 13:48:29 +0100
Pavel Stehule  wrote:

> čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens 
> napsal:
> 
> > On Thu, 2 Jan 2025 12:40:59 +0100
> > Pavel Stehule  wrote:
> >
> > > How can you identify unwanted usage of non qualified identifiers from
> > > wanted usage of non qualified identifiers? It is a common pattern for
> > > sharding. Using not qualified identifiers of operators, functions is
> > common
> > > when you are using orafce extensions, etc.
> >
> > I don't fully understand the use-case. Could you elaborate?
> >
> > As I understand, even if identifiers are not fully-qualified, it is
> > forbidden to use the search_path to refer to different database
> > entities at run-time (as David pointed out).
> >
> > So I don't understand how a dynamic "search_path" could be used in any
> > scenario within functions except when EXECUTE is involved.
> >
> 
> you don't need more databases
> 
> schema one - customer x
> schema two - customer y
> 
> create table one.t1(..); create table one.t2(..);
> create table two.t1(..); create table two.t2(..);
> 
> set search_path to one;
> -- work with data set of customer x
> 
> set search_path to two;
> -- work wit data set of customer y
> 
> some times can be pretty ineffective to have database per customer - more
> connect, disconnect in postgres is much more expensive than SET search_path
> TO .. and maybe RESET plans;

I guess that means there is a practical application where search_path
MAY change at runtime IF done in different sessions or if the cache is
reset using the DISCARD command:

https://www.postgresql.org/docs/17/sql-discard.html

I assume DISCARD PLANS would be the right command?

This seems to be a very special case though. I think there should be a
warning in the documentation of CREATE FUNCTION with regard to schemas
anyway, though.

Regards,
Jan

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Pavel Stehule

čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens 
napsal:

> On Thu, 2 Jan 2025 12:40:59 +0100
> Pavel Stehule  wrote:
>
> > How can you identify unwanted usage of non qualified identifiers from
> > wanted usage of non qualified identifiers? It is a common pattern for
> > sharding. Using not qualified identifiers of operators, functions is
> common
> > when you are using orafce extensions, etc.
>
> I don't fully understand the use-case. Could you elaborate?
>
> As I understand, even if identifiers are not fully-qualified, it is
> forbidden to use the search_path to refer to different database
> entities at run-time (as David pointed out).
>
> So I don't understand how a dynamic "search_path" could be used in any
> scenario within functions except when EXECUTE is involved.
>

you don't need more databases

schema one - customer x
schema two - customer y

create table one.t1(..); create table one.t2(..);
create table two.t1(..); create table two.t2(..);

set search_path to one;
-- work with data set of customer x

set search_path to two;
-- work wit data set of customer y

some times can be pretty ineffective to have database per customer - more
connect, disconnect in postgres is much more expensive than SET search_path
TO .. and maybe RESET plans;




>
> >
> > Using qualified identifiers everywhere strongly reduces readability.
> There
> > are no aliases to the schema, so aliases cannot help.
>
> Yes, I agree on that. Using "SET search_path" in the function's
> definition fixes that problem, but it's easy to miss how important this
> is from reading the documentation:
>
> The manual regarding "CREATE FUNCTION" refers to "search_path" only
> within the "Writing SECURITY DEFINER Functions Safely" section. It's
> easy to skip that part unless you use that feature. Moreover, that
> section alone doesn't explain the weird behavior of four different
> outcomes of a function with only two schemas involved which I brought
> up in the beginning of this thread.
>
> The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
> documentation doesn't mention the search_path or schemas. And I don't
> think you can expect every programmer will read the "Plan Caching"
> subsection in the "PL/pgSQL under the Hood" section. But even then, the
> information is just provided indirectly.
>
>
yes, probably nobody reads the plan caching doc. And if they read it, then
because they have performance problems.



> Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
> give any hint either.
>

This is a question - this is a generic feature in Postgres.  Every query
can be impacted by setting of search_path.

>From my perspective, there can be a note in the documentation related to
copy types and row types.

https://www.postgresql.org/docs/current/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE

The problem that you found is not just about the change of search_path.
Same problem can be found after altering the table.

Regards

Pavel


> I think (assuming that the behavior isn't fixed) that some slighly more
> prominent warning would be reasonable.
>
> >
> > you can identify the functions where search_path is not explicitly
> assigned
> >
> > select oid::regprocedure
> >   from pg_proc
> > where pronamespace::regnamespace not in ('pg_catalog',
> > 'information_schema')
> >and not exists(select 1 from unnest(proconfig) g(v) where  v ~
> > '^search_path');
> >
> >
> > Regards
> >
> > Pavel
>
> Kind regards,
> Jan
>

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Jan Behrens

On Thu, 2 Jan 2025 12:40:59 +0100
Pavel Stehule  wrote:

> How can you identify unwanted usage of non qualified identifiers from
> wanted usage of non qualified identifiers? It is a common pattern for
> sharding. Using not qualified identifiers of operators, functions is common
> when you are using orafce extensions, etc.

I don't fully understand the use-case. Could you elaborate?

As I understand, even if identifiers are not fully-qualified, it is
forbidden to use the search_path to refer to different database
entities at run-time (as David pointed out).

So I don't understand how a dynamic "search_path" could be used in any
scenario within functions except when EXECUTE is involved.

> 
> Using qualified identifiers everywhere strongly reduces readability. There
> are no aliases to the schema, so aliases cannot help.

Yes, I agree on that. Using "SET search_path" in the function's
definition fixes that problem, but it's easy to miss how important this
is from reading the documentation:

The manual regarding "CREATE FUNCTION" refers to "search_path" only
within the "Writing SECURITY DEFINER Functions Safely" section. It's
easy to skip that part unless you use that feature. Moreover, that
section alone doesn't explain the weird behavior of four different
outcomes of a function with only two schemas involved which I brought
up in the beginning of this thread.

The part on "SET configuration_parameter" part in the "CREATE FUNCTION"
documentation doesn't mention the search_path or schemas. And I don't
think you can expect every programmer will read the "Plan Caching"
subsection in the "PL/pgSQL under the Hood" section. But even then, the
information is just provided indirectly.

Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't
give any hint either.

I think (assuming that the behavior isn't fixed) that some slighly more
prominent warning would be reasonable.

> 
> you can identify the functions where search_path is not explicitly assigned
> 
> select oid::regprocedure
>   from pg_proc
> where pronamespace::regnamespace not in ('pg_catalog',
> 'information_schema')
>and not exists(select 1 from unnest(proconfig) g(v) where  v ~
> '^search_path');
> 
> 
> Regards
> 
> Pavel

Kind regards,
Jan

Re: search_path for PL/pgSQL functions partially cached?

2025-01-02 Thread Pavel Stehule

Hi


> >
> > some times can be pretty ineffective to have database per customer - more
> > connect, disconnect in postgres is much more expensive than SET
> search_path
> > TO .. and maybe RESET plans;
>
> I guess that means there is a practical application where search_path
> MAY change at runtime IF done in different sessions or if the cache is
> reset using the DISCARD command:
>
> https://www.postgresql.org/docs/17/sql-discard.html
>
> I assume DISCARD PLANS would be the right command?
>

that depends. plan inside plan cache is invalidated when search_path is
different. You use RESET plans because you want to release all plans
quickly.

Unfortunately, the types assigned to plpgsql variables are not invalidated.
This is the source of problems. It is a classical problem - it is hard to
say when you should invalidate cache.
Current design is not ideal - but it is almost a good enough compromise
between correctness and performance. It is true, so nobody did some work to
fix it. So maybe the impact to performance should not be too bad, but it is
not an easy issue. plans are isolated - and the impact of one plan to the
second plan is zero. For variables it is exactly opposite.


>
> This seems to be a very special case though. I think there should be a
> warning in the documentation of CREATE FUNCTION with regard to schemas
> anyway, though.
>

I am not sure. If you want to use this warning, then it should be
everywhere where any non-qualified identifier can be used. Maybe in plpgsql
can be more accented so almost everything in plpgsql depends on the current
setting of search_path. Lot of people don't understand, so every expression
in plpgsql is SQL and every expression is executed like part of a query.
And unfortunately there are some different caches - plpgsql cache and plan
cache and both caches are invalidated at different times (I think so
plpgsql cache is not resetted by RESET PLANS). Maybe it is better to
explain how plpgsql works. It is a little bit different from well known
interpreted languages.


> Regards,
> Jan
>

Re: Pipeline Mode vs Single Row Mode / Chunked Rows Mode

Re: search_path for PL/pgSQL functions partially cached?

Re: search_path for PL/pgSQL functions partially cached?

Re: search_path for PL/pgSQL functions partially cached?

Re: search_path for PL/pgSQL functions partially cached?

Re: search_path for PL/pgSQL functions partially cached?

Re: search_path for PL/pgSQL functions partially cached?

7 matches

Site Navigation

Mail list logo

Footer information