Re: Pipeline Mode vs Single Row Mode / Chunked Rows Mode
On Thu, Dec 19, 2024 at 3:37 PM Daniel Frey wrote: > I'm adding support for Pipeline Mode to my C++ PostgreSQL client library [...] > Anyway, is there some documentation about how these modes interact and how > they can be combined? Or should they never be combined? Hi. Happy New Year. I was interested in that question as well, having played with Pipeline mode a bit, but not Single Row mode yet. Any insights? --DD
Re: search_path for PL/pgSQL functions partially cached?
Hi čt 2. 1. 2025 v 11:37 odesílatel Jan Behrens napsal: > On Wed, 1 Jan 2025 11:19:32 -0700 > "David G. Johnston" wrote: > > > On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens > wrote: > > > > > On Sat, 28 Dec 2024 00:40:09 +0100 > > > Jan Behrens wrote: > > > > > > > On Fri, 27 Dec 2024 13:26:28 -0700 > > > > "David G. Johnston" wrote: > > > > > > > > > > Or is it documented somewhere? > > > > > > > > > > > > > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING > > > > > > > > I can't find any notes regarding functions and schemas in that > section. > > > > > > > > "Because PL/pgSQL saves prepared statements and sometimes execution plans > > in this way, SQL commands that appear directly in a PL/pgSQL function > must > > refer to the same tables and columns on every execution; that is, you > > cannot use a parameter as the name of a table or column in an SQL > command." > > > > Changing search_path is just one possible way to change out which object > a > > name tries to refer to so it is not called out explicitly. > > The first part of the cited sentence seems helpful ("you must always > refer to the same tables and columns on every execution"). I would thus > conclude that using a dynamic search_path when running functions or > procedures is *always* considered errorneous (even though not reported > by the database as an error), except when using EXECUTE. > > I wonder if the database could/should generate an error (or at least a > warning?) when a function or procedure without a "SET search_path" > statement uses a non-qualified name? According to the documentation > using a dynamic search_path to refer to different entities in the > database is a case that "must" not happen. > > But following through, this might lead to more warnings one might > expect, e.g. when using simple operators such as "=" or the "IN" or > "CASE expression WHEN" statements, as these rely on the search_path as > well. Should such code be considered non-idiomatic, dangerous, or even > errorneous if a "SET search_path" option is missing in the > function's/procedure's definition? > > Maybe I'm overthinking this. But in practice, I've been running into > surprising issues whenever functions and schemas are involved, and I'm > not sure if every programmer will be aware of how important it is to > properly set a search_path in the function's defintion after reading > the documentation. (Besides, it's not always possible in procedures.) > How can you identify unwanted usage of non qualified identifiers from wanted usage of non qualified identifiers? It is a common pattern for sharding. Using not qualified identifiers of operators, functions is common when you are using orafce extensions, etc. Using qualified identifiers everywhere strongly reduces readability. There are no aliases to the schema, so aliases cannot help. you can identify the functions where search_path is not explicitly assigned select oid::regprocedure from pg_proc where pronamespace::regnamespace not in ('pg_catalog', 'information_schema') and not exists(select 1 from unnest(proconfig) g(v) where v ~ '^search_path'); Regards Pavel > > > > > "SQL-language and PL-language functions provided by extensions are at > > > risk of search-path-based attacks when they are executed, since parsing > > > of these functions occurs at execution time not creation time." > > > > > Moreover, it isn't true for all > > > SQL-language functions, as can be demonstrated with the following code: > > > > Yeah, when we added a second method to write an SQL-language function, > one > > that doesn't simply accept a string body, we didn't update that section > to > > point out that is the string input variant of create function that is > > affected in this manner, the non-string (atomic) variant stores the > result > > of parsing the inline code as opposed to storing the raw text. > > > > David J. > > I missed that other part in the manual (which is in a totally different > section). Should I report the missing update in section 36.17.6.1. of > the documentation as a documentation issue, or is it not necessary? > > Kind regards, > Jan Behrens > > >
Re: search_path for PL/pgSQL functions partially cached?
On Wed, 1 Jan 2025 11:19:32 -0700 "David G. Johnston" wrote: > On Wed, Jan 1, 2025 at 10:55 AM Jan Behrens wrote: > > > On Sat, 28 Dec 2024 00:40:09 +0100 > > Jan Behrens wrote: > > > > > On Fri, 27 Dec 2024 13:26:28 -0700 > > > "David G. Johnston" wrote: > > > > > > > > Or is it documented somewhere? > > > > > > > > > > https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING > > > > > > I can't find any notes regarding functions and schemas in that section. > > > > > "Because PL/pgSQL saves prepared statements and sometimes execution plans > in this way, SQL commands that appear directly in a PL/pgSQL function must > refer to the same tables and columns on every execution; that is, you > cannot use a parameter as the name of a table or column in an SQL command." > > Changing search_path is just one possible way to change out which object a > name tries to refer to so it is not called out explicitly. The first part of the cited sentence seems helpful ("you must always refer to the same tables and columns on every execution"). I would thus conclude that using a dynamic search_path when running functions or procedures is *always* considered errorneous (even though not reported by the database as an error), except when using EXECUTE. I wonder if the database could/should generate an error (or at least a warning?) when a function or procedure without a "SET search_path" statement uses a non-qualified name? According to the documentation using a dynamic search_path to refer to different entities in the database is a case that "must" not happen. But following through, this might lead to more warnings one might expect, e.g. when using simple operators such as "=" or the "IN" or "CASE expression WHEN" statements, as these rely on the search_path as well. Should such code be considered non-idiomatic, dangerous, or even errorneous if a "SET search_path" option is missing in the function's/procedure's definition? Maybe I'm overthinking this. But in practice, I've been running into surprising issues whenever functions and schemas are involved, and I'm not sure if every programmer will be aware of how important it is to properly set a search_path in the function's defintion after reading the documentation. (Besides, it's not always possible in procedures.) > > > "SQL-language and PL-language functions provided by extensions are at > > risk of search-path-based attacks when they are executed, since parsing > > of these functions occurs at execution time not creation time." > > > Moreover, it isn't true for all > > SQL-language functions, as can be demonstrated with the following code: > > Yeah, when we added a second method to write an SQL-language function, one > that doesn't simply accept a string body, we didn't update that section to > point out that is the string input variant of create function that is > affected in this manner, the non-string (atomic) variant stores the result > of parsing the inline code as opposed to storing the raw text. > > David J. I missed that other part in the manual (which is in a totally different section). Should I report the missing update in section 36.17.6.1. of the documentation as a documentation issue, or is it not necessary? Kind regards, Jan Behrens
Re: search_path for PL/pgSQL functions partially cached?
On Thu, 2 Jan 2025 13:48:29 +0100 Pavel Stehule wrote: > čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens > napsal: > > > On Thu, 2 Jan 2025 12:40:59 +0100 > > Pavel Stehule wrote: > > > > > How can you identify unwanted usage of non qualified identifiers from > > > wanted usage of non qualified identifiers? It is a common pattern for > > > sharding. Using not qualified identifiers of operators, functions is > > common > > > when you are using orafce extensions, etc. > > > > I don't fully understand the use-case. Could you elaborate? > > > > As I understand, even if identifiers are not fully-qualified, it is > > forbidden to use the search_path to refer to different database > > entities at run-time (as David pointed out). > > > > So I don't understand how a dynamic "search_path" could be used in any > > scenario within functions except when EXECUTE is involved. > > > > you don't need more databases > > schema one - customer x > schema two - customer y > > create table one.t1(..); create table one.t2(..); > create table two.t1(..); create table two.t2(..); > > set search_path to one; > -- work with data set of customer x > > set search_path to two; > -- work wit data set of customer y > > some times can be pretty ineffective to have database per customer - more > connect, disconnect in postgres is much more expensive than SET search_path > TO .. and maybe RESET plans; I guess that means there is a practical application where search_path MAY change at runtime IF done in different sessions or if the cache is reset using the DISCARD command: https://www.postgresql.org/docs/17/sql-discard.html I assume DISCARD PLANS would be the right command? This seems to be a very special case though. I think there should be a warning in the documentation of CREATE FUNCTION with regard to schemas anyway, though. Regards, Jan
Re: search_path for PL/pgSQL functions partially cached?
čt 2. 1. 2025 v 13:15 odesílatel Jan Behrens napsal: > On Thu, 2 Jan 2025 12:40:59 +0100 > Pavel Stehule wrote: > > > How can you identify unwanted usage of non qualified identifiers from > > wanted usage of non qualified identifiers? It is a common pattern for > > sharding. Using not qualified identifiers of operators, functions is > common > > when you are using orafce extensions, etc. > > I don't fully understand the use-case. Could you elaborate? > > As I understand, even if identifiers are not fully-qualified, it is > forbidden to use the search_path to refer to different database > entities at run-time (as David pointed out). > > So I don't understand how a dynamic "search_path" could be used in any > scenario within functions except when EXECUTE is involved. > you don't need more databases schema one - customer x schema two - customer y create table one.t1(..); create table one.t2(..); create table two.t1(..); create table two.t2(..); set search_path to one; -- work with data set of customer x set search_path to two; -- work wit data set of customer y some times can be pretty ineffective to have database per customer - more connect, disconnect in postgres is much more expensive than SET search_path TO .. and maybe RESET plans; > > > > > Using qualified identifiers everywhere strongly reduces readability. > There > > are no aliases to the schema, so aliases cannot help. > > Yes, I agree on that. Using "SET search_path" in the function's > definition fixes that problem, but it's easy to miss how important this > is from reading the documentation: > > The manual regarding "CREATE FUNCTION" refers to "search_path" only > within the "Writing SECURITY DEFINER Functions Safely" section. It's > easy to skip that part unless you use that feature. Moreover, that > section alone doesn't explain the weird behavior of four different > outcomes of a function with only two schemas involved which I brought > up in the beginning of this thread. > > The part on "SET configuration_parameter" part in the "CREATE FUNCTION" > documentation doesn't mention the search_path or schemas. And I don't > think you can expect every programmer will read the "Plan Caching" > subsection in the "PL/pgSQL under the Hood" section. But even then, the > information is just provided indirectly. > > yes, probably nobody reads the plan caching doc. And if they read it, then because they have performance problems. > Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't > give any hint either. > This is a question - this is a generic feature in Postgres. Every query can be impacted by setting of search_path. >From my perspective, there can be a note in the documentation related to copy types and row types. https://www.postgresql.org/docs/current/plpgsql-declarations.html#PLPGSQL-DECLARATION-TYPE The problem that you found is not just about the change of search_path. Same problem can be found after altering the table. Regards Pavel > I think (assuming that the behavior isn't fixed) that some slighly more > prominent warning would be reasonable. > > > > > you can identify the functions where search_path is not explicitly > assigned > > > > select oid::regprocedure > > from pg_proc > > where pronamespace::regnamespace not in ('pg_catalog', > > 'information_schema') > >and not exists(select 1 from unnest(proconfig) g(v) where v ~ > > '^search_path'); > > > > > > Regards > > > > Pavel > > Kind regards, > Jan >
Re: search_path for PL/pgSQL functions partially cached?
On Thu, 2 Jan 2025 12:40:59 +0100 Pavel Stehule wrote: > How can you identify unwanted usage of non qualified identifiers from > wanted usage of non qualified identifiers? It is a common pattern for > sharding. Using not qualified identifiers of operators, functions is common > when you are using orafce extensions, etc. I don't fully understand the use-case. Could you elaborate? As I understand, even if identifiers are not fully-qualified, it is forbidden to use the search_path to refer to different database entities at run-time (as David pointed out). So I don't understand how a dynamic "search_path" could be used in any scenario within functions except when EXECUTE is involved. > > Using qualified identifiers everywhere strongly reduces readability. There > are no aliases to the schema, so aliases cannot help. Yes, I agree on that. Using "SET search_path" in the function's definition fixes that problem, but it's easy to miss how important this is from reading the documentation: The manual regarding "CREATE FUNCTION" refers to "search_path" only within the "Writing SECURITY DEFINER Functions Safely" section. It's easy to skip that part unless you use that feature. Moreover, that section alone doesn't explain the weird behavior of four different outcomes of a function with only two schemas involved which I brought up in the beginning of this thread. The part on "SET configuration_parameter" part in the "CREATE FUNCTION" documentation doesn't mention the search_path or schemas. And I don't think you can expect every programmer will read the "Plan Caching" subsection in the "PL/pgSQL under the Hood" section. But even then, the information is just provided indirectly. Searching for "schema" in "CREATE FUNCTION"'s documentation doesn't give any hint either. I think (assuming that the behavior isn't fixed) that some slighly more prominent warning would be reasonable. > > you can identify the functions where search_path is not explicitly assigned > > select oid::regprocedure > from pg_proc > where pronamespace::regnamespace not in ('pg_catalog', > 'information_schema') >and not exists(select 1 from unnest(proconfig) g(v) where v ~ > '^search_path'); > > > Regards > > Pavel Kind regards, Jan
Re: search_path for PL/pgSQL functions partially cached?
Hi > > > > some times can be pretty ineffective to have database per customer - more > > connect, disconnect in postgres is much more expensive than SET > search_path > > TO .. and maybe RESET plans; > > I guess that means there is a practical application where search_path > MAY change at runtime IF done in different sessions or if the cache is > reset using the DISCARD command: > > https://www.postgresql.org/docs/17/sql-discard.html > > I assume DISCARD PLANS would be the right command? > that depends. plan inside plan cache is invalidated when search_path is different. You use RESET plans because you want to release all plans quickly. Unfortunately, the types assigned to plpgsql variables are not invalidated. This is the source of problems. It is a classical problem - it is hard to say when you should invalidate cache. Current design is not ideal - but it is almost a good enough compromise between correctness and performance. It is true, so nobody did some work to fix it. So maybe the impact to performance should not be too bad, but it is not an easy issue. plans are isolated - and the impact of one plan to the second plan is zero. For variables it is exactly opposite. > > This seems to be a very special case though. I think there should be a > warning in the documentation of CREATE FUNCTION with regard to schemas > anyway, though. > I am not sure. If you want to use this warning, then it should be everywhere where any non-qualified identifier can be used. Maybe in plpgsql can be more accented so almost everything in plpgsql depends on the current setting of search_path. Lot of people don't understand, so every expression in plpgsql is SQL and every expression is executed like part of a query. And unfortunately there are some different caches - plpgsql cache and plan cache and both caches are invalidated at different times (I think so plpgsql cache is not resetted by RESET PLANS). Maybe it is better to explain how plpgsql works. It is a little bit different from well known interpreted languages. > Regards, > Jan >