Re: DEFINER / INVOKER conundrum

2023-04-03 Thread walther

Erik Wienhold:

A single DEFINER function works if you capture current_user with a parameter
and default value.  Let's call it claimed_role.  Use pg_has_role[0] to check
that session_user has the privilege for claimed_role (in case the function is
called with an explicit value), otherwise raise an exception.

Connect as postgres:

CREATE FUNCTION f(claimed_role text default current_user)
  RETURNS TABLE (claimed_role text, curr_user text, sess_user text)
  SECURITY DEFINER
  LANGUAGE sql
  $$ SELECT claimed_role, current_user, session_user $$;


For me, checking whether session_user has the privilege for claimed_role 
is not enough, so I add a DOMAIN to the mix:


CREATE DOMAIN current_user_only AS NAME CHECK (VALUE = CURRENT_USER);

CREATE FUNCTION f(calling_user current_user_only DEFAULT CURRENT_USER)
...
SECURITY DEFINER;

This works, because the domain check is evaluated in the calling context.

Best,

Wolfgang




Re: current_role of caller of a DEFINER function

2024-06-26 Thread walther

Dominique Devienne:

1) Is there any way to know the current_role of the caller of a
DEFINER function. I fear the answer is no, but better be sure from
experts here.


You can do something like this:

CREATE DOMAIN current_user_only AS text
  CONSTRAINT current_user_only CHECK (VALUE = CURRENT_USER);

CREATE FUNCTION func(
  calling_user current_user_only DEFAULT CURRENT_USER
) ... SECURITY DEFINER;

The default value will be evaluated in the context of the calling user, 
the constraint forces it to never be set explicitly to something else.


Thus you can use calling_user inside your function.

Best,

Wolfgang




Re: Fwd: A million users

2024-11-13 Thread walther

Dominique Devienne:

Hi. Sure, that's a good point, and a best practice IMHO.
But I already do that, and the original question remain,
i.e. how does PostgreSQL with thousands or millions of roles?
In my use case, 1000 LOGIN users, and 10'000 schema related ROLEs,
is possible, with can translate to millions of GRANTs.

It matters to me especially, since I'm using pg_has_role() in RLS predicates,
in addition to using ROLEs for access to schemas as usual (see above).
I'm not in a point to test that myself at this time, but if anyone looked into
how PostgreSQL scales with many roles (O(N)? O(logN)? O(NlogN)?),
I'd be interested in pointers to that research or those benchmarks.


I don't have any benchmarks, but the following data point:

We use PostgREST [1] which connects to the database with one 
"authenticator" role. For each request it handles, it does a SET ROLE to 
a role defined in the Authorization Header (JWT).


Some numbers we are running with currently:
- 1 authenticator role
- ~ 127.000 user roles granted to "authenticator"
- ~ 14.000 "scope" roles granted to the user roles (tenants, groups, ..)
- ~ 15 "context" roles granted to user roles ("admin", "user", ...)
- ~ 50 "access" roles granted to context roles ("view_x", "do_y", ...)

Only the access roles have any direct privileges granted.

We currently have ~ 700 RLS policies defined. Those are created TO the 
context roles. The policies check the current role's scope roles to 
select "allowed" rows.


In total, we have ~370.000 roles granted to each other (pg_auth_members).

Except for one thing, we have never had any real problems with this. We 
didn't observe anything getting massively worse with many roles, even 
though we use them extensively. RLS policies need to be carefully 
written to get any performance, though.


The one problem we found is:

The first time the authenticator role does a SET ROLE in a session it's 
**terribly** slow. With fewer users back then it took 6-7 minutes to do 
it. Any SET ROLE afterwards in the same session would be fast. Even more 
annoying - killing the session with SET ROLE running would not work 
properly and leave zombie processes. Giving the authenticator role the 
SUPERUSER privilege avoids the problem and makes it instant. However.. 
that's not very desirable.


There were some improvements, IIRC in the 17 cycle (?), in that area, 
but I had not have the time to test it with that. We are still on v15 
and the last time I tested this was ~ two years ago. I still wasn't able 
to put together a simple reproducer either.


You should *probably* be better off with your different LOGIN roles, I 
assume the role cache builds up much quicker in that case.


Hope that helps.

Best,

Wolfgang

[1]: https://postgrest.org




Re: Fwd: A million users

2024-11-22 Thread walther

Eric Hanson:
Did you find some way to prevent RESET ROLE?  I once advocated for a NO 
RESET option on SET ROLE [1] so that RESET ROLE would be impossible for 
the rest of the session.  Still think it would be helpful.


Yeah, this is still on my list of things to research more about 
eventually - currently still unsolved.


For my use-case the NO RESET would need to apply until the end of the 
transaction, not end of the session.


I imagine something like an extension, that would:
- block any SET SESSION ROLE
- block any RESET ROLE
- only allow SET LOCAL ROLE when CURRENT_USER has the right to do so

Then the effect of SET LOCAL ROLE would still be reversed at the end of 
the transaction, but you could never "escape" a SET LOCAL ROLE that was 
set earlier.


Best,

Wolfgang




Execution order of CTEs / set_config and current_setting in the same query

2020-12-09 Thread Wolfgang Walther

Hi,

with PostgREST [1] we are translating HTTP requests into SQL queries. 
For each request we are setting some metadata (headers, ...) as GUCs.


We used to do it like this:
SET LOCAL request.headers.x = 'y';
...

Since this is user-provided data, we want to use parametrized/prepared 
statements. This is not possible with SET, so we switched to:


SELECT set_config($1, $2, true), ...;

Both these queries are preceding our main query. The SELECT set_config 
is a bit slower than the SET LOCAL, probably because of more overhead on 
the SELECT.


Now, we are wondering: To reduce overhead, can we move the set_config 
calls to a CTE as part of the main query? The values would need to be 
available with current_setting(...) in the remaining query.


Of course we would need to ensure execution order, so that this CTE will 
always be fully executed, before all the other parts of the query.


Is this possible to do?

We did some basic testing, that seemed to be successful:

WITH set AS (
  SELECT set_config('pgrst.hello', 'world', true)
)
SELECT current_setting('pgrst.hello')
FROM set;

or

WITH set AS (
  SELECT set_config('pgrst.hello', 'world', true)
),
main AS (
  SELECT
current_setting('pgrst.hello') AS hello,

  FROM set, 
)
SELECT
  current_setting('pgrst.hello'),
  main.hello,
  
FROM set, main, ;


Queries like this seem to have set the GUC correctly. But is this 
guaranteed? What would need to be done to guarantee it?


I have a feeling that even though this works for those simple cases, 
there is some risk involved...


Additional question: If this can be guaranteed - what about using 
set_config('role', 'xxx', true) in the same way? Putting this into those 
examples above and checking with CURRENT_USER seems to work as well. How 
likely would this lead to problems with privileges / permissions?


Any input/insight would be helpful.

Thanks

Wolfgang

[1]: https://postgrest.org




Re: Execution order of CTEs / set_config and current_setting in the same query

2020-12-09 Thread Wolfgang Walther

Tom Lane:

I think you're on fairly shaky ground here.  Generally speaking, a CTE
will be executed/read only when the parent query needs the next row from
it.  Your examples ensure that the CTE is read before the parent query's
results are computed; but in realistic usage you'd presumably be joining
the CTE with some other table(s), and then the execution order is going
to be a lot harder to predict.  This approach is also going to
fundamentally not work for settings that need to apply during planning
of the query (which, notably, probably includes "role").


Ok, thanks for confirming that.


You'd be far better off to investigate ways to send SET LOCAL first,
without incurring a separate network round trip for that.  If you're
using simple query mode that's easy, you can just do

res = PQexec("SET LOCAL ... ; ");

In extended query mode you can't get away with that, but you might be able
to issue the SET LOCAL without immediately waiting for the result.


Yes, that's what we did so far. We switched to set_config to parametrize 
the query.


Is there any way to not wait for a SELECT? I don't care about the 
resultset, so I need something like PERFORM but for SQL, not plpgsql, I 
think?





ALTER ROLE ... SET in current database only

2021-02-15 Thread Wolfgang Walther

Hi,

I'm trying to set a GUC for a role in the current database only - but 
don't know the name of the database at the time of writing code. Could 
be development, staging, ...


I would basically like to do something like this:

ALTER ROLE a IN CURRENT DATABASE SET b TO c;

Although that syntax doesn't exist (yet).

I think I could wrap it in a DO block and create the statement 
dynamically. Alternatively, I could probably INSERT INTO / UPDATE 
pg_db_role_setting manually?


Any other ideas how to achieve this easily? Does the proposed "IN 
CURRENT DATABASE" syntax sound useful to anyone else?


Best,

Wolfgang




Re: ALTER ROLE ... SET in current database only

2021-02-16 Thread Wolfgang Walther

Abdul Qoyyuum:
Wouldnt you need to connect to the database first before you can ALTER 
ROLE anything?


Of course, otherwise the notion of "current database" wouldn't make 
sense at all. But that's only before executing the code. I am not 
writing and executing this code at the same time.


In my case I'm seeding a postgres docker container with settings and 
data on first launch. The database name is passed to the container via 
environment variable. But, I'm sure there are other use-cases where code 
should be written once, but be executed in different databases.


Best,

Wolfgang




Re: Backward compat issue with v16 around ROLEs

2024-09-11 Thread Wolfgang Walther

Dominique Devienne:

Hi David. I did as you suggested, and it fails the same way. Did I
misunderstand you? --DD

[..]

ddevienne=> grant dd_owner to dd_admin with admin option; -- 


I think this needs to be the other way around:

  grant dd_admin to dd_owner with admin option;

Best,

Wolfgang




Re: DROP OWNED BY fails with #53200: ERROR: out of shared memory

2022-01-11 Thread Wolfgang Walther

Dominique Devienne:

I wish for DB-specific ROLEs BTW...


Same here. That would be so useful.




Re: Event-Triggers for DB owners instead of just SUPERUSER

2025-04-14 Thread Wolfgang Walther

Dominique Devienne:

Can't event-triggers also be available to DB owners, instead of just SUPERUSER?


There's a recent -hackers thread exactly about this:

https://www.postgresql.org/message-id/flat/CAGRrpzbtYDkg7_xwfzrqByYgCJQbbL38tADyuN%2B6tAkbA-Pnkg%40mail.gmail.com

Best,

Wolfgang


Re: Interesting case of IMMUTABLE significantly hurting performance

2025-04-14 Thread Wolfgang Walther

Tom Lane:

If you err in the other direction, you don't get slapped on the
wrist that way.  We're willing to inline VOLATILE functions,
for instance, whether or not the contained expression is volatile.
Similarly for STRICT, and I think parallel safety as well.
So my own habit when writing a SQL function that I wish to be
inlined is to leave off all those markings.


According to [1], this only applies to inlining of scalar functions, but 
not to table functions, which *need* to be either STABLE or IMMUTABLE.


Just mentioning this for anyone taking this advice blindly and leaving 
all marks off, which might not always work as expected in the general case.


Best,

Wolfgang

[1]: https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions