query multiple schemas

2024-04-21 Thread Norbert Sándor

Hello,

I have a database with multiple tenants with a separate schema for each 
tenant.
The structure of each schema is identical, the tenant ID is the name of 
the schema.


What I would like to achieve is to able to query tables in all schemas 
at once with the tenant IDs added to the result  set.


I experimented with typed solutions like described in 
https://dev.to/staab/how-to-query-multiple-schemas-with-postgresql-28kj 
without much success.

So I turned to a more dynamic JSON-based solution.

Please note that I'm new to plpgsql, so /any/ (even a less related) 
advice is welcome :)


My current experimental function is:

   CREATE OR REPLACE FUNCTION demo.tenant_union_query(tbl anyelement)

   RETURNS setof json AS $func$

   declare

   _select text;

   begin

   _select := (select

   string_agg(

   format('select t.*, %L tenantId from %I.%I t', schema_name,
   schema_name, pg_typeof(tbl)),

   E'\n' || ' union all ' || E'\n')

   from (

   SELECT schema_name

   FROM information_schema.schemata

   where schema_name not in ('information_schema') and schema_name not
   like 'pg_%'

   ) tenants

   );

   return query execute 'select row_to_json(r) from (' || _select || ')
   as r';

   END;

   $func$ LANGUAGE plpgsql;

And this is how I use it to query a "usual" result-set-like result with 
the tenant ID in the 1st column, followed by the fields from the given 
table:


   select r->>'tenantid' tenantId, json_populate_record(null::mytable,
   r) from tenant_union_query(null::mytable) r order by tenantId; -- (1)

The above solution seems to work, my questions are:

1. Is there a better way to achieve the same functionality? Maybe
   without using JSON as an intermediate representation?
2. Is there a way to further simplify the usage, like instead of the
   query (1) above something more simple, like: select * from
   tenant_union_query_2(null::mytable) order by tenantId;

Thanks for your help in advance.
Best regards,
Norbi


Re: query multiple schemas

2024-04-23 Thread Norbert Sándor


  
  
> using a dynamic number of schemas

Although there are historical reasons behind our "schema per tenant"
architecture, it provides very good logical separation of data, and
is very convenient that we don't need to include the tenant ID in
each query (I'm sure that it would cause lots of bugs and trouble).
Besides, we use Hibernate and it
  has great support for this architecture.

> In any case, I'm interested in what works well for you.

I went with the dynamic solution I proposed in my original email
(which performs a UNION of the queried table from all tenants).
Performance is currently not a priority in our use-cases, otherwise
I would have probably chosen a more static solution (which wouldn't
be easy to maintain continuously, I'm sure of).

I further simplified its usage, so finally it is fairly comfortable
to use either directly:
select tenantId, (record).*
from tenant_query(null::mytable)
where (record).type=2
order by tenantId, (record).name;

or by using a temporary view:
create temp view
all_tenant_mytable as
select tenantId, (record).* from tenant_query(null::mytable);

select *
from all_tenant_mytable
where type=2
order by tenantId, name;

> In my case, the revision/version of the schema could be
  different as well

This complicates things very much, it is probably not possible to
implement a solution as comfortable as in my case.

Although I think the JSON-based solutions proposed in this thread
would work in your case as well.
In my current solution I also use JSON as intermediate
representation - although I feel it is because of my lack
of deeper knowledge of Postgresql's type system.
The difference is that you would need to use JSON as the final
representation, and reference the JSON fields using Postgres's
  JSON operators.

> And if/when I get back to this issue myself, I'll do the
  same.

My current solution is not much different than the one I posted in
my original question.
My main difficulty was the relatively static nature of Postgresql's
type system, so this solution is a result of lots of trial-and-error
rounds :)
Take a look at it, and you (and maybe others) may have
recommendations, e.g. how I could get rid of the usage of the
intermediate JSON "layer".
(Although I have to admit: it is amazing that it is possible to
implement this at all ;) )

  
CREATE OR REPLACE FUNCTION tenant_query_json(tbl anyelement)RETURNS setof json AS $func$declare	_select text;begin	_select := (select	string_agg(	format('select t.*, %L tenantId from %I.%I t', schema_name, schema_name, pg_typeof(tbl)),	E'\n' || ' union all ' || E'\n')	from (		SELECT schema_name		FROM information_schema.schemata		where schema_name not in ('information_schema') and schema_name not like 'pg_%'	  ) tenants	  );
	return query execute 'select row_to_json(r) from (' || _select || ') as r';END;$func$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION tenant_query(tbl anyelement)RETURNS table(tenantId text, record anyelement) AS $func$begin	return query	select t.tenantId, t.rec		from (			selectjr->>'tenantid' tenantId,json_populate_record(tbl, jr) rec			from tenant_query_json(tbl) jr		) t;END;$func$ LANGUAGE plpgsql;

  

As you can see in my examples above, I use the tenant_query() function but in your case
(if your schemas are different) something similar to tenant_query_json() may work better.

--
Norbi

On 2024. 04. 23. 9:33, Dominique
  Devienne wrote:


  
  
On Sun, Apr 21, 2024 at 11:12 PM Tom Lane 
  wrote:


  Steve
Baldwin 
writes:
> If the number of tenant schemas is reasonably static,
you could write a
> plpgsql function to create a set of UNION ALL views

Another idea is to build a partitioned table
  
  
  
  Hi Norbert. I asked a [similar question][1] a while back,
  and unfortunately didn't get any actionable input,
perhaps
  because I already mentioned in my message the options
  proposed here so far. Seems like people like us, using a
  dynamic number of schemas, are outliers in database-land.
  
  
  In my case, the revision/version of the schema could be
  different as well, which would complicate the
partitioning idea.
  
  
  In any case, I'm interested in what works well for you.
  And if/when I get back to this issue myself, I'll do the
same.
  
  
  Thanks, --DD