Restoring only a subset of schemas

2025-03-17 Thread Sylvain Cuaz

Hi all,

    I have a DB with one schema named "Common" holding data referenced by other schemas. All other 
schemas have the same structure (tables and fields) and are named "cXXX" where XXX is just an int. 
Thus the only cross-schema foreign keys are in "cXXX" pointing to "Common", and each "cXXX" is 
completely independent of other "cXXX" schemas.
    Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the "Common" 
schema :
- if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data 
inside "Common" and the restore fails.
- if I could pass --create --exclude-schema='c*' (fictional notation as patterns are only recognized 
by pg_dump), then all schemas would be created, with no data inside except for "Common". Creating 
all schemas is a waste of time, but more importantly would make restoring other schemas more 
difficult (e.g. rows should be inserted before creating foreign keys).
Note : to check the behaviour of pg_restore above, I pass -f- to check the SQL as it is far quicker 
than to actually restore a DB.


Maybe a new --include-create-schema option should be added to emit CREATE SCHEMA in addition to 
objects inside it ? That way I could :
1. --create --include-create-schema --schema=Common and have a DB with all DB-level properties 
(DEFAULT PRIVILEGES, COMMENT, SET parameter, etc.) and one schema with all of its data and 
schema-level properties (DEFAULT PRIVILEGES, COMMENT, GRANT USAGE).
2. then at any point later without the --create, with as many schemas I need :  
--include-create-schema --schema=cXXX. And if I need to reset a "cXXX" schema, just manually DROP 
SCHEMA and restore again.


Similarly, maybe add --exclude-create-schema to additionally exclude CREATE SCHEMA for schemas 
targeted by --exclude-schema.


IOW --schema and --exclude-schema both target objects inside schemas, these 2 new options would 
allow to also have control on the schemas themselves (and their properties like DEFAULT PRIVILEGES, 
COMMENT, etc.)


Cheers,

Sylvain





Re: Restoring only a subset of schemas

2025-03-19 Thread Sylvain Cuaz

Le 17/03/2025 à 16:29, Tom Lane a écrit :

Sylvain Cuaz  writes:

      Now if I want to restore from a full dump of this DB, but with only one "cXXX" and 
the "Common"
schema :
- if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. 
it only emits data
inside "Common" and the restore fails.
- if I could pass --create --exclude-schema='c*' (fictional notation as 
patterns are only recognized
by pg_dump), then all schemas would be created, with no data inside except for 
"Common". Creating
all schemas is a waste of time, but more importantly would make restoring other 
schemas more
difficult (e.g. rows should be inserted before creating foreign keys).

In general, the solution for edge-case restore selection needs is to
make a list of the dump's contents with "pg_restore -l", edit out what
you don't want using any method you like, then use the edited list with
"pg_restore -L".


Hi,

    I am aware of that feature, but that forces me to know every type of entry that pertains to a 
schema or database (e.g. DEFAULT ACL, ACL, COMMENT, DATABASE PROPERTIES, etc.) and what about new 
ones that will be added in the future ?


Further, I don't see how it's an edge-case, at the core I just want to restore some but not all the 
schemas. This is possible for pg_dump, see my response to Adrian Klaver.



While I'd be in favor of improving pg_restore to accept wild-card
patterns,

That would definitely be appreciated.

  I'm very hesitant to start inventing new kinds of selection
switches for it.  The interactions between such switches would be a
mess.


Which interactions ? It seems to me that the name of the schema should be used as the namespace to 
check in _tocEntryRequired() in pg_backup_archiver.c, and then the dependent entries (e.g. ACL, 
COMMENT) would be handled around line 3050. I've attached a patch with some pseudo-code. In fact, 
were it not for compatibility, I'd argue that my proposed options should be the default, at least 
with --create, so as to neither output invalid SQL (for -n) nor extra unwanted ones (for -N) and to 
behave like pg_dump.


Cheers,

Sylvain

--- pg_backup_archiver.c	2025-03-18 19:43:14.297545537 +0100
+++ pg_backup_archiver_create-schema.c	2025-03-18 21:11:25.681586139 +0100
@@ -3069,17 +3069,20 @@
 		/* Apply selective-restore rules for standalone TOC entries. */
 		if (ropt->schemaNames.head != NULL)
 		{
+			ns = &ropt->include-create-schema && strcmp(te->desc, "SCHEMA") == 0 ? te->tag : te->namespace;
 			/* If no namespace is specified, it means all. */
-			if (!te->namespace)
+			if (!ns)
 return 0;
-			if (!simple_string_list_member(&ropt->schemaNames, te->namespace))
+			if (!simple_string_list_member(&ropt->schemaNames, ns))
 return 0;
 		}
 
-		if (ropt->schemaExcludeNames.head != NULL &&
-			te->namespace &&
-			simple_string_list_member(&ropt->schemaExcludeNames, te->namespace))
-			return 0;
+		if (ropt->schemaExcludeNames.head != NULL)
+		{
+			ns = &ropt->exclude-create-schema && strcmp(te->desc, "SCHEMA") == 0 ? te->tag : te->namespace;
+			if(ns && simple_string_list_member(&ropt->schemaExcludeNames, ns))
+return 0;
+		}
 
 		if (ropt->selTypes)
 		{


Re: Restoring only a subset of schemas

2025-03-19 Thread Sylvain Cuaz

Le 17/03/2025 à 16:21, Adrian Klaver a écrit :

On 3/17/25 07:57, Sylvain Cuaz wrote:

Hi all,

 I have a DB with one schema named "Common" holding data referenced by other schemas. All 
other schemas have the same structure (tables and fields) and are named "cXXX" where XXX is just 
an int. Thus the only cross-schema foreign keys are in "cXXX" pointing to "Common", and each 
"cXXX" is completely independent of other "cXXX" schemas.
 Now if I want to restore from a full dump of this DB, but with only one "cXXX" and the 
"Common" schema :
- if I pass --create --schema=Common, then the CREATE SCHEMA is missing, i.e. it only emits data 
inside "Common" and the restore fails.


I am not seeing that.

For:

pg_dump -d test -U postgres -s --create --schema=other_sch --schema=public

What is the complete command you are using for the pg_dump?


Hi,

    As I said I'm restoring, not dumping. I make daily full backups and sometimes need to restore a 
specific day, but the full database is quite big and I would like to only restore one or two schemas.


As you said, if one passes --create --schema to pg_dump then a valid SQL is produced with CREATE 
DATABASE, CREATE SCHEMA, CREATE TABLE and all objects inside the schema.


But if one passes --create --schema to pg_restore then an invalid SQL is produced because it 
contains CREATE DATABASE, CREATE TABLE but it doesn't contain the CREATE SCHEMA needed for the 
tables. Is there any reason for that discrepancy between dump & restore and for outputting invalid 
SQL ?


My proposed --include-create-schema would just add the CREATE SCHEMA so that pg_restore behaves the 
same as pg_dump, and would allow to output valid SQL. But ideally this option shouldn't even be 
needed because pg_restore would just emit CREATE SCHEMA like pg_dump.




What Postgres version(s) are you using? 


A lot :-) But for this problem I'm using 13 & 15.


Cheers,

Sylvain.