Re: Validating check constraints without a table scan?
Thanks for the suggestion of directly updating the system catalog, that may be the “best” option. “*Most likely your query is not exactly the same as the check constraint. Think about NULL and similar.*” No that’s not sufficient to explain it. The SELECT is obviously the negation of the check constraint, so there is a difference there. But besides that, it’s very clear that Postgres does not make use of indexes while validating constraints. I suspect it doesn’t involve the query planner at all. On Fri, 15 Nov 2024 at 15:28, Torsten Förtsch wrote: > On Fri, Nov 15, 2024 at 9:38 AM Philip Couling wrote: > >> Is there a solid reason why adding a check constraint does not use >> existing indexes for validation. >> >> We are currently looking at partitioning a multi TB table leaving all >> existing data in place and simply attaching it as a partition to a new >> table. To prevent locking, we are trying to add an INVALID check constraint >> first and then validate it. >> >> I can trivially prove the invalid constraint is valid with a simple >> SELECT which will use an existing index and return instantaneously. But >> AFAIK Theres no way to mark a constraint as valid without scanning all the >> rows. >> > > Most likely your query is not exactly the same as the check constraint. > Think about NULL and similar. > > >> This operation is really problematic on a production database with heavy >> IO load. >> >> Is there a solid ready why validating check constraints cannot use >> existing indexes? If I can prove the constraint is valid so trivially with >> a SELECT, then why can Postgres not do the same (or similar)? >> > > Here is what has worked for me many times: > > 1. create the check constraint as NOT VALID. From now on no new or updated > row is allowed to violate it. > 2. check if the constraint holds with a query on a binary replica. Make > sure the query starts only when the constraint is visible on the replica. > 3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE > conname='your_constraint_name' AND conrelid='schema.table'::regclass > > How you perform step 2 is irrelevant. Checking it on a replica would > simply avoid the load on the master. You just need to make sure there is no > conflicting data in the table. > > WARNING, you need to be damn sure of your data if you do that. But if you > are, it works. > > Here is the procedure how I solved the same problem for some of our > multi-TB tables (PG14): > > The table has a column called transaction_time. We wanted to partition by > that column. For some historical reason the column did not have a NOT NULL > constraint. However, there was no way our processes could insert NULL in > that column and there was no row with NULL in that field. So, first was to > add the NOT NULL constraint: > > BEGIN; > > ALTER TABLE my.table > ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT > NULL) NOT VALID; > > UPDATE pg_constraint >SET convalidated=true > WHERE conname = 'transaction_time_not_null' >AND conrelid = 'my.table'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > COMMIT; > > Now for cosmetic purposes we first turn the check constraint above into a > normal NOT NULL constraint: > > BEGIN; > > SET LOCAL client_min_messages = 'debug4'; > -- expecting this message > -- DEBUG: existing constraints on column "table.transaction_time" are > sufficient to prove that it does not contain nulls > ALTER TABLE my.table > ALTER COLUMN transaction_time SET NOT NULL; > RESET client_min_messages; > > ALTER TABLE my.table > DROP CONSTRAINT transaction_time_not_null; > > COMMIT; > > If you set client_min_messages to something like debug4, then the database > tells you if it wants to scan the table or if existing constraints are > sufficient to prove the condition. > > transaction_time in our case is never in the future. Also database > transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust > the effect of any action performed more than 30 seconds ago in the database > is visible. > > So, I set the time after which new rows go to the new partition at least > 10 minutes from now at the next hour boundary. 30 seconds would be good > enough. I chose 10 minutes just for extra safety. > > SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS > switch_time\gset > > Next comes the actual change: > > BEGIN; > > -- rename the existing table > ALTER TABLE my.table RENAME TO table_old; > > -- drop triggers. We wi
Validating check constraints without a table scan?
Is there a solid reason why adding a check constraint does not use existing indexes for validation. We are currently looking at partitioning a multi TB table leaving all existing data in place and simply attaching it as a partition to a new table. To prevent locking, we are trying to add an INVALID check constraint first and then validate it. I can trivially prove the invalid constraint is valid with a simple SELECT which will use an existing index and return instantaneously. But AFAIK Theres no way to mark a constraint as valid without scanning all the rows. This operation is really problematic on a production database with heavy IO load. Is there a solid ready why validating check constraints cannot use existing indexes? If I can prove the constraint is valid so trivially with a SELECT, then why can Postgres not do the same (or similar)?
Re: Validating check constraints without a table scan?
It looks like updating pg_constraint isn't an option for AWS RDS due to the way AWS doesn't give you superuser access. Thanks a lot for the suggestion anyway. On Fri, 15 Nov 2024 at 15:28, Torsten Förtsch wrote: > On Fri, Nov 15, 2024 at 9:38 AM Philip Couling wrote: > >> Is there a solid reason why adding a check constraint does not use >> existing indexes for validation. >> >> We are currently looking at partitioning a multi TB table leaving all >> existing data in place and simply attaching it as a partition to a new >> table. To prevent locking, we are trying to add an INVALID check constraint >> first and then validate it. >> >> I can trivially prove the invalid constraint is valid with a simple >> SELECT which will use an existing index and return instantaneously. But >> AFAIK Theres no way to mark a constraint as valid without scanning all the >> rows. >> > > Most likely your query is not exactly the same as the check constraint. > Think about NULL and similar. > > >> This operation is really problematic on a production database with heavy >> IO load. >> >> Is there a solid ready why validating check constraints cannot use >> existing indexes? If I can prove the constraint is valid so trivially with >> a SELECT, then why can Postgres not do the same (or similar)? >> > > Here is what has worked for me many times: > > 1. create the check constraint as NOT VALID. From now on no new or updated > row is allowed to violate it. > 2. check if the constraint holds with a query on a binary replica. Make > sure the query starts only when the constraint is visible on the replica. > 3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE > conname='your_constraint_name' AND conrelid='schema.table'::regclass > > How you perform step 2 is irrelevant. Checking it on a replica would > simply avoid the load on the master. You just need to make sure there is no > conflicting data in the table. > > WARNING, you need to be damn sure of your data if you do that. But if you > are, it works. > > Here is the procedure how I solved the same problem for some of our > multi-TB tables (PG14): > > The table has a column called transaction_time. We wanted to partition by > that column. For some historical reason the column did not have a NOT NULL > constraint. However, there was no way our processes could insert NULL in > that column and there was no row with NULL in that field. So, first was to > add the NOT NULL constraint: > > BEGIN; > > ALTER TABLE my.table > ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT > NULL) NOT VALID; > > UPDATE pg_constraint >SET convalidated=true > WHERE conname = 'transaction_time_not_null' >AND conrelid = 'my.table'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > COMMIT; > > Now for cosmetic purposes we first turn the check constraint above into a > normal NOT NULL constraint: > > BEGIN; > > SET LOCAL client_min_messages = 'debug4'; > -- expecting this message > -- DEBUG: existing constraints on column "table.transaction_time" are > sufficient to prove that it does not contain nulls > ALTER TABLE my.table > ALTER COLUMN transaction_time SET NOT NULL; > RESET client_min_messages; > > ALTER TABLE my.table > DROP CONSTRAINT transaction_time_not_null; > > COMMIT; > > If you set client_min_messages to something like debug4, then the database > tells you if it wants to scan the table or if existing constraints are > sufficient to prove the condition. > > transaction_time in our case is never in the future. Also database > transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust > the effect of any action performed more than 30 seconds ago in the database > is visible. > > So, I set the time after which new rows go to the new partition at least > 10 minutes from now at the next hour boundary. 30 seconds would be good > enough. I chose 10 minutes just for extra safety. > > SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS > switch_time\gset > > Next comes the actual change: > > BEGIN; > > -- rename the existing table > ALTER TABLE my.table RENAME TO table_old; > > -- drop triggers. We will recreate them later. > DROP TRIGGER ... ON my.table_old; > DROP TRIGGER ...; > > -- create partitioned table > CREATE TABLE my.table ( > LIKE my.table_old > INCLUDING DEFAULTS > INCLUDING CONSTRAINTS > ) > PARTITION BY RANGE (transaction_time); > > -- recreate triggers > CREATE TRIGG
Re: Validating check constraints without a table scan?
Jian He The context here is constraints for partitioning as suggested in documentation https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-MAINTENANCE An example constraint from the documentation: ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02 CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' ); If logdate is indexed, then this constraint can be manually validated very quickly using a SELECT that will take advantage of the index SELECT 1 FROM measurement_y2008m02 WHERE logdate < DATE '2008-02-01' OR logdate >= DATE '2008-03-01' LIMIT 1 If the constraint is valid the query will return quickly with no rows, if any rows violate the constraint it will also return very quickly but return with a single row with column value: 1. I guess that validating constraints doesn't invoke the query planner, or otherwise the conversion is too complex for the query planner. The conversion being: - from: NOT (logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01') - to: logdate < DATE '2008-02-01' OR logdate >= DATE '2008-03-01' Hope that clarifies it. On Wed, 20 Nov 2024 at 09:45, jian he wrote: > On Fri, Nov 15, 2024 at 4:38 PM Philip Couling wrote: > > > > Is there a solid reason why adding a check constraint does not use > existing indexes for validation. > > > > can you give an sql example (except not-null) > where indexes can be used for check constraint validation? > i am not sure I understand it correctly. >