Re: Validating check constraints without a table scan?

2024-11-20 Thread Philip Couling
Thanks for the suggestion of directly updating the system catalog, that may
be the “best” option.

“*Most likely your query is not exactly the same as the check constraint.
Think about NULL and similar.*”

No that’s not sufficient to explain it. The SELECT is obviously the
negation of the check constraint, so there is a difference there. But
besides that, it’s very clear that Postgres does not make use of indexes
while validating constraints. I suspect it doesn’t involve the query
planner at all.

On Fri, 15 Nov 2024 at 15:28, Torsten Förtsch 
wrote:

> On Fri, Nov 15, 2024 at 9:38 AM Philip Couling  wrote:
>
>> Is there a solid reason why adding a check constraint does not use
>> existing indexes for validation.
>>
>> We are currently looking at partitioning a multi TB table leaving all
>> existing data in place and simply attaching it as a partition to a new
>> table. To prevent locking, we are trying to add an INVALID check constraint
>> first and then validate it.
>>
>> I can trivially prove the invalid constraint is valid with a simple
>> SELECT which will use an existing index and return instantaneously. But
>> AFAIK Theres no way to mark a constraint as valid without scanning all the
>> rows.
>>
>
> Most likely your query is not exactly the same as the check constraint.
> Think about NULL and similar.
>
>
>> This operation is really problematic on a production database with heavy
>> IO load.
>>
>> Is there a solid ready why validating check constraints cannot use
>> existing indexes? If I can prove the constraint is valid so trivially with
>> a SELECT, then why can Postgres not do the same (or similar)?
>>
>
> Here is what has worked for me many times:
>
> 1. create the check constraint as NOT VALID. From now on no new or updated
> row is allowed to violate it.
> 2. check if the constraint holds with a query on a binary replica. Make
> sure the query starts only when the constraint is visible on the replica.
> 3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE
> conname='your_constraint_name' AND conrelid='schema.table'::regclass
>
> How you perform step 2 is irrelevant. Checking it on a replica would
> simply avoid the load on the master. You just need to make sure there is no
> conflicting data in the table.
>
> WARNING, you need to be damn sure of your data if you do that. But if you
> are, it works.
>
> Here is the procedure how I solved the same problem for some of our
> multi-TB tables (PG14):
>
> The table has a column called transaction_time. We wanted to partition by
> that column. For some historical reason the column did not have a NOT NULL
> constraint. However, there was no way our processes could insert NULL in
> that column and there was no row with NULL in that field. So, first was to
> add the NOT NULL constraint:
>
> BEGIN;
>
> ALTER TABLE my.table
> ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT
> NULL) NOT VALID;
>
> UPDATE pg_constraint
>SET convalidated=true
>  WHERE conname = 'transaction_time_not_null'
>AND conrelid = 'my.table'::REGCLASS
> RETURNING conname, conrelid::REGCLASS, convalidated;
>
> COMMIT;
>
> Now for cosmetic purposes we first turn the check constraint above into a
> normal NOT NULL constraint:
>
> BEGIN;
>
> SET LOCAL client_min_messages = 'debug4';
> -- expecting this message
> -- DEBUG:  existing constraints on column "table.transaction_time" are
> sufficient to prove that it does not contain nulls
> ALTER TABLE my.table
>   ALTER COLUMN transaction_time SET NOT NULL;
> RESET client_min_messages;
>
> ALTER TABLE my.table
>   DROP CONSTRAINT transaction_time_not_null;
>
> COMMIT;
>
> If you set client_min_messages to something like debug4, then the database
> tells you if it wants to scan the table or if existing constraints are
> sufficient to prove the condition.
>
> transaction_time in our case is never in the future. Also database
> transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust
> the effect of any action performed more than 30 seconds ago in the database
> is visible.
>
> So, I set the time after which new rows go to the new partition at least
> 10 minutes from now at the next hour boundary. 30 seconds would be good
> enough. I chose 10 minutes just for extra safety.
>
> SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS
> switch_time\gset
>
> Next comes the actual change:
>
> BEGIN;
>
> -- rename the existing table
> ALTER TABLE my.table RENAME TO table_old;
>
> -- drop triggers. We wi

Validating check constraints without a table scan?

2024-11-15 Thread Philip Couling
Is there a solid reason why adding a check constraint does not use existing
indexes for validation.

We are currently looking at partitioning a multi TB table leaving all
existing data in place and simply attaching it as a partition to a new
table. To prevent locking, we are trying to add an INVALID check constraint
first and then validate it.

I can trivially prove the invalid constraint is valid with a simple SELECT
which will use an existing index and return instantaneously. But AFAIK
Theres no way to mark a constraint as valid without scanning all the rows.

This operation is really problematic on a production database with heavy IO
load.

Is there a solid ready why validating check constraints cannot use existing
indexes? If I can prove the constraint is valid so trivially with a SELECT,
then why can Postgres not do the same (or similar)?


Re: Validating check constraints without a table scan?

2024-11-27 Thread Philip Couling
It looks like updating pg_constraint isn't an option for AWS RDS due to the
way AWS doesn't give you superuser access.
Thanks a lot for the suggestion anyway.

On Fri, 15 Nov 2024 at 15:28, Torsten Förtsch 
wrote:

> On Fri, Nov 15, 2024 at 9:38 AM Philip Couling  wrote:
>
>> Is there a solid reason why adding a check constraint does not use
>> existing indexes for validation.
>>
>> We are currently looking at partitioning a multi TB table leaving all
>> existing data in place and simply attaching it as a partition to a new
>> table. To prevent locking, we are trying to add an INVALID check constraint
>> first and then validate it.
>>
>> I can trivially prove the invalid constraint is valid with a simple
>> SELECT which will use an existing index and return instantaneously. But
>> AFAIK Theres no way to mark a constraint as valid without scanning all the
>> rows.
>>
>
> Most likely your query is not exactly the same as the check constraint.
> Think about NULL and similar.
>
>
>> This operation is really problematic on a production database with heavy
>> IO load.
>>
>> Is there a solid ready why validating check constraints cannot use
>> existing indexes? If I can prove the constraint is valid so trivially with
>> a SELECT, then why can Postgres not do the same (or similar)?
>>
>
> Here is what has worked for me many times:
>
> 1. create the check constraint as NOT VALID. From now on no new or updated
> row is allowed to violate it.
> 2. check if the constraint holds with a query on a binary replica. Make
> sure the query starts only when the constraint is visible on the replica.
> 3. on the master: UPDATE pg_constraint SET conisvalidated=true WHERE
> conname='your_constraint_name' AND conrelid='schema.table'::regclass
>
> How you perform step 2 is irrelevant. Checking it on a replica would
> simply avoid the load on the master. You just need to make sure there is no
> conflicting data in the table.
>
> WARNING, you need to be damn sure of your data if you do that. But if you
> are, it works.
>
> Here is the procedure how I solved the same problem for some of our
> multi-TB tables (PG14):
>
> The table has a column called transaction_time. We wanted to partition by
> that column. For some historical reason the column did not have a NOT NULL
> constraint. However, there was no way our processes could insert NULL in
> that column and there was no row with NULL in that field. So, first was to
> add the NOT NULL constraint:
>
> BEGIN;
>
> ALTER TABLE my.table
> ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT
> NULL) NOT VALID;
>
> UPDATE pg_constraint
>SET convalidated=true
>  WHERE conname = 'transaction_time_not_null'
>AND conrelid = 'my.table'::REGCLASS
> RETURNING conname, conrelid::REGCLASS, convalidated;
>
> COMMIT;
>
> Now for cosmetic purposes we first turn the check constraint above into a
> normal NOT NULL constraint:
>
> BEGIN;
>
> SET LOCAL client_min_messages = 'debug4';
> -- expecting this message
> -- DEBUG:  existing constraints on column "table.transaction_time" are
> sufficient to prove that it does not contain nulls
> ALTER TABLE my.table
>   ALTER COLUMN transaction_time SET NOT NULL;
> RESET client_min_messages;
>
> ALTER TABLE my.table
>   DROP CONSTRAINT transaction_time_not_null;
>
> COMMIT;
>
> If you set client_min_messages to something like debug4, then the database
> tells you if it wants to scan the table or if existing constraints are
> sufficient to prove the condition.
>
> transaction_time in our case is never in the future. Also database
> transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust
> the effect of any action performed more than 30 seconds ago in the database
> is visible.
>
> So, I set the time after which new rows go to the new partition at least
> 10 minutes from now at the next hour boundary. 30 seconds would be good
> enough. I chose 10 minutes just for extra safety.
>
> SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS
> switch_time\gset
>
> Next comes the actual change:
>
> BEGIN;
>
> -- rename the existing table
> ALTER TABLE my.table RENAME TO table_old;
>
> -- drop triggers. We will recreate them later.
> DROP TRIGGER ... ON my.table_old;
> DROP TRIGGER ...;
>
> -- create partitioned table
> CREATE TABLE my.table (
> LIKE my.table_old
> INCLUDING DEFAULTS
> INCLUDING CONSTRAINTS
> )
> PARTITION BY RANGE (transaction_time);
>
> -- recreate triggers
> CREATE TRIGG

Re: Validating check constraints without a table scan?

2024-11-21 Thread Philip Couling
Jian He

The context here is constraints for partitioning as suggested in
documentation
https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-MAINTENANCE

An example constraint from the documentation:
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
   CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );

If logdate is indexed, then this constraint can be manually validated very
quickly using a SELECT that will take advantage of the index
SELECT 1 FROM measurement_y2008m02  WHERE logdate < DATE '2008-02-01' OR
logdate >= DATE '2008-03-01' LIMIT 1

If the constraint is valid the query will return quickly with no rows, if
any rows violate the constraint it will also return very quickly but return
with a single row with column value: 1.

I guess that validating constraints doesn't invoke the query planner, or
otherwise the conversion is too complex for the query planner. The
conversion being:

   - from:  NOT (logdate >= DATE '2008-02-01' AND logdate < DATE
   '2008-03-01')
   - to: logdate < DATE '2008-02-01' OR logdate >= DATE '2008-03-01'

Hope that clarifies it.

On Wed, 20 Nov 2024 at 09:45, jian he  wrote:

> On Fri, Nov 15, 2024 at 4:38 PM Philip Couling  wrote:
> >
> > Is there a solid reason why adding a check constraint does not use
> existing indexes for validation.
> >
>
> can you give an sql example (except not-null)
> where indexes can be used for check constraint validation?
> i am not sure I understand it correctly.
>