checking for a NULL date in a partitioned table kills performance (accidentally sent to the admin list before)

2024-08-22 Thread Sbob

All;

I am running a select from a partitioned table. The table (and all the 
partitions) have an index on contract_date like this:
CREATE INDEX on part_tab (contract_date) where contract_date > 
'2022-01-01'::date


The table (including all partitions) has 32million rows
The db server is an aurora postgresql instance with 128GB of ram and 16 
vcpu's


The shared buffers is set to 90GB and effective_cache_size is also 90GB
I set default_statistics_target to 1000 and ram a vacuum analyze on the 
table


I am selecting a number of columns and specifying this where clause:

WHERE (
    (contract_date IS NULL)
    OR
    (contract_date > '2022-01-01'::date)
 )

This takes 15 seconds to run and an explain says it's doing a table scan 
on all partitions (the query is not specifying the partition key)

If I change the where clause to look like this:

WHERE (
  (contract_date > '2022-01-01'::date)
 )

Then it performs index scans on all the partitions and runs in about 600ms

If i leave the where clause off entirely it performs table scans of the 
partitions and takes approx 18 seconds to run


I am trying to get the performance to less than 2sec,
I have tried adding indexes on the table and all partitions like this:
CREATE INDEX ON table (contract_date NULLS FIRST) ;
but the performance with the full where clause is the same:

WHERE (
    (contract_date IS NULL)
    OR
    (contract_date > '2022-01-01'::date)
 )

runs in 15 seconds and scans all partitions

I also tried indexes i=on the table and all partitions like this:
CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;

but I get the same result, table scans on all partitions and it runs in 
15 seconds


Any help or advice ?

Thanks in advance





Re: checking for a NULL date in a partitioned table kills performance (accidentally sent to the admin list before)

2024-08-22 Thread Vitalii Tymchyshyn
Can you put the whole thing into the index where clause?

(contract_date IS NULL)
 OR
(contract_date > '2022-01-01'::date)

Best regards, Vitalii Tymchyshyn

чт, 22 серп. 2024 р. о 14:48 Sbob  пише:

> All;
>
> I am running a select from a partitioned table. The table (and all the
> partitions) have an index on contract_date like this:
> CREATE INDEX on part_tab (contract_date) where contract_date >
> '2022-01-01'::date
>
> The table (including all partitions) has 32million rows
> The db server is an aurora postgresql instance with 128GB of ram and 16
> vcpu's
>
> The shared buffers is set to 90GB and effective_cache_size is also 90GB
> I set default_statistics_target to 1000 and ram a vacuum analyze on the
> table
>
> I am selecting a number of columns and specifying this where clause:
>
> WHERE (
>  (contract_date IS NULL)
>  OR
>  (contract_date > '2022-01-01'::date)
>   )
>
> This takes 15 seconds to run and an explain says it's doing a table scan
> on all partitions (the query is not specifying the partition key)
> If I change the where clause to look like this:
>
> WHERE (
>(contract_date > '2022-01-01'::date)
>   )
>
> Then it performs index scans on all the partitions and runs in about 600ms
>
> If i leave the where clause off entirely it performs table scans of the
> partitions and takes approx 18 seconds to run
>
> I am trying to get the performance to less than 2sec,
> I have tried adding indexes on the table and all partitions like this:
> CREATE INDEX ON table (contract_date NULLS FIRST) ;
> but the performance with the full where clause is the same:
>
> WHERE (
>  (contract_date IS NULL)
>  OR
>  (contract_date > '2022-01-01'::date)
>   )
>
> runs in 15 seconds and scans all partitions
>
> I also tried indexes i=on the table and all partitions like this:
> CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;
>
> but I get the same result, table scans on all partitions and it runs in
> 15 seconds
>
> Any help or advice ?
>
> Thanks in advance
>
>
>
>


Re: checking for a NULL date in a partitioned table kills performance

2024-08-22 Thread Tom Lane
Sbob  writes:
> 29 million of the 32 million rows in the table have NULL for contract_date

[ blink... ]  So your query is selecting at least 29/32nds of the
table, plus however much matches the contract_date > '2022-01-01'
alternative.  I'm not sure how you expect that to be significantly
cheaper than scanning the whole table.

regards, tom lane