Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Benjamin Tingle
Greetings Postgres Developers,

I've recently started taking advantage of the PARTITION BY HASH feature for
my database system. It's a really great fit since my tables can get quite
large (900M+ rows for some) and splitting them up into manageable chunks
should let me upload to them without having to update an enormous index
every time. What's more, since each partition has a write lock independent
of the parent table, it should theoretically be possible to perform a
parallelized insert operation, provided the data to be added is partitioned
beforehand.

What has been disappointing is that the query planner doesn't seem to
recognize this potential. For example, if I have a large list of input
data, and I want to perform a select operation across the target table:

  -- target table is hashed on 'textfield' & has a unique index on
'textfield'
  select * from temp_data td left join target tg on td.textfield =
tg.textfield;

I would expect to get a query plan like this:

  partition temp_data
  parallel scan on
target_p0 using target_p0_textfield_uniq_idx against temp_data_p0
target_p1 using target_p1_textfield_uniq_idx against temp_data_p1
target_p2 using target_p2_textfield_uniq_idx against temp_data_p2
...

Instead, I get a seemingly terrible plan like this:

  hash temp_data
  sequential scan on
target_p0 against temp_data
target_p1 against temp_data
target_p2 against temp_data
...

It doesn't even make use of the index on the textfield! Instead, it opts to
hash all of temp_data and perform a sequential scan against it.

It doesn't help if I partition temp_data by textfield beforehand either
(using the same scheme as the target table). It still opts to concatenate
all of temp_data, hash it, then perform a sequential scan against the
target partitions.

On insert the behaviour is better but it still opts for a sequential insert
instead of a parallel one.

Does the query planner know something I don't? It's my intuition that it
should be faster to do a rough counting sort (partition by hash) first, and
then do N smaller more accurate sorts in parallel afterwards.

Currently I am creating a custom script(s) to emulate my desired behaviour,
but it would be nice if there was a way to get the query planner to do this
automatically. Any tricks to do this would be much appreciated!

-Ben


RE: Query Tunning related to function

2022-04-17 Thread Kumar, Mukesh
Hi Babu ,

Please find below the script for the function from Oracle

Hi babu ,

Please find attached the script for function from Oracle .

Please revert in case of any query.

Thanks and Regards,
Mukesh Kumar

From: Bhupendra Babu 
Sent: Friday, April 15, 2022 3:44 AM
To: Kumar, Mukesh 
Cc: Michel SALAIS ; Ranier Vilela ; 
postgres performance list ; MUKESH KUMAR 
; [email protected]
Subject: Re: Query Tunning related to function

Can you paste from oracle for

Set lines 1
Select text from dba_source
Where name =
UPPER('translate_payment_status')
And owner = 'IMS_APP'

Thanks.


On Thu, Apr 14, 2022, 12:07 PM Kumar, Mukesh 
mailto:[email protected]>> wrote:
Hi Michael ,


We tried dropping the below values from the function, but it did not help.

Also, the values PAID and MANUALLY PAID constitutes about 60 % of the values in 
table ,  and infact we tried creating the partial index and it did not help.

The Strange thing is that we are trying to run this in oracle as we have done 
the migration recently and it is running in less than second with same indexes 
and other database objects . I can understand that comparing to oracle is 
stupidity, but this is only thing where we can compare.

Below is the query we are running on oracle and comparing in postgres

Below is the query and plan for same

https://explain.depesz.com/s/wktl#stats

Any help would be appreciated.



Thanks and Regards,
Mukesh Kumar

From: Michel SALAIS mailto:[email protected]>>
Sent: Thursday, April 14, 2022 11:45 PM
To: Kumar, Mukesh mailto:[email protected]>>; 
'Ranier Vilela' mailto:[email protected]>>
Cc: [email protected]; 
'MUKESH KUMAR' mailto:[email protected]>>
Subject: RE: Query Tunning related to function

Hi,

This part of the function is odd and must be dropped:
 IF (ret_status = payment_rec)
 THEN
  ret_status := payment_rec;

I didn’t look really the function code and stopped on the view referenced by 
the cursor.
The view (we know it just by its name) used in the function is a black box for 
us. Perhaps it is important to begin optimization there!
If values 'PAID' and 'MANUALLYPAID' are an important percentage of table rows 
forcing index use is not a good thing especially when it is done with a 
non-optimized function.

If rows with values 'PAID' and 'MANUALLYPAID'  constitute a little percentage 
of the table, then the partial index plus rewriting the query would be much 
more efficient
Select
  payment_sid_c,
 lms_app.translate_payment_status(payment_sid_c) as paymentstatus
from
  lms_app.lms_payment_check_request
where
 lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')
group by
  payment_sid_c

If not, you can gain some performance if you rewrite your query to be like this:

Select
  payment_sid_c,
 lms_app.translate_payment_status(payment_sid_c) as paymentstatus
from
  lms_app.lms_payment_check_request
group by
  payment_sid_c
having
 lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')

And you can also try to write the query like this:

Select t.payment_sid_c, lms_app.translate_payment_status(t.payment_sid_c)
From
(
  Select
payment_sid_c
  from
   lms_app.lms_payment_check_request
  group by
payment_sid_c
  having
lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')
) t

Regards

Michel SALAIS
De : Kumar, Mukesh mailto:[email protected]>>
Envoyé : jeudi 14 avril 2022 16:45
À : Ranier Vilela mailto:[email protected]>>
Cc : [email protected]; 
MUKESH KUMAR mailto:[email protected]>>
Objet : RE: Query Tunning related to function

Hi Rainer ,

We tried to create the partial ‘index on table but it did not help, and it is 
taking approx. 7 sec now.

Also we tried to force the query to use the index by enabling the parameter at 
session level

set enable_seqscan=false;

and it is still taking the time below is the explain plan for the same

https://explain.depesz.com/s/YRWIW#stats

Also we running the query which is actually used in application and above query 
is used in below query. Below is the explain plan for same.


https://explain.depesz.com/s/wktl#stats

Please assist


Thanks and Regards,
Mukesh Kuma

From: Ranier Vilela mailto:[email protected]>>
Sent: Thursday, April 14, 2022 7:56 PM
To: Kumar, Mukesh mailto:mku...@peabodyen

RE: Query Tunning related to function

2022-04-17 Thread Kumar, Mukesh
Hi All ,

We request you to please provide some assistance on below issue and it is 
impacting the migration project.

Thanks and Regards,
Mukesh Kumar

From: Kumar, Mukesh
Sent: Friday, April 15, 2022 11:43 AM
To: Bhupendra Babu 
Cc: Michel SALAIS ; Ranier Vilela ; 
postgres performance list ; MUKESH KUMAR 
; [email protected]
Subject: RE: Query Tunning related to function

Hi Babu ,

Please find below the script for the function from Oracle

Hi babu ,

Please find attached the script for function from Oracle .

Please revert in case of any query.

Thanks and Regards,
Mukesh Kumar

From: Bhupendra Babu mailto:[email protected]>>
Sent: Friday, April 15, 2022 3:44 AM
To: Kumar, Mukesh mailto:[email protected]>>
Cc: Michel SALAIS mailto:[email protected]>>; Ranier Vilela 
mailto:[email protected]>>; postgres performance list 
mailto:[email protected]>>; 
MUKESH KUMAR mailto:[email protected]>>; 
[email protected]
Subject: Re: Query Tunning related to function

Can you paste from oracle for

Set lines 1
Select text from dba_source
Where name =
UPPER('translate_payment_status')
And owner = 'IMS_APP'

Thanks.


On Thu, Apr 14, 2022, 12:07 PM Kumar, Mukesh 
mailto:[email protected]>> wrote:
Hi Michael ,


We tried dropping the below values from the function, but it did not help.

Also, the values PAID and MANUALLY PAID constitutes about 60 % of the values in 
table ,  and infact we tried creating the partial index and it did not help.

The Strange thing is that we are trying to run this in oracle as we have done 
the migration recently and it is running in less than second with same indexes 
and other database objects . I can understand that comparing to oracle is 
stupidity, but this is only thing where we can compare.

Below is the query we are running on oracle and comparing in postgres

Below is the query and plan for same

https://explain.depesz.com/s/wktl#stats

Any help would be appreciated.



Thanks and Regards,
Mukesh Kumar

From: Michel SALAIS mailto:[email protected]>>
Sent: Thursday, April 14, 2022 11:45 PM
To: Kumar, Mukesh mailto:[email protected]>>; 
'Ranier Vilela' mailto:[email protected]>>
Cc: [email protected]; 
'MUKESH KUMAR' mailto:[email protected]>>
Subject: RE: Query Tunning related to function

Hi,

This part of the function is odd and must be dropped:
 IF (ret_status = payment_rec)
 THEN
  ret_status := payment_rec;

I didn’t look really the function code and stopped on the view referenced by 
the cursor.
The view (we know it just by its name) used in the function is a black box for 
us. Perhaps it is important to begin optimization there!
If values 'PAID' and 'MANUALLYPAID' are an important percentage of table rows 
forcing index use is not a good thing especially when it is done with a 
non-optimized function.

If rows with values 'PAID' and 'MANUALLYPAID'  constitute a little percentage 
of the table, then the partial index plus rewriting the query would be much 
more efficient
Select
  payment_sid_c,
 lms_app.translate_payment_status(payment_sid_c) as paymentstatus
from
  lms_app.lms_payment_check_request
where
 lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')
group by
  payment_sid_c

If not, you can gain some performance if you rewrite your query to be like this:

Select
  payment_sid_c,
 lms_app.translate_payment_status(payment_sid_c) as paymentstatus
from
  lms_app.lms_payment_check_request
group by
  payment_sid_c
having
 lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')

And you can also try to write the query like this:

Select t.payment_sid_c, lms_app.translate_payment_status(t.payment_sid_c)
From
(
  Select
payment_sid_c
  from
   lms_app.lms_payment_check_request
  group by
payment_sid_c
  having
lms_app.translate_payment_status(payment_sid_c) IN ('PAID', 'MANUALLYPAID')
) t

Regards

Michel SALAIS
De : Kumar, Mukesh mailto:[email protected]>>
Envoyé : jeudi 14 avril 2022 16:45
À : Ranier Vilela mailto:[email protected]>>
Cc : [email protected]; 
MUKESH KUMAR mailto:[email protected]>>
Objet : RE: Query Tunning related to function

Hi Rainer ,

We tried to create the partial ‘index on table but it did not help, and it is 
taking approx. 7 sec now.

Also we tried to force the query to use the index by enabling the parameter at 
session level

set enable_seqscan=false;

and it is still taking the time below is the explain plan for the same

https://explain.depesz.com/s/YRWIW#stats

Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Tom Lane
Benjamin Tingle  writes:
> I've recently started taking advantage of the PARTITION BY HASH feature for
> my database system. It's a really great fit since my tables can get quite
> large (900M+ rows for some) and splitting them up into manageable chunks
> should let me upload to them without having to update an enormous index
> every time. What's more, since each partition has a write lock independent
> of the parent table, it should theoretically be possible to perform a
> parallelized insert operation, provided the data to be added is partitioned
> beforehand.

> What has been disappointing is that the query planner doesn't seem to
> recognize this potential.

That's because there isn't any.  The hash partitioning rule has
basically nothing to do with any plausible WHERE condition.  If you're
hoping to see partition pruning happen, you need to be using list or
range partitions, with operators compatible with your likely WHERE
conditions.

(I'm of the opinion that the hash partitioning option is more in the
category of a dangerous nuisance than a useful feature.  There are some
around here who will argue otherwise, but they're wrong for exactly the
reason that it's impossible to prune hash partitions.)

regards, tom lane




Re: Query Tunning related to function

2022-04-17 Thread David G. Johnston
On Sun, Apr 17, 2022 at 8:53 AM Kumar, Mukesh 
wrote:

> We request you to please provide some assistance on below issue and it is
> impacting the migration project.
>

I suggest you try and re-write the loop-based function into a set-oriented
view.

Specifically, I think doing: "array_agg(DISTINCT paymenttype)" and then
checking for various array results will be considerably more efficient.

Or do a combination: write the set-oriented query in an SQL function.  You
should not need pl/pgsql for this and avoiding it should improve
performance.

David J.

p.s., The convention on these lists is to inline post and remove unneeded
context.  Or at least bottom post.


Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Benjamin Tingle
Interesting. Why is it impossible to prune hash partitions? Maybe prune
isn’t the best word, more so use to advantage. At the very least, it should
be possible to utilize a parallel insert against a table partitioned by
hash. (Partition query rows, then distribute these rows to parallel workers)

On Sun, Apr 17, 2022 at 9:09 AM Tom Lane  wrote:

> Benjamin Tingle  writes:
> > I've recently started taking advantage of the PARTITION BY HASH feature
> for
> > my database system. It's a really great fit since my tables can get quite
> > large (900M+ rows for some) and splitting them up into manageable chunks
> > should let me upload to them without having to update an enormous index
> > every time. What's more, since each partition has a write lock
> independent
> > of the parent table, it should theoretically be possible to perform a
> > parallelized insert operation, provided the data to be added is
> partitioned
> > beforehand.
>
> > What has been disappointing is that the query planner doesn't seem to
> > recognize this potential.
>
> That's because there isn't any.  The hash partitioning rule has
> basically nothing to do with any plausible WHERE condition.  If you're
> hoping to see partition pruning happen, you need to be using list or
> range partitions, with operators compatible with your likely WHERE
> conditions.
>
> (I'm of the opinion that the hash partitioning option is more in the
> category of a dangerous nuisance than a useful feature.  There are some
> around here who will argue otherwise, but they're wrong for exactly the
> reason that it's impossible to prune hash partitions.)
>
> regards, tom lane
>
-- 

Ben(t).


Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Tom Lane
Benjamin Tingle  writes:
> Interesting. Why is it impossible to prune hash partitions? Maybe prune
> isn’t the best word, more so use to advantage. At the very least, it should
> be possible to utilize a parallel insert against a table partitioned by
> hash. (Partition query rows, then distribute these rows to parallel workers)

Your plan-shape complaint had nothing to do with insertions; it had
to do with joining the partitioned table to another table.  That
join can't be optimized.

regards, tom lane




Re: Query Planner not taking advantage of HASH PARTITION

2022-04-17 Thread Alvaro Herrera
On 2022-Apr-14, Benjamin Tingle wrote:

> It doesn't help if I partition temp_data by textfield beforehand either
> (using the same scheme as the target table). It still opts to concatenate
> all of temp_data, hash it, then perform a sequential scan against the
> target partitions.

Does it still do that if you set
  SET enable_partitionwise_join TO on;
?  If the partition strategies are identical, that might get you a
better plan.  (Actually, in pg13 and upwards the strategies don't need
to be exactly identical, just "compatible".)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)