Postgres NOT IN vs NOT EXISTS optimization

2022-06-14 Thread Dirschel, Steve
We are in the process of migrating from Oracle to Postgres and the following 
query does much less work with Oracle vs Postgres.

explain (analyze, buffers)
select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as 
changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_,
   favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as 
is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as 
position7_2_,
   favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as 
product_9_2_,
   favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as 
usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
  from cf0.FAVORITE_GROUP favoritegr0_
 where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
   and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
   and (favoritegr0_.FAVORITE_GROUP_SID not in
   (select favoriteen1_.FAVORITE_GROUP_SID
  from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
 cross join cf0.CATEGORY_PAGE categorypa2_
 where 
favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
   and categorypa2_.UNIQUE_NAME='Florida'
   and categorypa2_.IS_DELETED=0
   and favoriteen1_.IS_DELETED=0))
   and favoritegr0_.IS_DELETED=0
   and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null)
   and favoritegr0_.PRISM_GUID='ia7448342012ca23eacf87bb0ed56'
order by favoritegr0_.POSITION desc;

Here is the plan in Postgres.  It did 1426 shared block hits.  If you look at 
this plan it is not pushing filtering into the NOT IN subquery-  it is fully 
resolving that part of the query driving off where UNIQUE_NAME = 'Florida'.


 QUERY PLAN
-
Sort  (cost=5198.22..5198.22 rows=1 width=144) (actual time=6.559..6.560 rows=1 
loops=1)
   Sort Key: favoritegr0_."position" DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=1426
   ->  Index Scan using favorite_group_idx01 on favorite_group favoritegr0_  
(cost=5190.18..5198.21 rows=1 width=144) (actual time=6.514..6.515 rows=1 
loops=1)
 Index Cond: (((prism_guid)::text = 
'ia7448342012ca23eacf87bb0ed56'::text) AND (is_deleted = 0))
 Filter: ((NOT (hashed SubPlan 1)) AND ((usage_type = 0) OR (usage_type 
IS NULL)) AND ('FORMS.WESTLAW'::text = (product_sid)::text) AND 
((product_view)::text = 'DefaultProductView'::text))
 Buffers: shared hit=1423
 SubPlan 1
   ->  Nested Loop  (cost=0.70..5189.90 rows=1 width=33) (actual 
time=6.459..6.459 rows=0 loops=1)
 Buffers: shared hit=1417
 ->  Index Scan using category_page_idx04 on category_page 
categorypa2_  (cost=0.42..5131.71 rows=7 width=33) (actual time=0.035..6.138 
rows=92 loops=1)
   Index Cond: ((unique_name)::text = 'Florida'::text)
   Filter: (is_deleted = 0)
   Buffers: shared hit=1233
 ->  Index Scan using favorite_group_member_idx03 on 
favorite_group_member favoriteen1_  (cost=0.28..8.30 rows=1 width=66) (actual 
time=0.003..0.003 rows=0 loops=92)
   Index Cond: ((category_page_sid)::text = 
(categorypa2_.category_page_sid)::text)
   Filter: (is_deleted = 0)
   Buffers: shared hit=184
Planning Time: 1.624 ms
Execution Time: 6.697 ms

If I compare that to the plan Oracle uses it pushes the 
favoritegr0_.FAVORITE_GROUP_SID predicate into the NOT IN.  I'm able to get a 
similar plan with Postgres if I change the NOT IN to a NOT EXISTS:

explain (analyze, buffers)
select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_, favoritegr0_.CHANGED as 
changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as type_dis3_2_,
   favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as 
is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION as 
position7_2_,
   favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID as 
product_9_2_,
   favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE as 
usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
  from cf0.FAVORITE_GROUP favoritegr0_
 where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
   and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
   and not exists (
   select 'x'
  from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
 cross join cf0.CATEGORY_PAGE categorypa2_
 where favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
   and categorypa2_.UNIQUE_NAME='Florida'
   and categorypa2_.IS_DELETED=0
   and favoriteen1_.IS_DELETED=0
   and favoritegr0_

Re: Postgres NOT IN vs NOT EXISTS optimization

2022-06-14 Thread Jeremy Smith
I think this explains the situation well:

https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

On Tue, Jun 14, 2022 at 11:59 AM Dirschel, Steve <
steve.dirsc...@thomsonreuters.com> wrote:

> We are in the process of migrating from Oracle to Postgres and the
> following query does much less work with Oracle vs Postgres.
>
>
>
> explain (analyze, buffers)
>
> select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_,
> favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as
> type_dis3_2_,
>
>favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as
> is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION
> as position7_2_,
>
>favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID
> as product_9_2_,
>
>favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE
> as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
>
>   from cf0.FAVORITE_GROUP favoritegr0_
>
>  where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
>
>and favoritegr0_.PRODUCT_VIEW in ('DefaultProductView')
>
>and (favoritegr0_.FAVORITE_GROUP_SID not in
>
>(select favoriteen1_.FAVORITE_GROUP_SID
>
>   from cf0.FAVORITE_GROUP_MEMBER favoriteen1_
>
>  cross join cf0.CATEGORY_PAGE categorypa2_
>
>  where
> favoriteen1_.CATEGORY_PAGE_SID=categorypa2_.CATEGORY_PAGE_SID
>
>and categorypa2_.UNIQUE_NAME='Florida'
>
>and categorypa2_.IS_DELETED=0
>
>and favoriteen1_.IS_DELETED=0))
>
>and favoritegr0_.IS_DELETED=0
>
>and (favoritegr0_.USAGE_TYPE=0 or favoritegr0_.USAGE_TYPE is null)
>
>and favoritegr0_.PRISM_GUID='ia7448342012ca23eacf87bb0ed56'
>
> order by favoritegr0_.POSITION desc;
>
>
>
> Here is the plan in Postgres.  It did 1426 shared block hits.  If you look
> at this plan it is not pushing filtering into the NOT IN subquery-  it is
> fully resolving that part of the query driving off where UNIQUE_NAME =
> 'Florida'.
>
>
>
>
> QUERY PLAN
>
>
> -
>
> Sort  (cost=5198.22..5198.22 rows=1 width=144) (actual time=6.559..6.560
> rows=1 loops=1)
>
>Sort Key: favoritegr0_."position" DESC
>
>Sort Method: quicksort  Memory: 25kB
>
>Buffers: shared hit=1426
>
>->  Index Scan using favorite_group_idx01 on favorite_group
> favoritegr0_  (cost=5190.18..5198.21 rows=1 width=144) (actual
> time=6.514..6.515 rows=1 loops=1)
>
>  Index Cond: (((prism_guid)::text =
> 'ia7448342012ca23eacf87bb0ed56'::text) AND (is_deleted = 0))
>
>  Filter: ((NOT (hashed SubPlan 1)) AND ((usage_type = 0) OR
> (usage_type IS NULL)) AND ('FORMS.WESTLAW'::text = (product_sid)::text) AND
> ((product_view)::text = 'DefaultProductView'::text))
>
>  Buffers: shared hit=1423
>
>  SubPlan 1
>
>->  Nested Loop  (cost=0.70..5189.90 rows=1 width=33) (actual
> time=6.459..6.459 rows=0 loops=1)
>
>  Buffers: shared hit=1417
>
>  ->  Index Scan using category_page_idx04 on category_page
> categorypa2_  (cost=0.42..5131.71 rows=7 width=33) (actual
> time=0.035..6.138 rows=92 loops=1)
>
>Index Cond: ((unique_name)::text = 'Florida'::text)
>
>Filter: (is_deleted = 0)
>
>Buffers: shared hit=1233
>
>  ->  Index Scan using favorite_group_member_idx03 on
> favorite_group_member favoriteen1_  (cost=0.28..8.30 rows=1 width=66)
> (actual time=0.003..0.003 rows=0 loops=92)
>
>Index Cond: ((category_page_sid)::text =
> (categorypa2_.category_page_sid)::text)
>
>Filter: (is_deleted = 0)
>
>Buffers: shared hit=184
>
> Planning Time: 1.624 ms
>
> Execution Time: 6.697 ms
>
>
>
> If I compare that to the plan Oracle uses it pushes the
> favoritegr0_.FAVORITE_GROUP_SID predicate into the NOT IN.  I'm able to get
> a similar plan with Postgres if I change the NOT IN to a NOT EXISTS:
>
>
>
> explain (analyze, buffers)
>
> select favoritegr0_.FAVORITE_GROUP_SID as favorite1_2_,
> favoritegr0_.CHANGED as changed2_2_, favoritegr0_.TYPE_DISCRIMINATOR as
> type_dis3_2_,
>
>favoritegr0_.GROUP_NAME as group_na4_2_, favoritegr0_.IS_DELETED as
> is_delet5_2_, favoritegr0_.LAST_USED as last_use6_2_, favoritegr0_.POSITION
> as position7_2_,
>
>favoritegr0_.PRISM_GUID as prism_gu8_2_, favoritegr0_.PRODUCT_SID
> as product_9_2_,
>
>favoritegr0_.PRODUCT_VIEW as product10_2_, favoritegr0_.USAGE_TYPE
> as usage_t11_2_, favoritegr0_.ROW_VERSION as row_ver12_2_
>
>   from cf0.FAVORITE_GROUP favoritegr0_
>
>  where 'FORMS.WESTLAW' = favoritegr0_.PRODUCT_SID
>
>and favoritegr0_.PRODUCT_VIEW in ('Def

Re: Postgres NOT IN vs NOT EXISTS optimization

2022-06-14 Thread Tom Lane
"Dirschel, Steve"  writes:
> Is Postgres able to drive the query the same way with the NOT IN as the
> NOT EXISTS is doing or is that only available if the query has a NOT
> EXISTS?

NOT IN is not optimized very well in PG, because of the strange
semantics that the SQL spec demands when the sub-query produces any
null values.  There's been some interest in detecting cases where
we can prove that the subquery produces no nulls and then optimizing
it into NOT EXISTS, but it seems like a lot of work for not-great
return, so nothing's happened (yet).  Perhaps Oracle does something
like that already, or perhaps they're just ignoring the semantics
problem; they do not have a reputation for hewing closely to the
spec on behavior regarding nulls.

regards, tom lane




Recent 11.16 release change

2022-06-14 Thread Daniel Brinzila
I am a bit confused as to the following change:


   -

   Stop using query-provided column aliases for the columns of whole-row
   variables that refer to plain tables (Tom Lane)

   The column names in tuples produced by a whole-row variable (such as
   tbl.* in contexts other than the top level of a SELECT list) are now
   always those of the associated named composite type, if there is one. We'd
   previously attempted to make them track any column aliases that had been
   applied to the FROM entry the variable refers to. But that's
   semantically dubious, because really then the output of the variable is not
   at all of the composite type it claims to be. Previous attempts to deal
   with that inconsistency had bad results up to and including storing
   unreadable data on disk, so just give up on the whole idea.

   In cases where it's important to be able to relabel such columns, a
   workaround is to introduce an extra level of sub-SELECT, so that the
   whole-row variable is referring to the sub-SELECT's output and not to a
   plain table. Then the variable is of type record to begin with and
   there's no issue.


Could someone please give an example of this scenario, one that works in
11.15 and another for 11.16 after the recent change.

Thanks!


Re: Recent 11.16 release change

2022-06-14 Thread Tom Lane
Daniel Brinzila  writes:
> I am a bit confused as to the following change:
>Stop using query-provided column aliases for the columns of whole-row
>variables that refer to plain tables (Tom Lane)

> Could someone please give an example of this scenario, one that works in
> 11.15 and another for 11.16 after the recent change.

Here's the regression test example that changed behavior in that commit:

regression=# select row_to_json(i) from int8_tbl i(x,y);
  row_to_json   

 {"q1":123,"q2":456}
 {"q1":123,"q2":4567890123456789}
 {"q1":4567890123456789,"q2":123}
 {"q1":4567890123456789,"q2":4567890123456789}
 {"q1":4567890123456789,"q2":-4567890123456789}
(5 rows)

The fields of the JSON output used to be labeled "x" and "y", after
the column aliases of the FROM item.  But now that doesn't work and
you get the table's original column names (which happen to be "q1"
and "q2" in this test case).

The workaround proposed in the release note is to do this if you
need to relabel the columns of the whole-row variable "i":

regression=# select row_to_json(i) from (select * from int8_tbl) i(x,y);
 row_to_json  
--
 {"x":123,"y":456}
 {"x":123,"y":4567890123456789}
 {"x":4567890123456789,"y":123}
 {"x":4567890123456789,"y":4567890123456789}
 {"x":4567890123456789,"y":-4567890123456789}
(5 rows)

With the extra sub-select, "i" is no longer of the named composite
type associated with int8_tbl, but of an anonymous record type,
so it can have the column names you want.

regards, tom lane




Re: cast to domain with default collation issue.

2022-06-14 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, May 24, 2022 at 7:42 AM Tom Lane  wrote:
>> Perhaps we could improve matters like this?

>> -specified, the underlying data type's default collation is used.
>> +specified, the domain has the same collation behavior as its
>> +underlying data type.

> +1

Pushed then, thanks for thinking about it.

regards, tom lane