date:20230613

extended statistics n-distinct on multiple columns not used when join two tables

2023-06-13 Thread James Pang (chaolpan)

Hi,
 When join two table on multiple columns equaljoin, rows estimation always 
use selectivity = multiplied by distinct multiple individual columns, possible 
to use  extended n-distinct statistics on multiple columns?
PG v14.8-1, attached please check test case with details.

Thanks,

James

testdb=> \d test1
  Table "csdigital.test1"
 Column  |  Type  | Collation | Nullable | Default
-++---+--+-
 account_key | character varying(36)  |   | not null |
 subscription_id | character varying(36)  |   | not null |
 site_key| character varying(36)  |   | not null |
 site_id | numeric(38,0)  |   |  |
 site_name   | character varying(128) |   |  |
Indexes:
"test1_pk" PRIMARY KEY, btree (account_key, subscription_id, site_key)

testdb=> \d test2
  Table "csdigital.test2"
 Column  |  Type  | Collation | Nullable | Default
-++---+--+-
 account_key | character varying(36)  |   | not null |
 subscription_id | character varying(36)  |   | not null |
 offer_code  | character varying(128) |   | not null |
 offer_name  | character varying(128) |   |  |
 offer_type  | character varying(128) |   |  |
Indexes:
"test2_pk" PRIMARY KEY, btree (account_key, subscription_id, offer_code)

testdb=>  explain select count(*) from test1 t1,test2 t2 where 
t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id;
  QUERY PLAN
---
 Aggregate  (cost=629339.31..629339.32 rows=1 width=8)
   ->  Merge Join  (cost=1.99..629339.26 rows=22 width=0) 
 Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND 
((t1.subscription_id)::text = (t2.subscription_id)::text))
 ->  Index Only Scan using test1_pk on test1 t1  (cost=0.56..263134.00 
rows=4906772 width=61)
 ->  Index Only Scan using test2_pk on test2 t2  (cost=0.56..309937.54 
rows=6346934 width=59)
(5 rows)

Time: 8.018 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, 
tablename, attname, inherited, null_frac, n_distinct, 
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) 
n_hist, correlation
testdb->  FROM pg_stats WHERE attname in ('account_key','subscription_id') AND 
tablename='test1' ORDER BY 1 DESC;
   frac_mcv| tablename | attname | inherited | null_frac | 
n_distinct | n_mcv | n_hist | correlation
---+---+-+---+---++---++--
 0.0002667 | test1 | account_key | f | 0 | 
249474 | 1 |101 | -0.008081022
 0.0002667 | test1 | subscription_id | f | 0 | 
-0.6047012 | 1 |101 |   0.26008433
(2 rows)

Time: 7.760 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, 
tablename, attname, inherited, null_frac, n_distinct, 
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) 
n_hist, correlation
testdb->  FROM pg_stats WHERE attname in ('account_key','subscription_id') AND 
tablename='test2' ORDER BY 1 DESC;
 frac_mcv | tablename | attname | inherited | null_frac | n_distinct  | 
n_mcv | n_hist | correlation
--+---+-+---+---+-+---++-
  | test2 | account_key | f | 0 |  325385 | 
  |101 | 0.006174604
  | test2 | subscription_id | f | 0 | -0.68929785 | 
  |101 | 0.123789854
(2 rows)

Time: 1.423 ms
testdb=> select relname,reltuples from pg_class where relname in 
('test1','test2');
 relname |  reltuples
-+--
 test1   | 4.906772e+06
 test2   | 6.346934e+06
(2 rows)

Time: 0.746 ms
testdb=> create statistics 
cs_contract_site_map_account_key_subscription_id(dependencies, ndistinct) on 
account_key, subscription_id from test1;
CREATE STATISTICS
Time: 7.918 ms
testdb=> create statistics 
cs_contract_account_key_subscription_id(dependencies, ndistinct) on 
account_key, subscription_id from test2;
CREATE STATISTICS
Time: 1.558 ms
testdb=> analyze test1;
aANALYZE
Time: 745.952 ms
testdb=> analyze test2;
ANALYZE
Time: 606.735 ms
testdb=> select 
st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies
testdb-> from pg_statistic_ext st join pg_statistic_ext_data std on 
st.oid=std.stxoid;
ERROR:  permission denied for table pg_statistic_ext_data
Time: 2.464 ms

testdb=# select 
st.stxr

Re: extended statistics n-distinct on multiple columns not used when join two tables

2023-06-13 Thread Pavel Stehule

Hi

út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) 
napsal:

> Hi,
>
>  When join two table on multiple columns equaljoin, rows estimation
> always use selectivity = multiplied by distinct multiple individual
> columns, possible to use  extended n-distinct statistics on multiple
> columns?
>
> PG v14.8-1, attached please check test case with details.
>

There is not any support for multi tables statistic

Regards

Pavel


>
>
> Thanks,
>
>
>
> James
>
>
>

Re: Postgresql equal join on function with columns not use index

2023-06-13 Thread Tom Lane

"James Pang (chaolpan)"  writes:
> Looks like it's the function "regexp_replace" volatile and restrict=false 
> make the difference,  we have our application role with default 
> search_path=oracle,$user,public,pg_catalog.
>  =#select 
> oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from 
> pg_proc where proname='regexp_replace' order by oid;
>   oid  |proname | pronamespace | prosecdef | proisstrict | provolatile
> ---++--+---+-+-
>   2284 | regexp_replace | pg_catalog   | f | t   | i
>   2285 | regexp_replace | pg_catalog   | f | t   | i
>  17095 | regexp_replace | oracle   | f | f   | v 
>  17096 | regexp_replace | oracle   | f | f   | v
>  17097 | regexp_replace | oracle   | f | f   | v
>  17098 | regexp_replace | oracle   | f | f   | v

Why in the world are the oracle ones marked volatile?  That's what's
preventing them from being used in index quals.

regards, tom lane

Re: Postgresql equal join on function with columns not use index

2023-06-13 Thread Pavel Stehule

út 13. 6. 2023 v 15:50 odesílatel Tom Lane  napsal:

> "James Pang (chaolpan)"  writes:
> > Looks like it's the function "regexp_replace" volatile and
> restrict=false make the difference,  we have our application role with
> default search_path=oracle,$user,public,pg_catalog.
> >  =#select
> oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile
> from pg_proc where proname='regexp_replace' order by oid;
> >   oid  |proname | pronamespace | prosecdef | proisstrict |
> provolatile
> >
> ---++--+---+-+-
> >   2284 | regexp_replace | pg_catalog   | f | t   | i
> >   2285 | regexp_replace | pg_catalog   | f | t   | i
> >  17095 | regexp_replace | oracle   | f | f   | v
> >  17096 | regexp_replace | oracle   | f | f   | v
> >  17097 | regexp_replace | oracle   | f | f   | v
> >  17098 | regexp_replace | oracle   | f | f   | v
>
> Why in the world are the oracle ones marked volatile?  That's what's
> preventing them from being used in index quals.
>

It looks like orafce issue

I'll fix it

Regards

Pavel


>
> regards, tom lane
>
>
>

Re: Postgresql equal join on function with columns not use index

2023-06-13 Thread Pavel Stehule

út 13. 6. 2023 v 16:17 odesílatel Pavel Stehule 
napsal:

>
>
> út 13. 6. 2023 v 15:50 odesílatel Tom Lane  napsal:
>
>> "James Pang (chaolpan)"  writes:
>> > Looks like it's the function "regexp_replace" volatile and
>> restrict=false make the difference,  we have our application role with
>> default search_path=oracle,$user,public,pg_catalog.
>> >  =#select
>> oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile
>> from pg_proc where proname='regexp_replace' order by oid;
>> >   oid  |proname | pronamespace | prosecdef | proisstrict |
>> provolatile
>> >
>> ---++--+---+-+-
>> >   2284 | regexp_replace | pg_catalog   | f | t   | i
>> >   2285 | regexp_replace | pg_catalog   | f | t   | i
>> >  17095 | regexp_replace | oracle   | f | f   | v
>> >  17096 | regexp_replace | oracle   | f | f   | v
>> >  17097 | regexp_replace | oracle   | f | f   | v
>> >  17098 | regexp_replace | oracle   | f | f   | v
>>
>> Why in the world are the oracle ones marked volatile?  That's what's
>> preventing them from being used in index quals.
>>
>
> It looks like orafce issue
>
> I'll fix it
>

should be fixed in orafce 4.4.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>>
>> regards, tom lane
>>
>>
>>

extended statistics n-distinct on multiple columns not used when join two tables

Re: extended statistics n-distinct on multiple columns not used when join two tables

Re: Postgresql equal join on function with columns not use index

Re: Postgresql equal join on function with columns not use index

Re: Postgresql equal join on function with columns not use index

5 matches

Site Navigation

Mail list logo

Footer information