extended statistics n-distinct on multiple columns not used when join two tables
Hi,
When join two table on multiple columns equaljoin, rows estimation always
use selectivity = multiplied by distinct multiple individual columns, possible
to use extended n-distinct statistics on multiple columns?
PG v14.8-1, attached please check test case with details.
Thanks,
James
testdb=> \d test1
Table "csdigital.test1"
Column | Type | Collation | Nullable | Default
-++---+--+-
account_key | character varying(36) | | not null |
subscription_id | character varying(36) | | not null |
site_key| character varying(36) | | not null |
site_id | numeric(38,0) | | |
site_name | character varying(128) | | |
Indexes:
"test1_pk" PRIMARY KEY, btree (account_key, subscription_id, site_key)
testdb=> \d test2
Table "csdigital.test2"
Column | Type | Collation | Nullable | Default
-++---+--+-
account_key | character varying(36) | | not null |
subscription_id | character varying(36) | | not null |
offer_code | character varying(128) | | not null |
offer_name | character varying(128) | | |
offer_type | character varying(128) | | |
Indexes:
"test2_pk" PRIMARY KEY, btree (account_key, subscription_id, offer_code)
testdb=> explain select count(*) from test1 t1,test2 t2 where
t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id;
QUERY PLAN
---
Aggregate (cost=629339.31..629339.32 rows=1 width=8)
-> Merge Join (cost=1.99..629339.26 rows=22 width=0)
Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND
((t1.subscription_id)::text = (t2.subscription_id)::text))
-> Index Only Scan using test1_pk on test1 t1 (cost=0.56..263134.00
rows=4906772 width=61)
-> Index Only Scan using test2_pk on test2 t2 (cost=0.56..309937.54
rows=6346934 width=59)
(5 rows)
Time: 8.018 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV,
tablename, attname, inherited, null_frac, n_distinct,
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1)
n_hist, correlation
testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND
tablename='test1' ORDER BY 1 DESC;
frac_mcv| tablename | attname | inherited | null_frac |
n_distinct | n_mcv | n_hist | correlation
---+---+-+---+---++---++--
0.0002667 | test1 | account_key | f | 0 |
249474 | 1 |101 | -0.008081022
0.0002667 | test1 | subscription_id | f | 0 |
-0.6047012 | 1 |101 | 0.26008433
(2 rows)
Time: 7.760 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV,
tablename, attname, inherited, null_frac, n_distinct,
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1)
n_hist, correlation
testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND
tablename='test2' ORDER BY 1 DESC;
frac_mcv | tablename | attname | inherited | null_frac | n_distinct |
n_mcv | n_hist | correlation
--+---+-+---+---+-+---++-
| test2 | account_key | f | 0 | 325385 |
|101 | 0.006174604
| test2 | subscription_id | f | 0 | -0.68929785 |
|101 | 0.123789854
(2 rows)
Time: 1.423 ms
testdb=> select relname,reltuples from pg_class where relname in
('test1','test2');
relname | reltuples
-+--
test1 | 4.906772e+06
test2 | 6.346934e+06
(2 rows)
Time: 0.746 ms
testdb=> create statistics
cs_contract_site_map_account_key_subscription_id(dependencies, ndistinct) on
account_key, subscription_id from test1;
CREATE STATISTICS
Time: 7.918 ms
testdb=> create statistics
cs_contract_account_key_subscription_id(dependencies, ndistinct) on
account_key, subscription_id from test2;
CREATE STATISTICS
Time: 1.558 ms
testdb=> analyze test1;
aANALYZE
Time: 745.952 ms
testdb=> analyze test2;
ANALYZE
Time: 606.735 ms
testdb=> select
st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies
testdb-> from pg_statistic_ext st join pg_statistic_ext_data std on
st.oid=std.stxoid;
ERROR: permission denied for table pg_statistic_ext_data
Time: 2.464 ms
testdb=# select
st.stxr
Re: extended statistics n-distinct on multiple columns not used when join two tables
Hi út 13. 6. 2023 v 11:21 odesílatel James Pang (chaolpan) napsal: > Hi, > > When join two table on multiple columns equaljoin, rows estimation > always use selectivity = multiplied by distinct multiple individual > columns, possible to use extended n-distinct statistics on multiple > columns? > > PG v14.8-1, attached please check test case with details. > There is not any support for multi tables statistic Regards Pavel > > > Thanks, > > > > James > > >
Re: Postgresql equal join on function with columns not use index
"James Pang (chaolpan)" writes: > Looks like it's the function "regexp_replace" volatile and restrict=false > make the difference, we have our application role with default > search_path=oracle,$user,public,pg_catalog. > =#select > oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile from > pg_proc where proname='regexp_replace' order by oid; > oid |proname | pronamespace | prosecdef | proisstrict | provolatile > ---++--+---+-+- > 2284 | regexp_replace | pg_catalog | f | t | i > 2285 | regexp_replace | pg_catalog | f | t | i > 17095 | regexp_replace | oracle | f | f | v > 17096 | regexp_replace | oracle | f | f | v > 17097 | regexp_replace | oracle | f | f | v > 17098 | regexp_replace | oracle | f | f | v Why in the world are the oracle ones marked volatile? That's what's preventing them from being used in index quals. regards, tom lane
Re: Postgresql equal join on function with columns not use index
út 13. 6. 2023 v 15:50 odesílatel Tom Lane napsal: > "James Pang (chaolpan)" writes: > > Looks like it's the function "regexp_replace" volatile and > restrict=false make the difference, we have our application role with > default search_path=oracle,$user,public,pg_catalog. > > =#select > oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile > from pg_proc where proname='regexp_replace' order by oid; > > oid |proname | pronamespace | prosecdef | proisstrict | > provolatile > > > ---++--+---+-+- > > 2284 | regexp_replace | pg_catalog | f | t | i > > 2285 | regexp_replace | pg_catalog | f | t | i > > 17095 | regexp_replace | oracle | f | f | v > > 17096 | regexp_replace | oracle | f | f | v > > 17097 | regexp_replace | oracle | f | f | v > > 17098 | regexp_replace | oracle | f | f | v > > Why in the world are the oracle ones marked volatile? That's what's > preventing them from being used in index quals. > It looks like orafce issue I'll fix it Regards Pavel > > regards, tom lane > > >
Re: Postgresql equal join on function with columns not use index
út 13. 6. 2023 v 16:17 odesílatel Pavel Stehule napsal: > > > út 13. 6. 2023 v 15:50 odesílatel Tom Lane napsal: > >> "James Pang (chaolpan)" writes: >> > Looks like it's the function "regexp_replace" volatile and >> restrict=false make the difference, we have our application role with >> default search_path=oracle,$user,public,pg_catalog. >> > =#select >> oid,proname,pronamespace::regnamespace,prosecdef,proisstrict,provolatile >> from pg_proc where proname='regexp_replace' order by oid; >> > oid |proname | pronamespace | prosecdef | proisstrict | >> provolatile >> > >> ---++--+---+-+- >> > 2284 | regexp_replace | pg_catalog | f | t | i >> > 2285 | regexp_replace | pg_catalog | f | t | i >> > 17095 | regexp_replace | oracle | f | f | v >> > 17096 | regexp_replace | oracle | f | f | v >> > 17097 | regexp_replace | oracle | f | f | v >> > 17098 | regexp_replace | oracle | f | f | v >> >> Why in the world are the oracle ones marked volatile? That's what's >> preventing them from being used in index quals. >> > > It looks like orafce issue > > I'll fix it > should be fixed in orafce 4.4. Regards Pavel > > Regards > > Pavel > > >> >> regards, tom lane >> >> >>
