Hi,
When join two table on multiple columns equaljoin, rows estimation always
use selectivity = multiplied by distinct multiple individual columns, possible
to use extended n-distinct statistics on multiple columns?
PG v14.8-1, attached please check test case with details.
Thanks,
James
testdb=> \d test1
Table "csdigital.test1"
Column | Type | Collation | Nullable | Default
-----------------+------------------------+-----------+----------+---------
account_key | character varying(36) | | not null |
subscription_id | character varying(36) | | not null |
site_key | character varying(36) | | not null |
site_id | numeric(38,0) | | |
site_name | character varying(128) | | |
Indexes:
"test1_pk" PRIMARY KEY, btree (account_key, subscription_id, site_key)
testdb=> \d test2
Table "csdigital.test2"
Column | Type | Collation | Nullable | Default
-----------------+------------------------+-----------+----------+---------
account_key | character varying(36) | | not null |
subscription_id | character varying(36) | | not null |
offer_code | character varying(128) | | not null |
offer_name | character varying(128) | | |
offer_type | character varying(128) | | |
Indexes:
"test2_pk" PRIMARY KEY, btree (account_key, subscription_id, offer_code)
testdb=> explain select count(*) from test1 t1,test2 t2 where
t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=629339.31..629339.32 rows=1 width=8)
-> Merge Join (cost=1.99..629339.26 rows=22 width=0)
Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND
((t1.subscription_id)::text = (t2.subscription_id)::text))
-> Index Only Scan using test1_pk on test1 t1 (cost=0.56..263134.00
rows=4906772 width=61)
-> Index Only Scan using test2_pk on test2 t2 (cost=0.56..309937.54
rows=6346934 width=59)
(5 rows)
Time: 8.018 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV,
tablename, attname, inherited, null_frac, n_distinct,
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1)
n_hist, correlation
testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND
tablename='test1' ORDER BY 1 DESC;
frac_mcv | tablename | attname | inherited | null_frac |
n_distinct | n_mcv | n_hist | correlation
---------------+-----------+-----------------+-----------+-----------+------------+-------+--------+--------------
0.00026666667 | test1 | account_key | f | 0 |
249474 | 1 | 101 | -0.008081022
0.00026666667 | test1 | subscription_id | f | 0 |
-0.6047012 | 1 | 101 | 0.26008433
(2 rows)
Time: 7.760 ms
testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV,
tablename, attname, inherited, null_frac, n_distinct,
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1)
n_hist, correlation
testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND
tablename='test2' ORDER BY 1 DESC;
frac_mcv | tablename | attname | inherited | null_frac | n_distinct |
n_mcv | n_hist | correlation
----------+-----------+-----------------+-----------+-----------+-------------+-------+--------+-------------
| test2 | account_key | f | 0 | 325385 |
| 101 | 0.006174604
| test2 | subscription_id | f | 0 | -0.68929785 |
| 101 | 0.123789854
(2 rows)
Time: 1.423 ms
testdb=> select relname,reltuples from pg_class where relname in
('test1','test2');
relname | reltuples
---------+--------------
test1 | 4.906772e+06
test2 | 6.346934e+06
(2 rows)
Time: 0.746 ms
testdb=> create statistics
cs_contract_site_map_account_key_subscription_id(dependencies, ndistinct) on
account_key, subscription_id from test1;
CREATE STATISTICS
Time: 7.918 ms
testdb=> create statistics
cs_contract_account_key_subscription_id(dependencies, ndistinct) on
account_key, subscription_id from test2;
CREATE STATISTICS
Time: 1.558 ms
testdb=> analyze test1;
aANALYZE
Time: 745.952 ms
testdb=> analyze test2;
ANALYZE
Time: 606.735 ms
testdb=> select
st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies
testdb-> from pg_statistic_ext st join pg_statistic_ext_data std on
st.oid=std.stxoid;
ERROR: permission denied for table pg_statistic_ext_data
Time: 2.464 ms
testdb=# select
st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies
testdb-# from pg_statistic_ext st join pg_statistic_ext_data std on
st.oid=std.stxoid;
stxrelid | stxname | stxstattarget |
stxkeys | stxdndistinct | stxddependencies
----------+--------------------------------------------------+---------------+---------+-------------------+------------------------------------------
5920481 | cs_contract_site_map_account_key_subscription_id | -1 |
1 2 | {"1, 2": 3552572} | {"1 => 2": 0.891333, "2 => 1": 0.997467}
5920675 | cs_contract_account_key_subscription_id | -1 |
1 2 | {"1, 2": 6173026} | {"1 => 2": 0.917700, "2 => 1": 0.997400}
(2 rows)
testdb=> explain select count(*) from test1 t1,test2 t2 where
t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=629329.20..629329.21 rows=1 width=8)
-> Merge Join (cost=1.28..629329.15 rows=23 width=0) <<< here we expect
extended ndistinct used to rows = much more value, but still use
distinct(account_key)*distinct(subscription_id)
Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND
((t1.subscription_id)::text = (t2.subscription_id)::text))
-> Index Only Scan using test1_pk on test1 t1 (cost=0.56..263134.59
rows=4906790 width=61)
-> Index Only Scan using test2_pk on test2 t2 (cost=0.56..309935.05
rows=6346754 width=59)
(5 rows)
Time: 4.307 ms