On Nov 7, 2024, at 9:54 PM, Andrei Lepikhov wrote:
> On 11/8/24 09:45, Ed Sabol wrote:
>> On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov wrote:
>>> Postgres didn't want Materialize in this example because of the low
>>> estimation on its outer subquery. AFAIC, by increasing the *_page_cost's
>>> value, you added extra weight to the inner subquery
>> What kind of extended statistics do you suggest for this? ndistinct,
>> dependencies, or mcv?
>> CREATE STATISTICS tablename_stats () ON relation, type
>> FROM tablename;
>> ANALYZE tablename;
> I'd recommend to use all of them - MCV is helpful in most of the cases (and
> relatively cheap), distinct is actually used in Postgres now to calculate
> number of groups (GROUP-BY, Sort, Memoize, etc.); dependencies - to find
> correlations between columns - usually in scan filters.
OK, I've executed the following:
CREATE STATISTICS tablename_stats_rt_nd (ndistinct) ON relation, type FROM
tablename;
CREATE STATISTICS tablename_stats_rt_mcv (mcv) ON relation, type FROM tablename;
CREATE STATISTICS tablename_stats_rt_dep (dependencies) ON relation, type FROM
tablename;
CREATE STATISTICS tablename_stats_rv_nd (ndistinct) ON relation, value FROM
tablename;
CREATE STATISTICS tablename_stats_rv_mcv (mcv) ON relation, value FROM
tablename;
CREATE STATISTICS tablename_stats_rv_dep (dependencies) ON relation, value FROM
tablename;
CREATE STATISTICS tablename_stats_nr_nd (ndistinct) ON name, relation FROM
tablename;
CREATE STATISTICS tablename_stats_nr_mcv (mcv) ON name, relation FROM tablename;
CREATE STATISTICS tablename_stats_nr_dep (dependencies) ON name, relation FROM
tablename;
CREATE STATISTICS tablename_stats_nt_nd (ndistinct) ON name, type FROM
tablename;
CREATE STATISTICS tablename_stats_nt_mcv (mcv) ON name, type FROM tablename;
CREATE STATISTICS tablename_stats_nt_dep (dependencies) ON name, type FROM
tablename;
CREATE STATISTICS tablename_stats_nv_nd (ndistinct) ON name, value FROM
tablename;
CREATE STATISTICS tablename_stats_nv_mcv (mcv) ON name, value FROM tablename;
CREATE STATISTICS tablename_stats_nv_dep (dependencies) ON name, value FROM
tablename;
ANALYZE tablename;
Now with random_page_cost = 4.0, the optimizer materializes, and it's fast
again:
Nested Loop (cost=1226.12..11129.87 rows=1 width=112) (actual
time=30.965..31.333 rows=1 loops=1)
Join Filter: (a.name = d.name)
Buffers: shared hit=7447
-> Nested Loop (cost=1225.70..2.51 rows=1 width=108) (actual
time=30.921..31.208 rows=1 loops=1)
Buffers: shared hit=7418
-> Hash Join (cost=1225.27..11093.62 rows=1 width=86) (actual
time=30.862..31.078 rows=1 loops=1)
Hash Cond: ((a.name || '.doc'::text) = b_1.name)
Buffers: shared hit=7389
-> Nested Loop (cost=1167.53..11019.89 rows=11 width=70)
(actual time=27.143..27.347 rows=1 loops=1)
Join Filter: (CASE WHEN ("position"(a.name, 'zz'::text) =
1) THEN a.name ELSE ('h_'::text || a.name) END = "*SELECT* 1".table_name)
Rows Removed by Join Filter: 1021
Buffers: shared hit=6268
-> Bitmap Heap Scan on tablename a (cost=456.55..5407.28
rows=1077 width=38) (actual time=2.986..15.865 rows=1022 loops=1)
Recheck Cond: (relation = 'description'::text)
Filter: (type = 'table'::text)
Rows Removed by Filter: 37044
Heap Blocks: exact=4024
Buffers: shared hit=4065
-> Bitmap Index Scan on tablename_r
(cost=0.00..456.29 rows=38915 width=0) (actual time=2.336..2.336 rows=44980
loops=1)
Index Cond: (relation = 'description'::text)
Buffers: shared hit=41
-> Materialize (cost=710.98..5564.15 rows=2 width=64)
(actual time=0.008..0.009 rows=1 loops=1022)
Buffers: shared hit=2203
-> Append (cost=710.98..5564.14 rows=2 width=64)
(actual time=7.519..7.548 rows=1 loops=1)
Buffers: shared hit=2203
-> Subquery Scan on "*SELECT* 1"
(cost=710.98..3537.89 rows=1 width=64) (actual time=6.629..6.636 rows=0 loops=1)
Buffers: shared hit=1380
-> Bitmap Heap Scan on tablename
(cost=710.98..3537.88 rows=1 width=96) (actual time=6.628..6.633 rows=0 loops=1)
Recheck Cond: ((relation = ANY
('{start_time,end_time,dataset_id_column,dataset_id_prefix,original_mission_name,defaultSearchRadius,author,tableType,bibcode,priority,regime,author,includesTypes,Mission,subject}'::text[]))
AND (type = 'table'::text))
Filter: ((CASE relation WHEN
'Mission'::text THEN