Re: Hash Right join and seq scan

2024-07-05 Thread James Pang
David Rowley  於 2024年7月5日週五 上午10:15寫道:

> On Fri, 5 Jul 2024 at 12:50, James Pang  wrote:
> >we have a daily vacuumdb and analyze job, generally speaking it's
> done in seconds, sometimes it suddenly running more than tens of minutes
> with same bind variable values and huge temp space got used  and at that
> time, explain show  "Hash Anti join, Hash Right join" with seq scan two
> tables.
>
> There was talk about adding costing for run-time partition pruning
> factors but nothing was ever agreed, so nothing was done.  It's just
> not that obvious to me how we'd do that.  If the Append had 10
> partitions as subnodes, with an equality join condition, you could
> assume we'll only match to 1 of those 10, but we've no idea at plan
> time which one that'll be and the partitions might drastically vary in
> size.  The best I think we could do is take the total cost of those 10
> and divide by 10 to get the average cost.  It's much harder for range
> conditions as those could match anything from 0 to all partitions. The
> best suggestion I saw for that was to multiply the costs by
> DEFAULT_INEQ_SEL.
>
> I think for now, you might want to lower the random_page_cost or
> increase effective_cache_size to encourage the nested loop -> index
> scan plan. Good ranges for effective_cache_size is anywhere between 50
> - 75% of your servers's RAM.  However, that might not be ideal if your
> server is under memory pressure from other running processes. It also
> depends on how large shared_buffers are as a percentage of total RAM.
>
> David
>

  We already random_page_cost=1.1 and effective_cache_size=75% physical
memory in this database server. For this SQL,
   ->  Nested Loop Anti Join  (cost=40.32..132168227.57 rows=224338
width=78)
   Join Filter: (lower((p.ctinfo)::text) =
lower((w.ctinfo)::text))
   ->  Nested Loop Left Join
 (cost=39.63..398917.29   rows=299118 width=78)
 ->  Append  (cost=0.56..22.36 rows=8 width=54)
   ->  Index Scan using
wmdata_p0_llid_hhid_stime_idx on wmdata_p0 m_1  (cost=0.5
6..2.79 rows=1 width=54)

 ->  Append  (cost=39.07..49312.09 rows=54978 width=78)
   ->  Bitmap Heap Scan on wmvtee_p0 w.1
 (cost=39.07..1491.06 rows=1669 width=78)
 Recheck Cond: ((m.partitionkeyid)::text =
(partitionkeyid)::text)
 ->  Bitmap Index Scan on
wmvtee_p0_partitionkeyid_intid_idx  (cost=0.00..38.65 rows=1669 width=0)
   Index Cond: ((partitionkeyid)::text
= (m.partitionkeyid)::text)
 ...
   ->  Append  (cost=0.69..516.96 rows=4010 width=78)
 ->  Index Only Scan using
wmpct_p0_partitionkeyid_ctinfo_idx on wmpct_p0 p_1  (cost=0.
69..15.78 rows=124 width=78)
   ...

   for nest loop path, since the first one estimated only "8" rows
, and they use partitionkeyid as joinkey and all are hash partitions , is
it better to estimate cost to  8 (loop times) * 1600 = 12800 (each one
loop  map to only 1 hash partition bitmap scan ,avg one partition cost),
that's much less than 398917.29 of all partitions ?  for secondary Nest
Loop Anti join could be rows 299118 rows *  15.78(avg index scan cost of
one partition) = 4,720,082 that still much less than  132168227.57 ?
 for Hash Right join, is it possible to estimate by  8 seq
partition scan instead of all 32 hash partitions since the first query
estimated 8 rows only ?
 extend statistics may help estimate count(partitionkeyid) based on
other columns bind variables, but looks like that did not help table join
case.

Thanks,

James


Re: Hash Right join and seq scan

2024-07-05 Thread David Rowley
On Sat, 6 Jul 2024 at 02:43, James Pang  wrote:
>for nest loop path, since the first one estimated only "8" rows , 
> and they use partitionkeyid as joinkey and all are hash partitions , is it 
> better to estimate cost to  8 (loop times) * 1600 = 12800 (each one loop  map 
> to only 1 hash partition bitmap scan ,avg one partition cost), that's much 
> less than 398917.29 of all partitions ?

I'm not really sure where you're getting the numbers from here. The
outer side of the deepest nested loop has an 8 row estimate, not the
nested loop itself.  I'm unsure where the 1600 is from. I only see
1669.

As of now, we don't do a great job of costing for partition pruning
that will happen during execution.  We won't be inventing anything to
fix that in existing releases of PostgreSQL, so you'll need to either
adjust the code yourself, or find a workaround.

You've not shown us your schema, but perhaps enable_partitionwise_join
= on might help you. Other things that might help are further lowering
random_page_cost or raising effective_cache_size artificially high.
It's hard to tell from here how much random I/O is being costed into
the index scans.  You could determine this by checking if the nested
loop plan costs change as a result of doing further increases to
effective_cache_size. You could maybe nudge it up enough for it to win
over the hash join plan. It is possible that this won't work, however.

>  for secondary Nest Loop Anti join could be rows 299118 rows *  15.78(avg 
> index scan cost of one partition) = 4,720,082 that still much less than  
> 132168227.57 ?
>  for Hash Right join, is it possible to estimate by  8 seq partition 
> scan instead of all 32 hash partitions since the first query estimated 8 rows 
> only ?
>  extend statistics may help estimate count(partitionkeyid) based on 
> other columns bind variables, but looks like that did not help table join 
> case.

I can't quite follow this. You'll need to better explain where you're
getting these numbers for me to be able to understand.

David