Re: Merge David and Goliath tables efficiently

2023-06-18 Thread nicolas paris
> I assume 2TO is 2TB?

Yes. 2TB


> I don't understand why you're sharing just a part of the plan 


As for the nested loop plan, what I shared is the full plan. Actually
it is repeated many times, since 2M batched by 500 rows. I add it
again:

Merge on goliath_23 ca  (cost=2139.75..11077.17 rows=0 width=0)
  ->  Nested Loop Left Join  (cost=2139.75..11077.17 rows=1000
width=575)
->  Limit  (cost=2139.19..2495.67 rows=1000 width=569)
  ->  Index Scan using david_23_list_id_account_id_idx on
david_23  (cost=0.29..6794.16 rows=19058 width=569)
->  Index Scan using goliath_23_list_id_account_id_idx on
goliath_23 ca  (cost=0.56..8.56 rows=1 width=14)
  Index Cond: (list_id = david_23.list_id)


> Well, I kinda doubt you have 17023331531230 rows (not even physically
> possible with 2TB disk), so that's immediately suspicious.

Below is the full plan for the strategy 2.1 (Indeed the previous email
plan was truncated and wrong, sorry for that). 

Note that both plan acome from the same partitioned by hash table with
100 parts, with a unique index on the list_id + hash_key. For strategy
2.1, I turned on enable_partitionwise_join, since david table has the
same partitioning scheme as goliath including unique indexe. In both
case the query is:

MERGE INTO "goliath" ca   
USING (SELECT * FROM "david" ORDER BY "list_id") AS t 
ON t."list_id" = ca."list_id" 
WHEN MATCHED THEN
UPDATE SET ...
WHEN NOT MATCHED THEN 
INSERT (...)  
VALUES (...)

Except in strategy 3 david is split by limit/offset 500 on each part
tables such:

MERGE INTO "goliath_23" ca
USING (SELECT * FROM "david_23" ORDER BY "list_id" LIMIT 500 OFFSET 0)
AS t  
ON t."list_id" = ca."list_id" 
WHEN MATCHED THEN
UPDATE SET ...
WHEN NOT MATCHED THEN 
INSERT (...)  
VALUES (...)  





Merge on goliath ca  (cost=178016528.81..192778842.44 rows=0 width=0)
  Merge on goliath_0 ca_1
  Merge on goliath_1 ca_2
  Merge on goliath_2 ca_3
  Merge on goliath_3 ca_4
  Merge on goliath_4 ca_5
  Merge on goliath_5 ca_6
  Merge on goliath_6 ca_7
  Merge on goliath_7 ca_8
  Merge on goliath_8 ca_9
  Merge on goliath_9 ca_10
  Merge on goliath_10 ca_11
  Merge on goliath_11 ca_12
  Merge on goliath_12 ca_13
  Merge on goliath_13 ca_14
  Merge on goliath_14 ca_15
  Merge on goliath_15 ca_16
  Merge on goliath_16 ca_17
  Merge on goliath_17 ca_18
  Merge on goliath_18 ca_19
  Merge on goliath_19 ca_20
  Merge on goliath_20 ca_21
  Merge on goliath_21 ca_22
  Merge on goliath_22 ca_23
  Merge on goliath_23 ca_24
  Merge on goliath_24 ca_25
  Merge on goliath_25 ca_26
  Merge on goliath_26 ca_27
  Merge on goliath_27 ca_28
  Merge on goliath_28 ca_29
  Merge on goliath_29 ca_30
  Merge on goliath_30 ca_31
  Merge on goliath_31 ca_32
  Merge on goliath_32 ca_33
  Merge on goliath_33 ca_34
  Merge on goliath_34 ca_35
  Merge on goliath_35 ca_36
  Merge on goliath_36 ca_37
  Merge on goliath_37 ca_38
  Merge on goliath_38 ca_39
  Merge on goliath_39 ca_40
  Merge on goliath_40 ca_41
  Merge on goliath_41 ca_42
  Merge on goliath_42 ca_43
  Merge on goliath_43 ca_44
  Merge on goliath_44 ca_45
  Merge on goliath_45 ca_46
  Merge on goliath_46 ca_47
  Merge on goliath_47 ca_48
  Merge on goliath_48 ca_49
  Merge on goliath_49 ca_50
  Merge on goliath_50 ca_51
  Merge on goliath_51 ca_52
  Merge on goliath_52 ca_53
  Merge on goliath_53 ca_54
  Merge on goliath_54 ca_55
  Merge on goliath_55 ca_56
  Merge on goliath_56 ca_57
  Merge on goliath_57 ca_58
  Merge on goliath_58 ca_59
  Merge on goliath_59 ca_60
  Merge on goliath_60 ca_61
  Merge on goliath_61 ca_62
  Merge on goliath_62 ca_63
  Merge on goliath_63 ca_64
  Merge on goliath_64 ca_65
  Merge on goliath_65 ca_66
  Merge on goliath_66 ca_67
  Merge on goliath_67 ca_68
  Merge on goliath_68 ca_69
  Merge on goliath_69 ca_70
  Merge on goliath_70 ca_71
  Merge on goliath_71 ca_72
  Merge on goliath_72 ca_73
  Merge on goliath_73 ca_74
  Merge on goliath_74 ca_75
  Merge on goliath_75 ca_76
  Merge on goliath_76 ca_77
  Merge on goliath_77 ca_78
  Merge on goliath_78 ca_79
  Merge on goliath_79 ca_80
  Merge on goliath_80 ca_81
  Merge on goliath_81 ca_82
  Merge on goliath_82 ca_83
  Merge on goliath_83 ca_84
  Merge on goliath_84 ca_85
  Merge on goliath_8

RE: Forced to use UNION ALL when having multiple ANY operators and ORDER BY LIMIT

2023-06-18 Thread msalais
Hi,

Do you really need to do “select *”?

In other words, is it necessary to have all columns in the result?

 

Michel SALAIS

 

De : benoit  
Envoyé : lundi 12 juin 2023 23:35
À : Chris Hoover 
Cc : [email protected]
Objet : RE: Forced to use UNION ALL when having multiple ANY operators and
ORDER BY LIMIT

 

This new index is used but still the read is 230mb.

 

https://explain.dalibo.com/plan/b0f28a9e8a136afd

 

  _  

De : Chris Hoover mailto:[email protected]> >
Envoyé : lundi 12 juin 2023 22:55
À : benoit
Cc : [email protected]
 
Objet : Re: Forced to use UNION ALL when having multiple ANY operators and
ORDER BY LIMIT 

 

I normally create my indexes to match the where clause of the query. While
technically, it should not matter, I find a lot of time, it does. 

 

I would create an index on (status, sender_reference, sent_at) and see if
the improves your query performance.

 


 

SELECT * FROM docs WHERE status 

IN ('draft',

'sent')

AND sender_reference 

IN ('Custom/1175',

'Client/362',

'Custom/280')

ORDER BY sent_at DESC

 

 

 

Thanks,

 

 

Chris Hoover

Senior DBA

AWeber.com

Cell: (803) 528-2269

Email: [email protected]  

 

 





On Jun 12, 2023, at 4:17 PM, benoit mailto:[email protected]> > wrote:

 

Hello

 

I have a database with few 60gb tables. Tables rows are requested with
multiple ANY or IN operators. I am not able to find an easy way to make DB
able to use indexes. I often hit the index, but see a a spike of 200mb of IO
or disk read.

 

I am using version 13 but soon 14.

 

I wrote a reproduction script on version 14 with plans included.
https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d

 

I also have plans on a snapshot of the DB with real data.

- The current query that I try to improve :
https://explain.dalibo.com/plan/8b8f6e0he9feb551

  - I added the DB schema + index in query view. As you can see I have many
indexes for testing purpose and try what the planner can do.

- The optimized query when I have only one ANY and migrate to UNION ALL for
each parameter of the ANY operator
https://explain.dalibo.com/plan/427gg053d07328ga . Query is fast as I would
like but it means generate some merge to be able to get a fast result.

- The new issue I have when I have a new ANY operator on the previous
optimized query. Big IO/read
https://explain.dalibo.com/plan/e7ha9g637b4eh946

 

It seems to me quite undoable to generate for every parameters a query that
will then merge. I have sometimes 3-4 ANY operators with up to 15 elements
in an array.

 

Is there a misusage of my indexes?

Is there a limitation when using ANY or IN operators and ordered LIMIT
behind?

 

Thanks a lot