Re: Merge David and Goliath tables efficiently

2023-06-20 Thread Tomas Vondra
On 6/20/23 12:02, nicolas paris wrote: >... > > Definitely this makes sense to add the part_col in the join columns. > Also it helps the planner to choose a better plan, since now it goes > with per partition nested loop without having to trick the costs > (either enable_hashjoin/random_page_cost),

Re: Merge David and Goliath tables efficiently

2023-06-20 Thread nicolas paris
> This is absolutely expected. If you partition by hash (id, part_key), > you can't join on (id) and expect partitionwise join to work. To > quote > the enable_partitionwise_join documentation [1]: > >     Enables or disables the query planner's use of partitionwise > join, >     which allows a jo

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread Tomas Vondra
On 6/19/23 17:45, nicolas paris wrote: >> But you wrote that in both cases the query is: > > that was indeed yet another tipo, hope to do better in the future. > > >> I'm willing to continue to investigate, but only if you prepare a >> reproducer, > > Thanks for your starter script. Please

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
> But you wrote that in both cases the query is: that was indeed yet another tipo, hope to do better in the future. > I'm willing to continue to investigate, but only if you prepare a > reproducer, Thanks for your starter script. Please find attached 2 scripts which now illustrates two trouble

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread Tomas Vondra
On 6/19/23 14:20, nicolas paris wrote: >> IMHO the thing that breaks it is the ORDER BY in the merge, which >> likely >> acts as an optimization fence and prevents all sorts of smart things >> including the partitionwise join. I'd bet that if Nicolas replaces >> >>   MERGE INTO "goliath" ca >>   US

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
> IMHO the thing that breaks it is the ORDER BY in the merge, which > likely > acts as an optimization fence and prevents all sorts of smart things > including the partitionwise join. I'd bet that if Nicolas replaces > >   MERGE INTO "goliath" ca >   USING (SELECT * FROM "david" ORDER BY "list_id"

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread nicolas paris
On Mon, 2023-06-19 at 13:34 +0200, Tomas Vondra wrote: > > > On 6/18/23 22:57, nicolas paris wrote: > > > ... > > > Well, I kinda doubt you have 17023331531230 rows (not even > > > physically > > > possible with 2TB disk), so that's immediately suspicious. > > > > Below is the full plan for the

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread Tomas Vondra
On 6/19/23 09:46, Alvaro Herrera wrote: > I came here to talk about partitionwise join, but then noticed you have > already thought of that: > > On 2023-Jun-18, nicolas paris wrote: > >> Note that both plan acome from the same partitioned by hash table with >> 100 parts, with a unique index on

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread Tomas Vondra
On 6/18/23 22:57, nicolas paris wrote: >> ... >> Well, I kinda doubt you have 17023331531230 rows (not even physically >> possible with 2TB disk), so that's immediately suspicious. > > Below is the full plan for the strategy 2.1 (Indeed the previous email > plan was truncated and wrong, sorry f

Re: Merge David and Goliath tables efficiently

2023-06-19 Thread Alvaro Herrera
I came here to talk about partitionwise join, but then noticed you have already thought of that: On 2023-Jun-18, nicolas paris wrote: > Note that both plan acome from the same partitioned by hash table with > 100 parts, with a unique index on the list_id + hash_key. For strategy > 2.1, I turned o

Re: Merge David and Goliath tables efficiently

2023-06-18 Thread nicolas paris
> I assume 2TO is 2TB? Yes. 2TB > I don't understand why you're sharing just a part of the plan As for the nested loop plan, what I shared is the full plan. Actually it is repeated many times, since 2M batched by 500 rows. I add it again: Merge on goliath_23 ca (cost=2139.75..11077.17 rows=

Re: Merge David and Goliath tables efficiently

2023-06-17 Thread Tomas Vondra
On 6/17/23 23:42, nicolas paris wrote: >>> My interpretation reading the query plan is: well sized small >>> batches of upserts leverage the indexes while the regular join >>> choose the sequential scan, including sorting and hashing which >>> takes forever time and resources including disk. >>

Re: Merge David and Goliath tables efficiently

2023-06-17 Thread nicolas paris
> > My interpretation reading the query plan is: well sized small > > batches of upserts leverage the indexes while the regular join > > choose the sequential scan, including sorting and hashing which > > takes forever time and resources including disk. > > You may be right, but it's hard to tell

Re: Merge David and Goliath tables efficiently

2023-06-17 Thread Tomas Vondra
On 6/17/23 15:48, Nicolas Paris wrote: > In my use case I have a 2billion / 1To table. I have daily data to upsert > around 2milion, with say 50% inserts, based on the primary key in a fresh > analyzed table. > > I have tested multiple strategies to merge the data, all based on first stage > to

Merge David and Goliath tables efficiently

2023-06-17 Thread Nicolas Paris
In my use case I have a 2billion / 1To table. I have daily data to upsert around 2milion, with say 50% inserts, based on the primary key in a fresh analyzed table. I have tested multiple strategies to merge the data, all based on first stage to copy the 2m dataset in an staging unlogged / index