Merge David and Goliath tables efficiently

2023-06-17 Thread Nicolas Paris
In my use case I have a 2billion / 1To table. I have daily data to upsert 
around 2milion, with say 50% inserts, based on the primary key in a fresh 
analyzed table.

I have tested multiple strategies to merge the data, all based on first stage 
to copy the 2m dataset in an staging unlogged / indexed table:

1. Join insert then join update 
2.1. Usage of the new merge statement
2.2 Usage of merge on two hash partitioned tables wit partition wide join 
enabled
3. Usage of merge by batch of 1000 rows

First remark is the merge statement is almost 30% faster than two statements in 
my benchmarks. Thanks to the pg community for this.

While the strategies 1 and 2.x are incredibly slow (canceled after 10 hours), 
the third one finishes within 30 minutes.

My interpretation reading the query plan is: well sized small batches of 
upserts leverage the indexes while the regular join choose the sequential scan, 
including sorting and hashing which takes forever time and resources including 
disk.

Sadly my incoming dataset is too small to benefit from a seq scan and too large 
to benefit from an index scan join. However when splited manuallyin N portions, 
the problem can be tackled with N * small cost, which is cheap anyway.

Questions:
1. Is there another strategy  ?
2. Could postgres support a "batched indexed join itself", leveraging indexes 
itself by dynamic sized batches ?


It is error prone write code to split and iterate  I suspect postgres has 
everything internally (indexes catalog, planner) to split itself the job, 
making David vs Goliath something trivial.




Re: Merge David and Goliath tables efficiently

2023-06-17 Thread Tomas Vondra
On 6/17/23 15:48, Nicolas Paris wrote:
> In my use case I have a 2billion / 1To table. I have daily data to upsert 
> around 2milion, with say 50% inserts, based on the primary key in a fresh 
> analyzed table.
> 
> I have tested multiple strategies to merge the data, all based on first stage 
> to copy the 2m dataset in an staging unlogged / indexed table:
> 
> 1. Join insert then join update 
> 2.1. Usage of the new merge statement
> 2.2 Usage of merge on two hash partitioned tables wit partition wide join 
> enabled
> 3. Usage of merge by batch of 1000 rows
> 
> First remark is the merge statement is almost 30% faster than two statements 
> in my benchmarks. Thanks to the pg community for this.
> 
> While the strategies 1 and 2.x are incredibly slow (canceled after 10 hours), 
> the third one finishes within 30 minutes.
> 

Seems pretty terrible, provided the data is on reasonable storage (with
acceptable random I/O behavior).

> My interpretation reading the query plan is: well sized small batches of 
> upserts leverage the indexes while the regular join choose the sequential 
> scan, including sorting and hashing which takes forever time and resources 
> including disk.

You may be right, but it's hard to tell without seeing the query plan.

> 
> Sadly my incoming dataset is too small to benefit from a seq scan and too 
> large to benefit from an index scan join. However when splited manuallyin N 
> portions, the problem can be tackled with N * small cost, which is cheap 
> anyway.
> 

Sounds very much like you'd benefit from tuning some cost parameters to
make the index scan look cheaper.

> Questions:
> 1. Is there another strategy  ?
> 2. Could postgres support a "batched indexed join itself", leveraging indexes 
> itself by dynamic sized batches ?
> 

Not sure what 'batched indexed join' would be, but it very much sounds
like a nested loop with an index scan.

> 
> It is error prone write code to split and iterate  I suspect postgres has 
> everything internally (indexes catalog, planner) to split itself the job, 
> making David vs Goliath something trivial.
> 

What PostgreSQL version are you using, what hardware? Did you tune it in
any way, or is everything just default?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Merge David and Goliath tables efficiently

2023-06-17 Thread nicolas paris
> > My interpretation reading the query plan is: well sized small
> > batches of upserts leverage the indexes while the regular join
> > choose the sequential scan, including sorting and hashing which
> > takes forever time and resources including disk.
> 
> You may be right, but it's hard to tell without seeing the query
> plan.

Here are part of both plans:

Bad case (strategy 2.1):

->  Merge Left Join  (cost=530202629.03..255884257913.32
rows=17023331531230 width=579)
Merge Cond: (david.list_id = ca.list_id)
->  Sort  (cost=2019172.91..2024398.82 rows=2090361 width=569)
  Sort Key: david.list_id
  ->  Append  (cost=0.00..192152.41 rows=2090361 width=569)
->  Seq Scan on david_0 david_1  (cost=0.00..1812.52
rows=20852 width=569)
->  Seq Scan on david_1 david_2  (cost=0.00..1800.09
rows=20709 width=569)
->  Seq Scan on david_2 david_3  (cost=0.00..1794.44
rows=20644 width=569)

Good case (strategy 3):

Merge on goliath_23 ca  (cost=2139.75..11077.17 rows=0 width=0)
  ->  Nested Loop Left Join  (cost=2139.75..11077.17 rows=1000
width=575)
->  Limit  (cost=2139.19..2495.67 rows=1000 width=569)
  ->  Index Scan using david_23_list_id_account_id_idx on
david_23  (cost=0.29..6794.16 rows=19058 width=569)
->  Index Scan using goliath_23_list_id_account_id_idx on
goliath_23 ca  (cost=0.56..8.56 rows=1 width=14)
  Index Cond: (list_id = david_23.list_id)

> 
> Sounds very much like you'd benefit from tuning some cost parameters
> to
> make the index scan look cheaper.
> Not sure what 'batched indexed join' would be, but it very much
> sounds
> like a nested loop with an index scan.

Agreed, a 2M nested loop over index scan would likely work as well.
Would tuning the costs param could lead to get such large nested loop ?

> What PostgreSQL version are you using, what hardware? Did you tune it
> in
> any way, or is everything just default?

It is pg 15.3, on 2 cores / 8GO / 2TO ssds, with defaults cloud
provider parameters (RDS). 




Re: Merge David and Goliath tables efficiently

2023-06-17 Thread Tomas Vondra



On 6/17/23 23:42, nicolas paris wrote:
>>> My interpretation reading the query plan is: well sized small
>>> batches of upserts leverage the indexes while the regular join
>>> choose the sequential scan, including sorting and hashing which
>>> takes forever time and resources including disk.
>>
>> You may be right, but it's hard to tell without seeing the query
>> plan.
> 
> Here are part of both plans:
> 

I don't understand why you're sharing just a part of the plan and not
the whole thing, ideally including the actual update ... Giving us the
info in small pieces just means we need to guess and speculate.

> Bad case (strategy 2.1):
> 
> ->  Merge Left Join  (cost=530202629.03..255884257913.32
> rows=17023331531230 width=579)
> Merge Cond: (david.list_id = ca.list_id)
> ->  Sort  (cost=2019172.91..2024398.82 rows=2090361 width=569)
>   Sort Key: david.list_id
>   ->  Append  (cost=0.00..192152.41 rows=2090361 width=569)
> ->  Seq Scan on david_0 david_1  (cost=0.00..1812.52
> rows=20852 width=569)
> ->  Seq Scan on david_1 david_2  (cost=0.00..1800.09
> rows=20709 width=569)
> ->  Seq Scan on david_2 david_3  (cost=0.00..1794.44
> rows=20644 width=569)
> 

Well, I kinda doubt you have 17023331531230 rows (not even physically
possible with 2TB disk), so that's immediately suspicious. I'd bet the
UPDATE ... FROM ... is missing a condition or something like that, which
results in a cartesian product.

> Good case (strategy 3):
> 
> Merge on goliath_23 ca  (cost=2139.75..11077.17 rows=0 width=0)
>   ->  Nested Loop Left Join  (cost=2139.75..11077.17 rows=1000
> width=575)
> ->  Limit  (cost=2139.19..2495.67 rows=1000 width=569)
>   ->  Index Scan using david_23_list_id_account_id_idx on
> david_23  (cost=0.29..6794.16 rows=19058 width=569)
> ->  Index Scan using goliath_23_list_id_account_id_idx on
> goliath_23 ca  (cost=0.56..8.56 rows=1 width=14)
>   Index Cond: (list_id = david_23.list_id)
> 
>>
>> Sounds very much like you'd benefit from tuning some cost parameters
>> to
>> make the index scan look cheaper.
>> Not sure what 'batched indexed join' would be, but it very much
>> sounds
>> like a nested loop with an index scan.
> 
> Agreed, a 2M nested loop over index scan would likely work as well.
> Would tuning the costs param could lead to get such large nested loop ?
> 

It should be, but maybe let's see if there are other problems in the
query itself. If it's generating a cartesian product, it's pointless to
tune parameters.

>> What PostgreSQL version are you using, what hardware? Did you tune it
>> in
>> any way, or is everything just default?
> 
> It is pg 15.3, on 2 cores / 8GO / 2TO ssds, with defaults cloud
> provider parameters (RDS). 
> 

I assume 2TO is 2TB?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company