Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE

2025-01-27 Thread Feike Steenbergen
On Thu, 9 Jan 2025 at 20:39, Tom Lane  wrote:

> Yeah. That prevents pushing down the join condition, since in a FULL join
all
> rows of both sides will contribute to the result; none can be removed
ahead of
> the join.

> Not sure if this pattern is common enough to be worth trying to implement
such
> an optimization.

I'm not going to pursue this much further, however for those reading along/
future reference, what we're trying to do is:

- for a given target table
- merge a subset of the data using a source table (only update any
significant
changes). The subset filter `WHERE device_id=$1` is applied to both the JOIN
as well as the NOT MATCHED BY SOURCE part of the merge.
- when using MERGE ... NOT MATCHED BY SOURCE THEN DELETE currently (pg17)
reads
the whole of the target table before applying the subset filter. As we plan
to
merge only very small subsets (1/10,000 or so), this means that for now,
this
isn't usable for that use case.

For example, for a monitoring system, we have 10's of thousands of remote
systems, for which we want to merge a snapshot of their current state
inside a
central target table. That use case is currently not well supported with
MERGE.
(The workaround is a DELETE inside a CTE).

I would expect this use case to be quite common, however, as this feature is
only available in PG17, it may not be used yet with MERGE, so I don't expect
any others to voice the same concern.


Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE

2025-01-09 Thread Feike Steenbergen
As a followup, these are the `EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)`
plans for the previous version of the query and the one using NOT MATCHED
BY SOURCE.

Previous plan:  Buffers: shared hit=84 dirtied=1 written=1
New plan:   Buffers: shared hit=100690

*Old plan, using delete in CTE, then MERGE*

 Merge on merge_target t (actual time=0.117..0.119 rows=0 loops=1)
   Tuples: inserted=10
   Buffers: shared hit=84 dirtied=1 written=1
   I/O Timings: shared write=0.012
   CTE deleted
 ->  Delete on merge_target t_1 (actual time=0.222..0.222 rows=0
loops=1)
   Buffers: shared hit=306
   ->  Hash Anti Join (actual time=0.034..0.187 rows=100 loops=1)
 Hash Cond: (t_1."time" = s_1."time")
 Buffers: shared hit=106
 ->  Bitmap Heap Scan on merge_target t_1 (actual
time=0.021..0.163 rows=100 loops=1)
   Recheck Cond: (device_id = 19)
   Heap Blocks: exact=102
   Buffers: shared hit=105
   ->  Bitmap Index Scan on merge_target_pkey (actual
time=0.010..0.010 rows=110 loops=1)
 Index Cond: (device_id = 19)
 Buffers: shared hit=3
 ->  Hash (actual time=0.006..0.006 rows=10 loops=1)
   Buffers: shared hit=1
   ->  Seq Scan on merge_source s_1 (actual
time=0.001..0.002 rows=10 loops=1)
 Buffers: shared hit=1
   ->  Nested Loop Left Join (actual time=0.020..0.031 rows=10 loops=1)
 Buffers: shared hit=31
 ->  Seq Scan on merge_source s (actual time=0.006..0.007 rows=10
loops=1)
   Buffers: shared hit=1
 ->  Index Scan using merge_target_pkey on merge_target t (actual
time=0.002..0.002 rows=0 loops=10)
   Index Cond: ((device_id = 19) AND ("time" = s."time"))
   Buffers: shared hit=30
 Planning:
   Memory: used=80kB  allocated=144kB
 Planning Time: 0.189 ms
 Execution Time: 0.412 ms


*New plan, using MERGE WHEN NOT MATCHED BY SOURCE AND device_id=$1*

 Merge on merge_target t (actual rows=0 loops=1)
   Tuples: inserted=10 deleted=100 skipped=99810
   Buffers: shared hit=100690
   ->  Hash Full Join (actual rows=99920 loops=1)
 Hash Cond: (t."time" = s."time")
 Join Filter: (t.device_id = 18)
 Rows Removed by Join Filter: 10
 Buffers: shared hit=639
 ->  Seq Scan on merge_target t (actual rows=99910 loops=1)
   Buffers: shared hit=638
 ->  Hash (actual rows=10 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   Buffers: shared hit=1
   ->  Seq Scan on merge_source s (actual rows=10 loops=1)
 Buffers: shared hit=1
 Planning:
   Memory: used=41kB  allocated=80kB
 Planning Time: 0.642 ms
 Execution Time: 55.213 ms


Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE

2025-01-09 Thread Feike Steenbergen
I'm trying to change a few applications to fully use this, as PostgreSQL 17
added this support.

The application does something like this:

- fetch information from a source system and store it in a temp table
- run a MERGE with a table (target) in this database,
  updating, inserting and deleting in a single statement
- the target table holds information for multiple systems

The temp table (source) doesn't contain the identifier for the system, as
we can inject that at runtime.

This is the shape of the statement:

MERGE INTO
merge_target AS t
USING
merge_source AS s ON (t.time = s.time AND t.device_id = $1)
WHEN MATCHED THEN
UPDATE set
value = s.value
WHEN NOT MATCHED THEN
INSERT (device_id, time, value) VALUES ($1, time, value)
WHEN NOT MATCHED BY SOURCE
AND t.device_id = $1
THEN DELETE;

If we run this however, there is always a Seq Scan against merge_target,
whereas the filter of `AND t.device_id = $1` uses a (Bitmap) Index scan
in other types of queries.

Previously we would - in a CTE - do a delete *and then* the merge.

Which performed much better as the DELETE would not do a Seq Scan.

Attached a full reproducing test case. Some numbers:

Previously, DELETE in CTE, then merge:
Planning Time: 6.700 ms
Execution Time: 7.473 ms

Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE:
Planning Time: 0.994 ms
Execution Time: 65.695 ms

My questions are:

- is the Seq Scan expected by others in this mailing list as well?
- Is it required to do the Seq Scan?
- is this something that could be optimized?

Kind regards,

Feike Steenbergen


merge_delete_source_seq_scan_reproduction.sql.s
Description: Binary data