Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Jerry Brenner
The attached query plan is from 11.
We are getting Merge Joins on both sides of the UNION.  In both cases, the
first node under the Merge Join returns 0 rows but the other side of the
Merge Join (the one being sorted) is executed and that's where all of the
time is spent.

On the surface, I don't see any way from the attached explain plan to
determine which side of the Merge Join is executed first.  Some questions:

   - Which side gets executed first?
   - How would one tell that from the json?
   - Have there been any relevant changes to later releases to make that
   more apparent?
   - Whichever side gets executed first, is the execution of the side that
   would be second get short circuited if 0 rows are returned by the first
   side?

Here's a screenshot from pgMustard.

   - Nodes 6 and 14 (the first node under each of the Merge Joins) each
   return 0 rows
   - Nodes 9 and 15 are the expensive sides of the Merge Joins and return
   lots of rows

[image: image.png]

NOTE:

   - The query plan in 13 is slightly different, but still includes the
   Merge Joins.
   - Replacing ANY(ARRAY() fixes the
   performance problem, but we'd still like to understand the execution
   characteristics of Merge Join

Thanks,
Jerry


W_PC_PolicySearchByFirstAndLastNameWithANYARRAYInsteadOfINSubqueryRedacted.json
Description: application/json


Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel




Le 20/12/2023 à 15:40, Jerry Brenner a écrit :

The attached query plan is from 11.
We are getting Merge Joins on both sides of the UNION.  In both cases, 
the first node under the Merge Join returns 0 rows but the other side of 
the Merge Join (the one being sorted) is executed and that's where all 
of the time is spent.


On the surface, I don't see any way from the attached explain plan to 
determine which side of the Merge Join is executed first.  Some questions:


  * Which side gets executed first?
  * How would one tell that from the json?
  * Have there been any relevant changes to later releases to make that
more apparent?
  * Whichever side gets executed first, is the execution of the side
that would be second get short circuited if 0 rows are returned by
the first side?

Here's a screenshot from pgMustard.

  * Nodes 6 and 14 (the first node under each of the Merge Joins) each
return 0 rows
  * Nodes 9 and 15 are the expensive sides of the Merge Joins and return
lots of rows


I think those nodes (9 and 15) are expensive because they have to filter 
out 8 millions rows in order to produce their first output row. After 
that, they get short circuited.


Best regards,
Frédéric




Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel




Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
Whichever side gets executed first, is the execution of the side that 
would be second get short circuited if 0 rows are returned by the first 
side?


Indeed, if 0 rows are returned from the outer relation, the scan of the 
inner relation is never executed.


Best regards,
Frédéric




Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Jerry Brenner
Thanks.  Does this make sense?

   - There are 3 nodes under the Merge Join
   - The first node is an InitPlan, due to the ANY(ARRAY()) - that gets
   executed and finds 0 matching rows
   - The second node is the outer node in the Merge Join and that is the
   expensive node in our query plan
   - The third node is the inner node in the Merge Join and that node
   references the SubPlan generated by the first node. The IndexCond has*
   "id = ANY($2) AND ..."* and the comparison with the result of the
   SubPlan does not find a match, so that's where the short-circuiting happens.

Here are the relevant lines from the node (12) accessing the result of the
SubPlan:

"Plans": [
  {
"Node Type": "Index Only Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Scan Direction": "Forward",
"Index Name":
"policyperi_u_id_1mw8mh83lyyd9",
"Relation Name": "pc_policyperiod",
"Alias": "qroots0",
"Startup Cost": 0.69,
"Total Cost": 18.15,
"Plan Rows": 10,
"Plan Width": 8,
"Actual Startup Time": 0.045,
"Actual Total Time": 0.045,
"Actual Rows": 0,
"Actual Loops": 1,
"Index Cond": "(*(id = ANY ($2)) AND*
(retired = 0) AND (temporarybranch = false))",


Here's the screenshot again:

[image: image.png]

Thanks,
Jerry

On Wed, Dec 20, 2023 at 10:32 AM Frédéric Yhuel 
wrote:

>
>
> Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> > Whichever side gets executed first, is the execution of the side that
> > would be second get short circuited if 0 rows are returned by the first
> > side?
>
> Indeed, if 0 rows are returned from the outer relation, the scan of the
> inner relation is never executed.
>
> Best regards,
> Frédéric
>
>


Re: Which side of a Merge Join gets executed first? Do both sides always get executed?

2023-12-20 Thread Frédéric Yhuel




Le 20/12/2023 à 20:04, Jerry Brenner a écrit :

Thanks.  Does this make sense?

  * There are 3 nodes under the Merge Join
  * The first node is an InitPlan, due to the ANY(ARRAY()) - that gets
executed and finds 0 matching rows
  * The second node is the outer node in the Merge Join and that is the
expensive node in our query plan
  * The third node is the inner node in the Merge Join and that node
references the SubPlan generated by the first node. The IndexCond
has*"id = ANY($2) AND ..."* and the comparison with the result of
the SubPlan does not find a match, so that's where the
short-circuiting happens.


I think it does.

I'm not very experienced with the customs of these mailing lists, but I 
think the following would help to get more answers :


* TEXT format of EXPLAIN is much more readable (compared to JSON)
* A well formatted query would help
* Screenshots aren't so great

Rather than a screenshot, maybe you could use one of explain.depesz.com, 
explain.dalibo.com, or explain-postgresql.com ?


Best regards,
Frédéric