Which side of a Merge Join gets executed first? Do both sides always get executed?
The attached query plan is from 11. We are getting Merge Joins on both sides of the UNION. In both cases, the first node under the Merge Join returns 0 rows but the other side of the Merge Join (the one being sorted) is executed and that's where all of the time is spent. On the surface, I don't see any way from the attached explain plan to determine which side of the Merge Join is executed first. Some questions: - Which side gets executed first? - How would one tell that from the json? - Have there been any relevant changes to later releases to make that more apparent? - Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side? Here's a screenshot from pgMustard. - Nodes 6 and 14 (the first node under each of the Merge Joins) each return 0 rows - Nodes 9 and 15 are the expensive sides of the Merge Joins and return lots of rows [image: image.png] NOTE: - The query plan in 13 is slightly different, but still includes the Merge Joins. - Replacing ANY(ARRAY() fixes the performance problem, but we'd still like to understand the execution characteristics of Merge Join Thanks, Jerry W_PC_PolicySearchByFirstAndLastNameWithANYARRAYInsteadOfINSubqueryRedacted.json Description: application/json
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : The attached query plan is from 11. We are getting Merge Joins on both sides of the UNION. In both cases, the first node under the Merge Join returns 0 rows but the other side of the Merge Join (the one being sorted) is executed and that's where all of the time is spent. On the surface, I don't see any way from the attached explain plan to determine which side of the Merge Join is executed first. Some questions: * Which side gets executed first? * How would one tell that from the json? * Have there been any relevant changes to later releases to make that more apparent? * Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side? Here's a screenshot from pgMustard. * Nodes 6 and 14 (the first node under each of the Merge Joins) each return 0 rows * Nodes 9 and 15 are the expensive sides of the Merge Joins and return lots of rows I think those nodes (9 and 15) are expensive because they have to filter out 8 millions rows in order to produce their first output row. After that, they get short circuited. Best regards, Frédéric
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side? Indeed, if 0 rows are returned from the outer relation, the scan of the inner relation is never executed. Best regards, Frédéric
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
Thanks. Does this make sense?
- There are 3 nodes under the Merge Join
- The first node is an InitPlan, due to the ANY(ARRAY()) - that gets
executed and finds 0 matching rows
- The second node is the outer node in the Merge Join and that is the
expensive node in our query plan
- The third node is the inner node in the Merge Join and that node
references the SubPlan generated by the first node. The IndexCond has*
"id = ANY($2) AND ..."* and the comparison with the result of the
SubPlan does not find a match, so that's where the short-circuiting happens.
Here are the relevant lines from the node (12) accessing the result of the
SubPlan:
"Plans": [
{
"Node Type": "Index Only Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Scan Direction": "Forward",
"Index Name":
"policyperi_u_id_1mw8mh83lyyd9",
"Relation Name": "pc_policyperiod",
"Alias": "qroots0",
"Startup Cost": 0.69,
"Total Cost": 18.15,
"Plan Rows": 10,
"Plan Width": 8,
"Actual Startup Time": 0.045,
"Actual Total Time": 0.045,
"Actual Rows": 0,
"Actual Loops": 1,
"Index Cond": "(*(id = ANY ($2)) AND*
(retired = 0) AND (temporarybranch = false))",
Here's the screenshot again:
[image: image.png]
Thanks,
Jerry
On Wed, Dec 20, 2023 at 10:32 AM Frédéric Yhuel
wrote:
>
>
> Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> > Whichever side gets executed first, is the execution of the side that
> > would be second get short circuited if 0 rows are returned by the first
> > side?
>
> Indeed, if 0 rows are returned from the outer relation, the scan of the
> inner relation is never executed.
>
> Best regards,
> Frédéric
>
>
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
Le 20/12/2023 à 20:04, Jerry Brenner a écrit : Thanks. Does this make sense? * There are 3 nodes under the Merge Join * The first node is an InitPlan, due to the ANY(ARRAY()) - that gets executed and finds 0 matching rows * The second node is the outer node in the Merge Join and that is the expensive node in our query plan * The third node is the inner node in the Merge Join and that node references the SubPlan generated by the first node. The IndexCond has*"id = ANY($2) AND ..."* and the comparison with the result of the SubPlan does not find a match, so that's where the short-circuiting happens. I think it does. I'm not very experienced with the customs of these mailing lists, but I think the following would help to get more answers : * TEXT format of EXPLAIN is much more readable (compared to JSON) * A well formatted query would help * Screenshots aren't so great Rather than a screenshot, maybe you could use one of explain.depesz.com, explain.dalibo.com, or explain-postgresql.com ? Best regards, Frédéric
