[
https://issues.apache.org/jira/browse/IMPALA-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-14843.
-------------------------------------
Fix Version/s: Impala 5.0.0
Resolution: Fixed
> Show cancelled nodes in ExecSummary
> -----------------------------------
>
> Key: IMPALA-14843
> URL: https://issues.apache.org/jira/browse/IMPALA-14843
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> PlanNode execution could be cancelled when its parent don't need more rows,
> e.g. parent node reaches its limit. Currently, users have to distinguish this
> by checking the "Node Lifecycle Event Timeline" in the profile, to see if the
> node is opened and closed as expected. Take the following query as an example:
> {code:sql}
> with l as (select * from tpch.lineitem UNION ALL select * from tpch.lineitem)
> select STRAIGHT_JOIN count(*) from (select * from tpch.lineitem a LIMIT 1) a
> join (select * from l LIMIT 125000) b on a.l_orderkey =
> -b.l_orderkey{code}
> The UNION ALL operation has a limit of 125000 which is less than the number
> of rows in tpch.lineitem (6M). So the second union operand won't be executed.
> In the ExecSummary, we just show its output rows is 0 (see the line of
> "03:SCAN HDFS"):
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est.
> #Rows Peak Mem Est. Peak Mem Detail
> -----------------------------------------------------------------------------------------------------------------------------------
> F01:ROOT 1 1 16.379us 16.379us
> 0 0
> 05:AGGREGATE 1 1 0.000ns 0.000ns 1
> 1 24.00 KB 16.00 KB FINALIZE
> 04:HASH JOIN 1 1 348.292us 348.292us 0
> 1 9.06 MB 4.75 MB INNER JOIN, BROADCAST
> |--08:EXCHANGE 1 1 407.110us 407.110us 125.00K
> 125.00K 288.00 KB 337.52 KB UNPARTITIONED
> | F05:EXCHANGE SENDER 1 1 1.834ms 1.834ms
> 54.05 KB 48.00 KB
> | 07:EXCHANGE 1 1 221.991us 221.991us 125.00K
> 125.00K 184.00 KB 361.52 KB UNPARTITIONED
> | F04:EXCHANGE SENDER 3 3 3.476ms 5.611ms
> 54.05 KB 48.00 KB
> | 01:UNION 3 3 76.304us 121.649us 375.00K
> 125.00K 0 0
> | |--03:SCAN HDFS 3 3 33.598us 43.943us 0
> 6.00M 0 264.00 MB tpch.lineitem
> | 02:SCAN HDFS 3 3 83.305ms 98.254ms 377.86K
> 6.00M 48.17 MB 264.00 MB tpch.lineitem
> 06:EXCHANGE 1 1 8.285us 8.285us 1
> 1 16.00 KB 16.00 KB UNPARTITIONED
> F00:EXCHANGE SENDER 3 3 32.625us 40.147us
> 31.00 B 48.00 KB
> 00:SCAN HDFS 3 3 169.596ms 174.375ms 3
> 1 48.08 MB 264.00 MB tpch.lineitem a{noformat}
> If users just check the query plan, they would be confused since both
> ScanNodes don't have any predicates.
> {noformat}
> | 01:UNION
> | | pass-through-operands: all
> | | limit: 125000
> | | mem-estimate=0B mem-reservation=0B thread-reservation=0
> | | tuple-ids=4 row-size=8B cardinality=125.00K
> | | in pipelines: 02(GETNEXT), 03(GETNEXT)
> | |
> | |--03:SCAN HDFS [tpch.lineitem, RANDOM]
> | | HDFS partitions=1/1 files=1 size=718.94MB
> | | stored statistics:
> | | table: rows=6.00M size=718.94MB
> | | columns: all
> | | extrapolated-rows=disabled max-scan-range-rows=1.07M
> | | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1
> | | tuple-ids=3 row-size=8B cardinality=6.00M
> | | in pipelines: 03(GETNEXT)
> | |
> | 02:SCAN HDFS [tpch.lineitem, RANDOM]
> | HDFS partitions=1/1 files=1 size=718.94MB
> | stored statistics:
> | table: rows=6.00M size=718.94MB
> | columns: all
> | extrapolated-rows=disabled max-scan-range-rows=1.07M
> | mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1
> | tuple-ids=2 row-size=8B cardinality=6.00M
> | in pipelines: 02(GETNEXT){noformat}
> The only indication is in the life cycle of ScanNode(id=3):
> {noformat}
> HDFS_SCAN_NODE (id=3):
> Table Name: tpch.lineitem
> Hdfs split stats (<volume id>:<# splits>/<split lengths>):
> 0:2/256.00 MB
> ExecOption: TEXT Codegen Enabled
> Node Lifecycle Event Timeline: 139.781ms
> - Closed: 139.781ms (139.781ms){noformat}
> It's not even opened and finally closed directly. A normal timeline looks
> like this:
> {noformat}
> Node Lifecycle Event Timeline: 196.153ms
> - Open Started: 21.666ms (21.666ms)
> - Open Finished: 21.697ms (31.313us)
> - First Batch Requested: 21.702ms (5.330us)
> - First Batch Returned: 195.956ms (174.254ms)
> - Last Batch Returned: 195.957ms (187.000ns)
> - Closed: 196.153ms (196.518us){noformat}
> It'd be helpful to add a "cancelled" marker in the ExecSummary to indicate a
> node is not fully executed, i.e. don't have "Last Batch Returned" event.
> This is also helpful for HBO when tracking output cardinality of the
> ScanNodes. Such ScanNodes have incomplete ouput so should be skipped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)