Quanlong Huang created IMPALA-14969:
---------------------------------------
Summary: False completion in HdfsScanNode when scanner failed to
open due to cancellation
Key: IMPALA-14969
URL: https://issues.apache.org/jira/browse/IMPALA-14969
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang
False completion in HdfsScanNode when scanner failed to open due to cancellation
When using MT_DOP=0, HdfsScanNode spawns scanner threads to fill its RowBatch
queue asynchronously. The fragment instance thread invokes
HdfsScanNode::GetNext() to get RowBatches from the queue. If it gets nothing
(the queue is shutdown), eos (EndOfStream) is set to true, which means the
execution completes. In most of the completes, HdfsScanNode::GetNext() returns
an OK Status and sets eos=true. Setting eos=true also adds a "Last Batch
Returned" event in the Node Lifecycle Event Timeline in profile.
However, the RowBatch queue could also be shutdown due to scanner aborts
unexpectedly, e.g. when the fragment instance is cancelled.
Fragment instance cancellation code path:
ControlService::CancelQueryFInstances -> QueryState::Cancel ->
FragmentInstanceState::Cancel -> RuntimeState::Cancel -> set is_cancelled_ to
true. ScannerContext::cancelled() will return true due to this.
Scanner thread cancellation code path example:
HdfsScanNode::ScannerThread -> ProcessSplit -> CreateAndOpenScannerHelper ->
HdfsParquetScanner::Open -> ProcessFooter -> ScannerContext::Stream::ReadBytes
-> GetBytes -> GetBytesInternal -> GetNextBuffer -> Found
ScannerContext::cancelled() is true, returns
Status::CancelledInternal("ScannerContext").
The non-OK Status returned from CreateAndOpenScannerHelper() is used in
HdfsScanNode::SetError -> SetDoneInternal -> thread_state_.Shutdown() shutdown
the RowBatch queue.
In this case, HdfsScanNode::GetNext() returns a non-OK status but still set
eos=true. Then query profile shows "Last Batch Returned" in this node, which is
misleading especially if nothing has been returned from the node. It's more
confusing when the cancellation is triggered internally, e.g. when the
coordinator fragment reaches its limit, coordinator cancells all the fragment
instances. The query finishes successfully but some HdfsScanNodes have
RowsRead=0 and RowsReturned=0, seems like the table/partitions are empty or
files are skipped by runtime filters.
profile_scan_false_completion.txt is an example. In the ExecSummary, ScanNode
on tpch_nested_parquet.customer just returns 6 rows.
{noformat}
Operator #Hosts #Inst Avg Time Max Time #Rows Est.
#Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------------------------------------
F02:ROOT 1 1 11.344us 11.344us
4.01 MB 4.00 MB
06:EXCHANGE 1 1 5.603us 5.603us 10
10 16.00 KB 16.00 KB UNPARTITIONED
F01:EXCHANGE SENDER 3 3 34.231us 48.626us
358.00 B 80.00 KB
00:UNION 3 3 0.000ns 0.000ns 10
10 41.00 KB 0 CANCELLED
02:SUBPLAN 3 3 0.000ns 0.000ns 0
1.50M 33.00 KB 0 CANCELLED
|--05:NESTED LOOP JOIN 3 3 57.657us 63.183us 0
10 24.00 KB 20.00 B CROSS JOIN, CANCELLED
| |--03:SINGULAR ROW SRC 3 3 0.000ns 0.000ns 0
1 0 0 CANCELLED
| 04:UNNEST 3 3 10.494us 11.744us 0
10 0 0 c.c_orders arr, CANCELLED
01:SCAN HDFS 3 3 785.984us 1.113ms 6
150.00K 16.01 MB 176.00 MB tpch_nested_parquet.customer c{noformat}
Here is one of the misleading node profile:
{noformat}
HDFS_SCAN_NODE (id=1):(Total: 1.113ms, non-child: 1.113ms, % non-child:
100.00%)
Table Name: tpch_nested_parquet.customer
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:2/192.55
MB
Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0%
6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% 16:0% 17:0% 18:0% 19:0%
20:0% 21:0% 22:0% 23:0% 24:0% 25:0% 26:0% 27:0% 28:0% 29:0% 30:0% 31:0% 32:0%
File Formats: PARQUET/Unknown(Skipped):2
ExecOption: Codegen enabled: 2 out of 2, PARQUET Codegen Enabled
Node Lifecycle Event Timeline: 19.806ms
- Open Started: 18.758ms (18.758ms)
- Open Finished: 18.780ms (22.875us)
- First Batch Requested: 18.788ms (7.723us)
- First Batch Returned: 19.790ms (1.001ms)
- Last Batch Returned: 19.790ms (69.000ns)
- Closed: 19.806ms (16.062us)
...
- RowsRead: 0 (0)
- RowsReturned: 0 (0){noformat}
Though it shows "Last Batch Returned", the RowsRead and RowsReturned are all 0.
"File Formats: PARQUET/Unknown(Skipped):2" can explain a little bit, but there
are no reason to explain why the files are skipped. Note that no runtime
filters in this case.
It makes more sense to remove the "Last Batch Returned" event, which also helps
HBO to skip tracking the cardinality of the scan node (IMPALA-14843).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)