Riza Suminto created IMPALA-12984:
-------------------------------------
Summary: Show inactivity of data exchanges in query profile
Key: IMPALA-12984
URL: https://issues.apache.org/jira/browse/IMPALA-12984
Project: IMPALA
Issue Type: Improvement
Components: Distributed Exec
Reporter: Riza Suminto
Many-to-many data exchanges can be bottlenecked by hotspot receiver such
scenario described in IMPALA-6692 or when data spilling happens in subset of
backend. Ideally, this occurrences should be easily figured out in query
profile. But triaging this kind of issue often requires correlation analysis of
several counters in query profile. There are few ideas on how to improve this
identification:
# Upon query completion, let coordinator do some profile analysis and print
WARNING in query profile pointing at the skew. One group of EXCHANGE senders
and receivers can only complete simultaneously since all receivers need to wait
for EOS signal from all senders. Let say we take max of TotalNetworkSendTime
from all senders and max of DataWaitTime from all receivers, a "mutual wait"
time of min(TotalNetworkSendTime,DataWaitTime) can be used as indicator of how
long the exchanges are waiting for query operators above them to progress.
# Add "Max Inactive" column in ExecSummary table. Existing "Avg Time" and "Max
Time" are derived from RuntimeProfileBase::local_time_ns_. If ExecSummary also
display maximum value of RuntimeProfileBase::inactive_timer_ of each query
operator as "Max Inactive", we can then compare it against "Max Time" and
figure out which exchange is mostly idle waiting. The calculation between
local_time_ns, children_total_time, and inactive_timer can be seen here at
[https://github.com/apache/impala/blob/0721858/be/src/util/runtime-profile.cc#L935-L938]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)