[ 
https://issues.apache.org/jira/browse/IMPALA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011319#comment-18011319
 ] 

Joe McDonnell commented on IMPALA-6984:
---------------------------------------

Since this is useful for solving IMPALA-14271, we spent some time reproducing 
the issue from IMPALA-10047 and getting some minidumps while the issue is 
happening. The stacks on the coordinator are in a couple different places:
 # One thread is in Coordinator::CancelBackends() calling 
Coordinator::BackendState::Cancel(), which is waiting on a 
CancelQueryFInstances RPC. This is actually local to the coordinator. This 
thread is holding the Coordinator::BackendState::lock_.
 # 19 threads are waiting on the mutex in Coordinator::BackendState::IsDone(). 
(held by the thread in CancelBackends()) The stack is:

{noformat}
0  libpthread.so.0!__lll_lock_wait
1  libpthread.so.0!__GI___pthread_mutex_lock
2  impalad!impala::Coordinator::BackendState::IsDone()
3  impalad!impala::Coordinator::BackendState::LogFirstInProgress
4  impalad!impala::Coordinator::UpdateBackendExecStatus
5  impalad!impala::ClientRequestState::UpdateBackendExecStatus
6  impalad!impala::ControlService::ReportExecStatus{noformat}

 # 4 threads are spinning on the exec summary spinlock in 
Coordinator::BackendState::ApplyExecStatusReport() with stack:

{noformat}
0  impalad!base::internal::SpinLockDelay
1  impalad!base::SpinLock::SlowLock()
2  impalad!impala::Coordinator::BackendState::ApplyExecStatusReport
3  impalad!impala::Coordinator::UpdateBackendExecStatus
4  impalad!impala::ClientRequestState::UpdateBackendExecStatus
5  impalad!impala::ControlService::ReportExecStatus{noformat}

 # One thread is holding the exec summary spinlock while waiting on the 
Coordinator::BackendState::lock_ in 
Coordinator::BackendState::ApplyExecStatusReport() (held by the thread in 
CancelBackends()):

{noformat}
0  libpthread.so.0!__lll_lock_wait + 0x30
1  libpthread.so.0!__GI___pthread_mutex_lock + 0xe3
2  impalad!impala::Coordinator::BackendState::ApplyExecStatusReport
3  impalad!impala::Coordinator::UpdateBackendExecStatus
4  impalad!impala::ClientRequestState::UpdateBackendExecStatus
5  impalad!impala::ControlService::ReportExecStatus{noformat}

There are 24 threads stuck in ControlService::ReportExecStatus(). That number 
is significant, because it is also the number of CPUs on the machine. By 
default, the control service uses the number of CPUs to size the number of 
service threads. All 24 on the coordinator are occupied, so the 
CancelQueryFInstances RPC can't be processed. There is a temporary deadlock 
until the RPC times out.

The deadlock can be avoided by avoiding the self RPC. Regular executors 
wouldn't have so many service threads used up and should always be able to 
respond to the CancelQueryFInstances RPC. The coordinator is particularly 
susceptible to this. There are other opportunities. We can do the 
CancelQueryFInstances asynchronously or avoid holding a lock while we do the 
RPC. LogFirstInProgress()'s use of IsDone() across so many backends could be 
revisited.

One way to reproduce this is to run a lot more Impalads that there are CPUs 
(the problem original showed up on clusters with 40 executors and 16 CPUs on 
the coordinator). I can reproduce this intermittently on my development machine 
with 40 impalads running TPC-DS Q61 with mt_dop. 

The other way is to artificially reduce the number of control service threads 
by setting control_service_num_svc_threads=1 (high mt_dop values help, 
mt_dop=64 can reproduce the issue with 3 impalads).

> Coordinator should cancel backends when returning EOS
> -----------------------------------------------------
>
>                 Key: IMPALA-6984
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6984
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 3.0
>            Reporter: Daniel Hecht
>            Assignee: Joe McDonnell
>            Priority: Major
>              Labels: query-lifecycle
>
> Currently, the Coordinator waits for backends rather than proactively 
> cancelling them in the case of hitting EOS. There's a tangled mess that makes 
> it tricky to proactively cancel the backends related to how 
> {{Coordinator::ComputeQuerySummary()}} works – we can't update the summary 
> until the profiles are no longer changing (which also makes sense given that 
> we want the exec summary to be consistent with the final profile).  But we 
> current tie together the FIS status and the profile, and cancellation of 
> backends causes the FIS to return CANCELLED, which then means that the 
> remaining FIS on that backend won't produce a final profile.
> With the rework of the protocol for IMPALA-2990 we should make it possible to 
> sort this out such that a final profile can be requested regardless of how a 
> FIS ends execution.
> This also relates to IMPALA-5783.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to