Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/23229 )

Change subject: [mini-cluster] explicitly shut down Messenger
......................................................................

[mini-cluster] explicitly shut down Messenger

Since many components keep a reference to Messenger objects through
shared pointers, it's not trivial to track their lifecycle given the
resulting 'shared ownership' semantics.  Because of that, it makes
sense to explicitly shut down ExternalMinicluster's messenger in the
ExternalMiniCluster::Shutdown() method to prevent unexpected activity
while winding down the test harness.

Below is a redacted snippet from the log produced by the
MasterReplicationAndRpcSizeLimitTest.TabletReports scenario, running
master_replication-itest built with TSAN.  The TSAN detected a data
race between the main thread that was destructing static objects
in the global namespace after exiting the 'main()' function and RPC
connection negotiation task still running on a Messenger reactor thread.
Since all Kudu servers' messengers are explicitly shut down in
KuduServer::Shutdown() and ServerBase::ShutdownImpl() that must
had been called already because the main thread was out of the 'main()'
function, and MasterReplicationAndRpcSizeLimitTest::client_ KuduClient
instance isn't active in the TabletReports scenario and TestWorkload
has completed successfully with no pending RPCs running, the only
Messenger instance that might be running such an activity seemed to be
ExternalMiniCluster::messenger_.  I'm not sure what exact RPC
it was running, but I guess it might be ExternalDaemon::CheckForLeaks()
that's run only in TSAN builds.

  ==================
  WARNING: ThreadSanitizer: data race (pid=16235)
    Write of size 8 at 0x7b0800003cc8 by main thread:
      #0 operator delete(void*) .../tsan_new_delete.cpp:126 
(master_replication-itest+0x345c79)
      ...
      #9 std::__1::set<kudu::rpc::RpcFeatureFlag, ...>::~set() ... 
(libkrpc.so+0x133129)
      #10 cxa_at_exit_wrapper(void*) .../tsan_interceptors_posix.cpp:394 
(master_replication-itest+0x2c59ff)

    Previous read of size 8 at 0x7b0800003cc8 by thread T361:
      #1 std::__1::__tree_const_iterator<kudu::rpc::RpcFeatureFlag, 
...>::operator++() .../__tree:929:11 (libkrpc.so+0x133df2)
      ...
      #5 kudu::rpc::ClientNegotiation::SendNegotiate() 
.../client_negotiation.cc:327:20 (libkrpc.so+0x12b3b6)
      ...
      #9 kudu::rpc::ReactorThread::StartConnectionNegotiation()... 
.../reactor.cc:631:3 (libkrpc.so+0x19f46c)
      ...
      #17 kudu::ThreadPool::CreateThread()... .../threadpool.cc:815:48 
(libkudu_util.so+0x461921)
      ...

    As if synchronized via sleep:
      ...
      #2 kudu::SleepFor(kudu::MonoDelta const&) ...
      #3 kudu::Subprocess::KillAndWait(int) .../subprocess.cc:692:7 
(libkudu_util.so+0x434720)
      #4 kudu::clock::MiniChronyd::Stop() .../mini_chronyd.cc:251:16 
(libmini_chronyd.so+0x15170)
      #5 kudu::clock::MiniChronyd::~MiniChronyd() .../mini_chronyd.cc:166:5 
(libmini_chronyd.so+0x14e6e)
      ...
      #17 kudu::cluster::ExternalMiniCluster::~ExternalMiniCluster() 
.../external_mini_cluster.cc:173:45 (libmini_cluster.so+0x846b9)
      ...
      #22 
kudu::master::MasterReplicationAndRpcSizeLimitTest_TabletReports_Test::~MasterReplicationAndRpcSizeLimitTest_TabletReports_Test()
 .../master_replication-itest.cc:536:1 (master_replication-itest+0x351169)
      #23 testing::Test::DeleteSelf_() ...
      ...
      #31 testing::UnitTest::Run() .../gtest.cc:5444:10 
(libgtest.so.1.12.1+0x58dcc)
      #32 RUN_ALL_TESTS() .../gtest.h:2293:73 
(master_replication-itest+0x36896b)
      #33 main .../test_main.cc:156:10 (master_replication-itest+0x3675ac)

    Thread T361 'client-negotiat' (tid=22269, finished) created by thread T329 
at:
      ...
      #3 kudu::ThreadPool::CreateThread(...) ...
      #4 kudu::ThreadPool::DoSubmit(...) ...
      #5 kudu::ThreadPool::Submit(...) ...
      #6 kudu::rpc::ReactorThread::StartConnectionNegotiation() ...
      #7 kudu::rpc::ReactorThread::FindOrStartConnection(...) ...
      #8 kudu::rpc::ReactorThread::AssignOutboundCall(...) ...
      ...
      #13 kudu::rpc::ReactorThread::InvokePendingCb(...) ...
      #14 ev_run ...
      #15 ev::loop_ref::run(int) ...
      #16 kudu::rpc::ReactorThread::RunThread() .../reactor.cc:510:9 
(libkrpc.so+0x19b1b5)
      ...

  SUMMARY: ThreadSanitizer: data race .../tsan_new_delete.cpp:126 in operator 
delete(void*)
  ==================

Change-Id: I4712afe598ab63efaf56729d47a07ce4247a80db
Reviewed-on: http://gerrit.cloudera.org:8080/23229
Reviewed-by: Marton Greber <[email protected]>
Tested-by: Marton Greber <[email protected]>
---
M src/kudu/mini-cluster/external_mini_cluster.cc
1 file changed, 10 insertions(+), 0 deletions(-)

Approvals:
  Marton Greber: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/23229
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4712afe598ab63efaf56729d47a07ce4247a80db
Gerrit-Change-Number: 23229
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>

Reply via email to