vagetablechicken opened a new issue #3247: BE slow restart
URL: https://github.com/apache/incubator-doris/issues/3247
 
 
   After restarted one BE, we got the log like this:
   ```
   I0330 14:48:25.071900 119900 daemon.cpp:250] // the first line
   ...
   I0330 14:51:21.214937 119900 thrift_server.cpp:364] ThriftServer 'heartbeat' 
started on port: 905
   ...
   I0330 14:51:24.605083 123493 heartbeat_server.cpp:56] get heartbeat from 
FE.host:xx.xx.xx.xx, port:xx, cluster id:xxx, counter:1
   ```
   The restarted BE got heartbeat from fe after 3min, because thrift server 
started in `14:51:21.214937`.
   And the root cause is `OLAPStatus StorageEngine::open()` occupied most of 
the time.
   
   Let's make in-depth analysis.
   ```
   I0330 14:48:25.165302 119900 storage_engine.cpp:91] starting backend using 
uid:be4c84dd9a186a2a-9da3d1418e2c4089
   I0330 14:48:25.165720 119900 data_dir.cpp:1021] path: /xxx/be/hdd1 total 
capacity: 7937766936576, available capacity: 6886818156544
   I0330 14:48:25.172628 119900 data_dir.cpp:261] path: /xxx/be/hdd1, hash: 
-5512340829184430668
   ... // the same log of hdd2-10
   I0330 14:50:46.662498 119900 data_dir.cpp:1021] path: /xxx/be/hdd11 total 
capacity: 7937766936576, available capacity: 6788491837440
   I0330 14:50:46.681075 119900 data_dir.cpp:261] path: /xxx/be/hdd11, hash: 
484046396608113747
   I0330 14:50:59.586606 119900 data_dir.cpp:1021] path: /xxx/be/hdd12 total 
capacity: 7937766936576, available capacity: 6824911462400
   I0330 14:50:59.599603 119900 data_dir.cpp:261] path: /xxx/be/hdd12, hash: 
5929757045717874164
   ```
   As can be seen above, data dirs init() is the main part. And 
DataDir::_init_meta() is the most time-consuming portion.
   
https://github.com/apache/incubator-doris/blob/390f462f552fe18949ff3a7c76d41f5a1cf840ac/be/src/olap/data_dir.cpp#L108
   DataDir::_init_meta() actually calls the rocksdb::DB::Open().
   
https://github.com/apache/incubator-doris/blob/390f462f552fe18949ff3a7c76d41f5a1cf840ac/be/src/olap/olap_meta.cpp#L81
   
   As rocksdb official guidance described
   
https://github.com/facebook/rocksdb/wiki/Speed-Up-DB-Open#opening-too-many-dbs-one-by-one
   The simplest method is open those DBs in parallel. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to