vagetablechicken opened a new issue #3247: BE slow restart URL: https://github.com/apache/incubator-doris/issues/3247 After restarted one BE, we got the log like this: ``` I0330 14:48:25.071900 119900 daemon.cpp:250] // the first line ... I0330 14:51:21.214937 119900 thrift_server.cpp:364] ThriftServer 'heartbeat' started on port: 905 ... I0330 14:51:24.605083 123493 heartbeat_server.cpp:56] get heartbeat from FE.host:xx.xx.xx.xx, port:xx, cluster id:xxx, counter:1 ``` The restarted BE got heartbeat from fe after 3min, because thrift server started in `14:51:21.214937`. And the root cause is `OLAPStatus StorageEngine::open()` occupied most of the time. Let's make in-depth analysis. ``` I0330 14:48:25.165302 119900 storage_engine.cpp:91] starting backend using uid:be4c84dd9a186a2a-9da3d1418e2c4089 I0330 14:48:25.165720 119900 data_dir.cpp:1021] path: /xxx/be/hdd1 total capacity: 7937766936576, available capacity: 6886818156544 I0330 14:48:25.172628 119900 data_dir.cpp:261] path: /xxx/be/hdd1, hash: -5512340829184430668 ... // the same log of hdd2-10 I0330 14:50:46.662498 119900 data_dir.cpp:1021] path: /xxx/be/hdd11 total capacity: 7937766936576, available capacity: 6788491837440 I0330 14:50:46.681075 119900 data_dir.cpp:261] path: /xxx/be/hdd11, hash: 484046396608113747 I0330 14:50:59.586606 119900 data_dir.cpp:1021] path: /xxx/be/hdd12 total capacity: 7937766936576, available capacity: 6824911462400 I0330 14:50:59.599603 119900 data_dir.cpp:261] path: /xxx/be/hdd12, hash: 5929757045717874164 ``` As can be seen above, data dirs init() is the main part. And DataDir::_init_meta() is the most time-consuming portion. https://github.com/apache/incubator-doris/blob/390f462f552fe18949ff3a7c76d41f5a1cf840ac/be/src/olap/data_dir.cpp#L108 DataDir::_init_meta() actually calls the rocksdb::DB::Open(). https://github.com/apache/incubator-doris/blob/390f462f552fe18949ff3a7c76d41f5a1cf840ac/be/src/olap/olap_meta.cpp#L81 As rocksdb official guidance described https://github.com/facebook/rocksdb/wiki/Speed-Up-DB-Open#opening-too-many-dbs-one-by-one The simplest method is open those DBs in parallel.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org