morningman commented on issue #3893:
URL: 
https://github.com/apache/incubator-doris/issues/3893#issuecomment-646393889


   The `report version` is mainly used to avoid errors in the following 
scenarios:
   
   ```
                                             Time Line
   
                                                 +
                   +-------------------------+   |   
+--------------------------+
                   | FE: Create Table Thread |   |   | BE: Tablet Report Thread 
|
                   +-------------------------+   |   
+--------------------------+
                                                 |
                                                 |
                                                 |   1. Fetch all tablets 
currently exist
                                                 |      on BE(which is empty), 
and ready to
                                                 |      report.
                                                 |
   2. FE send Create Tablet Task to BE.          |
      BE receives the task and create Tablet X.  |
                                                 |
                                                 |
   3. BE finished the Create Tablet Task         |
      and send a "FinishTaskReport" to FE.       |
                                                 |
                                                 |
   4. FE receive the "FinishTaskReport", and     |
      finally, Tablet X takes effect on FE's     |
      meta.                                      |
                                                 |
                                                 |    5. The tablet report 
arrives, which
                                                 |       contains no tablets.
                                                 |       Therefore, the FE will 
think that
                                                 |       the Tablet X does not 
exist on the BE,
                                                 |       and will delete the 
Tablet X informationn
                                                 |       in the metadata, 
thereby causing information loss.
                                                 v
   ```
   
   But with the report version. In step 1, The report thread when take report 
version 0.
   And in step 3 and 4, the report version of the BE will be updated to 1.
   So finally in step 5, FE will find that the report version is stale, and 
ignore that tablet report.
   
   
   The reason why there are so many `out-of-date` reports in the production 
environment is because we update the report version in some unnecessary places. 
For example, when the BE processes the publish task, we will also increase the 
report version of the BE. If the load is very frequent, it will result in a 
large number of `out-of-date` reports.
   
   It is not necessary to update the report version after the publish task. 
Because this is actually a problem left over by history. In the reporting logic 
of the current version, we will no longer decrease the version information of 
the replica in the FE metadata according to the report. So even if we receive a 
stale version of the report, it does not matter.
   
   In our test environment (8 BEs), the average `out-of-date` report will occur 
60-80 times an hour before, and after the upgrade, it will be reduced to 1-2 
times per hour. This will no longer affect the cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to