weizuo93 opened a new issue, #10720:
URL: https://github.com/apache/doris/issues/10720

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   trunk version. commit id : a5efda68829c0873800b62d7e2e2c3b1807d1734
   
   ### What's Wrong?
   
   We replaced the original disk on the BE node with a new disk by wrong 
operation when BE restarted. When we discovered the mistake and added the 
original disk. There is no unhealthy replica in the cluster after a period of 
time, and we removed the wrong disk. When query comes, exception information is 
thrown and the error code is `-3109` which means `failed to open segment`. We 
found that segment files for some tablets in this BE node had been removed as 
trash but tablet meta is normal in original disk. These abnormal tablets can 
not be detected and repaired by FE.
   
   ### What You Expected?
   
   Tablet metadata should be consistent with data files for a tablet. When 
segment files removed as trash, the tablet should be droped on the BE node so 
that FE node could detecte and repaire the error replica.
   
   ### How to Reproduce?
   
   Cluster: 1 FE + 3 BE (BE01, BE02 and BE03, there is one disk called `disk-1` 
on BE01.)
   
   STEP 1: create a table on the cluster and ensure there are 3 replica for 
each tablet.
   
   STEP 2: insert data into the table.
   
   STEP 3: remove the `disk-1` on BE01 and add a new disk called `disk-2`, then 
restart BE01.
               When the deamon start, we will find that there is no replica on 
BE01 because there is only one empty disk which is `disk-2`, and the replica 
repair task will clone some replica to the `disk-2` on BE01.
   
   STEP 4: When there is no unhealthy replica in the cluster after a period of 
time, add the `disk-1` and restart BE01(there is two disks which are `disk-1` 
and `disk-2`).
               When the deamon start, we will find tablets in `disk-2` would be 
load and tablets which hold the same id with that on `disk-2` will not be load. 
Data on different disks are loaded in parallel. If the later loaded tablet on 
`disk-1`(there is a tablet with same id on `disk-2` has been loaded before 
successfully), the tablet will not be loaded successfully and segment files 
would be removed as trash but metadata is normal.
   
   STEP 5: remove the `disk-2` on BE01, keep `disk-1` on BE01, then restart 
BE01.
              When the deamon start, we will find tablets in `disk-1` would be 
load. These tablets hold normal metadata but has no segment files.
   
   STEP 6: query the table. If the query falls on these replica on BE01, an 
exception will occur and the error code is `-3109` which means `failed to open 
segment`. These abnormal tablets can not be detected and repaired by FE.
   
   
   
   ### Anything Else?
   
   NO.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to