liutang123 opened a new issue #6015:
URL: https://github.com/apache/incubator-doris/issues/6015


   **Describe the bug**
   We have a table with many replicas.
   Because of base tablet may be delete by balance, create tablet may fail.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   1. create schema change job
    ```
   ALTER TABLE test_db.test_table   ADD COLUMN name varchar(100) comment 'xxx'
   ```
   2. FE will create a schema change job:
   ```
   2021-06-11 14:17:57,739 INFO (thrift-server-pool-150|359) 
[SchemaChangeHandler.createJob():1383] finished to create schema change job: 
64494481
   ```
   This step generates the `partitionIndexTabletMap` of `SchemaChangeJobV2`. 
So, the locations of new tablet replicas are fixed.
   3. Wait table become stable:
   ```
   2021-06-11 14:18:12,816 INFO (schema change|25) [OlapTable.isStable():1391] 
table 23196651 is not stable because tablet 60422768 status is REDUNDANT. 
replicas: [[replicaId=60422770, BackendId=10003], [replicaId=60422771, 
BackendId=10006], [replicaId=62269543, BackendId=61958307, version=2], 
[replicaId=64494415, BackendId=61958301]]
   ```
   Tablet `60422768` is REDUNDANT.
   4. TabletScheduler remove 60422768 in FE meta.
   ```
   2021-06-11 14:18:26,495 INFO (tablet scheduler|38) 
[TabletScheduler.deleteReplicaInternal():982] delete replica. tablet id: 
60422768, backend id: 10006. reason: DECOMMISSION state, force: false
   ```
   
   5. Delete replica when report:
   ```
   2021-06-11 14:19:12,757 WARN (Thread-33|79) 
[ReportHandler.deleteFromBackend():677] failed add to meta. tablet[60422768], 
backend[10006]. errCode = 2, detailMessage = replica is enough[3-3]
   2021-06-11 14:19:12,757 WARN (Thread-33|79) 
[ReportHandler.deleteFromBackend():690] delete tablet[60422768 - 118915135] 
from backend[10006] because not found in meta
   ```
   
   6. Start create tablet
   ```
   2021-06-11 14:20:12,947 INFO (schema change|25) 
[AlterJobV2.checkTableStable():209] table 23196651 is stable, start 
SCHEMA_CHANGE job {}
   ```
   
   7. BE create tablet fail, because fail to find base tablet 60422768
   ```
   W0611 14:20:46.569319 425891 tablet_manager.cpp:244] fail to create 
tablet(change schema), base tablet does not exist. new_tablet_id=64530888, 
new_schema_hash=1683434764, base_tablet_id=60422768, base_schema_hash=118915135
   ```
   
   8. schema change fail
   ```
   2021-06-11 14:20:46,628 WARN (schema change|25) 
[SchemaChangeJobV2.runPendingJob():309] failed to create replicas for job: 
64494481, 10006: []
   ```
   
   **Expected behavior**
   
   **Screenshots**
   If applicable, add screenshots to help explain your problem.
   
   **Desktop (please complete the following information):**
    - OS: [e.g. iOS]
    - Browser [e.g. chrome, safari]
    - Version [e.g. 22]
   
   **Smartphone (please complete the following information):**
    - Device: [e.g. iPhone6]
    - OS: [e.g. iOS8.1]
    - Browser [e.g. stock browser, safari]
    - Version [e.g. 22]
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to