bobhan1 opened a new pull request, #63960:
URL: https://github.com/apache/doris/pull/63960

   ## Proposed changes
   
   This PR fixes the remaining MOW schema-change delete-bitmap path after 
#62256.
   
   #62256, whose master commit is `dd59f479af5a855401e3f862c751e8416070a1e2`, 
fixed the final schema-change commit path by deleting local rowsets in `[2, 
alter_version]` before adding the schema-change output rowsets to the real new 
tablet. That keeps the committed tablet rowset graph aligned with the Meta 
Service result.
   
   However, the delete-bitmap recompute path still builds and uses a temporary 
tablet in `CloudSchemaChangeJob::_process_delete_bitmap()`. That temporary 
tablet is initialized with the schema-change output rowsets, but after each 
`sync_tablet_rowsets(tmp_tablet)` it can again contain non-schema-change local 
rowsets in `[2, alter_version]`, such as double-write rowsets or compaction 
output rowsets.
   
   If the temporary tablet graph contains both:
   
   - schema-change output rowsets, for example `[2]`, `[3]`, ...
   - a wider local/compaction rowset, for example `[2-3]`
   
   then `capture_consistent_rowsets()` can choose the wider non-schema-change 
rowset from the temporary graph instead of the schema-change output rowsets. 
The delete bitmap is then recomputed against a rowset path that is not the one 
finally committed for the schema-changed tablet. A later MOW compaction may 
observe delete-bitmap coverage inconsistent with the visible rowset graph and 
fail row-count/delete-bitmap correctness checks.
   
   The fix is to normalize the temporary tablet rowset graph immediately after 
every `sync_tablet_rowsets(tmp_tablet)` and before capturing rowsets for 
delete-bitmap recomputation.
   
   Concretely this PR:
   
   - extracts `CloudTablet::replace_rowsets_with_schema_change_output()`;
   - removes non-schema-change local rowsets in `[2, alter_version]` from both 
`_rs_version_map` and the version graph before adding schema-change output 
rowsets;
   - reuses the helper in the real schema-change commit path;
   - calls the same helper after both tmp-tablet syncs in 
`_process_delete_bitmap()`;
   - keeps cache/delete-bitmap cleanup only for the real tablet, while the 
temporary tablet only normalizes its local graph;
   - adds a unit test that simulates a polluted tmp graph with `[2]`, `[3]`, 
and a stale compaction rowset `[2-3]`.
   
   ## Root cause
   
   #62256 fixed the final commit graph but not the earlier delete-bitmap 
recompute graph.
   
   The final tablet graph and the temporary delete-bitmap tablet graph must use 
the same schema-change output rowset path for historical versions. Otherwise 
delete bitmap recomputation may be based on a different rowset path from the 
one that becomes visible after schema change.
   
   This is why the issue can surface in a compaction after schema change has 
finished: the compaction output itself does not need to contain duplicate rows. 
The failure comes from delete bitmap state being recomputed from a polluted 
temporary rowset graph and later being applied to the committed schema-change 
graph.
   
   ## Testing
   
   ```
   ./run-be-ut.sh --run --filter=CloudTabletDeleteRowsetsForSchemaChangeTest.* 
-j100
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to