Yukang-Lian opened a new issue, #40879:
URL: https://github.com/apache/doris/issues/40879

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   When merging rowsets during partial column updates in MOW tables, the 
`total_disk_size` was not updated, resulting in discrepancies between the data 
seen by FE (as observed through MySQL's `SHOW DATA`) and BE (as observed 
through `curl compaction_status`). This discrepancy leads to incorrect data 
reporting, with FE showing more data than it should and BE showing less. This 
issue impacts both compute-storage separation and integrated compute-storage 
systems using MOW tables with partial column updates. It is particularly 
significant in cloud environments where user billing is involved.
   
   ### Solution
   
   1. **Update `total_disk_size` during partial column updates in the merge 
rowset logic** to ensure that the data seen by BE and FE is consistent. 
Although the data may not be entirely correct, this allows the testing team to 
start validating the changes. The relevant commit can be found 
[here](https://github.com/Yukang-Lian/selectdb-core/commits/Fix-Mow-Table-Size/).
   
   2. **Clarify the relationships between `total_disk_size`, `data_disk_size`, 
and `index_disk_size`.** Review all code logic involving these sizes to ensure 
data accuracy.
   
   3. **Modify the compaction logic** so that the rowset output size after each 
compaction is accurate. Update this data on both FE and BE sides, gradually 
correcting the previously erroneous data to achieve data convergence.
   
   4. **Add a tool to verify table size data.** This tool should read the 
rowset meta PB (ensuring accuracy) and update the data on both the FE and BE 
sides.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to