bobhan1 opened a new pull request, #22270: URL: https://github.com/apache/doris/pull/22270
## Proposed changes Currently, when executing `insert into select` statement converted from a delete stmt, doris will try to read all the values of the non-key columns from the previous rows with the same key in storage layer, which is highly costly and meaningless. For example, for a table with the following schema ``` CREATE TABLE test ( `k1` int NOT NULL, `c1` int, `c2` int, `c3` int, `c4` int) UNIQUE KEY(`k1`) DISTRIBUTED BY HASH(`k1`) BUCKETS 1 PROPERTIES("enable_unique_key_merge_on_write" = "true") ``` and data ``` k1,c1,c2,c3,c4 1,1,1,1,1 2,2,2,2,2 3,3,3,3,3 4,4,4,4,4 5,5,5,5,5 ``` after execute some delete statement(which is transformed to `insert into select` statement) that delete rows with k1=1,2,3, the data with delete_sign column will be like ``` k1,c1,c2,c3,c4,__DORIS_DELETE_SIGN__ 1,1,1,1,1,0 2,2,2,2,2,0 3,3,3,3,3,0 4,4,4,4,4,0 5,5,5,5,5,0 1,1,1,1,1,1 2,2,2,2,2,1 3,3,3,3,3,1 ``` It's evitable that it reads lots of useless data. So this PR eliminates the above process and use default or null value to fill the non-key columns when inserting rows with marked delete sign, since the values in these columns are useeless and will not be read. ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org