Youngwb opened a new issue #3930: URL: https://github.com/apache/incubator-doris/issues/3930
### BackGround Doris currently use REPLACE to update data, but the replacement order cannot be guaranteed for the data import of the same batch. The user needs to guarantee that there is no same key column in the imported data of the same batch to guarantee the replacement order, which is very inconvenient for the user. To solve this problem, we can use a **version** column to specify the replacement order. ### Goal The user specifies a **version column** when creating the table. Doris relies on this column to update the data of REPLACE type. The larger version column data can REPLACE the data of the smaller version column, while the data of the smaller version column cannot REPLACE the larger version column data. ### Create Table Interface ``` CREATE TABLE `test` ( `id` bigint(20) NOT NULL, `date` date NOT NULL, `group_id` bigint(20) NOT NULL, `version` int MAX NOT NULL, `keyword` varchar(128) REPLACE NOT NULL, `clicks` bigint(20) SUM NULL DEFAULT "0" , `cost` bigint(20) SUM NULL DEFAULT "0" ) ENGINE=OLAP AGGREGATE KEY(`id`, `date`, `group_id`) DISTRIBUTED BY HASH(`id`) BUCKETS 16 PROPERTIES ( "replace_version_column" = "version" ); ``` When creating a table, the user simply adds the **replace_version_column** attribute in PROPERTIES to identify the version column, which requires a MAX aggregation type to ensure that only the largest version column is retained for the same key column. ### Query When a user's query does not contain the REPLACE column, the original logic follows. When a user's query contains REPLACE columns, BE needs to extend the Version column on which the REPLACE column depends, and compare the value column when it is aggregated. These operations can be done by extending **Reader return columns**, and in FE,the **isPreAggregation** is OFF because of the REPLACE column is value column in StorageEngine ,which means the storage engine needs to aggregate the data before returning to scan node,so we can guarantee that the same key columns will be aggregated in Reader. ### Compaction Base and Cumulative Compaction use Reader to aggregate data, and it use all tablet columns as return columns, so similar to the query processing, we can use Reader for replace based on version columns. ### Load With the same batch of data load, Doris uses one or more **MemTable**. We need to ensure that the same key column in one MemTable, columns of REPLACE type are replaced with version column, while the data in different MemTable is not guaranteed in LOAD because Query and Compaction guarantee the order of replacement. ### RollUp If rollup contains a column of REPLACE type, we need the user to add the Replace version column or extend the column automatically. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org