Youngwb opened a new issue #3930:
URL: https://github.com/apache/incubator-doris/issues/3930
### BackGround
Doris currently use REPLACE to update data, but the replacement order cannot
be guaranteed for the data import of the same batch. The user needs to
guarantee that there is no same key column in the imported data of the same
batch to guarantee the replacement order, which is very inconvenient for the
user. To solve this problem, we can use a **version** column to specify the
replacement order.
### Goal
The user specifies a **version column** when creating the table. Doris
relies on this column to update the data of REPLACE type. The larger version
column data can REPLACE the data of the smaller version column, while the data
of the smaller version column cannot REPLACE the larger version column data.
### Create Table Interface
```
CREATE TABLE `test` (
`id` bigint(20) NOT NULL,
`date` date NOT NULL,
`group_id` bigint(20) NOT NULL,
`version` int MAX NOT NULL,
`keyword` varchar(128) REPLACE NOT NULL,
`clicks` bigint(20) SUM NULL DEFAULT "0" ,
`cost` bigint(20) SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(`id`, `date`, `group_id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 16
PROPERTIES (
"replace_version_column" = "version"
);
```
When creating a table, the user simply adds the **replace_version_column**
attribute in PROPERTIES to identify the version column, which requires a MAX
aggregation type to ensure that only the largest version column is retained for
the same key column.
### Query
When a user's query does not contain the REPLACE column, the original logic
follows. When a user's query contains REPLACE columns, BE needs to extend the
Version column on which the REPLACE column depends, and compare the value
column when it is aggregated. These operations can be done by extending
**Reader return columns**, and in FE,the **isPreAggregation** is OFF because of
the REPLACE column is value column in StorageEngine
,which means the storage engine needs to aggregate the data before returning
to scan node,so we can guarantee that the same key columns will be aggregated
in Reader.
### Compaction
Base and Cumulative Compaction use Reader to aggregate data, and it use all
tablet columns as return columns, so similar to the query processing, we can
use Reader for replace based on version columns.
### Load
With the same batch of data load, Doris uses one or more **MemTable**. We
need to ensure that the same key column in one MemTable, columns of REPLACE
type are replaced with version column, while the data in different MemTable is
not guaranteed in LOAD because Query and Compaction guarantee the order of
replacement.
### RollUp
If rollup contains a column of REPLACE type, we need the user to add the
Replace version column or extend the column automatically.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]