Youngwb commented on issue #3930:
URL:
https://github.com/apache/incubator-doris/issues/3930#issuecomment-654853902
According to @morningman @yangzhg 's suggest, I made some corrections
## name
use `sequence` column instead of `version` column for user understand easy.
## Create table
Use UNIQUE_KEYS instead of AGG_KEYS. Because sequence column is a hidden
column, there is no need to create a `version` column with MAX AGG_TYPE.
```
CREATE TABLE `test_1` (
`pin_id` bigint(20) NOT NULL COMMENT "",
`date` date NOT NULL COMMENT "",
`group_id` bigint(20) NOT NULL COMMENT "",
`keyword` varchar(128) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`pin_id`, `date`, `group_id`)
PROPERTIES (
"sequence_type" = "int"
);
```
like such example, user need to add `sequence_type` to Identify the sequence
column type. It only support the Integer types (int, bigint, largeint) and
time types(date, datetime). User can't query the `sequence_column` hidden in
table , but can add one column which value is equal to `sequence_column`.
like this
```
CREATE TABLE `test_2` (
`pin_id` bigint(20) NOT NULL COMMENT "",
`date` date NOT NULL COMMENT "",
`group_id` bigint(20) NOT NULL COMMENT "",
`sequence_visiable` int NOT NULL,
`keyword` varchar(128) NOT NULL
) ENGINE=OLAP
UNIQUE KEY(`pin_id`, `date`, `group_id`)
PROPERTIES (
"sequence_type" = "int"
);
```
Column names are not necessarily "sequence_visiable", this is just an
example. The user ensures that the values are same by specifying parameters at
LOAD time
## LOAD
### Stream Load
`curl --location-trusted -u root -H "columns:
pin_id,date,group_id,source_sequence,keyword" -H "sequence_col:
source_sequence" -T test_load
http://127.0.01:8030/api/test/test_1/_stream_load`
### Broker Load
```
LOAD LABEL test.test11
(
DATA INFILE("hdfs://path/to/load_file")
INTO TABLE `test_1`
FORMAT AS "parquet"
(pin_id,date,group_id,source_sequence,keyword)
) with BROKER broker_name (...)
PROPERTIES
(
"sequence_col" = "source_sequence"
);
```
### Routine Load
```
CREATE ROUTINE LOAD test_1_job ON test_1
COLUMNS TERMINATED BY ",",
(pin_id,date,group_id,source_sequence,keyword)
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "30",
"max_batch_rows" = "1000000",
"max_batch_size" = "509715200",
"sequence_col" = "source_sequence"
) FROM KAFKA
(
"kafka_broker_list" = "...",
"kafka_topic" = "...",
"property.client.id" = "...",
"property.group.id" = "..."
);
```
I added a parameter `sequence_col` to identify the source data for the
sequence column at load, because it's hidden column, user need to identify the
source column in `columns_mapping`.
For table `test_2` which has column `sequence_visiable`, user can set
"sequence_col" = "sequence_visiable" at properties, which means the hidden
column "sequence_col" is same with the "sequence_visiable" in table, user can
query the the column "sequence_visiable" instead of "sequence_col"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]