Youngwb edited a comment on issue #3930: URL: https://github.com/apache/incubator-doris/issues/3930#issuecomment-654853902
According to @morningman @yangzhg 's suggest, I made some corrections ## name use `sequence` column instead of `version` column for user understand easy. ## Create table Use UNIQUE_KEYS instead of AGG_KEYS. Because sequence column is a hidden column, there is no need to create a `version` column with MAX AGG_TYPE. ``` CREATE TABLE `test_1` ( `pin_id` bigint(20) NOT NULL COMMENT "", `date` date NOT NULL COMMENT "", `group_id` bigint(20) NOT NULL COMMENT "", `keyword` varchar(128) NOT NULL ) ENGINE=OLAP UNIQUE KEY(`pin_id`, `date`, `group_id`) PROPERTIES ( "function_column.sequence_type" = "int" ); ``` like such example, user need to add `sequence_type` to Identify the sequence column type. It only support the Integer types (int, bigint, largeint) and time types(date, datetime). User can't query the `sequence_column` hidden in table , but can add one column which value is equal to `sequence_column`. like this ``` CREATE TABLE `test_2` ( `pin_id` bigint(20) NOT NULL COMMENT "", `date` date NOT NULL COMMENT "", `group_id` bigint(20) NOT NULL COMMENT "", `sequence_visiable` int NOT NULL, `keyword` varchar(128) NOT NULL ) ENGINE=OLAP UNIQUE KEY(`pin_id`, `date`, `group_id`) PROPERTIES ( "function_column.sequence_type" = "int" ); ``` Column names are not necessarily "sequence_visiable", this is just an example. The user ensures that the values are same by specifying parameters at LOAD time ## LOAD ### Stream Load `curl --location-trusted -u root -H "columns: pin_id,date,group_id,source_sequence,keyword" -H "function_column.sequence_col: source_sequence" -T test_load http://127.0.01:8030/api/test/test_1/_stream_load` ### Broker Load ``` LOAD LABEL test.test11 ( DATA INFILE("hdfs://path/to/load_file") INTO TABLE `test_1` FORMAT AS "parquet" (pin_id,date,group_id,source_sequence,keyword) ORDER BY source_sequence ) with BROKER broker_name (...) ``` ### Routine Load ``` CREATE ROUTINE LOAD test_1_job ON test_1 COLUMNS TERMINATED BY ",", (pin_id,date,group_id,source_sequence,keyword) ORDER BY source_sequence PROPERTIES ( "desired_concurrent_number"="3", "max_batch_interval" = "30", "max_batch_rows" = "1000000", "max_batch_size" = "509715200" ) FROM KAFKA ( "kafka_broker_list" = "...", "kafka_topic" = "...", "property.client.id" = "...", "property.group.id" = "..." ); ``` I added a parameter `sequence_col` to identify the source data for the sequence column at load, because it's hidden column, user need to identify the source column in `columns_mapping`. For table `test_2` which has column `sequence_visiable`, user can set "sequence_col" = "sequence_visiable" at properties, which means the hidden column "sequence_col" is same with the "sequence_visiable" in table, user can query the the column "sequence_visiable" instead of "sequence_col" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org