[GitHub] [incubator-doris] Youngwb commented on issue #3930: [Proposal] Doris support version column for REPLACE aggregate type

GitBox Tue, 07 Jul 2020 06:21:39 -0700


Youngwb commented on issue #3930:
URL: 
https://github.com/apache/incubator-doris/issues/3930#issuecomment-654853902



   According to @morningman @yangzhg 's suggest, I made some corrections
   ## name
   use `sequence` column instead of `version` column  for user understand easy.
   
   ## Create table
   Use UNIQUE_KEYS instead of AGG_KEYS. Because sequence column is a hidden 
column,  there is no need to create a `version` column with MAX AGG_TYPE.
   
   ```
   CREATE TABLE `test_1` (
   `pin_id` bigint(20) NOT NULL COMMENT "",
   `date` date NOT NULL COMMENT "",
   `group_id` bigint(20) NOT NULL COMMENT "",
   `keyword` varchar(128)  NOT NULL
   ) ENGINE=OLAP
   UNIQUE KEY(`pin_id`, `date`, `group_id`)
   PROPERTIES (
     "sequence_type" = "int"
   );
   ```
   like such example, user need to add `sequence_type` to Identify the sequence 
column type. It only support the Integer types (int, bigint, largeint) and  
time types(date, datetime). User can't query the `sequence_column` hidden in 
table ,  but can add one column which value is equal to `sequence_column`.  
like this
   ```
   CREATE TABLE `test_2` (
   `pin_id` bigint(20) NOT NULL COMMENT "",
   `date` date NOT NULL COMMENT "",
   `group_id` bigint(20) NOT NULL COMMENT "",
   `sequence_visiable` int NOT NULL,
   `keyword` varchar(128)  NOT NULL
   ) ENGINE=OLAP
   UNIQUE KEY(`pin_id`, `date`, `group_id`)
   PROPERTIES (
   "sequence_type" = "int"
   );
   ```
   Column names are not necessarily "sequence_visiable", this is just an 
example. The user ensures that the values are same by specifying parameters at 
LOAD time
   
   ## LOAD
   ### Stream Load
   `curl --location-trusted -u root -H "columns: 
pin_id,date,group_id,source_sequence,keyword"  -H "sequence_col: 
source_sequence" -T test_load  
http://127.0.01:8030/api/test/test_1/_stream_load`
   
   ### Broker Load
   ```
   LOAD LABEL test.test11
   (
       DATA INFILE("hdfs://path/to/load_file")
       INTO TABLE `test_1`
       FORMAT AS "parquet"
       (pin_id,date,group_id,source_sequence,keyword)
   ) with BROKER broker_name (...)
   PROPERTIES
   (
        "sequence_col" = "source_sequence"
   );
   ```
   
   ### Routine Load
   ```
   CREATE ROUTINE LOAD test_1_job ON test_1
   COLUMNS TERMINATED BY ",",
   (pin_id,date,group_id,source_sequence,keyword)
   PROPERTIES
   (
       "desired_concurrent_number"="3",
       "max_batch_interval" = "30",
       "max_batch_rows" = "1000000",
       "max_batch_size" = "509715200",
       "sequence_col" = "source_sequence"
   ) FROM KAFKA
   (
       "kafka_broker_list" = "...",
       "kafka_topic" = "...",
       "property.client.id" = "...",
       "property.group.id" = "..."
   );
   ```
   
   I added a parameter `sequence_col` to identify the source data for the 
sequence column at load,  because it's hidden column, user need to identify the 
source column in `columns_mapping`. 
   
   For table `test_2` which has column `sequence_visiable`, user can set 
"sequence_col" = "sequence_visiable" at properties,  which means the hidden 
column "sequence_col" is same with the "sequence_visiable" in table, user can 
query the the column "sequence_visiable" instead of "sequence_col"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] Youngwb commented on issue #3930: [Proposal] Doris support version column for REPLACE aggregate type

Reply via email to