imay commented on a change in pull request #3230:
URL: https://github.com/apache/incubator-doris/pull/3230#discussion_r419906166



##########
File path: be/src/exec/file_reader.h
##########
@@ -34,6 +34,16 @@ class FileReader {
     // is set to zero.
     virtual Status read(uint8_t* buf, size_t* buf_len, bool* eof) = 0;
     virtual Status readat(int64_t position, int64_t nbytes, int64_t* 
bytes_read, void* out) = 0;
+
+    /**
+     * if read eof then return Status::OK and length is set 0 and buf is set 
NULL,
+     *  other return readed bytes.
+     *
+     * !! Important !!
+     * the buf must be deleted by user, otherwise leak memory
+     * !! Important !!
+     */
+    virtual Status read_all(uint8_t** buf, size_t *length) = 0;

Review comment:
       Is read_one_message a better name?

##########
File path: docs/zh-CN/sql-reference/sql-statements/Data Manipulation/STREAM 
LOAD.md
##########
@@ -67,11 +67,11 @@ under the License.
         比如指定导入到p1, p2分区,-H "partitions: p1, p2"
 
         timeout: 指定导入的超时时间。单位秒。默认是 600 秒。可设置范围为 1 秒 ~ 259200 秒。
-        
+
         strict_mode: 用户指定此次导入是否开启严格模式,默认为开启。关闭方式为 -H "strict_mode: false"。
 
         timezone: 指定本次导入所使用的时区。默认为东八区。该参数会影响所有导入涉及的和时区有关的函数结果。
-        
+
         exec_mem_limit: 导入内存限制。默认为 2GB。单位为字节。

Review comment:
       lack format, jsonpath, jsonpathfile explain

##########
File path: docs/zh-CN/sql-reference/sql-statements/Data Manipulation/ROUTINE 
LOAD.md
##########
@@ -310,6 +318,84 @@ under the License.
             "property.client.id" = "my_client_id"
         );
 
+    4. 为 example_db 的 example_tbl 创建一个名为 test1 的 Kafka 例行导入任务,导入的简单json数据。
+        1)数据样例, doris_data为固定关键字
+        {
+            "doris_data":[
+                {"category":"a9jadhx","author":"test","price":895},
+                {"category":"axdfa1","author":"EvelynWaugh","price":1299}
+            ]
+        }
+        2) 创建任务,可不设置jsonpath或者jsonpath_file
+        CREATE ROUTINE LOAD example_db.test1 ON example_tbl
+        COLUMNS(category, author, price)
+        PROPERTIES
+        (
+            "desired_concurrent_number"="3",
+            "max_batch_interval" = "20",
+            "max_batch_rows" = "300000",
+            "max_batch_size" = "209715200",
+            "strict_mode" = "false",
+            "format" = "json"
+        )
+        FROM KAFKA
+        (
+            "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
+            "kafka_topic" = "my_topic",
+            "kafka_partitions" = "0,1,2",
+            "kafka_offsets" = "0,0,0"
+        );
+
+    5. 通过jsonpath参数,为 example_db 的 example_tbl 创建一个名为 test1 的 Kafka 
例行导入任务,导入的数据格式为json。
+
+        CREATE ROUTINE LOAD example_db.test1 ON example_tbl
+        COLUMNS(category, author, price)
+        PROPERTIES
+        (
+            "desired_concurrent_number"="3",
+            "max_batch_interval" = "20",
+            "max_batch_rows" = "300000",
+            "max_batch_size" = "209715200",
+            "strict_mode" = "false",
+            "format" = "json",
+            "jsonpath" = 
"{\"jsonpath\":[{\"column\":\"category\",\"value\":\"$.store.book.category\"},{\"column\":\"author\",\"value\":\"$.store.book.author\"},,{\"column\":\"price\",\"value\":\"$.store.book.price\"}]}"

Review comment:
       Why not use `jsonpaths=["$.store.book.category", "$.store.book.author", 
"$.store.book.price"]`
   

##########
File path: docs/zh-CN/sql-reference/sql-statements/Data Manipulation/ROUTINE 
LOAD.md
##########
@@ -137,15 +137,23 @@ under the License.
             采样窗口内,允许的最大错误行数。必须大于等于0。默认是 0,即不允许有错误行。
             采样窗口为 max_batch_rows * 10。即如果在采样窗口内,错误行数大于 
max_error_number,则会导致例行作业被暂停,需要人工介入检查数据质量问题。
             被 where 条件过滤掉的行不算错误行。
-        
+
         4. strict_mode
 
             是否开启严格模式,默认为开启。如果开启后,非空原始数据的列类型变换如果结果为 NULL,则会被过滤。指定方式为 
"strict_mode" = "true"
 
         5. timezone
-            
+
             指定导入作业所使用的时区。默认为使用 Session 的 timezone 参数。该参数会影响所有导入涉及的和时区有关的函数结果。
 
+        6. format
+
+            指定导入数据格式,默认是csv,支持json格式。
+
+        7. jsonpath、jsonpath_file

Review comment:
       1. jsonpaths and jsonpaths_file seems a better name
   2. should give explain about format of jsonpaths
   3. should give explain about how this works




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to