worker24h opened a new issue #3509: URL: https://github.com/apache/incubator-doris/issues/3509
此issue主要讨论,导入json数据接口定义, 欢迎大家进行讨论 前提: 表1有 category,price,author三列, 以routine load导入举例说明 分类: 由于json支持嵌套,所以json数据可能很复杂,因为这里设计默认导入和精准导入 一、默认导入(不提供jsonpath) 默认导入,表示表列的顺序与json中key/value一一对应, **即表列名子等于json key的名字** 1.1 创建routine load语句, 该语句同事支持 场景1.1 和 场景1.2 CREATE ROUTINE LOAD example_db.test_json_label_1 ON table1 COLUMNS(category,price,author) PROPERTIES ( "desired_concurrent_number"="3", "max_batch_interval" = "20", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "format" = "json" ) FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092", "kafka_topic" = "my_topic", "kafka_partitions" = "0,1,2", "kafka_offsets" = "0,0,0" ); format: 目前支持csv, json两种,默认是csv 1.2 单object数据 kafka中一条消息内容 => {"category":"a1", "price":10, "author":"c1"} 导入后表数据: category price author ------------------------------- a1 10 c1 1.3 多object数据 为了提升吞吐,支持一次性导入多条数据,**其中RECORDS是关键字且为数组** kafka中一条消息内容 => { "RECORDS":[ {"category":"a1", "price":10, "author":"c1"}, {"author":"c2", "category":"a2"} {"category":"a1", "price":30, "author":"c3"} ] } 导入后表数据: category price author ------------------------------- a1 10 c1 a2 null c2 a3 30 c3 【特别说明】 如果没有匹配到数据,默认是null 二、精准导入(提供jsonpaths) 2.1 json数据格式 { "store": { "book": [ {"category": "a1", "author": "c1", "price": 10}, {"category": "a2", "author": "c2"}, {"category": "a3", "author": "c3", "price": 30} ] } 2.2 创建routine load语句 CREATE ROUTINE LOAD example_db.test_json_label_1 ON table1 COLUMNS(category, price, author) PROPERTIES ( "desired_concurrent_number"="3", "max_batch_interval" = "20", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "format" = "json", "jsonpaths" = "[\"$.store.book.category\",\"$.store.book.price\",\"$.store.book.author\"]" ) FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092", "kafka_topic" = "my_topic", "kafka_partitions" = "0,1,2", "kafka_offsets" = "0,0,0" ); 导入后表数据: category price author ------------------------------- a1 10 c1 a2 null c2 a3 30 c3 【特别说明】 1)如果jsonpath未匹配到数据,则默认是null 2)**doris内部使用rapidJson第三方库进行解析json,该库只支持双引号(""),表示字符串,因此在定义jsonpath时我们需要对其进行转义** 2.3 jsonpaths_file 当我们列比较多时可以将jsonpaths放到一个固定文件中且将文件上传到doris中 CREATE ROUTINE LOAD example_db.test_json_label_1 ON table1 COLUMNS(category, price, author) PROPERTIES ( "desired_concurrent_number"="3", "max_batch_interval" = "20", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "format" = "json", "jsonpaths_file" = "DEFAULT:filename.json" ) FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092", "kafka_topic" = "my_topic", "kafka_partitions" = "0,1,2", "kafka_offsets" = "0,0,0" ); 导入后表数据,与2.2一致。 【特别说明】 当jsonpaths 和 jsonpaths_file同时出现时, 将使用jsonpaths作为查询依据 三、备注 由于jsonpath语法比较复杂,目前doris只实现简单语法格式 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org