Shield0814 opened a new issue, #23810:
URL: https://github.com/apache/doris/issues/23810

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   2.0.0
   
   ### What's Wrong?
   
   使用hdfs broker 导入 hive 分区 【parquet 格式】时,导入失败,错误信息如下:
   [CORRUPTION]Invalid magic number in parquet file, bytes read: 131072, file 
size: 3795236, 
   
   相同的数据我在0.15.0版本通过hdfs broker 导入时,是可能正常导入成功的 
   
   经过多次尝试发现,似乎表分区数据量比较小的是时候,基本可能导入成功,数据量大一点的时候就会频繁失败
   
   出问题的表在数据量10w左右,parquet 文件大小只有500kb左右
   
   
   
   ### What You Expected?
   
   如何解决这个问题,保证数据可以正常导入
   
   ### How to Reproduce?
   
   1. ### doris 表创建语句
   CREATE TABLE `dws_xj_dwsdw_cp_variable_daily_bin` (
     `dt` int(11) NULL COMMENT '日分区',
     `variable_name` varchar(500) NULL COMMENT '变量名',
     `result_day` varchar(500) NULL COMMENT '计算出结果的日期',
     `binning_day` varchar(500) NULL COMMENT 
'分箱结果日期cp_variable_history_bin-date',
     `history_day` varchar(500) NULL COMMENT '要计算的历史日期',
     `variable_type` int(11) NULL COMMENT '变量类型(INTEGER-1, 
LONG-2,DOUBLE-5,BOOLEAN-7,STRING-9)',
     `bin_name` varchar(500) NULL COMMENT '数值分箱名',
     `value_count` int(11) NULL COMMENT '数值数量',
     `value_percent` double NULL COMMENT '数值分布占比',
     `his_value_count` varchar(500) NULL COMMENT '历史数值数量',
     `his_value_percent` double NULL COMMENT '历史数值分布占比',
     `his_left_value` varchar(500) NULL COMMENT '分箱起始值',
     `his_right_value` varchar(500) NULL COMMENT '分箱结束值'
   ) ENGINE=OLAP
   DUPLICATE KEY(`dt`, `variable_name`)
   COMMENT 'dws_xj_dwsdw_cp_variable_daily_bin'
   PARTITION BY RANGE(`dt`)
   (
   PARTITION p20230901 VALUES [("20230901"), ("20230902")),
   PARTITION p20230902 VALUES [("20230902"), ("20230903")),
   PARTITION p20230904 VALUES [("20230904"), ("20230905")))
   DISTRIBUTED BY HASH(`dt`, `variable_name`) BUCKETS 4
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 3",
   "is_being_synced" = "false",
   "storage_format" = "V2",
   "disable_auto_compaction" = "false",
   "enable_single_replica_compaction" = "false"
   ); 
   ### 2. broker 导入命令
    LOAD LABEL 
xj_dws.dws_xj_dwsdw_cp_variable_daily_bin_20230901_1693558145000(DATA 
INFILE('hdfs://xjprc-hadoop/user/sql_prc/warehouse/xj_dws.db/dws_xj_dwsdw_cp_variable_daily_bin/dt=20230901/*')
 INTO TABLE `dws_xj_dwsdw_cp_variable_daily_bin` TEMPORARY 
PARTITIONS(tp20230901) FORMAT AS "parquet" 
(`variable_name`,`result_day`,`binning_day`,`history_day`,`variable_type`,`bin_name`,`value_count`,`value_percent`,`his_value_count`,`his_value_percent`,`his_left_value`,`his_right_value`)
 set 
(`dt`="20230901",`variable_name`=`variable_name`,`result_day`=`result_day`,`binning_day`=`binning_day`,`history_day`=`history_day`,`variable_type`=`variable_type`,`bin_name`=`bin_name`,`value_count`=`value_count`,`value_percent`=`value_percent`,`his_value_count`=`his_value_count`,`his_value_percent`=`his_value_percent`,`his_left_value`=`his_left_value`,`his_right_value`=`his_right_value`))
 WITH BROKER hdfs 
("hadoop.security.authentication"="kerberos","kerberos_principal"="s_xiaojin@XIAOMI.HADOOP","kerb
 eros_keytab"="/home/work/soft/infra-client/s_xiaojin.keytab")
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to